Because measuring 2 objects to an inferior benchmark does not make them equal, as I pointed out. The context of pve or pvp is irrelevant to how the data is measured. The difference is pvp notices the imbalances because the other player(s) are actually capable of using them.
Consider the ai to be the control group in context of balance. Everything the player has is pitted against this test.
The expiremental test is determining if a player with x is greater than player with y. Any huge imbalances can more or less be found on paper without applying pvp situations at all. If a little bit of math shows that x=50 and y=25, there's probably a game breaking discrepancy between the two.
I know why you think this is the best approach, but I disagree.
First, calling the AI an inferior benchmark ignores that it is the one that matters for SP.
And second it leaves completely out that the AI is a differently weighted challenge. The AI has advantages and disadvantages compared to a human player, that another human player doesn't have. Using it as a control group doesn't apply either, because it is not baseline, compared to players.
The issue is more that when X returns 50 and Y returns 40, and the AI returns 45, then simply comparing X and Y and deciding to bring them in line with each other, is a hail mary with regard to the AI. And that's before we compare summands.