Small sample size is what I meant with unpredictable and small numbers. There is just not enough data to say anything. And the data that is there is highly noisy and prone to random fluctuations.
And what makes it extra hard, you cannot even measure quality of a game in an objective neutral way. So you can also not have any measurement of certain ratios and other significant numbers, like quality/time spent to create that quality, success based on quality and so on.
It would be similarily futile like judging the quality of a book by it's count of pages.
And with games (and books) you also would have to have the same target audience to even begin contemplating a comparison. In practical terms, the games would have to share most of their tags and the tags should be accurate. Most games inside the same jam do not even share that much. They might share exposure to a kernel of people that rate them. That should at least give a bit of information about the appeal and fun the games achieved. But this is art and taste and not the 713rd installment of a sports game series.