Post by leafo in Jam ratings calculation issue

Viewing post in Jam ratings calculation issue

I really appreciate your response, but my original points still stand.

The design goal of the current algorithm does not include that you should individually have direct control over if your score adjusted or not. Just like you can’t directly control how many people how many people decide to rate your game. You can only influence how visible your game is by how much you participate in rating & comments.

The goal is to rank each project relative to every other project. As an individual participating in a ranked jam you should be prepared to accept that there may be entries that organically are more well received than yours, with a high average and number of ratings.

I full understand your point about the aliasing that happens around the median. This aliasing still happens even with your suggestion, but will happen at a different spot and, as I mentioned, increase the odds of the ‘fluke’ scenario: a game with 6 votes ranking above a game with 200 votes (since generally medians are pretty low). By lowering the bar you diminish the reward that the proven submissions get (those with high number of ratings & score).

I also want to note, since I don’t know if you are even familiar with the algorithm, but the score adjustment is avg_score * sqrt(min(median, votes_received) / median). Note the sqrt which effectively reduces the slope of the penalty for those near the median, trying to mitigate some of the aliasing issue. (One experiment worth exploring might be changing the denominator of the exponent)

Another thing to note is the nature of games jams, especially apparent in larger ones. There are quite a few broken, incomplete, partially made, or just low quality entries. Many people will skip voting these depending on the circumstances. The median can map over this curve of completion for entries. There’s also the group of people who don’t participate in voting so their games get restricted exposure.

I can generally confidently suggest to anyone that if they put in a genuine effort into their game (including its presentation) and put in a genuine effort into rating entries they’re unlikely to get caught in the adjusted group since the median work required is lower than they might think.

(It’s possible your intuition is mixing up average effort with median effort for number of ratings. The average number of ratings is always larger than the median. Your concerns are based around the scenario when median == average, which doesn’t really happen)

Also, one last thing, if a jam host ever reaches out and they have specific goals in mind with how voting and ranking works, I’m happy to work with them.

Lastly, if you’re curious to experiment, here are some ranked jams you can look through:

https://itch.io/jam/brackeys-4/results
https://itch.io/jam/cgj/results
https://itch.io/jam/brackeys-3/results
https://itch.io/jam/lowrezjam-2020/results
https://itch.io/jam/igmc2018/results
https://itch.io/jam/united-game-jam-2020/results
https://itch.io/jam/nokiajam3/results

Like Reply

Alice3 years ago(+2)

Thank you for your response (and, uh, for other responses so far too ^^).

I understand the point about the design goal of the system, and it being primarily to compare entries against one another. My point about agency likely stems from this emphasized suggestion "participate in rating games" and generally encouraging participation. It's a very valid message, but also made it sound like it's primarily participant's responsibility to get their game above median, rather than a (closer to reality) combination of participant's involvement and out-of-control factors.

I'd like to point out there are two key aspects to the Jam experience:

"global" aspect - what kind of games were created and whether top-ranked entries deserve their spots (since it's the top places people are mostly excited about)
"individual" aspect - how one's entry performed, both in terms of feedback and ranking

The median measure seems focused on improving the global aspect - making sure that ratings are fairer.
Except it's a finnicky measure, because:

You mention 6-votes entry ranking above 200-votes entry, which I presume is about the 80% (rounded-up) measure of mine; but this implies the median is 7, and 7-votes entry ranking above 200-votes entry doesn't seem like a massive improvement.
In a recent (non-itch) Jam I participated in, there were 25 entries with votes from 19 entrants + 4 more people. Most of them ranked nearly all entries (people couldn't rank their own entry). The median-and-above entries got 20-22 votes, the below-median entries got mostly 18-19 votes (two entries got 14 and 16 votes). Also, one of the 19-voted entries was 5th out of 25, making it a relevant contender.
With the strict median measure, an entry getting 19 votes would have its score adjusted while 22-votes (most-voted entry) would not. It means that, depending on a situation 19 is deemed too unreliable vs 22, but in another Jam 7 seems reliable enough vs 200. Now, even with my proposal it would be 16-22 vs 6-200 spread, but it goes to show that median system adds extra noise - potentially near top-ranking entries, too - when all entries are voted-for almost evenly. The difference is that raw median semi-randomly punishes 11 out of 25 entries, while with my adjustment only 1 of 25 entries qualifies for score adjustment - that's one 1 less!

I guess the problem of extreme-voted entries can be tackled two-fold (maybe even both measures at once):

Promote the high-ranking (e.g. top 5%) low-voted entries, so that more people will see them and either prove their worth or get them off their high horse. People don't even need to specifically be aware these are near-top entries (especially since temporary score isn't revealed), what matters is that they'll play, rate and verify.
It's sort of "unfair" for poorer-quality entries, but chances are already stacked against them and it can improve the quality of top rankings by whittling down the number of undeserved all-5-star outliers. And let's face it - who really minds that 6-voted entry with all 2s ranks above a 200-voted entry with mostly 2s and some 1s?
More work in this one, but with great potential to improve jam experience - streamline the voting process.
In that Jam I mentioned, we have a tool called "Jam Player". It's packaged with the ZIP of all games, and from there you can browse the entries, run their executables, write comments, sort entries etc. As the creator of the Jam player I might be blowing my own horn, but before lots of voters played only a fraction of games. Ever since introducing the Jam player, the vast majority of voters play all or nearly all entries, even when the number of entries reach 50 or so (with 80 the split between complete-played and partially-played votes was more even, but still in favour of complete-played).
I imagine similar tool for integrated voting process could work for itch.io - obviously there are lots of technical challenges between a ZIP-embedded app for a local jam and a tool handling potentially very large Jams, but with itch.io hosting all the Jam games it might be feasible (compare that with Ludum Dare and its free links). With such a player app, same people would play more entries, making the votes distributions more even and thus reliable (say, something like 16/20 vs 220 instead of 6/7 vs 200).
Perhaps I should write up a thread on the itch.io Jam Player proposal...

The 80% median seeks to improve the individual aspect - making sure it's easier to avoid the disappointment of getting score adjusted on own entry despite one's efforts

If someone cares about not getting their score adjusted and isn't a self-entitled buffoon, they'll do their best to participate and make their entry known. If someone doesn't care, then they won't really mind whether their entry gets score adjusted or not. The question is, how many people care and how many don't.
If less than 50% people care, they'll likely end up in the higher-voted half of entries. Thus, no score adjustment for them, the lower-voted half doesn't care, everything is good.
However, if more than 50% people care, there'll inevitably be some that get in the score-adjusted lower half. E.g. if 70% people cared about score adjustment, then roughly 20% would get score-adjusted despite their efforts not to. The score adjustment might not even be that much numerically, but it still can have a psychological impact like "I failed to participate enough" or "I was wronged by bad luck". I'm pretty sure it would sour the Jam experience, which goes against the notion of "the jam is the best experience possible for as many people as possible". The fact that 60-70% Ludum Dare entries end up above 20 entries threshold, and that 19/25 entrants voted in the Jam I mentioned, I'd expect in a typical jam at least half of participants would care.

Do note that in the example Jam from earlier, 9 of 19 voting entrants would get score-adjusted with 100% median system despite playing and ranking all or near-all entries. Most of that with quality feedback, too, you can hardly participate more than that. Now, I don't know how about you, but if I lost a rank or several to the score-adjustment despite playing, ranking and reviewing all entries - just because someone didn't have time to play my game and its votes count got below median - I'd be quite salty indeed.
With the 80% median system, all voting entrants would pass at the cost of 16 vs 22 variance, which isn't all that great compared to 20 vs 22 variance (the least voted entrant didn't vote).

To sum it up:

if the votes count variance is outrageous in the first place (like 6/7 vs 200), then sticking to strict median won't help much
if the votes count variance is relatively tame (like 18 vs 22), then using strict median adds more noise than it reduces
provided that someone cares about score adjustment and actively participates to avoid it, the very fact of score-adjustment can souring/discouraging, even if the adjustment amount isn't all that much
rather than adhering to strict median, the votes variance problem may be better solved by promoting high-ranked low-voted entries (so that they won't be so low-voted anymore) and increasing number-of-votes-per-person by making the voting process smoother (like the Jam Player app; this one is ambitious, though)
with more votes-per-person and thus more even distribution of votes, we should be able to afford a leeway in the form of 80% median system

Also, thanks for the links to the historical Jams. Is there some JSON-like API that could fetch the past Jam results (entry, score, adjusted score, number of times entry was voted on) for easier computer processing? Scraping all this information from webpages might be quite time-consuming and transfer-inefficient.

Like Reply