Over the last week I have been programming an API that retrieves your comments under every submitted "Brackeys Jam" game and calculates a "Comment Uniqueness Index" shortened to "CUI". The CUI is a value between 0 and 100.000 with 0 meaning that every comment you have made is a copy of every other comment and 100.000 meaning that you have never used the same word twice.
So without further to do here are my findings:
Bell curve:
Ratings TOP 10:
Name | Game ID | CUI | Number of comments* | |
1 | OgelGames | 722621 | 90536 | 43 |
2 | basaOnly | 720648 | 90187 | 19 |
3 | Pix1lDev | 723136 | 89994 | 40 |
4 | Haredo | 724910 | 89690 | 48 |
5 | sohwathismelvin | 723687 | 89666 | 115 |
6 | Jason Flores | 723454 | 89515 | 29 |
7 | ldd | 718085 | 89478 | 21 |
8 | Mr. Minticuz | 724863 | 89453 | 30 |
9 | Tipu | 722753 | 89394 | 30 |
10 | NoelOskar | 724990 | 89269 | 28 |
In total 751 participants fulfilled the minimal requirement (at least 5 comments under ratings) for a score. Because of post limitation in the community tab I couldn't put all the score here. If you wish to see the full list you'll need to visit my github and open the "CUI_Score.pdf" .
*note that your comments under your own submission aren't counted here, since a reasonable assumption could be made that you would write a lot of "thank yous" that in turn would count against your score.
Some questions you might have:
If you can't find yourself in the ratings this can be because of a number of things.
- you didn't meat the minimum requirement of 5 comments under the rating page of other participants.
- this data set was taken on Sunday the 16th, so it might not include some comments that you have written since then.
- some of your comments got eaten by the database Craken (he doesn't like special characters) and I didn't bother adding them manually. I am a lazy void after all.
- your name contained special characters. In this case you are still in my database but with a jumbled name.
Note that a low score doesn't automatically mean that you are a bot or a bad rater. Some low scores can be explained by the low minimal requirement of 5 comments. Short and generic comments ("I liked the game", "really good implementation of the theme") as well as an excessive usage of common words ("a", "the", "I"...) can lead to a low score. The algorithm is super simple and doesn't necessarily reflect the contents correctly.
About the process:
I have written an API in C# that collected all games and authors from the Breackeys Jam 2020.2 submissions page and saves them in a MySql database. After that it goes thru the list of submitted games and collects all comments made on the rating page of each game.
This data set is then fed into a Jaccard Index calculation function, that compares every comment that a participant made under other peoples games with every other such comment, including itself. This causes some inaccuracy, which I accept (because I am a lazy void). The average is calculated from these indexes. Finlay I subtracted the calculated value from one and multiply it by 100.000. The reason behind this is to make the scores more impressive. Displaying the CUI as a float from 0 to 1, would have been basically the same.
If you have other questions regarding the data and how it was acquired, a mistake you have spotted or simply want to offer a suggestion you can write me a comment!
If you're interested in the data set I used to preform your own analysis you can request the MySql database or an excel file (I am currently searching for a good place to upload it)