Viewing post in Bound by Your Word jam comments
Yay I'm glad the absurd humor resonated! That kind of goofiness makes me giggle, and the giggles are what keep me building :D. I'm not sure how much you want to know about the word association stuff, but here's an attempt to answer! Sorry for the wall of text.
I think you're asking about how the game knows which answers are partially correct (checkmark vs cross)? That is largely based on the Google NGram dataset . The 2-gram data lists the frequency of certain 2-word pairs [as found in books], which can be used to figure out which word pairs have relationships or special meaning, relative to other words. Around 2020, I wrote code to slurp that up and make it searchable via my Dillfrog Context Search website (example) . So for the jam, I used my internal API to search that data and write code that "bakes" level files from my manually-generated list of correct answers and clues.
To generate the puzzles (i.e. the "inputs" to the baking process), I manually read through a list of "ambiguous" words that had multiple meanings , in the hopes that they led to more interesting contexts and puzzles. For each word, I searched Wiktionary (primarily) and my Dillfrog Context data (secondarily) to see what words might be recognizable clues. Then I created a TSV file that included the intended correct answers, and the clues to use.
Then, to build the JSON "output" that the game will actually read and use, I run that baking process. The baker slurps the TSV. For each clue, it uses that Google NGram data to spit out the top ~500 words that occur before or after each clue (based on a PMI score if I remember correctly...), so it can acknowledge the partially-correct guesses [for the top ~500 results]. It also adds correct answers that I missed (e.g. plurals, if I listed the answer in singular form), based on the top ~500 connections, and does some other cleanup.
Hope that helps explain it!