https://www.kaggle.com/datasets/rtatman/english-word-frequency is also a good reference
Viewing post in Sternly Worded Adventures comments
Word frequency is not a good statistical metric for a game where you care about letter frequency within words. If you use word frequency you end up drowning in H's and T's unless every other word you make is THE or contains TH. If you can track down a pre-V23 copy you can feel that first hand.
I is less common in shorter words than it is in longer words. Considering only words of 3-6 letters length it's around the 5th most common letter. Above that it's the most common letter by an increasing margin as the words get longer.
Making it less common on its own would make it harder to make 7+ letter words. Having the algorythm consider what's already on the board would be a possiblity, but that might inadvertedly nerf things like the Book of 4-heal.
As it currently stands for I, it's the third most common letter, which is a middle of the road compromise between short words and long words.
I think you misunderstand. I'm not saying base your distribution on how often these words appear, but use that as a dictionary to run your tile improvement method against.
So instead of "The new tile rarities are based on an analysis of the in-game dictionary, weighted based on source word length with a bias curving towards 7 letter words."
It would be "The new tile rarities are based on an analysis of the most used words dictionary, weighted based on source word length with a bias curving towards 7 letter words."
edit: if you do this, delete any one or two letter words in that dictionary, there is a lot of slang and bullshit there that should be removed before running an analysis.
Also, just checking you 100% have the rights to use the dictionary you're using, because hint found a unique word that isn't in oxford or scrabble dictionaries (I googled it because I'd never heard it before). ooecia - cool word, tonnes of vowels, but not in just about any dictionary used for games. (Some dictionaries include unique words to their dictionary to test if people use it without proper authorization)