Solved it by coding. Some pretty weird words are counted AS words, though some other words that can be found in a normal dictionary are not. Still, the idea is interesting.
Can you give some examples of common words that are missing? I might be using a problematic dictionary.txt