You could try using this list and cutting it to the top 50k words or so: https://gist.github.com/h3xx/1976236