There's a lot of room for improvement in Lil performance. The each loop in your example is particularly expensive because "each" is a map operation yielding a result list, range eagerly creates an entire list, and each loop body is executed in its own scope. Contrast with "while", which is considerably faster due to involving less bookkeeping and keeping much less in memory at once:
while x<1000000 x:1+x end
Or, when it's possible, just using the natural "conforming" over lists:
1+range 1000000
Performance for each over a huge list is particularly bad in c-lil; I'll do some investigating.