Reminds me of a simplified vector graphics renderer of some sort. The speediness sounds really handy for, say, game development - every cycle counts when you're rendering a game, even in 2d! IFJMP is instant now (fixed that bug), so, yes, it takes 2 cycles per pixel, which is...frankly, I didn't even know you could optimize the code that much. Very impressive!
Your preprocessor is really cool. The base source file for the line-renderer/pixel-pusher looks like real Assembly, like the sort I often see in, say, online 6502 development tutorials. It's quite a bit easier to understand, and omg the use of underscores in the binary sections is a huge improvement to readability.
The only major comment I have is that it might be worth adding the estimated/intended address of every word in the comments for that line - makes it easier to debug when something (inevitably) goes catastrophically wrong. Combined with the comments for, say, "label here", or "overlay here", it should be fairly easy to figure out where the problem is in the original pre-preprocessing source file. One of the latest updates (dunno if it's the one I released last, or the one that's WIP w/ the code for the cycle timer & such) has a built-in way to check the current address in the Debug screen, meaning that being able to search for said address in your code can help you find what blew up faster than manually trawling through the RAM viewer, looking for the highlighted entry.
(Also, my apologies for taking so long to reply to this. Internet blew up. Again. I feel like such an unprofessional developer >.<)