While admittedly not as neat as your paint program, the program I want to share in this post doesn't need more than the default mode to work. It's an image viewer - that manages to fit both the viewer and the image to show into ROM (doesn't use the harddrive), with some space left over. (Not quite enough space for a screen one bittage larger, but I think it would only need two more words than there is room for, and only for the additional image data - the code itself should only need two numbers changed.)
I actually thought I had been clever by using self-modifying code to do it, but only because I hadn't done the bootloader task yet (I had the idea and wanted to see if I could "cheat" by drawing the image myself instead) - when I did, I discovered that using self-modifying code is actually required for that (since there's no pointers), so I apparently wasn't being quite as clever as I thought... ^_^' :)
Obviously, this program does things a bit differently than the "win" program on the disk does (which I took a look at only after writing mine). Mine only stores the color values for each pixel, and draws them in a fixed order, which goes column by column.
SET 15 3 1 // R15=1 : for doing various math
SET 14 1 10000000 // R14[8:8]=1 : 1-value for incrementing Y
MOVI 10 26 // R10=ram[26] : this instruction is used as data
MOVI 10 18 // R10=ram[18] : load the next word of image data : self-modified
MATH 2 2 1 // R2=0 : initialize loop counter
PMOV 15 3 0 31 4 0 // R3=16 : max value that ends the loop, pixels per word
PMOV 10 1 0 1 0 0 // R1[0:1]=R10[0:1] : copy current color into R1
SETDATA 0 3 1 // write(screen, R1) : write pixel to screen
MATH 14 1 0 // R1+=R14 : increment Y position in R1 (overflows to X position)
PMOV 10 10 0 31 2 0 // R10<<2 : move next pixel into place
MATH 15 2 0 // R2++ : add one to the loop counter
IFJMP 2 5 3 // loop: if R2 < R3 then jump back 5 instructions
MOVI 2 3 // R2=ram[3] : load MOVI instruction at address 3 into R2
MATH 15 2 0 // R2++ : increment address of MOVI instruction
MOVO 2 3 // ram[3]=R2 : overwrite MOVI instruction at address 3
MOVI 3 2 // R3=ram[2] : load instruction at address 2 into register 3
IFJMP 1 3 3 // loop: if R2 < R3 then jump to instruction at address 3
HLT // end of program - image is being displayed
DATAC 11111111111111111101010110101011 // image data: 16x8 pixels, 2 bits
DATAC 11101010010101111110100101101011 // per pixel (hence 8 words total),
DATAC 11101010010101111101010110101011 // stored in column-first order
DATAC 11101010101010111101010101010111 // (so two columns per word).
DATAC 11101010101010111101010101010111
DATAC 11100101101010111110101001011011
DATAC 11010101010101111110101010101011
DATAC 11010101011001111111111111111111
(I grabbed the image from the gif in the game description, but I also discovered that the PC already has a working bootloader built into it - starting the PC without first compiling a program works just fine and shows the "win" image...)
Since I had some room left over, and my initial simple bootloader implementation was rather small, I considered combining them to show an error if the disk program wanted too many words loaded from disk, but it turned out there wasn't quite that much room left over. (I ended up just making the bootloader draw an S instead (for "Size error") if more than 26 words was asked for, since that's the most it could actually do. It still has some room left over that I'm not sure what to do with.)
By the way, thanks for the nice game/sandbox. Already just from the description it reminded me of TIS-100 (which is not a bad thing, quite the opposite), just with a more conventional architecture, and the game itself didn't disappoint on that front. While it does have some rough edges, considering it started as a Ludum Dare entry and thus currently has very little development time behind it (as compared to most games), I'm actually impressed how well it does work. Seeing as you seem to be continuing work on it, I figure those edges will probably be smoothed out over time. Thumbs up from me. :)