Here's a program that is a bit unusual for mine, in that it doesn't really work in the Default Mode of Senbir. So I've tested it in the Extended Mode instead, and it seems to work there.
It's a TC-06 emulator. (For the version of Senbir I currently have installed - so without UTL/OFST.)
Technically I suppose you could say that the emulator itself fits in the default mode, since all its code and internal data takes less than 256 words, but to be used it needs some data areas for the emulated memory, registers and disk drive - and while I've managed to squeeze in the area for the memory, that still leaves the registers starting at address 256 (exactly), and the disk area comes after that (to take the rest).
It's currently set up to emulate approximately the default mode - 16 regs, 32 words of RAM, and whatever disk space remains available after the emulator code and data areas. I believe it could fairly easily be modified for larger memory though - even emulating more memory than the host has - as long as there's enough disk space for storing it.
While I haven't really tested it comprehensively, it appears to work with the programs I have tested. It currently includes a basic bootloader in ROM and my image displaying program on the emulated disk (the one that fits in ROM while showing a win screen), and those seem to work.
I'm not saying the behaviour is 100% identical - especially for code doing things it shouldn't like trying to read/write memory outside of what's available - but for well-behaved programs I believe it should work pretty much as expected. (Well, as long as you don't reboot without rewriting the disk at least, since the emulated RAM isn't reset at boot...)
It's horribly slow, though. Slowness is not really unexpected for an emulator like this, but even so.
That bootloader and image program, when run directly on the extended mode machine, takes about 3 seconds to draw the first pixel, and 14 more until it's done (machine HLTed), so about 17 seconds total from power-on to HLT.
When run in the emulator, the first pixel was drawn after about 228 seconds (3m 48s), and it was done about 1174 seconds later (19m 34s), for a total of about 1402 seconds (23m 22s) from power-on to HLT.
That's a factor of 82.5 in how much more time it takes in the emulator... ouch.
A lot of that (maybe most) is due to needing to use the disk runner technique for the emulator itself, of course - if it could fit in RAM (and was written to do so), it would obviously be significantly faster (though still much slower than running natively).
One potentially interesting tidbit is that the code for handling getdata/setdata is comparable in size to the code for the rest of the instructions put together. That gives an indication of how complicated those instructions are...
Also, in addition to/instead of emulating larger memory, I think it wouldn't really take all that much to modify this program to essentially do preemptive multitasking (not cooperative), as long as there was enough disk space on the host for storing the data for two or more programs (plus the extra code). I don't think it would be very difficult to make it switch between two programs, executing one instruction at a time for each of them, which basically makes them run concurrently, without being aware of each other - since they each have their own set of registers, memory, etc. - or even needing to know about the emulator. (It would be a particularly slow kind of multitasking though.)
Given a larger monitor, they could even be given separate sections of it to display themselves simultaneously, though that might come at the cost of the emulated video memory not really having the full 32-bit range of storage per pixel (since the secretly higher resolution takes some of the bits), or being much slower because the emulator would have to keep a separate copy of that memory.
That's getting into OS territory, though - it wouldn't really surprise me too much if either of those turned out to be more complicated to add than I currently think...