(Note: This is based on the version that was out before the update mentioned in the below post about the OS kernel. I don't know if that update changes anything that matters for this post, but thought I'd mention it just in case.)
Regarding the op-codes, one thing I've noticed is that the current instruction set is fairly RISC-like. It even mostly uses a load/store architecture, with MOVI/MOVO being the only memory-accessing instructions - except for GETDATA/SETDATA with flag 1 or 2. That's pretty neat.
Also, several of the existing instructions already have the kind of sub-code you mentioned, though usually called "flag", and their bit positions vary.
E.g. MATH has sub-codes for ADD, SUB, MOV, etc., and IFJMP has two separate ones for forward/back and for EQ, GT, etc..
Regarding the max clock speed, based on the numbers you're using, I'm guessing your clock ticks are currently tied directly to the game's rendering frame rate, or maybe to some kind of timer (with a limited resolution).
However, I don't think you really need them to be tied that way - after all, for an emulator, it doesn't really matter if the ticks are evenly spaced. The most you might need is for them to appear that way to the player.
So, a fairly simple technique for that is to run two or more ticks per frame (or per timer triggering). This would double (or more) the apparent clock frequency, as long as the CPU can actually keep up with the emulation.
I used that technique in my JS TC-06 emulator, and have managed to hit 500kHz that way. Admittedly only with the debugger off, and a very simple program (the max I've reached depends on what it's doing) - but this was in JavaScript, running in Firefox, with code that isn't primarily built for max speed - so I suspect you can do better than that in C#, without needing to move to C++ or using things like threads.
Of course, that assumes you actually are timer bound and not CPU bound, but I don't really see how you could be CPU bound at that frequency, without trying to be...