Post by edorfaus in Code sharing megathread

Viewing post in Code sharing megathread

Regarding the RAM being useless: yeah, for a while now I've actually been considering it a place that is primarily only for code, and not very useful for data. You may have noticed that I've been using the disk even for temporary runtime data (e.g. in Snake) - that wasn't _just_ because of the size...

(A disclaimer: I haven't actually done any real work on/with compilers, so much of this post is really just guesswork and assumptions - since you've apparently been researching it, you might already know better.)

I noticed an implicit assumption in your function implementations: that the arguments are always in RAM, never in registers.

However, I think one of the things modern compilers usually spend some effort on is to keep as much as possible in registers (and keeping track of what is in which register), to avoid having to access RAM (which is usually much slower than the registers).

Obviously, this is limited by the registers usually being far less numerous than available RAM, and sometimes other limitations, but IIUC esp. things like local variables are often kept purely in registers.

This suggests that it may be better to think of x=y as a sequence of several operations:
- loading the address of y (if necessary)
- loading the value of y from that location (if necessary)
- setting x to y (this happens in the registers)
- loading the address of x (if necessary)
- storing the value of x (if necessary) into that location.

so that the compiler can skip the ones that are not necessary in that particular location (e.g. "x=y;z=y;" doesn't need to re-load y).

This would also simplify the implementation of e.g. x=y+z, since it would simply do the same first two steps, then again for z, in exactly the same way (no real difference in implementation, just the register assignments and pointer locations), and just the middle step is different.

I'll note that some of this probably depends on the ISA - e.g. modern x86_64 I think has a lot of instructions that work on data in RAM without needing separate load/store operations, and I think its calling conventions generally use RAM in form of a stack, which it has instructions to manipulate directly. I believe those are still slower than the register equivalents though. (Oh, and, those calling conventions don't really apply to the functions we're talking about here, since these are at a rather lower level.)

As such, it may be more useful to look at the operations the compiler expects to be given for implementing a RISC-style architecture that doesn't have those things, since the TC-06 doesn't either (except for a few special cases). I suspect that it may look more like what I suggested above, with separate operations for load/store and the actual action (set/add/etc).

Of course, I could be wrong - for all I know the compiler might be optimizing by parsing the resulting code and deduplicating the load/stores somehow - but I think that would require rather more deep information about what each of the instructions/operations do, which the compiler wouldn't be given simply by implementing those functions you mentioned.

Regarding MOVR, I'll note that it makes it possible to write a bootloader that doesn't use self-modifying code (and is smaller and faster):

SET 15 3 1
GETDATA 1 3 0
MATH 1 3 5
JMP 3 6
NILLIST 22
GETDATA 1 3 3
MATH 15 3 1
MOVO 1 0
MOVR 0 0 3
IFJMP 2 4 3
JMP 1 0

Other than that meta-point, though, I don't think it allows you to do anything technically new, unless the only reason you couldn't do it before was lack of space or that it took too much time to execute.

So, if you consider the need for self-modifying code (or the space/time restrictions) as central to Senbir, you may want to avoid adding it.

On the other hand, if you don't, there are other variants that would be far more useful - e.g. MOVI/MOVO variants with the memory address in a register instead of an immediate (or even better, both at the same time) - so you might still not want to add it, depending on just how easy you want these things to be.

(I think the decision about should is more up to you than anyone else, since it's your game. Especially the decision about how far to go, since it doesn't have to be all or nothing.)

In some ways, OFST is a similar change, in that it lets you do much the same thing as before but far easier, with much less code modification (you could have achieved the same thing by modifying the MOVI/MOVO instructions it affects) - with the exception of the meta-point that it allows you to run self-modifying code that wasn't designed for allowing such offsets (except in its JMPs).

I'll also note that I'm not aware of any other processor with an instruction quite like OFST, and I think you might here be starting to run into (one of) the reason(s) for this: it is global state that significantly affects other instructions, in such a way that you need to be aware of it and turn it off/on often - and depending on what you're doing, even that may not be enough.

Consider, for example, if OFST is used by the OS to set up an area for the program to work within, and the compiler thus generates variable addresses within that area (instead of truly global ones which might be what it's expecting), and then uses OFST for these internal functions like you've done here... Then the current implementation of turning it off wouldn't be enough, since it actually needs to use the previous offset instead to end up at the correct address. And if the program itself also uses OFST internally for other things, it gets even more complicated and/or necessary to carefully track the stack of offsets.

(I'll note that modern systems do tend to have something kind of similar, but a bit more flexible yet in some ways simpler - namely an MMU. Which is a device that sits between the processor and the RAM, remapping all the addresses, so that all the programs the OS loads can use the same addresses without conflicting with either each other or the OS.)

Regarding making it a separate ISA/processor you can select in a custom mode, I think that's a pretty clean solution, especially if you want to maintain the basic nature of the original challenge in the default modes while also allowing more complex things (like an OS) to be more easily built in a custom mode. Allowing choice here would also let you push at least some of this decision to the player, so that the should is up to them and what they want.

(In some ways you've already done that, except only allowing users to switch processor by starting a different version of Senbir itself, since newer versions of Senbir have a CPU with extra features...)

To be honest, I've actually been intending to do pretty much exactly that (custom ISAs/CPUs) in my own simulator (once I've gotten the default mode simulation working right - it currently has some significant bugs, like not using signed numbers, only unsigned, that I want to fix first).

A while back I even designed a different ISA that I think allows all the same things the TC-06 does (except UTL since that didn't exist yet), but without using sub-codes, and that also is much more flexible/easier to use (and maybe to implement, though I haven't tried to actually do that yet). There's probably some details I missed or could improve, but IIRC it seemed pretty nice. It was quite different in several ways, though.

By the way, out of curiosity, does the TC in TC-06 stand for anything?

CliffracerX6 years ago

While I can understand the allure of using registers, especially for simple operations (int X = int Y, for instance), I figured RAM might be better overall since it means you can have as many "arguments" as you need, especially if you get into custom datatypes (like a struct) that could easily balloon past 7 or 8 registers being required for even a simple X=Y op. That is, however, coming from my own implicit biases, and lack of optimization skill, so I don't see it as being HUGELY unlikely that I'm wayyy off the mark in how best to do that. (Also - you almost certainly know way more about what goes on under the hood of modern computers than I do. Knowing how computers work is probably the more important factor for compiler development than having a tiny bit of research into how compilers work. :P)

MOVR basically seems like a one-off to get out of needing to write a bunch of self-modifying code just to move one memory address' contents through RAM - while it can be used for some small improvements here and there, it's not necessarily worth taking up a whole sub-code slot for. Your suggestion for alternate MOVI/MOVO modes is probably the more practical thing to add, even if it means an extra clock cycle or two.

OFST is definitely worth keeping - it's all-around useful, and in the case of the multitasking kernel I wrote a while back, mandatory for functionality. Far as I can tell, there's no way to make a program that'll live at an offset without OFST - you can't just self-modify the MOVI/MOVO ops to splice the offset in, because self-modifying code would need that offset in the first place.

(Also, now I'm tempted to make an OFST variant for Custom Mode that lets you set up your own MMUs. The code-creep is getting out of hand, there're just too many cool things to add!)

My thinking for using that separate mode is that it means the current feel (and perhaps charm) of the TC-06 can be preserved with ease, while also easily enabling access to a more practical processor (or less practical, if you so desire - experimenting with an online circuit simulator & making a byte of RAM inspired me to make a byte-based lower-fi TC-06 derivative, the specs for which will go up on the Gitlab at some point soon) for doing fancy stuff.

I'm aware Senbir is my game, but I also feel like it has a certain charm to its terribleness, and I'm worried adding a new-opcode (e.g, "RAMC <4-bit sub-code for practical RAM-based variants of all the main codes> <24 bits of arguments for that sub-code>") would potentially mess that charm up, or disturb the overall feel of the TC-06 & how it works, hence asking about it - don't wanna ruin it for my playerbase! I do definitely want to look into making a more practical RAM-based architecture with less reliance on registers (maybe use them more as op-code arguments/variables than they are now?) down the line, just to explore what it'd be like.

(Also: no, TC-06 isn't really an acronym for anything. It's sorta loosely based around "The 006 Cooperative", but mostly just serves as a cool abbreviation-sounding processor name. :P

Also, sorry if this post is a bit discombobulated, it's six in the morning. Am not as smart as I should be/could be.)

edorfaus6 years ago (2 edits)

Regarding MOVI/MOVO variants taking more clock cycles than MOVR, well, not necessarily, because they're more flexible and so can be used to do more parts of the job.

Consider the first version of your SET operation, with R0 pointing to the arguments - your version with MOVR takes 14 cycles, while with the MOVI/MOVO variants it could be done in 5, without self-modification (thus not needing OFST), and using one register less:

MOVIR 4 0   // R4 = mem[R0] : read the source address into reg 4
MOVIR 2 4   // R2 = mem[R4] : read the data from the source address
MATH 15 0 0 // R0++         : increment argument pointer
MOVIR 4 0   // R4 = mem[R0] : read the destination address into reg 4
MOVOR 2 4   // mem[R4] = R2 : write the data to the destination address

If the MOVI/MOVO variants supported both register and immediate at the same time, it could be done in 4, since then R0 doesn't need to be incremented.

For version 2, though, you're correct that it would take an extra cycle - though that assumes we're not counting the cycles required to move the pointers into those particular memory locations from wherever they were originally. That may or may not be reasonable, depending on the larger context.

(Btw, unlike MOVR, MOVIR/MOVOR could be used for your ADD operation as well.)

Of course, that flexibility also means they remove far more of the need for self-modifying code, so it depends on what you want the game to be - MOVR is somewhat less of a change in that respect.

Regarding taking up a sub-code slot, that's not in itself a problem, since there's really lots of space available - worst-case, you can not only do sub-sub-subcodes, but make multi-word instructions. (As in, the instruction at address N takes arguments from address N+1, and sets the next address to be executed to N+2 instead of N+1. Those arguments could even themselves be subcodes.) Consider e.g. the x86 family - it has 1-byte instructions, but from what I've read, it also has instructions that are up to 15 bytes long. (But then, it's got thousands of instructions - exactly how many depends on how you count them...)

Of course, an indirect problem with that is having to implement and support all those instructions... I'd say that would be a better reason to not add them without a decent reason for having them.

Regarding functions using registers or RAM, having many/large arguments, etc. - one thing to remember is that there's a major difference between language-level functions and the kind of compiler-operation-level functions I believe we were talking about here.

The language-level functions are the kind the programmer specifies in the higher-level language they're writing in (C, C#, whatever), with whatever parameters etc. they specify, and executing whatever code the programmer wrote - they're a part of the source code that is the input to the compiler.

At the CPU level, execution of these functions is generally started by a jump from whatever code called them, and once they're done executing, there's another jump to return to the calling code. Where to put the return address, arguments and return value (if any) for the function differs between architectures and languages and is part of what's generally termed the calling conventions of that architecture/language. Here, using RAM for arguments often makes a lot of sense (often in the form of a call stack).

The compiler-operation-level functions, on the other hand, are (I assume) used by the compiler to generate the assembly code that implements a given language-level function - and so are defined by the compiler itself (fixed parameters etc.), and basically return some assembly code that implements the operation that particular function is for. I'd say they are probably more accurately thought of as templates, rather than functions, since they don't directly execute the operation, but return the code for executing it (to be put into the generated program).

At the CPU level, in the program the compiler generated, those functions (or rather the blocks of code they returned) are generally reached without any jumps, and don't jump when they're done; instead execution simply falls through from one operation to the next, since the compiler placed copies of their code into the output, one after the other, in such a way that the combination does what the higher-level language described. (Doing it with function calls (jumps), while possible, would be much slower and would probably end up taking more space.)

This leads to a fairly obvious optimization, where if the code for one operation loads a value from memory into a register to do something with it, and the next operation also needs that same value (or the result of the first operation) to be in a register to do something with it, then it's rather wasteful to load it from RAM again, since it's already present in a register. (Whether that value came from an argument to the language-level function is rather immaterial at that point.) However, that does require some careful tracking of exactly which value is stored where. (This is the optimization I meant decent compilers typically do, as it's a relatively easy and effective one - though not quite as important for the TC-06 since its RAM is very fast.)

By the way, at the compiler-operation level, I'd guess there's probably no such thing as a struct - by that time, the compiler has probably already transformed operations on a struct into operations on either its component fields or the block of memory that holds it, depending on what is being done with it. At the CPU level, copying a struct (instead of just the pointer to one) generally means copying either its fields one by one, or (often quicker/easier) the entire block of memory that holds the struct (which can be done with a tight loop) - and the compiler knows this and generates the assembly code accordingly, based on the low-level operations it has available for the target architecture.

Regarding OFST and programs living at an offset without it - this can be done by pushing (some of) the responsibility into the program loader. (As an example, I know that GNU/Linux does something kinda like this, though the details are different of course.)

Basically, your kernel (or the OS around it) necessarily has some piece of code that loads a program from disk into memory to be executed, right? And that loader necessarily knows which offset into memory it is loading that program into. So, in addition to loading the program into memory, that loader could also update (parts of) the program's code to work at that offset, and tell it what that offset is so it can do whatever else it needs to do to adapt itself.

As a simple example, the loader could update every MOVI and MOVO instruction it loads by adding the offset to the address stored in that instruction - and then set a register, let's say R14, to the offset before starting execution of the program. That would make the MOVI and MOVO instructions already point at the right places, and if the program does any self-modification, it can use R14 to adjust how it does that - and since the MOVO instructions are already updated, it can easily save the other modified instructions in the right places.

Of course, a fancier loader could also update any other instructions that are known to contain absolute memory addresses, or the program file could contain a header that tells the loader to update specific other locations as well (e.g. DATACs) that it can't auto-detect - but I think updating MOVI/MOVO is enough for things to work.

Technically, R14 isn't needed in this scenario - the program could MOVI one of its MOVI/MOVO instructions and grab the offset from that - but it makes things easier, and the loader probably has the offset easily available anyway, so why not?

Technically in the other direction, I think that if the offset is given in a register (or PCSR 0 does what I think it does), then even just updating a single MOVO would be enough (could even be on a fixed address), but that would make things harder on the program. It's probably better to make the loader do more, to avoid having to duplicate that effort in every program.

Come to think of it, another solution would be to have an OS function at a known fixed absolute address, that basically does a relative MOVO - call it with a relative target address and return address, and the OS function adds the offset to those addresses before performing the MOVO operation and returning to the program. If given an equivalent MOVI-performing function, or some way to find its offset (whether R14 or PCSR 0), I think it can do everything it needs to. Though in a somewhat more complicated (and slower) manner from the program's point of view, so the other solution is probably nicer.

Regarding the MMU, I would suggest adding an MMU device (for a GETDATA/SETDATA port) rather than adding CPU instructions for its functions. Mainly because that makes it easier/more sensible for some game modes to not have it, or for custom modes to have different models that work differently or have different APIs, without the CPU's instruction set having to be different. (Though also because I see it conceptually as a separate device that happens to sit between the CPU and the RAM, even though most (but not all) CPUs these days apparently integrate it into the same silicon chip.)

Based on just a little bit of googling, it's apparently a pretty complex device, with enough possible options and variations (segmentation vs paging, page sizes, PTE levels, APIs for process switching, etc.), that I don't think we're likely to get it perfect first try - if there even is such a thing as perfect here. There are trade-offs involved. So being able to experiment with different variations with otherwise equivalent computers would probably be nice.

(One tutorial I found basically said that, like everything in programming, the only way to really understand it is to play with minimal examples - but that the minimal example for using that MMU is rather large because it involves making your own small OS... On the other hand, a tutorial for a different MMU was much smaller and simpler, so it apparently depends.)

It sounds to me like you rather like the original ISA, in part because of its limitations, and therefore don't really want to loosen those limitations up much (if at all) - but at the same time, want to be able to do more advanced stuff without having all of those limitations making it extra hard. Which suggests that having them be runtime-optional is probably the best solution. And implementing that as separate modes (or options for custom modes) sounds to me like a good way of doing that.

I may be biased in that assessment, though, as personally, I think that the limitations are part of the challenge, and that it probably wouldn't be quite as fun to do those early levels (or some of the other things I've done) without them, as it would be rather too easy. But at the same time, I agree that they can get rather annoying when trying to do more complex things later. And thus, that it would be nice to have options to enable choosing the appropriate difficulty for any given idea. (Also, I tend to go for providing options and flexibility in general, making frameworks/toolkits more than products, in part to avoid having to choose for others. This tends to cause a certain amount of overengineering on my part... Like designing my simulator so that the CPU's instructions can be replaced/reconfigured individually...)

(Also, I'm rather weird in several ways, and not really a people person - so I don't really think I should try to speak for players in general.)

Come to think of it, something like this could also be integrated as part of the main game - essentially having things be unlocked as the player goes along and reaches more difficult levels, kind of like achievements.
E.g. after finishing some basic screen manipulation levels, "Re-reading the documentation with your newfound experience, you find you now understand a part you didn't before - which shows that there's a way to change the resolution and color depth of the screen." and continue with some levels that use the higher settings.
Or when you start needing better instructions, "You suddenly notice that some pages in the manual had gotten stuck together. Carefully peeling them apart, you discover that there are some more instructions that you didn't know about before."
Or maybe, "You discover a small box tucked away at the back of the computer. It turns out to contain a memory upgrade and a better processor."
Or "Studying the internals of the computer, you notice there's a jumper marked "debug mode" on one side and "normal mode" on the other - currently set to debug mode. After changing its position, the computer starts to run much faster!"
Etc.

Of course, that kind of depends on thinking up some more levels that would make for a decent progression...

Eh, no worries. This post is probably no better. *Looks at clock* ... ~8am... Ok... time to go to bed I guess... Just gonna post this first... *shakes head* (... ok, ready to post finally... *checks* 08:45... *sigh*)

itch.io

Viewing post in Code sharing megathread