I think the start-of-disk function index system will mostly be useful for when you're doing it manually and referring to the functions by ID yourself.
If your assembler/compiler/preprocessor/IDE (or whatever you want to call it) lets you use "FUNC <name>" and inserts the code to jump there on its own, keeping track of the IDs for you - then it must also keep track of the addresses for those IDs, which suggests it can probably just as easily use those addresses directly, without needing the IDs or such an index table.
Though, I suppose we would want to minimize the number of instructions (or other resources) needed to do a call, to maximize what's available for other code, so I guess havin the table might help with that.
Note: the rest of this post ended up being a rather in-depth analysis of the available approaches that such an assembler/etc. could use. Feel free to correct me if I've missed something, but apparently there's no cut-and-dried "best" approach, and it depends on circumstances... Also, I originally intended to respond to other parts of your post as well, but this is too long already, so I'll add separate replies for those later.
Let's see now. I don't think the JMP to the loader routine can be avoided (unless you can put the call at the end of the memory area), so that's one instruction, but since it's unavoidable we can ignore it for the purpose of comparing the different approaches.
So the main thing we need is to somehow get the actual disk address to load the function from into a register for the loader to use.
With index table
Now, if we have and use an index table somewhere on disk, what would the code for that look like?
Well, to start with, it means we must have a GETDATA instruction to load the actual address from the index table, based on the index.
GETDATA supports an immediate (constant) value, so can we use that for the index, to load the address directly, with just one instruction?
*checks the docs* ... well, that constant value gets shifted up by 10 bits (documentation says 11, but I think that's incorrect, and 11 would be even worse for this). This means that it can only be used to load from addresses that are multiples of 1024. Which means we get one function per 4KiB of disk space, minus one for address 0 since that's where the initially bootloaded program is. Not really useful with the default 256-word disk.
So, we can't practically have the table index inside that one instruction, which means it must be stored either in memory or in a register.
The memory variant means having the function index stored in memory somewhere, probably loaded as part of the current function, and having the GETDATA instruction point to that address. The obvious way to do this would leave each call site using two memory addresses.
Trying to be clever and putting the instruction in the loader doesn't really help either, since we'll either be limited to one call per function, or have to spend instructions on modifying memory (either the instruction or the address with the index), which will probably take more addresses total.
The register variant means getting the function index into a register somehow, which will (generally) require at least one instruction, but in return it's possible to move the GETDATA instruction to the loader.
If we dedicate a register to it and limit ourselves to 256 functions (minus the initially bootloaded program), we can use SET to have it use just one instruction/memory address per call site.
If we don't dedicate a register to it, or need more functions, we have to either store the function index in memory somewhere (to use MOVI), or use multiple instructions (e.g. MATH to clear + SET) - so at least two addresses per call site. This is no better than the memory variant, so we can disregard this option.
So, with an index table, we end up with these options to choose from:
- two memory addresses per call site
- one memory address per call site, plus one in the loader and one register, with limitations
This is in addition to the space taken by the table itself, and the JMP for each call site.
Without index table
Now, if we don't have an index table, and just use the address directly, what can we do then?
Using MOVI will work, and will take two memory addresses - one for the instruction, one for the function address. Simple and straightforward.
Using SET can work, and will take just one instruction, but requires all functions to start at addresses before 256 (since we can only use 8 bits), and that we're careful about how the load-address register is used (since the other 3 bytes must remain 0).
However, the loader is currently using R2 for the load address (to use IFJMP), which I doubt we can be that careful with, so we'll need to use another one, and copy it into R2 as needed. That copy instruction can be in the loader, but this will essentially require a dedicated register.
It's also possible to use multiple SET instructions, to ease (or even eliminate) the limitations, but that means using between two and four instructions per call site, without being any better than using MOVI, so we can disregard this option.
Using PMOV or MATH essentially requires already having the address (or something that can easily be turned into the address) in a register, which seems rather unlikely in the general case, without spending extra instructions on it. Generating a function index using this method is somewhat more likely, but not much. So either way, this can only be used for special cases.
So, without an index table, we end up with these options to choose from:
- two memory addresses per call site
- one memory address per call site, plus one in the loader and one register, with limitations
This is in addition to the JMP for each call site, but avoids taking space for a table.
These options are very similar to the ones with a table, though not quite the same, in that the limitations of the second option are a bit different.
Summary of options
The first option for both uses the same number of instructions, addresses and clock cycles, which suggests that with those approaches, the table is an unnecessary indirection that just takes extra disk space.
The second option for both also uses the same number of instructions, addresses and clock cycles. But they have different limitations.
Now, if the disk is no larger than 256 addresses (like the default disk), then the limitations are void for both, since they are based on needing more than 8 bits to address things. So in this case the table again just takes extra disk space.
However, if the disk is larger than that, then the limitations come into play, and then the table is actually useful as it allows us to have functions stored anywhere on the disk instead of just in the first 256 addresses. However, even with the table, we are limited to at most 256 functions, minus the space taken by the initial program.
Which approach to choose
The above suggests that the choice of approach will mainly be based on which resources are most dear, and how many function calls you have.
First off, if you need more than 256 functions, you obviously need to use the first option, and thus don't need the function table.
If you need all the registers you can get, but have room to spare in memory, then again the first option is probably best.
If you can spare a register, but have lots of calls and little room in memory, then the second option is probably best, as it saves you one word of memory per function call.
If you have little disk space, and at least a couple calls, then the second option without a table is probably best, as it saves you both the table and one word per function call (not just in memory, but on disk).
If your dearest resource is your sanity (because you don't have an assembler/etc. that does this work for you), then I think the second option with a table is probably best, closely followed by the first option with a table, with the non-table options pretty far behind...