Writing a loader in Malbolge
Malbolge,
for those not familiar with it, is a language designed to be difficult
(or perhaps impossible - it's not been proved to be Turing complete) to
program in. See the link above for the gory details.
Why write a loader?
One of the many difficulties in Malbolge programming is that only legal
instructions can be initially loaded into memory, and all of these are
small (less than 128). This is fine for instructions, but makes it very
hard to write big programs, since there is no way to enter branch
target addresses, or data register values, or data locations, that are
larger than this. This essentially limits all directly entered programs
to less than 255 bytes. (The numbers 129 through 255 can be entered
directly due to a bug in the
interpreter.)
So the first thing you need to write is a loader, that feeds a
(carefully crafted) byte string to a small bootstrap program, and thus
allows the user to start with any memory location set to any value.
It's not at all obvious such a program can be written (it's certainly
much more ambitious than any existing Malbolge programming).
The basic idea for the loader
If we can repeatably set two locations to completely arbitrary values,
we can build a loader. Call the locations ADDR (for address) and
DATA (for data). Then the following code, repeated often enough,
will allow us to set any memory location to any value.
ROT DATA ; this loads the D value
into A. Of course it's rotated, but since we assuming D can be
set to any value, this can be pre-compensated away.
LOAD ADDR ; this loads the D register. Of
course D will be incremented before we use it, but once again this can
be compensated for
OP
; This does the actual mod. May need to be repeated several
times, with different values, to get the desired result
How to set a single location:
We can read a byte from the input, and use it to do an OP. But
this can only change the lowest 5 trits, since when we do an input the
upper part of the word is always set to 0. We can also do a
rotate. Unfortunately, many values cannot be reached by any
combination of OPs of input values and ROTATES. This is because
the upper 0s force all 5 of these trits to 0 or 1. If a value of 2 is
needed here it is lost.
So next we introduce the 'idler' (named for the small gear between 2
big gears). Now the sequence is
- INPUT
-
OP ; with the
idler
-
OP ; the desired
value.
We will set the idler initially to 22222xxxxx, where x does not matter
as long as it is known. Because of the properties of OP, the 2s
in the upper trits of the idler will stay 2s, and the lower trits can
be set to anything we want. And for the second OP, the upper
trits are always 2, which does not lose any information. (the 1s
and 2s in the target are swapped). A test program shows that we
can reach any value with a combination of ops of the form 22222xxxxx
and ROTATES. In fact this can be done with just one operation
combining the OP and the ROTATE. There is one complication, the
first INPUT/OP (steps 1 and 2) combination may need to be done twice
before doing the
second OP (step 3) , since there are some values of the idler that
cannot be
reached in one step from the previous idler.
The basic plan of the loader:
So now the basic plan of the loader is clear. We read a byte, and
use it to do a computed branch to one of these operations.
- Do an input and OP into the idler
- Op the idler into ADDR
- Op the idler into DATA
- OP the location indicated by ADDR with the data at DATA
- Branch to the start of the newly loaded code.
This needs a 5 way computed branch. The first observation is that
you don't really need step 5. After all, if you can change any
location, you can change some branch target within the loader. So
you simply set all the locations you want, then zap one of the branch
addresses within the loader to go there on the next pass through.
Building a computed branch.
This is not straightforward (What did you expect?). The basic
problem is that all branch addresses are in memory, and the only to
save a value in memory is through an OP, and the 0s in the top of the
input values cause the value to radically change. So one again,
we use an idler, in this case of the form 11111xxxxx. We read an
input byte, OP it into the idler, then branch to the idler. Once
again the basic problem is that you cannot reach all values of
11111xxxxx from all other values. This means you cannot chose
your next state arbitrarily. However, there are subsets of the
values where abtritrary next state selection is possible. We will
endeavor to use one of these.
You can also compute a D value, then do a LOAD, then a branch.
I'm not yet sure which is better
Combining steps (2), (3), and (4):
This is a really cool optimization, using the weirdest 'features' of
Malbolge. Imagine a loop built basically like this: (actually
there are branches between each step, for two reasons - the
instructions need to be located at certain memory locations for proper
effect, and the branches are used to 'undo' some of the code
self-modification. However they are omitted here for clarity).
- I: Input
- Op into idler
- I1 - Instruction 1, explained below
- I2 - Instruction 2, explained below
- reset D and branch back to I:
Since the D is auto incremented, operations I1 and I2 will work on
different memory locations. First, make sure the input and the OP
always work on every cycle, by putting them at locations where they are
a 2-cycle and fixing them after each
use (this can be done by branching
to the instruction afterwords, as explained on the main page). Now pick
both I1 and I2 to be part of an instruction
cycle of the same
length. 2 of the 9-cycles look very good. A ROTATE loaded
at location 71 or 83 (mod 94) does only ROTATES, OPS, and NOPs.
Now by starting the one at location 83 as a NOP, the two can be put out
of phase. So now, if we execute this loop 9 times, we get the
following behavior, where N means NOP:
Pass through
the loop:
|
71 mod 94
|
83 mod 94
|
1
|
OP
|
N (loadable)
|
2
|
N
|
N
|
3
|
N
|
N
|
4
|
N
|
OP
|
5
|
N
|
ROT
|
6
|
ROT
|
N
|
7
|
N
|
N
|
8
|
N
|
N
|
9
|
N
|
N
|
So if we execute this 9 times, we get exactly what we want. Both
locations get one OP followed by a ROTATE. The OPs are at least 2
locations apart, so we can change the idle to the right value before
the OP is done. If we call this routine COMBINED, then we only
need a 2 way branch.
- COMBINED - used to set both ADDR and DATA, by making multiple
passes.
- perform OP, using DATA, at the location ADDR.
There is one possible fly in the ointment. Say we need to change
OLD-ADDR->ADDR and OLD-DATA->DATA. Using this technique
this must be possible using exactly the same number of OP-ROTATE pairs
for both changes. Is this always possible? For example, you
could imagine a scenerio where the change OLD-ADDR->ADDR always
required an odd number of OP-ROTATE pairs, but the change
OLD-DATA->DATA requires an even number, perhaps due to some parity
consideration in the OP. Then this technique would fail.
Fortunately this fear can be shown to be groundless. Instead of
seaching for a sequence that transforms OLD_DATA->DATA directly,
first transform OLD-DATA to 0, then 0 to DATA. Then do the same
for OLD_ADDR->0->ADDR. Now pick the shorter of the two
sequences. The 0 step can be extended as many cycles as needed,
since 0000000000 OPed with 2222222222 -> 0000000000 again.
Hence we can always pad the shorter sequence to be exactly the same
length as the longer one. A side benefit of this technique is
that if we wish to calculate the transforms needed by table lookup,
then the going through 0 requires 2 tables of size O(N), instead of the
direct technique which requires a table of size O(N^2), where N is the
numbe of values a Malbolge memory location can assume (59049).
The downside of this approach is that the sequences are longer than is
strictly necessary, but if efficiency is your main concern perhaps
Malbolge is not the right programming language to use.
Discovering this is what convinced me it is indeed possible to write a
generic loader within the confines of the small code space accessible
through the loader bug. I'm
now completely convinced this can be done.
Creating unloadable constants:
Lots of the constants you would like, as well as useful instructiions
such as immutable nops at all
locations, cannot be loaded directly. (This is part of the language
spec, but a bug in the interpreter
lets a somewhat wider variety of constants be loaded, but by no
means all.) However, these can be created
systematically. At the very beginning of your code, create the
location 2000000000 by ROTATING 0000000002, which can be loaded.
Then branch to it, and at this location we place a long chunk of
straight line code. Since there is a small value in D, this code
will increment D through your main code one step per instruction.
Most of these instructions will be NOPs, but a few can be OPs or
ROTATEs to set up the unloadable constants. Note that once you
make a pass through all your code, you could (if you need to)
reload the D register and make another pass.