Introduction to Malbolge
Malbolge, for those not familiar with it, is a language
designed to be difficult (or perhaps impossible - until recently,
there was not even an informal argument showing
Turing completeness) to program in. For example, the effect of
any instruction depends on where it is located in memory (mod 94, of
course), all instructions are self-modifying (according to a
permutation table) and both the code and data pointers are
incremented after every instruction, making it hard to re-use any
code or data. There is no way to initialize memory except to one of
the 8 instruction characters, there is no LOAD or STORE operator,
and the only available memory operators (both of them) work in
trinary and are designed to be opaque. The only control flow
construct is an unconditional computed jump, which is also nearly
worthless since there is no way (or certainly no obvious way) to set
memory to anything except the 8 instruction characters.
Believe it or not,
Counter
02101012220 people (counted in trinary, of course)
have so far have expressed an interest in programming in Malbolge!
Note that if your first impression is that counting visitors in
trinary is rather inconvenient, then you are missing the point of
Malbolge altogether.
Originally, information about Malbolge was published at the site
below, though this site is now dead, according to the author:
http://www.mines.edu/students/b/bolmstea/malbolge/
Fortunately (or maybe not) this information has been
preserved. A copy of the original site was archived at
http://web.archive.org/web/20000815230017/http:/www.mines.edu/students/b/bolmstea/malbolge/
The language specification copied from the original site: Malbolge
language specification
The reference interpreter copied from the original site: Malbolge
Interpreter.
Note: Where the spec and the interpreter differ (for example,
the spec calls '<' an INPUT instruction and '/' an OUPUT, but the
interpreter does the opposite), in this work the interpreter is held
to be correct.
Although the language had been out since 1998, for many years the
most complex known programs was 'hello, world', available in
several versions.
http://www.acooke.org/andrew/writing/malbolge.html
http://www.antwon.com/index.php?p=234
http://www.wikipedia.org/wiki/Malbolge_programming_language
http://www2.latech.edu/~acm/helloworld/malbolge.html
In 2004, based on the analysis below, I wrote a program that
copied input to output, though it did not terminate properly on
end of input.
There was a claim that the '99 bottles of beer' program had been
written in Malbolge. ( The site was (now dead): http://99-bottles-of-beer.ls-la.net/m.html)
The implication is that the program was doing looping,
testing and printing. However, closer examination shows that
the programmer was just doing a printf("") of the desired result
using straight line code. Conceptually this is exactly the
same as the 'hello world' example above.
This difficult task of writing a general program in Malbolge was
completed for real in 2005 by Hisashi Iizawa , Toshiki
Sakabe, Masahiko Sakai , Keiichirou Kusakari, and Naoki
Nishida. Their paper "Programming Method in Obfuscated
Language Malbolge" (in Japanese) can be found at http://www.sakabe.i.is.nagoya-u.ac.jp/~nishida/DB/pdf/iizawa05ss2005-22.pdf.
The resulting source code for '99 bottles of beer' can be found
at: http://www.99-bottles-of-beer.net/language-malbolge-995.html
. Though some of the theory developed here was used,
reducing this to practice was an amazing feat of programming
prowess.
Malbolge as a cryptosystem
The correct way to think about Malbolge, I'm
convinced, is as a cryptographer and not a programmer. Think of it
as a complex code and/or algorithm that transforms input to output.
Then study it to see if you can take advantage of its weaknesses to
forge a message that produced the output you want.
Looked at as a cryptosystem, it has several weaknesses:
Some permutation cycles are short
First, the self modifying instructions do not form one large
permutation. (If they did, then any instruction executed enough
times would always turn into a 'HALT' at some point). So we can find
instructions, which when executed, turn into other instructions, and
then back. For example, an OP instruction, when located at location
20 (mod 94), will become a LOAD instruction, then a NOP, then
another NOP, then back to a OP instruction, and so on. Cycles are
length 2, 9, 4, 5, 6, and 68, and the instructions executed by each
cycle depend on the starting position mod 94. In general the short
cycles are more useful than the long ones, but the long cycle at 2
mod 94 is very good, as it cycles between input, output, and load D
register instructions. A list of all the cycles can be found here.
Jump instructions do not self modify
The next weakness is a biggie - any JUMP instruction is not self
modifying! This happens because the order is:
- Instruction at C is executed
- Instruction at C is scrambled by the permutation table
- C is incremented
But the branch instruction changes C between steps (1) and (2). Thus
the branch address is one less than the intended target, and neither
the branch instruction itself nor the target is modified (the work
before the target IS modified, but that's not so bad. In fact
we will use this to great advantage later). This is very helpful
since it's much easier to cope with changing instructions than
changing control flow. Also, once an instruction permutes into a
branch instruction, it will not change any further.
Initializing values
The next weakness is in the program reader. It is clear from the
text description that the intent is allow only valid
instructions to be written into memory, and the rest of memory
will be filled by the OP loop. This in general prevents the (ab)user
from starting with memory set to any useful values. However, close
inspection of the code reveals that non-printing characters (0-31)
and (128-255) are written directly to memory without checking
(except newline, tab, and a few other whitespace characters (those
selected by isspace()), which are ignored). One could argue this is
simply a bug in the interpreter, but taking advantage of a bug in
the interpreter seems very much in character (so to speak). This is
very helpful, allowing the programmer to ensure that the right
branch target address is in the right spot in memory, for example.
Using these weaknesses, I've succeeded in writing a Malbolge
program that copies its input to its output. Since some of it is
non-printing, here it is uu-encoded:
begin 666 copy.mb
M1"="04 _/CT\.SHY.#<V-30S,C$P+RXM+"LJ*2@G)B4D(R(A?GU\>WIY>'=V
M=71S<G%P;VYM;&MJ:6AG9F5D8V)A8%]>75Q;6EE85U955%-245!/3DU,2TI)
M2$=&141#0D% /SX]/#LZ.3@W-C4T,S(Q,"\N+2PK*BDH)R8E)",B(7Y]?'MZ
M>7AW=G5T<W)Q<&]N;6QK:FEH9V9E9&-B86!?7EU<6UI96%=655134E%03TY-
M3$M*24A'1D5$0R9?O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]Y+V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O>2]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
DO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;T*
end
Here are some more observations, not taken advantage of yet:
Getting aroung the effect of self
modifying instructions
All the various instructions appear in loops of length 2,
although only when located at certain memory locations. This means
that you can write a routine that does the right thing every other
time, and nothing the alternate times. However, since the
instructions will only do this when located at the proper spots, you
need to branch after each one. For example, suppose you wanted
to do
- Input
- OP
- rotate
In that order. You can do this at any location, but only
once. If you try to execute it again, the instructions
will have self modified in various ways, and the next attempt will
do something un-intended. However, if you do it this way:
- Input (at location 53, mod 94)
- Branch to 82
- OP (at location 82, mod 94)
- branch to 59
- rotate (at location 59, mod 94)
- branch
Then each instruction is located where it is part of a
2-cycle. So the first time it will do what is desired, the
next time it will do nothing at all, then the next time each will do
the correct thing again.. In fact you can do even better - you can
execute the routine, then branch to each of the branches (this can
work since the branch target is not coded with the branch, it's in
data segment. Thus the same branch can jump to multiple
different locations.) So why branch to a branch instead of
going to the final location directly?? Because the side effect
of a branch is to permute the instruction just before the branch
target! Thus executing the code, then chaining through the
branches, restores the code (if 2-cycle) to the original state.
Many modifications are possible: pre-munging, munge as you go,
or post-fixing as described above. Each is useful in some
circumstances.
Building immutable NOPs
Only some locations have NOPs which can be loaded initially, and
will always remain NOPs. Such cycles
exist for all locations (mod 94), but most never go through
any official instruction and hence cannot be loaded directly.
However, all can be loaded by loading a constant from 129-255,
then doing a single rotate (used as a divide by three.) This
gives numbers in the range of 43-85 (Note that the numbers must be
evenly divisible by three since the LST will be rotated into the
most significant trit). Each location has at least one NOP
loop that contains one such number.
Note that in the case of NOPs , as in branches, we do not need to
worry about the cycle length, though for different reasons (branches
don't change, and NOPs change but we don't care as long as they
change into other NOPs).
Some observations on the OP operator:
OP is defined as:
| A trit:
________|_0__1__2_
0 | 1 0 0
*D 1 | 1 0 2
trit 2 | 2 2 1
If the memory (*D) is all ones, then the result is just the A
register with 1s and 0s swapped.
If the A trit is all 2s, then the result is the memory with 1 and 2
swapped.
All the values that are easy to come by (instructions or input) will
have 0s in their upper trits. Thus after any OP they will have
all 1s in these position.
You can set a memory location to a known value as follows:
First OP a location with itself. (Any ROTATE or OP
instruction will set the A register and memory to the same value).
After resetting the D register, then OPing with itself, a location
will contain only 0s and 1s. Then if you OP with an A of 0,
you'll get all ones. If you OP with an A of all 1s,
you'll set A and memory to 0.
Loads and Stores:
You can synthesize a load from 10 rotates (which restores the
original). Alternatively, you can fill A with all 2s, then OP
the location (which swaps 1s and 2s in the memory location. Then
repeat this process, which swaps the memory back and loads A with
the newly corrected value.. You can synthesize a 'store' by
OPing twice into locations fulled with all '1's. (If the
memory trit is 1, then the OP bit gets written with 0 and 1 reversed
and 2 kept the same. If you do this twice you get the original
back.)
Doing arithmetic in Malbolge
I suspect the best way to do arithmetic is by table lookup.
Even this is difficult (you need a table filled with computed
values followed by an equally sized array of branch targets.
Then you load the value with a ROTATE or OP instruction
followed by a tables worth of immutable NOPs, followed by a LOAD and
BRANCH. Of course this scrambles your table entry, which must
then be undone, and so on.). This still looks easier than
synthesizing arithmetic out of OPs and ROTATEs, at least to me.
Code Density
Most of these techniques require considerable code to perform simple
operations. In general this is no problem, and does not affect
theoretical Turing completeness. (If your primary concern is
code density, perhaps Malbolge is not such a good choice of
languages....) It is a problem in practice since only the bottom 256
locations can be easily addressed by any program you can input - any
higher addresses must be synthesized. This in itself takes
lots of code.
A general strategy for writing larger Malbolge programs, and
proving practical Turing completeness.
If the code can be made to fit, then a combinations of OPs, rotates,
and computed branches should allow a bootstrap
program that reads an arbitrary byte string into memory, which
would help get around the 8 allowed character restriction, and
allows use of more of the address space. Then it might be possible
to write a BrainF***->Malbolge
compiler, and so on.... This shows that Molbolge meets
the practical definition of Turing Completeness - it can compute any
problem that fits within its memory. However, Malbolge can
never meet the formal
definition of Turing completeness, which requires access to an
unlimited amount of memory.
Proving formal Turing completeness.
However, a very slight addition to Malbolge makes it truly and
formally Turing complete. There is nothing in the Malbolge
spec that states that the INPUT and OUTPUT operations cannot refer
to the same data stream, which then can be considered a tape with a
257 symbol alphabet. (257 since INPUT can return the values
0..255, plus the special value 2222222222 upon end of file.
We'll assume this really means upon any attempt to read an undefined
byte.) INPUT can read the symbol on the tape and move the head
one byte to the right, exactly what it currently does. OUTPUT
can write the symbol (A mod 256) to the current position and move
the head one to the right, also exactly what it currently does.
Now for the change: an OUTPUT with A=2222222222
(normally an obscure way to write the byte 168) moves the tape head
one position to the left (in UNIX terms, it backs up the file
location by one byte). Call this variant Malboge-T, with the T
standing for Turing completeness.
As far as I can determine, Malbolge-T would give results identical
to classic Malbolge with all existing Malbolge programs*. (As if
compatibility between Malbolge varients was a big practical
problem). However, since Malbolge-T has access to a
potentially unlimited external memory, it at least has the
possibility of being Turing complete. In fact it's not hard to show,
using the table lookup techniques utilized in the BrainF***->Malbolge compiler,
that a simple state machine with completely arbitrary state
transitions can be implemented. And if you can implement a
relatively small arbitrary state machine (5 states times 5 symbols
is sufficient), and combine it with the bi-directional tape,
then you can implement a Universal Turing Machine, and hence show
true Turing completeness.
*except for the copy program above, which would now backup forever
after EOF on the input, instead of spewing an infinite number of
bytes with a value of 168.
It could be worse
Malbolge, although obviously difficult, could be worse. Here
are some suggestions for making it even tougher:
- Redo the instruction permutation table to remove all short
cycles.
- In particular, if every possible cycle for every location
contains at least one non-NOP instruction, then you could not
even construct a NOP that you could rely on.
- Remove the oversight in the reference interpreter that lets
the user load non-ascii values directly.
- You could make the OP even less useful by modifying it so that
as few rows and columns as possible contain all three digit
values. This makes it hard to set specific values that contain
all three trits. Alternatively, make OP so that as many
rows and columns as possible contain all three trit values.
This makes it very hard to set a memory address to
anything unless you know the prior value.
- Modify instructions as they are fetched, not when they are
done. Then branches too would self modify.
Happy programming,
Lou Scheffer
From Ryan Kusnery's weird languages page:
The day that someone writes, in Malbolge, a program
that simply copies its input to it's output, is the day my hair
spontaneously turns green. It's the day that elephants are purple
and camels fly, and a cow can fit through a needle's eye.
I hereby release all my work on Malbolge, in any and all forms, into
the public domain.
This page last modified 17 Apr 2015.
Back to
LScheffer home page