Creating a programming language that can't be interacted with
I recently participated in the 4th installment of langjam, a competition that makes its competitors create a programming language over a period of 48 hours. If that sounds like less time than it should take to make a good programming language, you’d be right, but I reckon that’s not really the point of the competition.
Anyway, I learned that this competition existed about half-way through the first day. I decided to look up the prompt which was the sound(ness) of one hand typing
. I thought of the old adage which relates to the clapping of a single hand, and got to thinking about ways to design a programming language that was missing half of it, in such a way as to preclude its use. Moreover, we will answer the question “how shit can we make a programming language, before the language becomes completely unusable?”
The result is write-only-wanguage, or ‘wow’. Available on github here.
Seizing the means of non-interaction.
If you want to remove half of “read/write”, you can choose to either have a read-only language, which sounds boring, or a write-only language. What could a write-only language be? Someone must have made one before, but I didn’t bother to check. Can I design a language so absolutely shit that my write-only language is the world’s shittest write-only language?
I started writing a parser that would take in data files, and have no further input. But then, isn’t that what all compilers do? No, a write-only language would have to have no means of input, no reading of files, or loaded values. Let’s push this further and isolate it from the outside world entirely. It has no input, no output. That’s the rules.
We’re going to allow internal reads and comparisons internally, despite being technically reading, because I decided it was OK, and I’m making up the rules to end up with the most entertaining result.
How do we design a language that can’t be interacted with? It can’t be compiled, since the compiled code is an output, and it can’t really be interpreted since the interpreter needs an input.
But do we really need an input for an interpreted language?
If we take some inspiration from virtual machines, we can have a state machine that modifies its own state starting from a hardcoded value and proceeding from there. As such, the user does not pass in the input and our interpreter still runs. Magnificent!
A crappy state machine: designed for jankiness
We’re already designing a language with no means of interaction. How could we make this worse? Let’s go to a throwback of old. Let’s imagine we have a really crappy computer as the basis of our state machine.
Our state machine has an accumulator (ACC, 64-bit), an instruction register (INS, 16-bit). The choice of a 16-bit non-extendable instruction set really increases the jank. It has 256 unsigned 64-bit integers as memory, which we will call “heap”. That’s it. No stack pointer. No counter. No flags. You have to manually change the instruction register if you want to make it do things.
I’m not certain of this, but I think that not even the worst retrocomputers would be this bad. They had to actually sell units, after all, so we’re trailblazing here. This is groundbreaking fundamental research!
Here is what most of the logic flow looks like.
// Let us define our static registers
static ACC: AtomicU64 = AtomicU64::new(0);
static INS: AtomicU16 = AtomicU16::new(0);
fn main() {
eval_loop();
}
fn eval_loop() {
// Here we initialize our main source of memory
let mut heap: [u64; 256] = [0; 256];
// Main loop
loop {
let ins = INS.load(Ordering::SeqCst);
INS.store(0, Ordering::SeqCst);
// Function dispatch on instruction
let cont: Continuation = match ins {
0xFFFF => terminate(),
0x0000 => processor_idle_sleep(),
// {... more commands ...}
_ => break,
};
match cont {
Continuation::Stop => break,
Continuation::Continue => continue,
};
}
}
The program consists of a single loop that does the following in order:
- Fetches INS from the atomic static variable, then sets INS = 0;
- Takes INS, matches it to a function, and then calls the function.
- The function returns a Continuation enum, which is either Stop or Continue
- If stop, then break the loop and terminate the program.
ACC starts off at 0, as does INS. The 0 instruction is a no-op. As such the program will loop forever and you have to kill the program to terminate it. Great language!
Functions
In order to simplify the previous code, I have omitted the functions. The whole list is available on github. Mainly, the functionality is grouped into 4 categories:
Instruction OPCode Range | Functionality Group |
---|---|
0x0000, 0xFFFF | Sleeps for 10ms, terminates program respectively |
0xA000…0xA8FF | Operations that modify ACC |
0xAA00…0xAB00 | Operations that write ACC to the heap |
0xBA00…0xBCFF | Operations that write to INS |
0xC000…0xC3FF | Operations that write to INS, depending on the value of ACC |
INS is a single value, and does not allow for multiple instruction calls to be queued. As such, it would be seemingly impossible for any meaningful work to be performed.
For instance, let’s say we want to zero the ACC. One way this could happen is if INS happened to equal 0xBA04. This is the op-code for “load the value at heap memory address 04 into INS”. At address 0x04, if the value 0xA000 was written, it would have the effect of loading A000 into INS, which would then zero the ACC.
Let’s work through the control flow here.
Step | Place in code | heap[0x04] | ACC | INS |
---|---|---|---|---|
0 | Start of loop | 0xA000 | any | 0xBA04 |
1 | Dispatch -> set_ins() | 0xA000 | any | 0 |
2 | After set_ins() | 0xA000 | any | 0xA000 |
3 | Start of loop | 0xA000 | any | 0xA000 |
4 | Dispatch -> zero_acc() | 0xA000 | any | 0 |
5 | After zero_acc() | 0xA000 | 0 | 0 |
6 | (At this point, it loops forever) | 0xA000 | 0 | 0 |
We need a second instruction that tells the machine where the next instruction is located.
To remedy this, I have invented what is probably the pinnacle of this project.
The jammer
What if we had a friend that would just jam another instruction into INS at step 4?
Enter modern multithreaded programming. We detach a thread whose only purpose is to jam another instruction into INS as soon as possible. This basically keeps the next instruction in memory somewhere, and could be replaced by a queue, but I think this mechanism is fun. Plus, it’s definitely a write-only mechanism.
fn deferred_jam_instruction(ins: u16) {
std::thread::spawn( move || {
let mut cycles = 10; // Lol deadlock prevention
while INS.load(Ordering::SeqCst) != 0 && cycles > 0 {
cycles -= 1;
// Wait for INS = 0
thread::sleep(time::Duration::from_millis(2));
}
INS.store(ins, Ordering::SeqCst);
});
}
The op-code BBXX, loads the instruction at (XX+1) into memory, and then creates a jammer to load BB(XX + 2) into INS if INS == 0. Beautiful. If you are wondering whether this causes problems, or has race-condition issues, the answer is yes.
We can now set ACC=2 and do the next instruction by simply having INS = BB04, and having the following values in memory:
Address | Value | Instruction functionality |
---|---|---|
0x04 | 0xA000 | Zero ACC |
0x05 | 0xBB06 | Set INS to value of 0x06 and start jammer for 0x07 |
0x06 | 0xA001 | Incr ACC |
0x07 | 0xA001 | Incr ACC |
We can chain jammers to keep the code moving
Address | Value | Instruction functionality |
---|---|---|
0x04 | 0xA000 | Zero ACC |
0x05 | 0xBB06 | Set INS to value of 0x06 and start jammer for 0x07 |
0x06 | 0xA001 | Incr ACC |
0x07 | 0xBB08 | Set INS to value of 0x06 and start jammer for 0x07 |
0x08 | 0xA001 | Incr ACC |
0x09 | 0xAA04 | Store ACC to 0x04 |
Let’s see this in action.
Step | Place in code | heap[0x04] | ACC | INS |
---|---|---|---|---|
0 | Start of loop | 0xA000 | any | 0xBB04 |
1 | Dispatch -> set_ins_and_jam() | 0xA000 | any | 0 |
2 | After set_ins_and_jam() | 0xA000 | any | 0xA000 |
3 | Dispatch -> zero_acc() | 0xA000 | any | 0xBB06 |
4 | After zero_acc() | 0xA000 | 0 | 0xBB06 |
5 | Dispatch -> set_ins_and_jam() | 0xA000 | 0 | 0 |
6 | After set_ins_and_jam | 0xA000 | 0 | 0xA001 |
7 | (Prior to dispatch) | 0xA000 | 0 | 0 |
8 | Dispatch -> incr_acc() | 0xA000 | 0 | 0xBB08 (jammed in) |
9 | After incr_acc() | 0xA000 | 1 | 0xBB08 |
10 | Dispatch -> set_ins_and_jam() | 0xA000 | 1 | 0 |
11 | After set_ins_and_jam() | 0xA000 | 1 | 0xA001 |
12 | Dispatch -> incr_acc() | 0xA000 | 1 | 0 |
13 | After incr_acc() | 0xA000 | 2 | 0xAA04 (jammed in) |
14 | Dispatch -> write_acc() | 0xA000 | 2 | 0 |
15 | After write_acc() | 2 | 2 | 0 |
(At this point, it loops forever) | 2 | 2 | 0 |
Fantastic! The jammer is a very nice and not at all problematic way to do what could be done by a simple data structure. But this still doesn’t solve the main problem: how is this scenario possible if all the memory values, including ACC and INS, are set to 0?
Interacting with the uninteractive machine
So far, I have withheld one critical piece of information, which is that the program runs on a general purpose computer on which we have access to memory addresses. While this is not surprising, it does allow us to do one neat trick.
The trick is that we can write and read to values in memory. We can also interrupt the control flow. In fact, most programmers have done this at some point in their life, through a tool that exploits this same flaw; the debugger.
Hooking up the debugger and writing a program
We are going to be using rust-gdb
for this example, though other debuggers would certainly work.
rust-gdb target/debug/write-only
For this example, let us go through how we would run a predetermined program, then we’ll go over what the program does.
We are going to set a couple breakpoints. The first one is intended to interrupt before the assignment of the value of INT to the temporary variable int. The second one is intended to intercept calls to write_acc
so that we can change the value of ACC and thus write whatever we want into memory. The third catches the program when it is terminated, allowing us to read the value of ACC.
Let’s set-up the breakpoints, run the program and continue until the first breakpoint.
b 23
b write_acc
b terminate
r
c
At the first breakpoint, we can now write our instruction to INS = 0xAB00, and continue on until we reach the second breakpoint.
set INS.v.value = 0xAB00
c
This will hit the dispatch table and end up running the function write_acc_all
.
fn write_acc_all(heap:&mut [u64; 256]) -> Continuation {
for index in 0..0xFF {
write_acc(heap, index);
}
Continuation::Continue
}
For every iteration of the loop, we have one call to write_acc
. This is where we placed our second breakpoint.
fn write_acc(heap:&mut [u64; 256], ins: u16) -> Continuation {
let index: usize = (ins & 0xFF).into();
heap[index] = ACC.load(Ordering::SeqCst);
Continuation::Continue
}
If we set ACC before every call, it will load whatever value we want into memory.
set ACC.v.value = 0
c
This will continue until the next iteration of write_acc_all
. We can proceed to mass assign our instructions and variables.
set ACC.v.value = 1
c
set ACC.v.value = 1000
c
set ACC.v.value = 0xBB04
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB06
c
set ACC.v.value = 0xA100
c
set ACC.v.value = 0xBB08
c
set ACC.v.value = 0xA101
c
set ACC.v.value = 0xBB0A
c
set ACC.v.value = 0xAA00
c
set ACC.v.value = 0xBB0C
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB0E
c
set ACC.v.value = 0xA101
c
set ACC.v.value = 0xBB10
c
set ACC.v.value = 0xA001
c
set ACC.v.value = 0xBB12
c
set ACC.v.value = 0xA001
c
set ACC.v.value = 0xBB14
c
set ACC.v.value = 0xAA01
c
set ACC.v.value = 0xC316
c
set ACC.v.value = 100
c
set ACC.v.value = 0xBA05
c
set ACC.v.value = 0xBB19
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB1B
c
set ACC.v.value = 0xA100
c
set ACC.v.value = 0xFFFF
c
We have many more loops, but we don’t have anything to write, so let’s clear the second breakpoint and continue.
cl write_acc
c
We will hit the first breakpoint again. There is nothing left to do but launch our program.
set INS.v.value = 0xBA03
cl 23
c
This will run until we hit termination. The value in ACC is the one we care about so
print ACC.v.value
And we are done.
Um, what does that program do?
Well, it’s supposed to sum up all the odd numbers below 100, but I found that the limit is 255 before it enters an infinite loop. It also does not output the correct value, and I haven’t had time to figure out why it didn’t work during the 48-hour allocated period.
I think this deserves extra points. The language is so utterly shite that even its creator couldn’t produce an example that runs properly.
The version hosted on my personal github should have a version that works, at some point in the future.
Here’s what the memory, and each address does
Addr-hex | Value | Effect |
---|---|---|
00 | 0 | Storage, summation variable “sum” |
01 | 1 | Increment “inc” |
02 | 1000 | –unused, kept to not mess up line numbers- |
03 | 0xBB04 | Set INS to the value at 0x04, and jam at 0x05 |
04 | 0xA000 | Zero ACC |
05 | 0xBB06 | … |
06 | 0xA100 | Add “sum” to ACC |
07 | 0xBB08 | … |
08 | 0xA101 | Add “inc” to ACC |
09 | 0xBB0A | … |
0A | 0xAA00 | Set “sum” to ACC |
0B | 0xBB0C | … |
0C | 0xA000 | Set ACC to 0 |
0D | 0xBB0E | … |
0E | 0xA101 | Add “inc” to ACC |
0F | 0xBB10 | … |
10 | 0xA001 | Add 1 to ACC |
11 | 0xBB12 | … |
12 | 0xA001 | Add 1 to ACC |
13 | 0xBB14 | … |
14 | 0xAA01 | Set “inc” to ACC |
15 | 0xC316 | Compare ACC to the value in 16, if false, set INS to value of 17, if true set INS to val of 18 |
16 | 100 | Limit for increment, |
17 | 0xBA05 | Set INS to value of 05, which is 0xBB06 |
18 | 0xBB19 | … |
19 | 0xA000 | Zero ACC |
1A | 0xBB1B | … |
1B | 0xA100 | Add “sum” to ACC |
1C | 0xFFFF | Terminate |
Or, if you write it in pseudocode
s = 0
inc = 1
while True:
# Doing "sum = sum + inc"
acc = 0
acc += s
acc += inc
sum = acc
# Doing "inc = inc + 2"
acc = 0
acc += inc
acc += 1
acc += 1
inc = acc
if inc > 100:
break
acc = 0
acc = s
This gives the right result in Python.
¯\_(ツ)_/¯
Work left to do
Other than making it work, there are also some exploits to the language, mainly thatyou can write to the heap directly. It’s too convenient, making the language half-usable. As such, there needs to be some indirection nonsense to make this more inconvenient than the method I presented before. I have not yet conceptualized what I intend for this.