[EN] Writing a Compiler is Surprisingly Easy (part 2)

In my first few attempts at writing a compiler, a big hurdle was compiling control flow (conditionals, loops, etc). I think this stemmed from the fact that the a lot of the common advice that's floating around the internet is for writing an optimizing compiler, so I would hear things like 'static single-assignment', 'control flow graphs', 'phi-nodes' and the like, and get lost trying to take it all in.

Because of such things, my first serious attempt at implementing a programming language ended up begin a tree-walking interpreter, and it was several years until I dared write a compiler.

I'd like to show that all these things, while interesting and valuable in their fields of application, are not required to write a useful compiler. In particular, we'll see at the end that even a naive compilation strategy will handily beat a fast tree-walking interpreter in speed.


In the previous part, we implemented a simple code generator for a small language of arithmetic expressions. Today, let's extend our simple language with some statements. These are pieces of code that do not evaluate to a value, but only serve to modify state, control flow, or some other effect.

I want to implement an imperative language, so it will definitely need a way to express mutation. An easy way to get some mutation going is variable assignment.

In our simple language, an assigment will evaluate some expression and then store its result into a predetermined stack slot.

struct assignment {
	int slot;
	struct expression* value;

Translating to assembly is really easy: after evaluating the expression, its result will be stored in %rax, so we just move it from there into the stack.

void compile_assignment(struct assignment* e) {
	int slot = e->slot;
	printf("mov %%rax, %d(%%rsp)\n", -8 * (slot + 1));


To keep the compiler simple, I decided that a program in this language consists of a single statement. However, no useful program does just one assignment, just one branch or just one loop. We really need the ability to do several.

To achieve this, let's add a way to execute multiple statements: the block. A block represents a sequence of statements that execute one after the other, but is conceptually a single entity.

struct block {
	int step_count;
	struct statement** steps;
struct statement is a tagged union, which we will define later.

The translation is also straightforward.

void compile_block(struct block* e) {
	for (int i = 0; i < e->step_count; ++i) {


This is when things get interesting.

The way to direct control flow in most machines is through jumps. We can add labels to different parts of the code, and there are some instructions ("jumps") that will transfer control to that label. For example, here is a loop that adds one to %rax forever in x86-64 assembler:

  add $1, %rax
  jmp infinite_loop

There are also some instructions ("conditional jumps") that will only perform the jump if some condition is met, and will otherwise just proceed to execute the next instruction. For example, this loops as long as %rax is non-zero:

  add $1, %rax
  jnz this_loop_terminates
jnz stands for "jump-non-zero"

I claimed that a decision is being made based on the value of %rax, yet the jnz instruction does not mention %rax at all. How can this be?

This is because, in x86-64, conditional jumps make their decisions based on the CPU's flag register. This is a special register that stores a bunch of boolean flags, encoding some metadata about the last instruction that was executed. One of those flags, the zero flag, encodes whether the last instruction stored a zero in its destination operand. jnz performs the jump if the zero flag is not set.

Actually, not all instructions update the flag register. Notably, mov doesn't, so we'll need to do something to deal with that.

This will allow us to implement an if-statement that conditionally executes another statement if its condition evaluated to a non-zero value. (just like in C!)

struct if_statement {
	struct expression* cond;
	struct statement* body;

We can achieve this by inserting a conditional jump that skips over the statement code if the condition turned out to be zero. Therefore, a plausible implementation might look something like this:

void compile_if_statement(struct if_statement* e) {
	printf("jz skip\n"); // skip statement if condition is zero
Note the jz ("jump-zero") instruction, with the opposite behavior of jnz.

There are some problems with this, though.

First, if there is more than one if statement in our program, we will use the same label twice, which is not allowed.

This is fixed with a simple counter, just like stack slots.

Second, we are relying on the code that evaluates the expression to update the flag register, which it might not do (e.g. reading a variable, which we compile down to a single mov, won't).

We can address this by emitting a test instruction. test updates the flag register as if it performed a bitwise-and between its two operands, but doesn't actually modify its destination. This means that when test-ing a register against itself, the zero flag will be set if and only if the register was zero.

int label_counter = 0;
void compile_if_statement(struct if_statement* e) {
	int label_id = label_counter++;
	printf("test %%rax, %%rax\n");
	printf("jz label%d\n", label_id); // skip statement if zero
	printf("label%d:\n", label_id);

Instead of if-statements, I chose to implement if-else-statements, which are a bit more general. Their representation is still very simple:

struct if_else {
	struct expression* cond;
	struct statement* true_branch;
	struct statement* false_branch;

And the translation is also fairly easy. We just have to remember to skip the false branch after executing the true branch:

void compile_if_else(struct if_else* e) {
	int false_label = label_counter++;
	int end_label = label_counter++;
	printf("test %%rax, %%rax\n");
	printf("jz label%d\n", false_label); // skip true branch if zero
	printf("jmp label%d\n", end_label); // skip false branch after true branch
	printf("label%d:\n", false_label);
	printf("label%d:\n", end_label);


Right now, our little language can't do much in the way of useful computation. Most useful programs tend to involve repetition, usually expressed as loops or recursion. Recursion is a bit out of reach at the moment, so let's implement loops.

In particular, let's add a staple of imperative programming: the while loop.

struct while_loop {
	struct expression* cond;
	struct statement* body;

A while loop executes a statement for as long as some condition is true. In our case, we will stick by the convention that non-zero means true:

void compile_while_loop(struct while_loop* e) {
	int start_label = label_counter++;
	int end_label = label_counter++;
	printf("label%d:\n", start_label);
	printf("test %%rax, %%rax\n");
	printf("jz label%d\n", end_label); // exit loop if zero
	printf("jmp label%d\n", start_label); // loop to the top
	printf("label%d:\n", end_label);

The translation is quite similar to that of a conditional: If the condition is zero, we skip the loop body. The difference is that after the loop body we jump backwards to the begining of the loop, where the condition will be checked again.

Putting it all together

Finally, we just need to dispatch over the different statement types and then we're done.

enum statement_tag {
struct statement {
	enum statement_tag tag;
	union {
		struct assignment as_assignment;
		struct block as_block;
		struct if_else as_if_else;
		struct while_loop as_while_loop;

void compile_statement(struct statement* e) {
	switch (e->tag) {

Testing it out

To test out the functionality we added today, I created a little program that computes fibonacci numbers.

int main() {

	int n = 0;
	int a = 1;
	int b = 2;
	int c = 3;

	struct statement* stmt = block3(
		assignment(a, int_literal(1)),
		assignment(b, int_literal(1)),
		while_loop(variable(n), block4(
			assignment(c, addition(variable(a), variable(b))),
			assignment(a, variable(b)),
			assignment(b, variable(c)),
			assignment(n, addition(variable(n), int_literal(-1)))


block3, block4, assignment and while_loop are helpers that construct values of our tagged union.

If we run the compiler right now, we get this output:

mov $1, %rax
mov %rax, -16(%rsp)
mov $1, %rax
mov %rax, -24(%rsp)
mov -8(%rsp), %rax
test %rax, %rax
jz label1
mov -16(%rsp), %rax
mov %rax, -88(%rsp)
mov -24(%rsp), %rax
add -88(%rsp), %rax
mov %rax, -32(%rsp)
mov -24(%rsp), %rax
mov %rax, -16(%rsp)
mov -32(%rsp), %rax
mov %rax, -24(%rsp)
mov -8(%rsp), %rax
mov %rax, -88(%rsp)
mov $-1, %rax
add -88(%rsp), %rax
mov %rax, -8(%rsp)
jmp label0

To test it, we can once again strap it into a C program. First we add some more assembly to make it take an argument into stack slot 0 and return the contents of stack slot 1 at the end.

.global fib
	mov %rdi, -8(%rsp)

... generated code omitted ...

	mov -16(%rsp), %rax

Then we can just call it from C:

#include <stdint.h>
#include <stdio.h>
uint64_t fib(uint64_t n);
int main() {
	for (int i = 1; i <= 10; ++i) {
		printf("fib(%d) = %lu\n", i, fib(i));

Compile and run:

$ gcc fib.s fib_test.c -o fib
$ ./fib
fib(1) = 1
fib(2) = 2
fib(3) = 3
fib(4) = 5
... and so on ...

I also took a few minutes to write a really simple interpreter for the language we have so far. We can use it to compare the performance of interpreted code versus our naive compiler.

To make this a bit more fair, I changed the test to only make one call to fib instead of several. (The way I wrote the interpreter benchmark, it has to rebuild the AST on each call)

$ gcc fib_test.c interpreter.c ast.c -g -o fib_interpreted -O2
$ gcc fib.s fib_test.c -g -o fib
$ time ./fib
fib(50000000) = 1661595531329882850

real	0m0,208s
user	0m0,207s
sys	0m0,000s

$ time ./fib_interpreted 
fib(50000000) = 1661595531329882850

real	0m1,775s
user	0m1,776s
sys	0m0,000s

I think the interpreter I used, while simple, is a reasonable stand-in for an "optimized" tree-walking interpreter. It uses the native stack and calling conventions, and all values are unboxed and untagged.

A bytecode interpreter would be better, but I didn't want to spend that much time, and then I'd be writing a (bytecode) compiler anyways.

That's an 8.5x ratio between a fast interpreter and code generated by the most naive possible compiler.


Compiling structured control flow is surprisingly easy! Again, we spent more time talking about details of the underlying architecture than anything specific to compilers.

Also, we've seen that even very naive compilers, are way faster than "fast" interpreters (though I recognize that "fast" is being used loosely here). In my opinion, this tells us that the techniques shown above are more than good enough for a first or second compiler (or even N-th compiler, if used as a starting point for later experimentation).

In the following parts we will implement pointers and function calls, making our language turing complete.

About me

My name is Sebastian. I'm a competitive programmer (ICPC World Finalist) and coach. I also enjoy learning about programming language theory and computer graphics.

Social links:


Popular posts from this blog

[EN] Writing a Compiler is Surprisingly Easy (part 1)

[EN] Representing Programs Within Programs is Surprisingly Easy