Posts

Showing posts from November, 2023

[EN] Writing a Compiler is Surprisingly Easy (part 2)

Part 1: expressions, arithmetic Part 2: statements, control flow In my first few attempts at writing a compiler, a big hurdle was compiling control flow (conditionals, loops, etc). I think this stemmed from the fact that the a lot of the common advice that's floating around the internet is for writing an optimizing compiler, so I would hear things like 'static single-assignment', 'control flow graphs', 'phi-nodes' and the like, and get lost trying to take it all in. Because of such things, my first serious attempt at implementing a programming language ended up begin a tree-walking interpreter, and it was several years until I dared write a compiler. I'd like to show that all these things, while interesting and valuable in their fields of application, are not required to write a useful compiler. In particular, we'll see at the end that even a naive compilation strategy will handily beat a fast tree-walking interpreter in speed. State

[EN] O(log^2(N)) Range Queries on Fenwick Tree Without Inverse Operation

Background: Fenwick Tree The Fenwick tree is a data structure that is commonly used to perform prefix sum queries and point updates on some array A. Leaving the issue of updates aside the general idea is that, for each index i, it stores the sum of A over the range (i−f(i),i] (yes, that's an open-closed interval), where f(i) returns the least significant bit of i. To compute a prefix sum over [1,i1] we start with the interval (i2,i1] (where i2=i1−f(i1)), then we add (i3,i2] (where i3=i2−f(i2)), and so on until we cover the entire range. Since each i[k+1] has one fewer bit than i[k], this can only happen O(log(i1)) times in the worst case. Therefore any prefix sum query is computed in O(logn) worst case running time. int fenwick_tree[MAXN]; int prefix(int i) { int x = 0; for (; i > 0; i -= f(i)) x += fenwick_tree[i]; return x; } It is also easy to compute any range sum query over a range [l,r] by taking the sum over the prefix [1,r] and subtracting [1,l−1]. int

[EN] Writing a Compiler is Surprisingly Easy (part 1)

Part 1: expressions, arithmetic Part 2: statements, control flow Ever since I was a teenager I wanted to create my own systems programming language. Such a programming language would certainly have to be compiled to native code, which meant I'd have to write a compiler. Even though I managed to write several half-working parsers, I'd always fail at the stage of generating assembly code, as the task turned too complex. In this blog I intend to show my teenage self how writing a code generator is, in fact, not complex at all, and it can be fully done in a couple of weekends. (As long as we make some simplifying assumptions) I will that assume you already know that representing programs within a program is surprisingly easy Where do we even start? The goal today is to translate a high level language to x86-64 machine code. Our high-level language will have integer literals, variables, negation, and addition. To make our job simpler, we will translate the hig

[EN] Representing Programs Within Programs is Surprisingly Easy

Suppose that you want to create a brand new programming language that will change the world. Or that you want to auto-formatter that gets indentation just right . Or maybe you want to write a pre-processor for an existing language, to hack some missing feature into it . Whatever it may be, there are many reasons to write programs that manipulate other programs as data and the first step is to figure out how you are going to represent that data within the program. My goal in this article is to explain one of the most standard approaches (Abstract Syntax Trees) and show how they are actually really easy to understand and implement. This article has many code examples in the C language. What kind of language are we defining? Let's start with a simple example: a language with variables, numbers, and addition. Note that this language does not have statements, but only expressions that evaluate to simple values. This means that we will be able to represent expressions such as