Debugging Programs Generated by Buggy Compilers
A test failed and produced an unexpected output. What do you do now? Sometimes staring at the phases of your compiler won’t help you in finding the bug. And even worse, sometimes staring at the generated assembly code won’t help you either. When this happens, the best course of action is to run your program and see what happens. This is exactly what dynamic analysis is about; it helps you understand what your code does by running it. In this blog post, I will explain how to debug a compiler by running the generated code. I will use a buggy version of the Adder compiler and Diamondback compiler as an example.
1 Setting up the environment
1.1 Dynamic debugging
To debug the Adder Compiler, we need to set up a few things. Firstly, we need a debugger, and we will use The GNU Project Debugger (GDB). On linux, GDB is usually installed by default. However, vanilla GDB does not have the best debugging experience, as you often have to type long commands to display the state of the program. To improve it, we will use GDB Enhanced Features (GEF). This is a plugin for GDB that adds many useful features, such as a context panel that shows the values of the registers, a stack panel that shows the state of the stack, a disassembly panel that shows the disassembly of the current function, and a trace panel that shows the call history of the program. To install it, use the install script provided in the repository. WARNING: On the Khoury machines, GEF won’t work because of a bug in the Python version installed on the machines. To fix it, you can either locally update the Python version or install peda instead of GEF; it’s pretty similar. There are also many other distributions of GDB that you can use, I highly recommend researching them and finding the one that suits you best.
1.2 Static debugging
Another way to debug the Adder Compiler is to use static analysis. This will allow you to
debug each phase of your compiler separately. The compiler is written in OCaml, and we can
use Dune to build it. Dune is a build system for OCaml
that is similar to Make. It is very easy to use and it is the standard build system for
OCaml projects. To install it, follow the instructions on the website.
Newer versions of the Snake Compiler allow to print a trace of the compiler’s execution by
using the -t
flag on the main
binary. However, the Adder Compiler does not have this feature,
therefore we will use dune utop
to trace the compilation process. dune utop
is a command
that starts an OCaml REPL with the project loaded. It is very useful for debugging and
experimenting with the code, as it will allow you to see the output of every function
call and to modify the code on the fly. In this blog, we will focus on dynamic debugging,
but I highly recommend using dune utop
or the pipeline trace to debug the compiler before trying to use GDB.
2 Debugging the Adder Compiler
Do Now!
Our motivating example for this blog post will be the following program:
(let ((x 1))
(let ((y 2))
(add1 y)))
The expected output of this program is 3
. However, the Adder Compiler generates the following
assembly code, which leads to an output of 2
:
section .text
global our_code_starts_here
our_code_starts_here:
mov [RSP + 8*-1], RAX
mov RAX, 1
mov [RSP + 8*-2], RAX
mov RAX, 2
mov RAX, [RSP + 8*-2]
add RAX, 1
ret
Most people will immediately notice exactly what is wrong with this code, but let’s ignore that for now. Let’s see how we can debug this program using GDB.
2.1 Running the program
Let’s start by running the program. We can do this by writing our program in a file
inside the ./input
directory, in this case ./input/buggy.adder
. We can then run
make output/buggy.run
to compile the program to a binary. We can then run the binary
in GDB by typing gdb output/buggy.run
. If you did everything correctly, you should
see the following output:
$ gdb output/buggy.run
<wall of text omitted...>
Reading symbols from output/buggy.run...
gef➤
Great! We are now inside the GDB debugger. Let’s run the program by typing run
or r
.
gef➤ run
Starting program: /home/federico/adder/output/buggy.run
[Inferior 1 (process 1) exited normally]
2
Alright, so the program exited normally and printed 2
as noted above. Let’s see what
happened. We can use the disassemble
or disas
command to see the assembly code of a function.
But first, we want to set the assembly code to be displayed in Intel syntax, as it is
the one we are used to. We can do this by typing set disassembly-flavor intel
.
gef➤ set disassembly-flavor intel
gef➤ disas our_code_starts_here
Dump of assembler code for function our_code_starts_here:
0x00005555555551c0 <+0>: mov QWORD PTR [rsp-0x8],rax
0x00005555555551c5 <+5>: mov eax,0x1
0x00005555555551ca <+10>: mov QWORD PTR [rsp-0x10],rax
0x00005555555551cf <+15>: mov eax,0x2
0x00005555555551d4 <+20>: mov rax,QWORD PTR [rsp-0x10]
0x00005555555551d9 <+25>: add rax,0x1
0x00005555555551dd <+29>: ret
Cool, so far nothing new. We can now break
at a specific line of the assembly code
to stop the program at that point. We can do this by typing break *0x00005555555551c0
,
to stop at the first instruction of the function —break *our_code_starts_here
. We can then run the program again
by typing run
or r
. This time, the program will stop at the first instruction of the
function.
gef➤ break *our_code_starts_here
Breakpoint 1 at 0x5555555551c0
gef➤ run
2.2 Viewing the context
Now, you will see A LOT on your screen. This is because GEF is displaying the context
of the program. The context is the state of the program at the current point of execution.
You can bring up the context panel by typing context
or ctx
.
The context panel is divided into 4 sections: the registers, the stack, the disassembly,
and the trace (also threads, but we don’t care about that in our case).
Each value on the screen will be colored in a different way, depending on its type;
GEF will display a legend at the top of the context panel to explain what each color
means. For example, if the the register was modified by the last instruction, it will
be colored in red.
Let’s start to make sense of this. First, let’s look at the registers.
───────────────────────────────────────────────── registers ────
$rax : 0x0
$rbx : 0x007fffffffe198 → 0x007fffffffe51d → "omitted"
$rcx : 0x00555555557dc8 → 0x005555555550f0 → endbr64
$rdx : 0x007fffffffe1a8 → 0x007fffffffe563 → "SHELL=/usr/bin/zsh"
$rsp : 0x007fffffffe058 → 0x00555555555186 → <main+54> mov rsi, rax
$rbp : 0x007fffffffe080 → 0x0000000000000001
$rsi : 0x007fffffffe198 → 0x007fffffffe51d → "omitted"
$rdi : 0x1
$rip : 0x005555555551c0 → <our_code_starts_here+0>
$r8 : 0x0
$r9 : 0x007ffff7fced70 → endbr64
$r10 : 0x007fffffffddb0 → 0x0000000000800000
$r11 : 0x202
$r12 : 0x0
$r13 : 0x007fffffffe1a8 → 0x007fffffffe563 → "SHELL=/usr/bin/zsh"
$r14 : 0x00555555557dc8 → 0x005555555550f0 → endbr64
$r15 : 0x007ffff7ffd000 → 0x007ffff7ffe2c0 → 0x00555555554000
Each register will display its name and its value. If the value is a pointer, GEF will
follow the pointer and display the value it points to, recursively. For example, the
$rsp
register points to 0x007fffffffe058
, which points to 0x00555555555186
, which
points to <main+54> mov rsi, rax
. This is the return address of the main
function in the C runtime,
where the program will return after the our_code_starts_here
function returns.
Let’s now look at the stack.
───────────────────────────────────────────────── stack ────
0x007fffffffe058│+0x0000: 0x00555555555186 → <main+54> ← $rsp
0x007fffffffe060│+0x0008: 0x0000000000000000
0x007fffffffe068│+0x0010: 0x007fffffffe198 → 0x007fffffffe51d → "..."
0x007fffffffe070│+0x0018: 0x0000000000000001
0x007fffffffe078│+0x0020: 0x07503e7a1bd6eb00
0x007fffffffe080│+0x0028: 0x0000000000000001 ← $rbp
0x007fffffffe088│+0x0030: 0x007ffff7ddb790 → mov edi, eax
0x007fffffffe090│+0x0038: 0x007fffffffe188 → 0x00000000000038 ("8"?)
On this panel, we can see the a window of the stack, with addresses increasing from
top to bottom. More notably GEF will display
where the $rsp
and $rbp
registers are pointing to. Additionally, GEF will display
the offsets of the stack from the $rsp
register. This will be useful to visualize
the stack when we will start to modify it.
Finally, let’s look at the disassembly.
───────────────────────────────────────────────── code:x86:64 ────
0x5555555551b6 <main+102> ret
0x5555555551b7 <main+103> call 0x555555555030 <__stack_chk_fail@plt>
0x5555555551bc nop DWORD PTR [rax+0x0]
→ 0x5555555551c0 <our_code_starts_here+0> mov QWORD PTR [rsp-0x8], rax
0x5555555551c5 <our_code_starts_here+5> mov eax, 0x1
0x5555555551ca <our_code_starts_here+10> mov QWORD PTR [rsp-0x10], rax
0x5555555551cf <our_code_starts_here+15> mov eax, 0x2
0x5555555551d4 <our_code_starts_here+20> mov rax, QWORD PTR [rsp-0x10]
0x5555555551d9 <our_code_starts_here+25> add rax, 0x1
This panel displays the disassembly of the code. We can see that the program is
currently executing the first instruction of the our_code_starts_here
function,
which is mov QWORD PTR [rsp-0x8], rax
. This instruction is moving the value of the
$rax
register into the memory address pointed by $rsp-0x8
. The next
instruction is mov eax, 0x1
, which is moving the value 0x1
into the $rax
register.
2.3 Stepping through the program
We can step through every instruction of the program by typing stepi
or si
. Every time
we step, GEF will update the context panel to reflect the new state of the program.
gef➤ si
rax
contains the value 0x0
, so now we would expect the value of the memory address
pointed by $rsp-0x8
to be 0x0
. Let’s check.
gef➤ x/gx $rsp-0x8
0x7fffffffe050: 0x0000000000000000
x
is the examine
command, which allows us to examine the memory. gx
is a specifier,
which tells GEF to display the value of the memory address as a hexadecimal (the x
)
and to display it as a 64-bit value (the g
). $rsp-0x8
is the address of the memory
we want to examine.
Let’s continue stepping through the program.
gef➤ si
Now, rax
contains the value 0x1
, we can see this by looking at the context panel.
2.4 Modifying the program
We can modify the value of the registers by typing set $register=value
. For example,
let’s set the value of $rax
to 0x2
.
gef➤ set $rax=0x2
Now, if we step through the program, to the next instruction (mov QWORD PTR [rsp-0x10], rax
),
we can see that the value of the memory address pointed by $rsp-0x10
is now 0x2
, which
is what we wanted to see given our input program.
gef➤ si
gef➤ x/gx $rsp-0x10
0x7fffffffe048: 0x0000000000000002
Let’s now break at the end to see what our_code_starts_here
returns.
gef➤ disas our_code_starts_here
Dump of assembler code for function our_code_starts_here:
0x00005555555551c0 <+0>: mov QWORD PTR [rsp-0x8],rax
0x00005555555551c5 <+5>: mov eax,0x1
0x00005555555551ca <+10>: mov QWORD PTR [rsp-0x10],rax
0x00005555555551cf <+15>: mov eax,0x2
0x00005555555551d4 <+20>: mov rax,QWORD PTR [rsp-0x10]
0x00005555555551d9 <+25>: add rax,0x1
0x00005555555551dd <+29>: ret
End of assembler dump.
gef➤ b *our_code_starts_here+29
Breakpoint 1 at 0x5555555551dd
We can resume the program by typing continue
or c
. We will then hit the breakpoint
we just set.
gef➤ c
Continuing.
We can see that the value of $rax
is 0x3
, which is the value we wanted to see.
gef➤ p $rax
$1 = 3
2.5 Diagnosis
So why was our program not working? Well, we were not setting the value of $rax
to
0x2
before the mov QWORD PTR [rsp-0x10], rax
instruction. Instead, we were setting
the value of $rax
to 0x2
after the mov QWORD PTR [rsp-0x10], rax
instruction.
This means that the value of the memory address pointed by $rsp-0x10
was 0x1
, because
of the compilation of the previous binding, x
.
3 Debugging Call Stack Issues
Do Now!
Now that we are familiar with basic debugging, let’s look at a more complex example. We will take a look at a compiler with support for basic function definitions and calls. We will use the following program as input to our compiler.
def add(x, y):
let res = x + y in
res
let n1 = add(1, 2) in
let n2 = add(4, 5) in
add(n1, n2)
When the program is compiled and run, it should print 12
. However, when we run the program, we
get the following output.
$ ./output/buggy_stack.run
Segmentation fault (core dumped)
A segmentation fault is a type of error that occurs when a program tries to access memory that it does not have access to. This is usually caused by a bug in the program, which typically involves a value that is wrongfully interpreted as a memory address.
Let’s look at the assembly produced by our compiler.
section .text
global our_code_starts_here
add:
push RBP ; save old base pointer
mov RBP, RSP ; set new base pointer
sub RSP, 8 ; allocate space for local vars
mov RAX, [RBP+16]
add RAX, [RBP+24]
mov [RBP-8], RAX
mov RAX, [RBP-8]
add RSP, 8 ; pop local vars
mov RSP, RBP ; restore stack pointer
pop RBP ; restore base pointer
ret
our_code_starts_here:
push RBP ; save old base pointer
mov RBP, RSP ; set new base pointer
sub RSP, 16 ; allocate space for local vars
mov R11, 4
push R11 ; push arg number 1
mov R11, 2
push R11 ; push arg number 2
call add
mov [RBP-8], RAX
mov R11, 10
push R11 ; push arg number 1
mov R11, 8
push R11 ; push arg number 2
call add
mov [RBP-16], RAX
mov R11, [RBP-16]
push R11 ; push arg number 1
mov R11, [RBP-8]
push R11 ; push arg number 2
call add
add RSP, 16 ; pop local vars
pop RBP
ret
Unless you have worked with stack frames before, the issue is not immediately obvious. Let’s debug the program in GDB to see when the segmentation fault occurs.
$ gdb output/buggy_stack.run
To debug these kinds of call stack issues, it’s useful to add breakpoints before and after every function call. This way, we can see what the call stack looks like in between calls.
gef➤ b *&our_code_starts_here+24 # break at the first call to add
Breakpoint 1 at 0x5555555556c9
gef➤ b *&our_code_starts_here+49 # break at the second call to add
Breakpoint 2 at 0x5555555556e2
gef➤ b *&our_code_starts_here+70 # break at the third call to add
Breakpoint 3 at 0x5555555556f7
gef➤ b *&our_code_starts_here+80 # break on return instruction
Breakpoint 4 at 0x555555555701
We can run our program with r
and cycle through the breakpoints with n
.
gef➤ r
gef➤ n
gef➤ n
gef➤ n
gef➤ n
Having reached the last breakpoint without the program crashing, we can see in the stack panel
that the value of $rsp
is misaligned, which is causing the segmentation fault.
───────────────────────────────────────────────── stack ────
0x007fffffffdfe8│+0x0000: 0x0000000000000a ("\n"?) ← $rsp
0x007fffffffdff0│+0x0008: 0x0000000000000002
0x007fffffffdff8│+0x0010: 0x0000000000000004
0x007fffffffe000│+0x0018: 0x0000000000000012
0x007fffffffe008│+0x0020: 0x0000000000000006
0x007fffffffe010│+0x0028: 0x007fffffffe040 → 0x0000000000000001
0x007fffffffe018│+0x0030: 0x00555555555652 → <main+50> mov QWORD PTR [rsp], rax
0x007fffffffe020│+0x0038: 0x0000000000000000
When the ret
address is executed, the program tries to jump to the address pointed by
$rsp
, which is 0x0000000000000a
, which is not a valid address.
Instead, $rsp
should have pointed to the 0x00555555555652
address, which is the address of the mov QWORD PTR [rsp], rax
instruction,
making the control flow of the program jump to the frame in the stack of the main
function.
Let’s rewind the program to the first breakpoint and look at the stack before the first call
gef➤ r
───────────────────────────────────────────────── stack ────
0x007fffffffdff0│+0x0000: 0x0000000000000002 ← $rsp
0x007fffffffdff8│+0x0008: 0x0000000000000004
0x007fffffffe000│+0x0010: 0x0000000000000000
0x007fffffffe008│+0x0018: 0x0000000000000000
0x007fffffffe010│+0x0020: 0x007fffffffe040 → 0x0000000000000001 ← $rbp
0x007fffffffe018│+0x0028: 0x00555555555652 → <main+50> mov QWORD PTR [rsp], rax
0x007fffffffe020│+0x0030: 0x0000000000000000
We can see that both 2
(1
in snakeval) and 4
(2
in snakeval) are pushed to the stack,
nothing wrong here. Let’s continue to the next breakpoint.
gef➤ n
───────────────────────────────────────────────── stack ────
0x007fffffffdfe0│+0x0000: 0x0000000000000008 ← $rsp
0x007fffffffdfe8│+0x0008: 0x0000000000000a ("\n"?)
0x007fffffffdff0│+0x0010: 0x0000000000000002
0x007fffffffdff8│+0x0018: 0x0000000000000004
0x007fffffffe000│+0x0020: 0x0000000000000000
0x007fffffffe008│+0x0028: 0x0000000000000006
0x007fffffffe010│+0x0030: 0x007fffffffe040 → 0x0000000000000001 ← $rbp
0x007fffffffe018│+0x0038: 0x00555555555652 → <main+50> mov QWORD PTR [rsp], rax
Now, this is interesting. We can see that 2
and 4
are still on the stack, but
8
(4
in snakeval) and 10
(5
in snakeval) are also on the stack.
What should have happened is that 2
and 4
should have been popped off the stack
after the first call to add
, and 8
and 10
should have taken their place.
Additionally, we see that the $rsp
register is advanced by 16
more
bytes than it should be.
Let’s go back to the assembly code of our_code_starts_here
and see what’s going on.
gef➤ disas our_code_starts_here
Dump of assembler code for function our_code_starts_here:
0x00005555555556b1 <+0>: push rbp
0x00005555555556b2 <+1>: mov rbp,rsp
0x00005555555556b5 <+4>: sub rsp,0x10
0x00005555555556b9 <+8>: mov r11d,0x4
0x00005555555556bf <+14>: push r11
0x00005555555556c1 <+16>: mov r11d,0x2
0x00005555555556c7 <+22>: push r11
0x00005555555556c9 <+24>: call 0x555555555690 <add>
0x00005555555556ce <+29>: mov QWORD PTR [rbp-0x8],rax
0x00005555555556d2 <+33>: mov r11d,0xa
0x00005555555556d8 <+39>: push r11
0x00005555555556da <+41>: mov r11d,0x8
0x00005555555556e0 <+47>: push r11
=> 0x00005555555556e2 <+49>: call 0x555555555690 <add>
0x00005555555556e7 <+54>: mov QWORD PTR [rbp-0x10],rax
0x00005555555556eb <+58>: mov r11,QWORD PTR [rbp-0x10]
0x00005555555556ef <+62>: push r11
0x00005555555556f1 <+64>: mov r11,QWORD PTR [rbp-0x8]
0x00005555555556f5 <+68>: push r11
0x00005555555556f7 <+70>: call 0x555555555690 <add>
0x00005555555556fc <+75>: add rsp,0x10
0x0000555555555700 <+79>: pop rbp
0x0000555555555701 <+80>: ret
End of assembler dump.
We are pushing 2
and 4
to the stack, then calling add
, then pushing 8
and 10
to the stack, then calling add
again. However, we are never
popping the arguments off the stack after the function returns. This is why the stack
is messed up.
3.1 Diagnosis
Whenever our_code_starts_here
calls the add
function, it pushes the arguments to the
stack. However, it does not pop the arguments off the stack after the function returns.
The easiest way to fix this is to add a add RSP, 16
instruction after every call add
,
which will add 16
to the value of $rsp
, which will effectively pop the arguments off
the stack.