From our (a programmer’s) perspective, main memory (think RAM, even though the story is more complicated) is a byte array indexed by addresses from 0 to 264 − 1 (on our 64 bit systems)
In assembly, we normally access memory indirectly using an address stored in a register or via a label
E.g., if a valid address is stored in %rax, the following will first save the number 1234 at that address and then move move the contents of that address into %rbx
1234 into memory at the address stored in %rax
# move the quadword movq $1234, (%rax)
, at the address stored in %rax, into %rbx
# move the quadword stored in memorymovq (%rax). %rbx
We can also use offsets (displacement) to access memory a certain number of bytes before or after an address stored in a register:
4 bytes at the address %rax - 8 to the address %rax + 4
# copy the -8(%rax), %ebx
movl %ebx, 4(%rax) movl
Remember: how much data is actually moved depends on the instruction size
Note: x86 assembly does not allow you to move a value between memory location using just one instructions, so the following is not valid:
-8(%rax), 4(%rax) # ERROR movl
When our program is loaded into memory, some of the things that go there are directly in the executable: the code (.text
), global variables (.data
)
Some things are created while the program is running
A program’s memory space (that is, the portion of memory that a program can access and use) is partitioned into a few chunks (segments):
+-----------------------------------+ <- High address
| Environment vars + args |
+-----------------------------------+
| STACK |
| | |
| v |
|...................................|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|...................................|
| ^ |
| | | Dynamically allocated memory
| HEAP |
+-----------------------------------+
| Uninitialized globals (.bss) |
+-----------------------------------+
| Initialized globals (.data) | .text, .data, .bss come from the executable
+-----------------------------------+
| |
| Code (.text) |
+-----------------------------------+
| OS stuff |
+-----------------------------------+ <- low address
%rsp
(the stack pointer) and %rbp
(the base pointer)A stack frame is an area of the stack delimited by the registers %rbp
and %rsp
Normally, everytime a function is called, it sets up a stack frame for itself for storing local information
Once the function exits, the stack frame is released
Setting up the stack frame is exactly what the instruction enter
does
On the other hand, releasing the stack frame is the job of leave
- this cleans up whatever the function might have stored on the stack
Setting up the stack frame can be also achieved using the following pair of instructions:
%rbp # save the previous stack frame base to the stack
pushq movq %rsp, %rbp # copy the current stack pointer into the base pointer, creating a stack of size 0
leave
can be then replaced by the following instructions
movq %rbp, %rsp # drop the current stack frame by resetting %rsp to the base of the frame
%rbp # restore the previous frame base popq
As mentioned above, stack frames are useful for storing local information a function needs during its lifetime
This can be either using push
/pop
or by using offsets from %(rbp)
as local variables
How does this work?
First we need to allocate some number of bytes on the stack
Let’s say that we want to store two long variables (let’s call them a
and b
) on the stack
That’s a total of 16 bytes
We’ll tell enter
that we want an initial stack frame of size 16 bytes instead of 0:
enter $16, $0
...
Now we can map a
to -8(%rbp)
and -16(%rbp)
(remember that the stack grows downward!)
...movq $42, -8(%rbp) # a = 42
movq $1, -16(%rbp) # b = 1
$12, -16(%rbp) # b += 12
addq
;
# return bmovq -16(%rbp), %rax
leave
ret
We’ll need to use local variables if:
In assembly, a function is represented as a label, a prologue (with stack frame setup), a body, and an epilogue (stack frame teardown and return)
In this class, we also add comments with the C signature and variable mappings (following the Assembly Design Recipe)
Here’s an example:
(long x)
# long double-> %rdi
# x double:
# PROLOGUEenter $0, $0
# BODY
+ x;
# return x movq %rdi, %rax
%rdi, %rax
addq
# EPILOGUEleave
ret
The ret
instruction jumps back to the instruction right after the call
that called the given function
How does ret
know where to jump? The return address gets pushed onto the stack just before it jumps to the function’s body
Now, go and read Nat Tuck’s Assembly Design Recipe
The recipe breaks the process of writing a function into 5 steps:
Our function will take an unsigned long and will return an unsigned long
(unsigned long n) # unsigned long fact
unsigned long fact(unsigned long n) {
if (n < 2)
return 1;
else
return n * fact(n - 1);
}
-> %rsi # n
Skeleton
…
Body
…
To be continued…