Once in the previous post we have obtained a way to execute our program with "stable" addresses we can now enter in the real game.
In this post we are going to cover the heap part; before starting coding a quick (and incomplete) overview is (again) necessary.
What we have to know about the heap (or data segment) is that is simply a portion of memory that the operating system give us to store data dynamically.
Again using the picture above we can see where the heap is typically located in the present day. We can also observe that it grows upward.
Obviously we all know that "dynamically" means we don't know the amount of memory necessary for the execution of the program before the execution itself.
The linux kernel gives us this classical UNIX interface to tackle the problem.
#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);
A call to brk sets the new end of the data segment to the value specified in addr, and on success returns 0.
Calling sbrk is possible to increment (or to decrement) the size of the data sector of a certain value. This function returns the new pointer to the end of the data segment; if the increment is 0 it returns the current end of the heap.
For more info on this two functions obviously the man page will be more exhaustive.
It's time now for two observations:
1 the entire problem of the dynamic allocation is brilliantly solved at the interface with only two simple functions.
2 thinking on it we realize that not both are needed to solve the problem. Maybe for this reason none of the two is specified either in the C or POSIX standards but traditionally UNIX systems implement at least one. In fact I'm just taking a look to the source of the UNIX Version 6 in the Lions' book and I find only trace of sbrk (at the time named sbreak). But the research provides me with another interesting/funny result in form of annotated comment:
/* break system call.
* -- bad planning: "break" is a dirty word in C.
*/
Luckily nobody's perfect...
Now that we have introduced the UNIX interface we can point out that in fact malloc is a function executed in user space by the standard library that effectively calls sbrk.
Instead of taking a look to the glibc implementation if you are interested in a simpler but working example of malloc implementation you can find it in the K&R. So now we have clear that all the allocators that we commonly use in the end pass trought either brk or sbrk calls. Working on the top of the operating system interface will make our implementation agnostic on the kind of memory allocator that is used by the program (if any is used).
Ok let's go with the implementation!
First thing to do when we launch our program is to find and to store the heap address. It's very important to do this action before every allocation in the data segment. According to the interface previously this simply translate in:
heap_start = sbrk(0);
Now we need the function, that we will call before freezing our program, to save the content of the data segment to file. With our previous knowledge the implementation is pretty straightforward. StoreHeap simply reads the current last address of the data segment and taking as argument the initial address, it stores the portion of memory between the two values directly onto a binary file named heap.data.
We now need the symmetric function that will load back the previously stored data.
This is a little bit more tricky... We are using fopen from stdio.h to keep our life as simple as possible, but fopen itself allocates some memory in the data segment.
We will have to:
- open the file
- store the new heap end
- allocate the necessary memory to load back our data
- write the data in memory (starting just after the portion allocated by fopen)
- close the file
- shift our heap content to the original position
- move the end of the data segment back in the original position.
We are almost there and we miss only the main function.
Here is an example (written as simple as possible):
You will easily notice a couple of details:
#include <malloc.h>
mallopt(M_MMAP_MAX, 0);
This is because in fact the kernel linux provides a third way to allocate memory that is Anonymous Memory Mapping through mmap.
Because this is typically convenient for big portion of memory, malloc sometimes could also use that.
If (as I'm now assuming) we will be interested in using malloc, it's a good idea to instruct this not to use mmap with the code just showed.
Moreover you will have noticed that I stored the value of str directly in the code with:
str = (void*)0x602010;
I did this simply because reloading the program we will not have the correct value of the pointer in str.
Because this variable is in the stack we will learn in the next post how to restore that. For now I've simply discovered the pointer value reading the print in the first execution and I've hardcoded it in the code.
Do the same because depending on various factor this address probably will be different on your computer!
Here is the example on how to how our little library:
And here the output on my computer.
To prove that we are not doing something too illegal here is the valgrind output with no errors.
Next post stack time!