Some years ago I was studying some memory topics, in summary that kind of stuff that make happy every C coder worthy of such a name...
As result I asked my self: is possible in a simple way to save "the state" of a generic C program?
It would be really nice to save the state of a program to the hard disk while it's running and reload it when needed to restart it from the saved state!
Even I think that the resulting solution is hardly applicable in production, I suppose it could be quite instructive for who is trying to enter in this kind of topics.
Because the main focus of this post is to build a simple example of a C program with freeze save and restart capabilities, we will only introduce some concepts in a real informal way. To the interested reader more research on these topics are left.
First we have to take a look to the classical memory layout of a C program.
Thanks to the virtual memory every program thinks to have all the memory for itself starting from address 0 to 2^64 (obviously talking about 64 bit systems).
We will see some more details later but we will focus on 3 main points:
- Stack: The history of the called function is in the stack (basically what you can see translated by gdb using the backtrace command while stopped in a break point). Also the values of local variables are contained there. As you can see the stack start from a really high address and "grows" downward going to the directions of the zero.
- Heap: In the heap (sometimes referred as data segment) is located all the memory dynamically allocated so typically the most part of our data.
- CPU registers: In the registers are located some values used during calculation by the processor. The register in which we are more interested in is the instruction pointer, also called program counter. In this register the address of the next instruction that the processor will execute is simply stored. Instructions (i.e. our program) are stored in the text segment area (see first figure). It's quite clear that the instruction pointer is a critical value to be saved for our purpose, because it basically holds the position of the execution of our program inside the code.
Last important point is that for simplicity we are now making the assumption to have a program without global variables (in fact, their use in general is not a good practice).
Before starting we face the first problem...
Try to compile and execute this example:
Here is a screenshot of the output on my computer.
First we can observe that, as expected, the stack variable has an address value really high. Also the main function address (that stay in the text sector) and of the memory allocated on the heap via malloc correspond to low values as expected.
But bad news!! The addresses of the memory allocated on heap and stack is always changing!
In fact only the address of the main function (that is stored in the text sector) stays stable.
The problem is pretty clear, if heap and stack do not start from the previous data addresses in memory when we will reload in memory all the data saved will have a new location shifted of a certain value. In this situation all the pointers will point to a wrong address and nothing will work.
But with bad news also good news are coming... In the lower buffer of the trusty emacs you can see the output of the same program running under gdb.
Good news is pretty evident, there the addresses are stable. Why?
The answer is Address Space Layout Randomization.
ASLR (for friends) is a technique adopted to avoid buffer overflow attack, basically consists in randomly changing the initial address of stack and heap at every program run.
If this technique is good for security reason is not good for our purpose so we have first to disable ASLR in some way.
Luckily gdb disables ASLR to provide stable addresses during different debug sessions (a treatament that not all the debuggers will grant you...)
So to do the same has been only necessary to take a quick view to the gdb source code to extrapolate something like this:
We have now a tool that can child a process after having set the "personality" of the executable in the proper way. In this way the kernel will be informed not to adopt ASLR on the new process.
Next part following here.
No comments:
Post a Comment