OVERVIEW of CHECKPOINT-RESTART This document provides an overview of the homework. Some of the details in this overview are clarified in 000-README, and in some cases, by small example programs that illustrate a systems technique. Therefore, the recommended reading order is: Recommended reading order: I. OVERVIEW - conceptual overview II. 000-README - step-by-step description of what to implement and how III. BACKGROUND - example programs that illustrate the systems techniques used However, if something is unclear, feel free to stop in the middle (for example, in the middle of OVERVIEW), and read an appropriate section for details (e.g., in 000-README.) ==== The rest of this document has three parts, with three different views of the code: A. SPATIAL (compile-time) VIEW B. TEMPORAL (runtime) VIEW C. DEBUGGING VIEW A. SPATIAL (compile-time) VIEW: libckpt.so: constructor signal (register signal handler for SIGUSR2) signal handler getcontext => writes a new ucontext_t into a buffer save ucontext_t (save reg's) [ An older sigsetjmp and sigjmp_buf exists; Use getcontext/setcontext ] save /proc/self/maps restart: See 000-README for details of this program. myckpt: Generated when checkpointing B. TEMPORAL (runtime) VIEW: Phase 1: constructor called stack: constructor call frame constructor returns (does not call exit(0)) Phase 2: main called stack: main -> other functions of user program Phase 3 (checkpoint): SIGUSR2 is sent stack: main->other user fnc's-> -> my_signal_handler [ is a call frame added by kernel ] my_signal_handler [SEE: SPATIAL VIEW: libckpt.so: signal handler) getcontext => writes a new ucontext_t into a buffer save ucontext_t (save reg's) [ For technical reasons, we recommend setcontext, _not_ sigsetjmp ] save /proc/self/maps Phase 4 (restart): restart program restore memory (restoring original /proc/self/maps in ckpt image) mmap() with extra PROT_WRITE, read() restore sigjmp_buf from ckpt image file (which contains the old registers) setcontext is called, but it returns from getcontext (see man page) [ This changes $pc, $sp, etc. ] [ For technical reasons, we recommend setcontext/getcontext. ] Phase 5 (restart): The program counter changes from the restart program to the original user program because we called setcontext setcontext is called, but it returns from getcontext (see man page) When getcontext returns, it returns 0 if called directly, but it returns 1 if called from setcontext (see man page) We returned from getcontext into the signal handler function of Phase 3. We will have a _global_ variable, is_restart. For an example, see save-restore-context.c, where we declare 'static int is_restart = 0;'. 'static' means that it will behave like a global variable. So: signal_handler(): int getcontext(ucontext_t); if (is_restart == 0) // This is checkpoint, Phase 3 if (is_restart != 0) // This is restart, Phase 5 is_restart = 1; // When we ckpt memory, is_restart // will be '1' in the ckpt file. // Later during restart, we restore memory, which // has the value of is_restart as '1'. if this is restart (Phase 5), then: return from signal handler immediately [ For technical reasons, we recommend setcontext/getcontext. ] C. DEBUGGING VIEW: GDB: break my_constructor (Phase 1) break main (Phase 2) break my_signal_handler (Phase 3) ./restart (Phase 4) To get from Phase 4 to Phase 5 under GDB, use '(gdb) si' when you start to execute setcontext(). [ In fact, setcontext is written in assembly. ] ADVANCED TIPS FOR DEBUGGING: DEBUGGING AT ASSEMBLY LEVEL with GDB