​​​​​Hi Prof. Cooperman, Here are the notes I had taken on Friday morning. I hope they'll be helpful to everyone. Sorry for the delay on this! Also, do you have links to the three papers you had mentioned on interposition, shadow device drivers and process virtualization? See you tomorrow! -Austin 01-13-17 Notes Virtualization : Interposition on complex systems Examples: - wrapper functions - loader, ld, has --wrap (create a wrapper at link time) - Shared libraries (e.g. .so) - man dlsym - Modify the ELF tables - Symbol table, relocation table - Interposition techniques (paper from 90s) - Wrapping signals - In Linux, can write a kernel module to change signal table - Device drivers - Use virtualization based on 5 or 6 io calls - open, read, write, close... - Kernel Tables - Different ways to virtualize pid (e.g. for checkpointing) - could write module to modify pid - hope it doesn't matter if pid is different - expose pid and allow user to modify - DMTCP example - wrap around syscalls - Checkpoint / restart (save/restore workspace) - Interpose on checkpoint or restart event - Example of intel using dmtcp (Electronic Design Automation) - Test if MS office runs on new chip, don't want to boot windows every time - Create a semantically equivalent model of the computation - Control groups (cgroups) - Allows assignment of cores, cpu time, etc... - Union file system - Containers are "lightweight virtual machines" - Read only layer, read write layer Examples of "full" virtualization (full in the sense that it encompasses a "complete" sytsem) - Virtual machines - Interpose on the hardware - Snapshots will save state of sytsem - Linux containers - Consist of namespaces, cgroups, union filesystem - Interpose on the kernel - Save the state by saving the read/write portion of the filesystem - Process virtualization - Interpose on the runtime library (libc and friends) - Language virtual machines (e.g. java) - Checkpoints aren't usually done at this level Layers for interposition: - Application - ELF tables - runtime libraries <- POSIX layer or kernel calls (linux api), proc filesystem - %libnetwork.so - %libEDA.so - %libmypid.so - libc.so (% => Our added layers) - Kernel - Hardware Save/restore models - Virtual machine : snapshot (full OS, processes and filesystem) - container : filesystem (namely the read/write portion) - process virtualization : checkpointing - language vm - save the state of language vm Save/restore leads to migration (e.g. load balancing of servers) - Copy state to another machine and restore - Or, live migration, skip the middle man - Uses kernel support for COW (copy on write) to incrementally move process - Fork the process, child is migrated - In that time, note any progress the parent made and do copy-on-write to child - Suspend the parent Live migration will use: /proc/clear_refs /proc/*/pagemap How do we migrate between servers and keep the connection alive? a. create stub process on old machine, relay it b. use network tricks to advertise new listener port Example usage in the 90s - Condor (we'll see it in the paper) Wrapper Functions: (virtualization of pid's) create a library (e.g. libmypid.so) pid_t getpid() { pid_t rc = NEXTFUNC(getpid)(); pid_t virtual_pid = real_to_virtual(rc); return virtual_pid; } Utilities: real_to_virt() virt_to_rel() Assume dynamically shared libraries (implies library search order) Typically uses greedy algortithm to find our wrapper first ...libmypid.so ... libc.so ... (LD_PRELOAD=libmypid.so => found in man ld.so) NEXTFUNC(getpid)() => (*dlysm(RTLD_NEXT, "getpid")) Now we can have nested wrappers to give us layer modularity - Originally led to a problem in dmctp Layers of interposition - Requires nested wrappers libmypid.so ... libc.so Call down to libc works, problem arises at checkpoint/restore events Register callbacks top-down (respect library search order) On restore we would also register top-down, which is WRONG (look at C++ constructors or pthread_atforks() for correct method) Another problem => At restore, need to make sure we get the updated real pid (libc used to cache the old one) Symbol Tables - name (string) - address Relocation Tables - Call site (address of inctruction) - name (string) Internal symbols vs external (dynamic) symbols - Externals are registered in relocation table - Internals are harder to warp but can be done