Debugging and Other Systems Tricks:

Copyright © Gene Cooperman, 2017, 2018.
This may be freely copied and modified as long as this copyright notice remains. I would appreciate any enhancements being sent back for possible inclusion here.

If the suggestions are unclear, use "man" to find out more about the commands.

  1. pgrep -n a.out
    pkill -9 a.out
    , where you replace a.out by the name of your binary. Also, consider pgrep with -o, and with no flags.
  2. GDB (beyond the basics): The use of gdb is essential.
    1. To use GDB, you must compile for debugging. With gcc or g++, I use the following flags:
      gcc -g3 -O0 FILE.c
      NOTE: -O0 is for optimization level 0. Otherwise, the compiler will inter-mix assembly instructions from different source-level statements. The -g3 flag says to include debugging information, including values of C/C++ macros (level 3 debugging). The simpler -gflag will not save the values of your C macros.
    2. I invoke gdb --args <COMMAND STRING>
      This form allows you to include the command and its arguments.
    3. Read up on the basic GDB commands: break, run, continue, step, next, where, frame, list, print
    4. An example of a common pattern that I use is:
      (gdb) info threads # What threads are there?
      (gdb) thread 2 # Show me thread number 2
      (gdb) where 5  # Show me the last 5 call frames of the stack for this thread
      (gdb) where -5 # Show me the first 5 call frames of the stack
      (gdb) frame 3  # Show me call frame number 3
      (gdb) print x  # Show me the local variable x of call frame number 3
      (gdb) list
      (gdb) macro expand MACRO_CALL(ARGS)
      Do gdb help COMMAND to see the meanings of the commands.
      Also, consider (gdb) apropos kernel-calls (or whatever).
    5. Within gdb, try this ASCII graphics mode: ^Xa (control-x a)
    6. Include useful utility functions in your code. You can then do things like (gdb) print my-utility-len-linked-list(mylist) This also works on system calls: (gdb) print lseek(4, 0, 0) This finds the file offset for file descriptor number 4, where the final 0 corresponds to SEEK_SET, whose value is found from grep -R SEEK_SET /usr/include . Similarly, you can find out what fd 4 corresponds to: (gdb) set $pid = getpid()
      (gdb) shell ls -l /proc/$pid/fd
      NOTE: getpid() can be called only after the gdb run (after the target is running).
    7. In some cases, GDB where will show you the file and line number of a call frame, but GDB list will not show you the source code. This is because GDB has a default directory search path that does not yet include the directory for the current call frame. The solution is:
      (gdb) dir PREFIX_DIR_PATH
      As an example, suppose libc.so crashes on you. And the call frame shows that you are at line 100 in glibc: "io/readlink.c:100"
      % ldd /bin/ls
            ...
            libc.so.6 => /lib64/libc.so.6 (0x00007f9ab5d8f000)
      % ls -l /lib64/libc.so.6       lrwxrwxrwx. 1 root root 12 Dec 23 11:42 /lib64/libc.so.6 -> libc-2.17.so
      % wget .../glibc-2.17.tar.gz # From GLIBC downloads on the web
      % tar xf glibc-2.17.tar.gz
      And now you are ready to see the source code of the call frame inside glibc, to help in debugging:
      (gdb) dir DOWNLOAD_DIR/glibc-2.17
      (gdb) where 100 # Should now show source code for io/readlink.c:100
    8. set follow-fork-mode child will follow the child process on fork. parent will have it follow the parent process. break fork will have GDB stop before executing the fork system call.
    9. A more advanced alternative to set follow-fork-mode is set detach-on-fork off . This will suspend either the parent or child (whichever process you are not debugging). Then use info inferiors and inferior INFERIOR_NUMBER to choose which process to debug currently. (This extends the hierarchy of what to debug by using: "frame NUMBER", "thread NUMBER", "inferior NUMBER".)
    10. If your program under GDB exits too soon, execute both lines below:
      (gdb) break _exit
      (gdb) break exit
      (You can also break on other system calls in libc.)
    11. gdb a.out <PID>
      OR:
      gdb a.out
      (gdb) attach <PID>
      (where the attach command is given within gdb); a convenient single command that finds the PID is:
      gdb a.out `pgrep a.out | tail -1`
    12. On newer Linux kernels, it may be configured to not allow attaching. If so, try: echo 0 > /proc/sys/kernel/yama/ptrace_scope
      Or else, add prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, 0, 0, 0); to the course code of your target early (e.g., in main or in a constructor like DmtcpWorker::DmtcpWorker()), and re-compile.
    13. Consider delay loops in combination with "gdb attach". These are better than calls to 'sleep' since they do not access external libraries. For example, insert into your source code: {volatile int x = 1; while(x);} Then do "gdb attach" and (gdb) print x=0 An alternate form to stop at the 5th occurrence is: {static int x = 1; if (x++ > 5) while(x);}
    14. To use gdb with assembly, consider x/10i $pc, and x/10i $pc-12, etc. A convenient version is: (gdb) display/5i $pc followed by stepi (si) or nexti (ni). To set a breakpoint in assembly, try: (gdb) break *0xbfdea000 for a breakpoint at the given address.
    15. To use gdb on multi-threaded programs, try:
      (gdb) info threads
      (gdb) thread <NUM>
      (gdb) thread apply all where full
      (gdb) thread set scheduler-locking on

      NOTE: the last command (scheduler-locking) has the possibility of creating deadlock -- for example, if one thread is holding a low-level libc or C++ lock, and you try to advance a second thread whose execution requires the lock.
    16. If the code segfaults, let it create a core dump:
      ulimit -c unlimited
      Then: gdb a.out core
      If your core dump is not in the local directory, then note that some recent Linux distros or WSL are hiding the core dump. Try ls core.* or on some machines, either:
      1. examine /proc/sys/kernel/core_pattern to find where the core file was saved; or
      2. use coredumpctl to access the core file.
    17. In rare cases, you may want to debug inside the glibc source code, to see why it does something. For this, consider both:
    18. You may sometimes see GDB saying Cannot insert breakpoint. The solution to this is:
      (gdb) set breakpoint always-inserted on
      For details, try: help set breakpoint always-inserted.
    19. To use gdb on programs generating multiple processes, try:
      (gdb) set detach-on-fork off
      (gdb) info inferiors
      (gdb) inferior <NUM>

      NOTE: info inferiors will list each thread of of a new child process as a separate inferior.
    20. In C++, there may be more than one method for the same function name. So, if we want to set a breakpoint in GDB, we must specify the C++ signature. We do this by beginning with a quote (apostrophe) and the prefix of the signature, and then typing the TAB key. Try, for example:
      (gdb) break 'dmtcp::FileConnection::doLocki<TAB>
    21. In C++, it's possible that we don't even know the prefix of the function. (What namespace is it in? Do we need to specify just a class or also a subclass?) We would like an "apropos" command that will find all symbols matching some substring. Rather than guessing or examining the source code, ask for all methods matching a given substring (doLocki in this example):
      (gdb) info functions doLocki
      NOTE: For related functionality, see:
          Examining the Symbol Table (or alt)
      In particular, note: info symbol address, and info types regexp, and info functions regexp (as above), and info variables regexp.
    22. In C++, one further difficulty occurs. In GDB, it is easy to call a C function to help in debugging (for example: print getpid()). The corresponding technique in C++ is:
      (gdb) info functions methodName
      and look at the mangled names under "Non-debugging symbols". Then call the function using the mangled name (while recalling that the first argument of a mangled name is this, the current object.
    23. If you suspect a C++ STL container of interacting with your code unsafely (e.g., you use a data structure after it is freed), try compiling with "-D_GLIBCXX_DEBUG" for debugging in interactions with libstdc++.so. See libstdc++ debug mode for further details.
    24. Signals affect the behavior of your program. Some tricks for finding out about signals in GDB are:
      (gdb) # Stop in gdb before passing a signal to your process
      (gdb) handle SIGCHLD stop
      (gdb) print $_siginfo # if GDB stops at signal (gdb) # Diagnose signal handlers (Here, the SIGCHLD macro is: 17)
      (gdb) call malloc(sizeof(struct sigaction))
      $1 = (void *) 0x6034c0
      (gdb) call __sigaction(17, 0, $1)
      (gdb) print *((struct sigaction *)$1)
    25. In GDB, try help catch to stop at fork, exec, exceptions, etc.
    26. Start using GDB's ability to source in scripts: gdb -x gdbinit a.out
      OR:
      (gdb) source gdbinit
      Place your first few GDB commands of your session in gdbinit, instead of remembering the seuqnce of commands.
      (I prefer not to use .gdbinit (the GDB initialization file), because I don't like hidden files, but you can also do ln -s .gdbinit gdbinit to "unhide" the hidden file.)
      Also, sconsider set history save on to capture the GDB commands from your previous session, and copy them into gdbinit.
    27. Start using Python to extend the GDB scripts:
      In mygdbinit:
              define procmaps
              python gdb.execute("shell cat /proc/" + str(gdb.selected_inferior().pid) + "/maps")
              end
              
      And execute:
              (gdb) source mygdbinit
              (gdb) procmaps
              
      Or alternatively, integrate a new GDB command. In mygdbinit:
              python
              class Procmaps(gdb.Command):
                  """procmaps (using the Python API)"""
                  def __init__(self):
                      super(Procmaps, self).__init__("procmaps", gdb.COMMAND_USER)
                      self.dont_repeat()
                  def invoke(self, arg, from_tty):
                      # argv = gdb.string_to_argv(arg)  # not needed here
                      gdb.execute("shell cat /proc/" +
                                  str(gdb.selected_inferior().pid) + "/maps")
              Procmaps()
              end
              
      And execute:
              (gdb) source mygdbinit
              (gdb) help procmaps
              (gdb) procm  (auto-complete the GDB command)
              
  3. strace -o myoutput a.out (trace system calls based on kernel API: /usr/include/asm/unistd*.h ; decide in advance if it should trace all child processes or not; the flags -f and -ff exist for tracing parent and all children)
  4. ltrace -o myoutput a.out (not as useful as strace, but sometimes interesting: trace library calls instead of system calls).
  5. ps auxw | grep a.out
  6. pstree -pu $USER or pstree -lu $USER (tree of processes and child process; names in curly braces are additional threads); Note idioms like: pstree -p | grep -C2 a.out
  7. top and htop (and you can use things like strace directly from inside htop
  8. When your program runs too slowly, it might not be CPU-bound. Check man iostat
    man vmstat
    for disk/file I/O (Blk_read/s / Blk_wrtn/s), and paging to disk (bi/bo/id), respectively. A local disk (not on the network; SANs are different) can sequentially read or write (not both at once) roughly at a rate from 50 MB/s to 100 MB/s. If you are accessing files mostly and you don't see that bandwidth, then your program is not efficient. If you are paging to disk and you do see a bandwidth anywhere near that bandwidth, then you are using too much RAM.
  9. watch -d ls -l /tmp/myfile.txt
    watch -d "pstree -l | grep -A1 `basename $SHELL`"
    (repeatedly execute COMMAND for watch -d COMMAND)
  10. Search for SUBSTRING in all dmtcp/src/* files: find dmtcp| xargs grep SUBSTRING
    or alternatively: grep -r SUBSTRING
    Don't forget: grep -C3 ...; grep -A5 ...; (and so on.)
  11. grep and google are your friends when searching for information. Besides "grep'ing" through source code, here is a grep trick you may not have seen:
    find /usr/share/man/man3 | xargs zgrep MYSTRING
    find /usr/share/man/man3 | xargs gzip -dc | grep -C3 MYSTRING
  12. less /proc/PID/maps
  13. ls -l /proc/PID/fd
  14. List open file descriptors: lsof | grep a.out If you discover an interesting socket with SOCKET_ID through 'lsof' or 'ls -l /proc/PID/fd', then find the other end of the socket: lsof | grep SOCKET_ID
    For finding out what TCP ports your software is using, try:
    lsof | grep TCP (or else: lsof | grep TCP| grep $USER)
    and man lsof for other information on sockets, ports, etc.
  15. List environment variables of a process: cat /proc/PID/environ | tr '\0' '\n'
  16. In general, look up the waalth of information available at man proc, or at https://www.kernel.org/doc/Documentation/filesystems/proc.txt
  17. And the /dev directory can be informative. For example, compare: ls /dev   with: cat /proc/cpuinfo
  18. Similarly, for non-process information about the system (e.g., system hardware and drivers), see man sys, man sysctl, and https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt
  19. And for information about the current filesystems, try mount, and also consider /etc/fstab to see the default filesystems that are mounted over your root filesystem (over '/').
  20. List ttys being used by you: ls -l /dev/pts | grep $USER
    (and compare with your current tty: tty)
  21. nm -D a.out (or nm -D library.so) exists for seeing all of the dynamic symbols in the ELF symbol table. (ELF specifies a static symbol table (extended using gcc -g), which is used by GDB, but can be stripped out with strip. ELF also specifies a separate dynamic symbol table so that if a base executable calls "foo", the runtime loader can search the "library search path" to find a matching definition of "foo" in some library.)
    A similar technique is: touch myfile.c &&make -n myfile.o Note also the form nm -o for printing out filenames. This can be useful with brute force strategies:
    nm -o /usr/lib/lib* | grep MY_SYMBOL
    (Also see discussion of readelf and objdump below.)
  22. strings -a a.out (for some binary, a.out)
  23. If there is a syntax error in a .h file, try: cpp -I. -Iother/path/to/include/files myfile.c and you can see the expanded C or C++ code with no #include files. This often makes it easier to find the syntax error.
  24. Sometimes, a macro was expanded, and it's hard to track down what happened. Try: cpp -dM -I. -Iother/path/to/include/files myfile.c
    cpp -E exists to limit expansion to #define and other directives, but not macros.
    gcc -E will stop after the preprocessing stage, and before the compilation stage.
  25. Sometimes, it's not clear what includes paths to use. In the above example, try: rm myfile.o; make myfile.o and copy the command line used by 'make' to build myfile.o. If 'make' uses libtools, you may also have to remove hidden directories with names like .libs .
    A similar technique is: touch myfile.c &&make -n myfile.o
  26. Get to know your loader. It executes before your executable file: man ld.so
  27. env LD_DEBUG=help a.out
    env LD_DEBUG=files a.out
    (and try other options to LD_DEBUG)
  28. ldd a.out (for some binary, a.out)
  29. Replace PID in following:
    pushd /proc/PID;
    ls -l exe;
    echo -n "cmdline: "; cat -v cmdline;
    echo ""; cat -v environ; echo "";
    popd
  30. DMTCP: ./configure --enable-debug; make clean; make and then run and look at /tmp/dmtcp-USER@HOST/jassert* files for your value of USER and HOST. Before your next test, rm -rf /tmp/dmtcp-USER@HOST .
  31. MTCP: Look at mtcp/Makefile and uncomment the line that adds to CFLAGS the flag: -DDEBUG
  32. More on gdb: Using gdb with C++ : For a C++ function with namespace, class, and signature (e.g.: dmtcp::myClass::foo(int, bool) ), try listing it first:
    list 'dmtcp::myC<TAB>
    It will autocomplete. Extend it, and type the final quote mark ('). A related strategy (also described earlier) is to do:
    info functions foo
    Once you're sure you can list it, you can do things like set a breakpoint:
    break 'dmtcp::myC<TAB> (and complete it with quote mark as before).
    Don't forget info functions substring to discover the full signature in C++.
  33. Using gdb with errno: In glibc, the global variable errno (see man errno) is a macro that is redefined to:
    *(int *)__errno_location()
    If you want to p errno within gdb, you will have to modify this into p *(int *)__errno_location() On 64-bit Linux, glibc seems to do something even more complicated, requiring a more complicated solution.
  34. If you look at gdb and some call frames on the stack have no information (only a hex address and "?"), then find out where the call frames come from. Look at the hexadecimal address. Then do:
    (gdb) shell cat /proc/PID/maps (where the PID of the current process is given by (gdb) info proc )
    Alternatively:
    (gdb) info proc mappings
    Find which library or other memory segment the unknown hexadecimal address came from. Knowing which library was called is useful, but you may be able to find out more. If it comes from libc.so (or some other well-known library), then see the next two tips for how to get the library to show you its internal debugging information.
  35. (Continued) If you need a libc.so (or other well-known library) with debugging symbols, then:
    1. Install the package libc6-dbg. (The package name might differ for you. Also, this assumes you have root privilege on your Linux.) This will install a special libc.so in the directory /usr/lib/debug . Please note that the CCIS Ubuntu Linux machines already have a debugging version of libc installed, currently as /usr/lib/debug/libc-2.7.so .
    2. Next, do:
      env LD_LIBRARY_PATH=/usr/lib/debug dmtcp_checkpoint a.out (Presumably, after you checkpoint, the restarted a.out process will be using the pre-checkpoint libraries and hence the debugging versions. So, probably you don't need to use env LD_LIBRARY_PATH=/usr/lib/debug for the restart command. But if you're unsure, it doesn't hurt.)
    3. The a.out process above should now be using a debugging version of libc.so and perhaps other libraries. You can verify this by looking at /proc/PID/maps for your process. Now, in gdb, you will see the symbol information in the call frame and a source code file and line number.
    4. To read the corresponding source code, you can either download it from the main source code location: http://www.gnu.org/software/libc/libc.html#Availability (try to choose the same libc version, and note that the line numbers may be different in your Linux distro), or download the source package for your particular Linux distro.
  36. (Continued) [You can find a more conceptual version of this discussion here.] If gdb still shows some call frames with "?", and you have the full pathname of the library on disk, then you can often fix it as follows. (Once you understand this procedure, you may want to try the bin/gdb-add-symbol-file shell script found in DMTCP.)
    1. In /proc/PID/maps look up the full pathname of the library you need to load. The address of the call frame with missing information should be in the address range of that library.
    2. In gdb, read help add-symbol-file
    3. In gdb, type add-symbol-file FILE ADDR where FILE is the full pathname you identified in the /proc/PID/maps file. The ADDR will be the hexadecimal sum of:
      1. beginning of text segment address (text segment normally has r-x permission) in /proc/PID/maps; and
      2. hexadecimal address for Addr heading corresponding to .text when you look it up under Headers: with either of the following command: readelf -S FILE
        objdump -h FILE
    4. In the last step, the maps file provided the beginning address of the whole segment, but the binary library on disk contains many sections for a segment, and the .text section need not be the first section in the file. So, we must add the offset of the .text section, found by analyzing the binary library on disk.
    5. In gdb, a convenient way to add hexadecimal numbers is:
      p/x addr1 + addr2 where addr1 and addr2 are the two addresses we discussed. If those addresses are in hexadecimal, make sure to include 0x at the beginning of each hexadecimal number.
    6. Now do 'where' in gdb, and you should see full call frame information.
  37. It's sometimes annoying to search where your current distro has hidden the libc.so file and its version number. Try:
    ldd /bin/ls | grep libc.so; ls -l /lib64/libc.so.6 to find libc and its version number (assuming that ldd points you to /lib64).
  38. For statically linked executables, you can use ld --wrap=symbol to create wrapper functions when statically linking some .o files together. This plays tricks with the ELF symbol tables to create new symbols, __wrap_symbol and __real_symbol. See man ld for more information.
  39. If gdb is inconvenient, you can set a breakpoint directly in your program using a technique of Nikolay Igotti. In short, one defines a handler for SIGTRAP, forks a child, and uses ptrace on the child to set some of the child's x86 hardware debug registers using the POKEUSER option of ptrace and include/sys/user.h.
  40. The two commands readelf and objdump are useful for inspecting the contents of binary files. These are related to the other commands, nm and strings, but these commands have many more options, including the ability to disassemble into assembly code, the ability to display section headers, etc. Scan the man pages quickly to see if something might be useful for you.
  41. For an assembly level listing as you do stepi in gdb, try objdump -S a.out > a.out.listing where a.out should be replaced by your binary. For a more verbose form, try one of: gcc -c -g -Wa,-alh,-L file.c > file.s
    gcc -c -g -Wa,-ahls=file.s file.c
    Variations of this can also produce assembly code that can be directly assembled by gcc or by as. For example, if you want to modify and re-compile the source code for libc.so, this is normally quite painful. A nice trick is to disassemble libc.so into assembly, and then cut or copy out the particular assembly routines that you want to assemble into a modified library.
  42. UNIX system calls, by Open Group; (enter system call in search box); This is the clearest, most precise man pages for system calls you will ever find.
  43. Valgrind (Memory and leak detection utility); This is easy-to-use and surprisingly powerful.
  44. Starting with GCC 5.0, -fsanitize is available. Search for -fsanitize in Options for Debugging Your Program or GCC. This alternative to valgrind will detect do some of the same things as valgrind (detect memory leads, detect races, array bounds checking, enum checking, etc.)
  45. If you want to see the stack just before a segfault, a quick idea that may help is: catchsegv COMMAND_LINE
  46. Another method for diagnosing segfaults that may give you more control is to try the glibc backtrace function: man backtrace It mangles any C++ names, but they are mostly readable (and utilities exist for demangling the names). Read the notes of man backtrace . (For example, compile with gcc -rdynamic to get symbol names.) Also, note man addr2line.
    Look at the example file, backtrace.c, for this course.
    Also, for any call frames with no symbol name, look up the hex address in /proc/<PID>/maps. Use addr2line to translate hex addresses into line numbers in source code. (If it's a .so dynamic library, give it the offset, the hex address minus the beginning library address as shown by /proc/<PID>/maps.
  47. Understand addresses of symbols:
    1. Your executable will be loaded into RAM at an unknown base address. But once it is loaded, you can find the base address it was loaded to: less /proc/PID/maps
    2. Your executable file provides the offset from the base address at which you will find the beginning of: 'text', 'data' (and possibly 'bss' segment). Commands like 'objdump' and 'readelf' will show you this offset in the file. readelf -S a.out | grep '\.text '
    3. You may want to know the address of a particular symbol within the 'text', 'data', or 'bss' segment. Use 'nm', 'objdump', or 'readelf'. SHELL% nm a.out | grep main
      080483e4 T main
      (As described in the 'man' pages, 't', 'T', 'd', 'D', 'b', 'B', 'U' tell you if the symbol is in text, data, bss, or undefined (presumably defined in a different library). Lower case means file-private, and upper case means a globally visible symbol. Look up __attribute__ ((visibility ("hidden"))) for declaring a symbol library-private: globally visible within a .o file, but file-private within the .so (library) file.)
    4. In gdb, if it fails to correctly show you the stack, maybe some memory was mmapped in (e.g. by DMTCP) that confused it. The command (gdb) help add-symbol-file along with the above information will allow you to tell gdb at what address in RAM the executable or library file on disk was loaded. The file on disk contains the symbol information.
    5. Some of these calculations can be automated for you by a DMTCP utility: (gdb) shell utils/gdb-add-symbol-file
  48. An interesting Linux command: addr2line
    ('main' is on line 2 of tmp.c in the example below.) SHELL% nm a.out
    ...
    080483e4 T main
    SHELL% addr2line -C -e /tmp/a.out 080483e4
    /tmp/tmp.c:2
  49. In comparing two versions of a file, consider programs such as: kompare, kdiff3, meld, gvimdiff (or text-based vimdiff).
  50. To examine the Linux source code, try "google LXR" (Linux Cross Reference), which should lead you to lxr.linux.no . LXR is free software for hyper-linking large code bases. Another popular choice is Doxygen (available as a package in many Linux distros).
  51. To see which virtual memory pages are currently mapped to physical memory, see /proc/PID/pagemap .
  52. To find out information known to BIOS: sudo dmiprobe -t help
  53. When using google to get technical information, stackoverflow.com tends to have high quality answers. Try those hits first.