NetBSD Documentation: Debugging the NetBSD kernel with GDB HOWTO

Using KGDB


Using KGDB

Introduction

Although the DDB debugger that can be included with the NetBSD kernel is useful for gathering crash tracebacks, examining the values of variables, and other minor debugging tasks, if you're doing serious kernel hacking you'll want to setup to work with the remote debugger, KGDB, instead.

The advantage of KGDB over DDB is that you can step through the source code of the kernel, rather than through disassembled machine code. As a matter of fact, nearly all GDB facilities work, including any of the various graphical frontends for gdb (eg - devel/ddd).

Prerequisites

  • Two machines of the same architecture (including object code format) both running NetBSD:

    • TARGET - the machine that will be running the debug kernel

    • REMOTE - the machine that will run/display gdb

    It is possible to build gdb hosted on one architecture and targeted for another, but that is currently beyond the scope of this document.

  • A free serial port on each machine.

  • A null modem cable (see the NetBSD Serial Port Primer for more information).

  • Knowledge of how to build and install a kernel, and how to use gdb.

Instructions

In the following, we will assume that you'll be using /dev/tty01 on the REMOTE machine (the one running gdb) and /dev/tty00 on the TARGET (the one being debugged). You may need to substitute the correct devices for the serial ports for your hardware, e.g., /dev/dty00.

  1. Build a kernel with KGDB enabled

    (NOTE: It may be best to build the kernels on the REMOTE machine. That way all the proper source and symbol files are already there when it comes time to debug.)

    • Comment out the following lines in the kernel config file for the TARGET machine:

        #options 	DDB			# in-kernel debugger
        #options 	DDB_HISTORY_SIZE=100	# enable history editing

      and uncomment (or add) the following three lines:

        options 	KGDB		# remote debugger
        options 	KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600
        makeoptions	DEBUG="-g"	# compile full symbol table

      Change KGDB_DEVADDR to match the IO address of the serial port you will use on the TARGET (0x3f8 is tty00, 0x2f8 is tty01), and KGDB_DEVRATE to match the serial bitrate you want to use.

    • Configure and build the kernel for the TARGET.

  2. Prepare the TARGET machine

    Copy the file netbsd from the kernel build directory to the root directory of the TARGET machine. DO NOT INSTALL THIS KERNEL ON THE REMOTE MACHINE (especially if you're using the same tty on both machines!).

  3. Prepare the REMOTE machine

    • If you built the kernel on the TARGET machine, make a copy of all of /usr/src/sys from there to the REMOTE machine. (Note: you can't just NFS-mount the directory from the TARGET! When gdb hits a breakpoint, everything on the TARGET will stop, including nfsd!)

    • Change the line in /etc/ttys for the tty you plan to use on the REMOTE machine (and only the REMOTE) to something like:

        tty01 "/usr/libexec/getty std.9600" unknown off local

      The important parts here are off (so that init won't run getty on the port) and local. This is because ttyflags sets up the defaults for the port according to /etc/ttys at boottime, and gdb requires local to be set so that it doesn't wait for DTR.

      You may also want to change the std.9600 to a different bitrate - it should match the rate you set in the kernel options for the TARGET as well as the remotebaudrate you set in gdb (below). Make sure there is actually an entry in /etc/gettytab to match the name you give here.

    • Reboot the REMOTE machine, or otherwise have ttyflags run and reread /etc/ttys. (kill -1 1 may be sufficient, although init can get confused by a change in the ordering of items in /etc/ttys).

  4. Connect the serial ports with the null modem cable.

  5. Reboot the TARGET, and hit the space bar as soon as the boot loader message comes up. Enter the following command:

      boot -d

    This will cause the kernel to load, after which the message waiting for kgdb will be printed, and the TARGET will stop.

  6. On the REMOTE machine, cd to the directory where you built the kernel (usually /usr/src/sys/arch/something/compile/config-name) and run gdb:

    # gdb netbsd.gdb

    After a couple seconds of churning, you will get the (gdb) prompt.

  7. setup a couple gdb flags:

      # this one lets you stop the TARGET any time with Ctrl-C
      (gdb) set remotebreak 1
      # this sets the baudrate gdb will use (default 9600,
      # MUST match the setting in the kernel installed on the TARGET)
      (gdb) set remotebaud 9600
      # this one speeds up retransmissions of debugger
      # commands when there is a line error on the serial
      (gdb) set remotetimeout 3
  8. Connect to the TARGET machine (assuming you're using tty01 on the REMOTE):

      target remote /dev/tty01

    You should be greeted with something like the following:

      Remote debugging using /dev/tty01
      kgdb_connect (verbose=1) at 	../../../../arch/i386/i386/kgdb_machdep.c:244
      244             if (verbose)
      (gdb)

    If GDB instead appears to hang, you may have something wrong with your serial hardware, cable, or settings. See the troubleshooting section below.

  9. If you did get a prompt back, then you're ready to hack - you can set breakpoints, examine data, single step, etc, just like gdb'ing a userlevel application running on the local machine! To continue with the kernel boot process, use cont, and to pop back into the debugger at a later time, hit Ctrl-C.

  10. To automate steps 5 - 7, create a file called .gdbinit in the kernel build directory containing the following lines:

      file netbsd.gdb
      set remotebreak 1
      set remotebaud 9600
      target remote /dev/tty01

    Now you can start debugging by just typing gdb.

Troubleshooting

If things don't work, try some of these:

  • Reboot the TARGET without specifying -d. You should see a message similar to the following one displayed in the device probes. If you don't see the second line, either you didn't get KGDB enabled in the kernel you built, or you're running the wrong kernel:

      com0 at isa0 port 0x3f8-0x3ff irq4: ns16550a, working fifo
      com0: kgdb
  • Make sure the serial ports and cable work with a normal application: boot the TARGET with a non-KGDB kernel and try running tip between the two machines. If you don't know about tip, here's a quick rundown on what to do:

    • put the following lines in /etc/remote on both the TARGET and the REMOTE machines:

      	tty00-9600:dv=/dev/tty00:br#9600:pa=none:dc:
      	tty01-9600:dv=/dev/tty01:br#9600:pa=none:dc:
    • on the TARGET, give the command tip tty00-9600, and on the REMOTE do tip tty01-9600

    • type characters at the keyboard of each machine - the characters should echo to the other machine's display.

  • Double check the line for your tty in /etc/ttys, and reboot to make sure it has taken effect.

  • In all the above discussion, I've assumed you were running as root. tip and gdb may not work if you're running as a normal user (depending on the permissions of /dev/tty0*). Of course, running as root is not an advisable normal strategy. Instead, you should do this:

    1. put /dev/tty0* in group wheel (if it isn't already)

    2. add your username to the wheel line in /etc/group

    3. add your username to the dialer line in /etc/group

    (2) will allow your gdb process (and other processes run by you) to open the tty, (3) will allow you to run tip.

General Caveats

  1. You may sometimes notice a long pause after you've entered a command until you get a response back - this seems to be due to bad data on the serial connection; after a short pause and a retransmit, everything is back on track. setting remotetimeout to a value lower than the default 20 seconds helps immensely. (This was reported by one person to be caused (in his case) by having a kernel printf() executed between commands; this apparently somehow corrupted the gdb data).

  2. Ctrl-C may not work if the kernel has lock some high priority interrupts (depending on the port), ie you can't break an endless loop in splimp() on the i386, but if you place a breakpoint before the loop you can single-step through it.


Back to  NetBSD Documentation: Kernel