B Writing a Remote Debugger Server

This appendix describes how to write a Ladebug remote debugger server for an Alpha target (operating system or hardware platform). It describes the functionality required of the server and explains how this functionality is implemented in the Digital UNIX server and in the server included in the debugger monitors for the Alpha server evaluation boards. It also includes information about writing new Ladebug remote debugger servers.

B.1 Reasons for Using a Remote Debugger

The main reason for using a remote debugger is to debug software on a system that cannot run a fully functional debugger locally. Examples of when you might use a remote debugger are:

Remote debuggers are useful simply for debugging software on systems that are remote from where you are working, although in this case there are often other alternatives (for example logging into the system across a network). Whether it is better to use remote debugging or one of these alternatives will often depend on the precise characteristics of the network and the debugger used.

B.2 Alternatives to Using a Remote Debugger

In most cases, the alternatives to using a remote debugger are either to put debugging code, such as print statements, in the software being debugged or to develop a simple local debugger for the target system.

Adding debugging code to the software increases the complexity of the software, hence causing additional bugs, and it is often difficult to determine what information will be needed to debug the software. A further problem is the debugging code can itself change the behavior of the software and as such normally has to be removed before the software is released.

Developing a local debugger can itself be a major task. Since such a debugger is normally a one-off development, you cannot normally justify including support for high level features (such as source level debugging). Even if this is possible, attempting to provide such facilities locally the target system will often not have the resources (memory, for example) required to run a high level debugger.

B.3 The Structure of a Remote Debugger

All remote debuggers consist of two parts:

These communicate through a remote debugger protocol that runs over some communication mechanism such as a serial line or the Internet.

The client provides the user interface to the debugger and most of the intelligence of the debugger. For example, the client will normally do all translation between addresses and variable or function names.

The server makes available low level functions that allow the client to examine and control software running on the target system. For example, the server provides functions to read and modify the target system's memory. The client requests these functions, and receives responses, using the remote debugger protocol.

In general, a server is much simpler than a client and will require only minimal functionality from the target system on which it is running. This allows servers to be implemented for environments in which the functionality of a full workstation operating system is not available.

B.4 Types of Targets

Most target systems for remote debugging are of these two types:

B.5 Ladebug as a Remote Debugger

Ladebug is a debugger running on Digital UNIX systems. It supports a wide range of languages including Ada, C, C++, COBOL, and Fortran. Besides providing local debugging on Digital UNIX systems, it supports remote debugging through the Ladebug remote debugger protocol. The same text and windows based interfaces are available for use with both local and remote debugging and almost all the commands that are available for local debugging are also available for remote debugging.

B.5.1 Target and Programming System Requirements

Ladebug servers must be able to:

For Ladebug to debug a program there must be symbolic information for the program available to Ladebug on the host in a form that it understands. At present, the only form of symbolic information that Ladebug understands for programs running on Alpha processors is extended COFF (ECOFF) for Digital UNIX. The program must follow the register usage, function calling, and other conventions expected of programs that have this form of symbolic information. For example, a program for which the symbolic information is ECOFF must use Digital UNIX register usage and function calling conventions.

B.5.2 The Protocol

The Ladebug Remote Debugger Protocol is a request/response protocol running over UDP. The debugger client (Ladebug) initiates all transactions sending a request to the server. On receiving the request the server acts upon the request and sends a response. The server never sends any messages except in response to requests received from the client. The requests that the client can send are listed in Table B-1.

Table B-1 Remote Debugger Protocol Client Requests

Request  Action 
Load Process  Loads a new process for debugging. 
Connect to Process  Connects to an existing process. 
Connect to Process Insist  Connects to an existing process even if other debugger sessions are already connected to it. 
Probe Process  Checks the state of the process being debugged. 
Disconnect from Process  Disconnects, ending a debugger session. 
Kill Process  Kills the process; and then disconnects. 
Stop Process  Stops a running process. 
Continue Process  Continues running a stopped process. 
Step  Executes one instruction in the process being debugged. 
Set Breakpoint  Sets a breakpoint at an address. 
Clear Breakpoint  Clears a breakpoint at an address. 
Get Next Breakpoint  Gets the "next" breakpoint that is known to the server. Breakpoints are returned in an arbitrary order but no breakpoint will be returned more than once in a single scan of the list. 
Get Registers  Gets the contents of all the registers. 
Set Registers  Sets the contents of all the registers. 
Read  Reads memory. 
Write  Writes to memory. 

Section B.11 contains a full description of the protocol.

B.5.3 Starting a Remote Debugger Session

The protocol provides three alternative requests for starting a debugger session:

All servers should implement either Load Process or Connect to Process but a server need not implement both of them. A server that implements Connect to Process can choose any of the following:

To allow a single target machine to run multiple remote debugger sessions at the same time, clients always send Connect and Load requests to a fixed privileged, UDP port on the target. The server is expected to allocate a new unprivileged UDP port before replying. The new port is used on the target as the source and destination of all messages for the remainder of the debugger session.

To allow servers to be run on systems on which security is an issue (for example typical Digital UNIX systems) the Connect and Load requests contain the client and server login names. The server can use these login names, together with the name of the host system, to check that the remote user is authorized to run programs on the target system, as is done when a user runs programs through rsh .

B.5.4 Ending a Remote Debugger Session

There are two different requests that end a remote debugger session:

Although there are explicit Ladebug commands that call up each of these requests, Ladebug normally kills processes that it has loaded and disconnects from processes to which it has connected. As such, a server that implements the Load Process function should at least implement the Kill Processes function and a server that implements the Connect to Process function should implement the Disconnect from Process function. The following are optional:

B.6 Example Servers

Two example Ladebug servers are available. The source code of these example servers is available from Digital Equipment Corporation for unrestricted reuse on Alpha based platforms (see the copyright notice in the source code for details).

B.6.1 The Digital UNIX Server

The Digital UNIX server is designed to allow the debugging of user processes running on remote Digital UNIX systems. The version described in this section supports loading new processes using the Load Process request but does not support the Connect to Process or Connect to Process Insist requests.

The server consists of a server daemon and user servers:

The remote debugger daemon must be run as a root process. It would normally be started at system start-up. The user server loads the debuggee using Digital UNIX's fork() and exec() functions and uses Digital UNIX's ptrace() interface to implement the low level debugger facilities required. The load server and daemon both use Digital UNIX UDP sockets to communicate with the client.

B.6.2 Evaluation Board Server

The evaluation board server is included in the evaluation board debug monitor provided with the Alpha evaluation boards (EB64, EB64+, EB66, etc.). It is designed to provide source level debugging of operating system kernels being ported to these boards, and of programs running on these boards without an operating system. The complete monitor (including the remote debugger server) is designed to be easily ported to other Alpha based hardware.

The server only supports starting debugger sessions through the Connect to Process or Connect to Process Insist requests. This server does not support the Load Process request. Since the monitor is not a multiprocessing system, the server ignores the process ID in the Connect requests. It also ignores the login names in Connect requests.

The user is expected to load the test program using the monitor load facilities (LOAD, NETLOAD, etc.) before starting the debugger server. The server interprets all addresses it receives as physical addresses. The server then performs all debugger functions by directly reading and writing memory.

To set breakpoints, the debugger patches a PAL call into the code being debugged. To avoid conflict with the use of the breakpoint PAL call by operating system kernels, this is not the standard breakpoint PAL call (the BPT PAL call) but a special PAL call (DBGSTOP). DBGSTOP exhibits identical behaviour but has its own system entry address. It is implemented in the debugger version of evaluation boards' PAL code. When this PAL call is executed, it results in a call back to the monitor at which point the state of the debuggee is saved and the server is reentered.

The monitor's ethernet software allows server to register to receive packets addressed to particular UDP port and to send packets on any UDP port. The server depends on interrupts to receive packets while the debuggee is running. Upon receiving any interrupt, the monitor polls the ethernet driver for messages. The ethernet software passes any appropriate messages to the server.

A consequence of using interrupts to receive messages is that some care is needed when debugging programs that do their own interrupt handling. To allow such programs to be debugged, the Evaluation Board user library contains a function that polls the ethernet. This function would normally be called by the application every time it receives an interrupt.

B.6.3 Structure of the Servers

Each server consists of:

B.6.4 Creating a Server for a New Target

The simplest way of creating a server for a new target is to base it upon one of the example servers. Normally, if you are developing the server to be part of a monitor program, you should base it upon the evaluation board server.

If, however, you are developing it as an operating system utility you should probably base it upon the Digital UNIX server. You should try to make as few changes as possible to the example servers, since you are likely to have no satisfactory way of debugging software (and hence the servers themselves) until you have successfully ported them to your target system.

B.7 The Communicators

This section describes

B.7.1 Communicator Interface Functions

The communicator contains the main function to the debugger server, to which the interface is target-dependent. It also contains some functions that the other components of the server can call. These functions are as follows:

B.7.2 Digital UNIX Communicator

The Digital UNIX communicator is implemented in the C source file server_main.c. This file contains the daemon's entry point (main() ), the main function of the user servers (user_server_main() ), and the interface functions previously described.

When the daemon is started, main() creates a socket and binds it to the Ladebug remote debugger connect port. It then reads packets from this port ignoring any packets that are not load requests. When it receives a load request, it checks that the client user is allowed to run remote debugger sessions on this machine using the server user name he has requested. This it does by calling to the Digital UNIX function ruserok() .

If the request is valid, the daemon creates a child process (by forking). The parent process then simply continues round the packet reading loop. The child process:

  1. Creates a new session.

  2. Changes its group ID to the primary group ID of the requested server user.

  3. Changes its login name to the server user name.

  4. Changes its uid to the server user's uid.

  5. Calls user_server_main() with the client address and the load request packet as arguments. When user_server_main() returns, the child process exits.

If, for any reason, the daemon is unable to start the user server, it then sends a load request response to the client containing an error code.

The user server function user_server_main() starts by creating a UDP socket for communicating with the client. It then:

  1. Finds a free unprivileged UDP and binds the socket to this address.

  2. Processes the load packet passed to it (by calling ProcessPacket() ) and sends the response to the client.

  3. Enters its main loop; in this loop it reads packets from the client, processes them, and sends the responses back to the server.

  4. Breaks out of this loop and exits from the user server when the client disconnects from it.

One complication in the code of the communicator is that the daemon has to be able to handle the receipt of duplicate load messages. The client sends such duplicate load messages when the server's load response message is lost or does not reach the client within the client's time-out time.

To handle such duplicate load messages, a pipe is created between the daemon and each user server. When the daemon receives a duplicate load message, it uses this pipe to pass it on the appropriate user server. The user server treats this message like any other duplicate message.

B.7.3 Evaluation Board Monitor

The evaluation boards' Ethernet driver software passes received frames to other parts of the monitor's software through call-back functions. A component of the monitor that wishes to receive frames on a UDP port calls a registration function provided by the Ethernet driver software.

The registration functions take as an argument the address of the call-back function to be called when such frames are received. When a component registers a call-back function, it can do either of the following:

If the ethernet hardware has received a packet for any registered UDP port then the driver will call the appropriate call-back function. The call-back function called may be in a completely different component of the monitor from that which called ethernet_process_one_packet() . Once any call-back function has completed its processing, ethernet_process_one_ packet() will return with a result indicating whether any packets were processed.

All packets passed to the ethernet driver must be built in fixed sized buffers provided by the driver, so that the ethernet driver never has to copy any data. The ethernet driver allocates these buffers at addresses from which the ethernet devices can send data and to which they can receive data.

To avoid the need for complex allocation algorithms, and complex error handling if buffer allocation fails, any component of the monitor can allocate a number of buffers at start-up. To maintain this buffer count the ethernet send functions always return to the caller a buffer to replace the buffer containing the packet to be sent.

On completion, the call-back functions used to receive frames must always return an ethernet buffer to the ethernet drivers. This can be, but need not be, the buffer that contained the received frame.

The debugger server does not need its own pool of buffers, since it only sends a frame immediately following the receipt of a frame. As such it handles received frames in the following steps:

  1. Receive a frame through call-back function.

  2. Process the frame.

  3. Send a response frame in the received buffer. The send function returns a buffer (most likely a different one).

  4. Exit call-back function returning the buffer returned by the send function.

The server's communicator is largely implemented in C source file server_read_loop.c . This contains the following code:


Note
The evaluation board library contains an assembler function called ladbx_poll() that calls app_poll() through the pointer at this address. Applications that do their own interrupt handline should call ladbx_poll() frequently (for example, every time they receive an interrupt) to ensure that the debugger server receives all debugger protocol packets without excessive delay.

In addition, the file kutil.s contains the source of the monitor's interrupt function. The monitor only enables interrupts when an application is running. When the monitor receives any interrupt, it saves the debuggee's state and polls the Ethernet for received frames. Since the monitor will normally receive regular 1ms timer interrupts this will ensure that it receives all the client's debugger frames.

B.7.4 Porting the Communicators to Other Systems

It should be possible to port the Digital UNIX communicator to most other versions of Digital UNIX and Digital UNIX derivatives with few changes. For operating systems that are not derived from Digital UNIX, the mechanism for starting user servers may have to be significantly modified.

In particular, many operating systems have no exact equivalent of fork() and instead start a new process by running a new executable file. On such a system, the communicator will have to be split into two separate executable files (one for the daemon and the other for the server). Also in such systems the new process typically does not have access to data set up by its parent before it was created, so some other mechanism will have to be used to transfer the first packet, and other data, to the user process.

The mechanism for setting the user identifier of the user server will vary widely between operating systems. Be aware that although the term daemon is a Digital UNIX term almost all operating systems have some mechanism for installing and running privileged background processes.

The socket mechanism used to read data from the network is quite widely available. If this mechanism is not available, then any other mechanism that allows the communicator to wait for the receipt of UDP packets on particular ports can be used.

If the operating system does not provide any such mechanism (for example, a real time kernel that does not include networking support), then one option is to port part or all of the networking code in the Evaluation Board Monitor to this environment. In this case, it may be easier to base your communicator upon the that in the Evaluation Board Server rather than basing it upon that in the Digital UNIX Server.

For embedded servers, few (if any) changes should be needed to the Evaluation Board's communicator. However, for many such systems you will need to rewrite the network device drivers. These are contained in the ethernet code of the Evaluation Board Monitor.

B.8 The Protocol Handler: Interface Functions and Implementation

The protocol handler's main interface function is ProcessPacket() . The only argument to this function is a pointer to the packet that it is to process. As a part of its processing of the packet ProcessPacket() converts the request packet passed to it into a response packet. The caller must ensure that the buffer pointed to by the argument is large enough to contain any possible response packet.

The function DumpPacket() can also be called by the communicator. This dumps the contents of packets passed to it, if the protocol handler is compiled with tracing enabled.

The code for the packet handler is identical for the two servers. It should not need to change for other server implementations. The source code is in the files packet-handling.c and packet-util.c ; packet-handling.c contains the function ProcessPackets(). When this function receives a packet, it:

  1. Checks whether the packet is a duplicate of the previous packet, by checking whether the sequence number is the same:

  2. Carries out the action appropriate to the request it has received.

    This normally consists of extracting some arguments from the packet and passing them to the appropriate debugger kernel function. Where a request changes the state of the connection to the client (for example the kill command disconnects from the client), it calls the appropriate communicator interface function to inform the communicator that this has happened. In some cases, it also retrieves data from kernel functions (for example, the contents of the registers) and copies them into the packet.

  3. Converts the packet into a response packet by setting the top bit of the packet's command code.

    It also sets the packet's return value. Before it returns the response packet to the communicator, it makes a copy of it so that it can be resent if the next packet duplicates the request packet.

packet-util.c contains utility functions for reading and writing the fields of packets and for dumping the contents of a packet. To avoid any possible alignment problems the utility functions read and write packet fields a byte at a time.

B.9 The Debugger Kernels

This section describes:

B.9.1 The Debugger Kernel Interface Functions

The debugger kernels provide the following interface functions to the protocol handler:

B.9.2 Digital UNIX Server Debugger Kernel

In the Digital UNIX server, the debugger kernel is implemented using the ptrace() function. This is a Digital UNIX function that allows a parent process to examine and control its children. Since ptrace() can only be used to debugger child processes the Digital UNIX debugger kernel only supports the loading of new processes and not connection to existing processes.

Since this means that the debuggee always runs as a child of the server, and the server is killed when the client disconnects, the Digital UNIX server does not support disconnecting from a debuggee without killing it.

When kload() loads a new debuggee, it does so using the Digital UNIX functions fork() and exec() . kload() creates the new process using the fork() function. This new child process:

  1. Makes a ptrace() call to allow its parent to control it using ptrace() calls. This also sets up a breakpoint on executing new images.

  2. Opens the standard input, output, and error files. If it is unable to open any of these it terminates.

  3. Loads and executes the debuggee by calling exec() . When it reaches the debuggee's entry point, it will stop and its parent will receive a SIGCHLD signal.

The parent process meanwhile waits for a signal from the child. When it receives a signal it, checks that the debuggee has stopped at a breakpoint (rather than, for example, having exited). If it has then the kernel checks whether the debuggee uses shared libraries.

If the debuggee does use shared libraries the server tells it to continue (through a ptrace() call) and waits for it to stop once more.


Note
This is done because starting a program that uses shared libraries on Digital UNIX involves executing two new images. The first is a special program loader and the second is the image of the program itself. The debuggee cannot be accessed by ptrace() until the child process stops for the second time.

Once the debuggee has been started the server stores the state of the debuggee in the variable child_state. The Digital UNIX kernel inserts breakpoints by using ptrace() to write a breakpoint PAL call to the address of the breakpoint. On Alpha Digital UNIX ptrace() always reads and writes 8-byte (2 instruction) quantities, so the kernel has to insert and remove breakpoints through a read, modify, write sequence.

The kernel implements stop request (function kstop() ) by sending a SIGINT signal to the debuggee. This will stop the debuggee unless it has disabled receiving SIGINT signals. kkill() uses the same technique to stop the debuggee before killing it.

When kgo() is called, it first checks whether there is a breakpoint at the current program counter. If there is a breakpoint there, then the kernel executes the original instruction at this location using the internal function kstepoverbreak() . This function temporarily puts the instruction back into the code and then uses ptrace() with the PT_STEP function code to execute this instruction. Once it has executed the instruction it restores the breakpoint.

When kstepoverbreak() returns kgo() calls the internal function kasyncwait() to set up kstopped() as a signal handler for the SIGCHLD signal. It then calls ptrace() with the PT_CONTINUE function code. This tells the debuggee to continue from its current program counter.

When a running debuggee stops for any reason, the server will receive a SIGCHLD signal. This will cause kstopped() to be called. kstopped() checks why the debuggee has stopped and sets the child_state appropriately. If the debuggee has stopped as a result of a breakpoint PAL call the program counter will point to the instruction after the breakpoint. Under these circumstances, kstopped() will move the program counter back one instruction to point at the breakpointed address.

When the kernel reads memory, it uses the breakpoint table functions to check whether there is a breakpoint on either of the longwords it is reading. If there is such a breakpoint, it reads the data for that longword from the breakpoint table rather than from the debuggee's memory. Similarly, when the kernel is asked to write to memory, it will update breakpoint table entries if necessary and will not overwrite breakpoint PAL calls.

The mapping of register to register number used by ptrace() is the same as that used by the kernel interface. This means that kregister() and ksetreg() translate very directly into ptrace() calls.

B.9.3 Evaluation Board Server

The evaluation board server's debugger kernel is implemented by directly reading from and writing to memory. It runs in the same environment as the debuggee, with the same mapping of virtual to physical addresses.

As such, there is no distinction between its memory and the debuggee's memory. This means that it can satisfy requests to read or write the debuggee's memory by simply reading or writing its own virtual memory.

The evaluation board kernel implements breakpoints through a special additional PAL call, DBGSTOP. The monitor's PAL code provides this additional PAL call. It functionally is identical to the standard Digital UNIX breakpoint PAL call except that its system entry address can be set up independently by passing a different function code value to wrest() . This allows the monitor to set breakpoints even in applications (for example operating systems) that do their own breakpoint handling using the standard breakpoint PAL call.

The evaluation boards' debugger kernel is implemented in the C source file kernel.c and the assembler source file kutil.s . The functions in these files are also used to implement the low level debugger commands provided by the monitor on its dumb terminal interface. kernel.c contains the main body of the debugger kernel, including all the interface functions previously listed. kutil.s contains the system entry points for interrupts, traps and breakpoints; and functions that provide a C interface to various PAL calls.

B.9.3.1 Initialization

kernel.c contains three functions that are used to initialize the debugger kernel:

B.9.3.2 Setting Breakpoints

The kernel sets breakpoints by saving the original instruction in the breakpoint table and inserted by writing the DBGSTOP instruction to the location at which a breakpoint is required. To simplify other memory access in the debugger monitor the kernel does not write DBGSTOP instructions into memory until just before the program is allowed to run, and replaces them with the original instructions as soon as the program stops.

The function kinstall_breakpoints() writes DBGSTOP instructions for all current breakpoints to memory, and the restore_breakpoint_instrs() internal function restores the original instructions at these locations whenever the debuggee stops. Because any modification to the debuggee's memory can alter its instruction stream, the kernel follows all writes to the debuggee's memory with instruction barrier PAL calls.

B.9.3.3 Hitting a Breakpoint or an Exception

When the debuggee hits a breakpoint (i.e., executes a DBGSTOP PAL call), the PAL code calls the monitor's assembler breakpoint function (dbgentry() ) , which:

  1. Saves processor's complete register set (including the program counter and the processor status) in a static area

  2. Calls kreenter() , which:

    1. Removes any temporary breakpoints (used for single stepping and to implement stop requests).

    2. Calls through a function pointer the kernel's current breakpoint continuation function.

Unless the debuggee has just single stepped or processed a stop request this function will be katbpt() . katbpt() steps the saved program counter back one instruction so that it:

  1. Points at the actual breakpoint (rather than at the instruction following it).

  2. Calls kwaitforcontinue() , which:

    1. Restores the original instructions at all the breakpoints.

    2. Sets a flag to indicate that the debuggee is stopped.

    3. Listens for either debugger packets or commands by repeatedly calling either read_packets() or user_main() .

    kwaitforcontinue() will stay in this loop until some other kernel function clears the stopped flag.

The handling of exceptions is similar to the handling of breakpoints. All the unused system entry points are initially set up to point to dbgtrap. This sets a flag (in a register that the PAL code has already saved) to indicate that the server was reentered as a result of an exception and then jumps to dbgentry2 .

This is an alternative entry point to the function dbgentry() . dbgentry() saves the processor's registers, as previously described, but then, instead of calling kreenter() , calls ktrap() . ktrap() removes any temporary breakpoints, then sets child_state appropriately and then calls kwaitforcontinue() .

B.9.3.4 Receiving and Processing Commands

A command can be received as a result of:

When a command is received, either the command processor or the protocol handler calls the appropriate kernel function. The functions that can be called are the previously listed interface functions. Table B-2 explains the behavior.

Table B-2 kernel Functions

kernel Function  Action 
kload() kload_implemented()   Always return FALSE 
kconnect_implemented()   Returns TRUE 
kconnect()   Always returns TRUE, does nothing else 
kkill_ possible()   Always returns FALSE 
kkill()   Does nothing 
kdisconnect_possible()   Always returns TRUE 
kdisconnect()   Does nothing 
kpid()   Always returns 0 
kgo()   Checks whether the debuggee is stopped at a breakpoint. If it is, it uses ksetstepbreak() to set the temporary breakpoints so that the debuggee stop again after executing one instruction. It also sets the breakpoint continuation function to be ksteppedoverbreak() . If the debuggee is not at a breakpoint, kgo() places breakpoint instructions (DBGSTOP PAL calls) at all the breakpoints and sets the breakpoint continuation function to be katbpt() . Then, whether or not the debuggee was at a breakpoint, it clears the stopped flag so that the debuggee will continue the next time kwaitforcontinue() checks the flag. 
kstop()   Checks whether the debuggee is still running or stopped. If the debuggee is stopped, kstop() does nothing. If the debuggee is running, it places a temporary breakpoint at the current instruction. 
kaddressok()   Returns TRUE if the address is quadword aligned and FALSE otherwise. 
kcexamine()   Reads the requested location. It does not need to check the breakpoint table because it is only called when the debuggee is stopped. 
kcdeposit()   Writes to the requested location. 
kstep()   Uses ksetstepbreak() to set up temporary breakpoints everywhere the program counter can be after executing the next instruction. This requires a maximum of two temporary breakpoints since ksetstepbreak() can work out the destination of a jump instruction by reading the instruction's argument register. It also sets the breakpoint continuation function to be katbpt and clears the stopped flag. 
kpc()   Returns the saved program counter. 
ksetpc()   Modifies the saved program counter. 
kregister()   Returns the value of the appropriate entry in the saved register array. 
ksetreg()   Sets the value of the appropriate entry in the saved register array. 
kbreak()   Calls bptinsert() . The kernel does not write to the debuggee's memory until the debuggee about to be run or resumed. 
kremovebreak()   Calls bptremove()
kpoll()   Returns the value of child_state

B.9.3.5 Continuing from a Breakpoint or Exception

When kwaitforcontinue() sees that the stopped flag is clear, it returns (through a number of intermediate functions) to bptentry() . This restores the processor registers and then calls the PAL RTI function to return to the debuggee.

If the debuggee was continuing from a (permanent) breakpoint as a result of a kgo() call, it will hit a new (temporary) breakpoint after executing one instruction. The state will be saved as it would be with a permanent breakpoint but the breakpoint continuation function called will be ksteppedoverbreak() . This backs up the program counter 1 instruction, places DBGSTOP PAL calls at all the breakpoints in the breakpoint table, and then once again returns to bptentry() to resume the debuggee.

The debuggee will now run until it is stopped by hitting a further permanent breakpoint or by an exception, or by a stop command.

B.9.3.6 Interrupt Handling

The assembler source file kutil.s contains the function dbgint() . This is the monitor's system entry point for interrupts. On receiving any interrupt the monitor save the previous state and call data_received() to tell the communicator that the ethernet device may have received data and that it should poll the ethernet driver.

The one complication in dbgint() is that if the server receives a Stop Request packet, then the debugger kernel will need to know the debuggee's current program counter. This is not necessarily the program counter saved by the PAL code because the interrupt routine can itself be interrupted (and therefore be called recursively).

The global variable containing the saved program counter is checked for a nonzero value. A value of zero is used to indicate that it is not in use. If it is already set, it is not reset but data_ received() is called.

If the program counter has not already been saved in this global variable, the value that was saved on the stack by the PAL code is examined. If it is within dbgint() , then this is a recursive call to dbgint() with the second interrupt having happened before the first call to dbgint() saved the program counter. In these circumstances, there is no need to call data_received() since it will be called by the outer call to dbgint() . Otherwise, this value is saved as the debuggee's program counter and data_received() is called.

This procedure requires that the code that saves the value of the program counter should be in the function dbgint() and not within another function called by dbgint() .

The kernel also contains a function, knullipl() , that clears an interrupt. On the EB64 version of the kernel, it simply writes two commands to the 82C59 (the interrupt controller used on the EB64). This function will clearly have to be rewritten for target systems that use different interrupt controllers.

B.9.4 Porting the Debugger Kernels to Other Systems

Few, if any, changes should be needed to port the Digital UNIX debugger kernel to other Digital UNIX like operating systems. The operating system functions used in the Digital UNIX debugger kernel seem to be available in all Digital UNIX dialects.

A problem that may arise is that in some Digital UNIX dialects, when the debuggee stops at a breakpoint the program counter, it may point to the actual breakpoint instruction rather than the instruction following the breakpoint. For some of the newer dialects of Digital UNIX, a server with greater functionality (in particular the additional ability to connect to existing processes) could be implemented by rewriting the debugger kernel to use the /proc debugger interface.

Porting the server to other operating systems will involve replacing the ptrace(), fork() and exec() calls with the equivalent calls (if they exist) for the target operating system. Assuming it is possible to read and write a subprocess's memory and registers, this should not be difficult.

On operating systems where this is not possible you may have to link some low level debugger functions into the debuggee and communicate between these functions and the kernel through shared memory. On such operating systems, there is also a need for a mechanism for detecting that the debuggee has stopped at a breakpoint. How this is done will vary widely between operating systems.

Few changes are likely to be needed to port the Evaluation Board Server to other embedded systems that use the Digital UNIX PAL code interface. The changes that will often be needed are as follows:

If some other PAL code interface is used, then this will probably require changes in how breakpoints are set and how the server's entry points are called when the debuggee hits a breakpoint or receives an interrupt. It may also alter how much of the debuggee's state is saved by the PAL code before the server's entry points are called from the PAL code, and the value of the program counter that is passed to the server on reaching a breakpoint.

The most common problems that have arisen when modifying the code of the debugger kernels are as follows:

B.10 The Breakpoint Table Handler: Interface Functions and Implementation

The breakpoint table handler provides the following interface functions:

The code for the breakpoint table handler is identical for the two servers. It should not need to change for other server implementations. The source code is in bptable.c . The table is implemented as 3 arrays of 100 entries each. The breakpoint number of a breakpoint is used as an index into these arrays. The three arrays are:

New entries are inserted in the first available entry and entries are found by a linear search.

B.11 Ladebug Remote Debugger Protocol

The Ladebug Remote Debugger Protocol is a request/response protocol running over UDP. The debugger client (Ladebug) initiates all transactions sending a request to the server. On receiving the request, the server acts upon the request and sends a response.

If the client does not receive a response within a time-out, it repeats the request (with an indication that the message is a duplicate). The time-out will vary between a tenth of a second and 10 seconds depending on how long it took to get responses to previous requests.

If the client does not receive a response, after a number of attempts and with increasing retry time-outs, it assumes that the communication path to the server has failed. The server never sends any messages except in response to messages received from the client.

Section B.11.1.1 through Section B.12 describe more about the Ladebug remote debugger protocol:

B.11.1 Messages and Message Formats

This section describes the Ladebug Remote Debugger Protocol messages and the format of each message.

B.11.1.1 Message Headers

Table B-3 shows header names, byte numbers, format, and contents of the message headers. Section B.11.1.2 shows the possible values of the messages.

Table B-3 Header Format

.
Name  Byte Number  Format  Content 
Protocol Version   0   Integer   Should be 2 
Retransmit Count   1   Integer   In requests, 0 the first time a packet is transmitted: each retransmission of packet increments by a one.

In responses, The retransmit count of the request.  

Command code   2 to 3   Integer in network order, most significant byte first[1]  Identifies the type of request or response. 
Sequence Number   4 to 7   Integer in network order   Identifies the message.  
Process ID   8 to 11   Integer in target machine order, least significant byte first for Alpha targets  Identifies the process being debugged. The value is not defined in load request messages. 
Return value   12 to 16   Integer in network order   Ignored in requests. In replies tells the client whether the requested action was successful, and if not why not 

[1] The protocol sends multibyte integer fields whose meaning is independent of the target architecture in conventional network order (i.e most significant byte first). Examples of such fields are the command code or byte counts. Multibyte integer fields that can only be interpreted with knowledge of the target architecture, such as addresses or register values, are sent in target machine order. For Alpha targets this means that such fields are sent least significant byte first.

B.11.1.2 Message Values

Table B-4 explains the values returned by messages.

Table B-4 Message Table

Value  Message  Explanation 
OK  Request succeeded 
Bad process ID  The process ID of the message is not that of the debuggee, or, in the case of Connect to Process, the server could not connect to that process. 
No resources  The server did not have the resources to carry out the request. 
Not connected  The server is not connected to a debuggee. The request requires that it should be. 
Not stopped  The debuggee is running. The request can only be carried out with the debuggee stopped. 
Bad address  The address given in the request is bad. The precise meaning of this varies between the different types of responses that can give this return value. 
Not implemented  The server does not implement this request. 
Bad load name  See Section B.11.1.4 
Already connected  The server is already debugging the requested debuggee. 
Cannot disconnect from process  See Section B.11.1.8 
10  Cannot kill process  See Section B.11.1.10 
11  Cannot step  See section Section B.11.1.12 

B.11.1.3 Load Process Request and Response

The load process request is a request to the server to load a new process and to start a new debugger session. They are transmitted in the request in the order shown with no unused bytes between the fields. Section B.11.1.4 describes the possible responses to a load process request.

Table B-5 shows the fields of the load process request:

Table B-5 Fields of the Load Process Request Message

Name  Length  Format  Contents 
Header  16 bytes   See Table B-3  See Table B-3 
Client User Name  Variable   Null terminated character string   Name of the user of the client on the host. This can be used by the server to check that the client is allowed to load the requested process.  
Server User Name  Variable   Null terminated character string   User name of user to run the process on the target. This will be ignored by some servers.  
Program Name  Variable   Null terminated character string   Name of program to be loaded. The form and interpretation of this name will vary between servers.  
Number of arguments  1 byte   Integer   Count of program argument fields  
Arguments  Variable   Variable number of null terminated strings. The number of arguments field gives the number of strings.   Arguments to be passed to the loaded process. May be ignored by some servers.  
Standard input  Variable   Null terminated string   File name of a file to which standard input is to be redirected. An empty string (just a 0 byte) indicates no redirection: otherwise the interpretation of the file name is server dependent.  
Standard output  Variable   Null terminated character string   Name of file to which standard output is to be redirected. This file name is interpreted in the same way as the standard input file name.  
Standard error  Variable   Null terminated character string   Name of file to which standard error is to be redirected. This file name is interpreted in the same way as the standard input file name.  

B.11.1.4 Responses to the Load Process Request

The command code for a Load Process request is 1. The server should ignore the PID received in a Load Process request.

The fields of a Load Process response are:

The command code of a load response is 0x8001.

B.11.1.5 Connect to Process Request and Response

A connect request message requests that the server should start a debugger session by connecting to an existing process on the host. The fields of a connect request are:

The command code of a connect request is 2.

The format of a connect response is identical to that of a load response. Possible failure reasons are:

The command code of a connect response is 0x8002

B.11.1.6 Connect to Process Insist Request and Response

A connect insist request message requests that the server should take over debugging a process to which there may already be a server connected. The formats of connect insist requests and responses are differ from those of connect requests and responses only in the command codes. A Connect to Process Insist request has a command code of 3 and its response has a command code of 0x8003. A server should only return the Already Connected failure reason if it could not terminate the old debugger session.

B.11.1.7 Probe Process Request and Response

The Probe Process request asks the server what the state of the debuggee is. It contains no fields other than the standard header. Its command code is 0x81.

The Probe Process response returns the state of the debuggee. Following the standard header (at byte 16) it contains a 1-byte integer field giving the state of the debuggee. Possible values are:

Possible failure reasons are:

The command code of Probe Process response is 0x8081.

B.11.1.8 Disconnect from Process Request and Response

The Disconnect from Process request asks the server to disconnect from both the current debuggee and the client. It will only succeed if the server can disconnect from the debuggee without killing it, or if the debuggee is already dead.

The effect on breakpoints of disconnecting from a process may vary between servers. In particular, the protocol does not define whether disconnecting from a stopped process will allow it to run on, or what happens if the processes reaches a breakpoint after the server has disconnected from it.

The request contains no fields other than the message header. Its command code is 0x82.

The Disconnect from Process response contains no fields other than the message header. Possible failure reasons are:

If the server cannot disconnect from the debuggee it will remain connected to the client and to the debuggee. The command code of Disconnect from Process response is 0x8082.

B.11.1.9 Stop Process Request and Response

The Stop Process request asks the server to stop a running debuggee as soon as possible. It contains no fields other than the message header. Its command code is 0x83.

The Stop Process response contains no fields other than the message header. Possible failure reasons are:

The command code of Stop Process response is 0x8083.

B.11.1.10 Kill Process Request and Response

The Kill Process request asks the server to kill the current debuggee and disconnect from the client. It will only succeed if the server can kill the debuggee, or if the debuggee is already dead. The request contains no fields other than the message header. Its command code is 0x84.

The Kill Process response contains no fields other than the message header. Possible failure reasons are:

If the server cannot kill the debuggee it will remain connected to the client and to the debuggee. The command code of Kill Process response is 0x8084.

B.11.1.11 Continue Process Request and Response

The Continue Process request asks the server to make the debuggee to run on until it hits a breakpoint, terminates, is stopped by the server acting on a Stop Process request, or stops for some other reason (e.g. executing a trap or exception). It contains no fields other than the message header. Its command code is 0xA1.

The Continue Process response contains no fields other than the message header. Possible failure reasons are:

If the debuggee is terminated the request will succeed: but its effect is undefined. The command code of a Continue Process response is 0x80A1.

B.11.1.12 Step Request and Response

The Step request asks the server to make the debuggee execute one instruction. It contains no fields other than the message header. Its command code is 0xA2.

The Step response contains no fields other than the message header. Possible failure reasons are:

If the debuggee is terminated the request may succeed: but its effect is undefined. The command code of a Step response is 0x80A2.

B.11.1.13 Set Breakpoint Request and Response

The Set Breakpoint request asks the server to set a breakpoint in the code of the debuggee. Although a server is not required to be able to set a breakpoint on any particular address to be useful it must be able set breakpoint on a significant portion of the instructions of the debuggee.

The effect of setting a breakpoint on anything other than an instruction of the debuggee is not defined. Furthermore, the effect of setting a breakpoint on an instruction that the debuggee modifies or reads as data is not defined.

The fields of a Set Breakpoint request are:

The command code of a Set Breakpoint request is 0xA3.

The Set Breakpoint response contains no fields other than the message header. Possible failure reasons are:

The command code of a Set Breakpoint response is 0x80A2.

B.11.1.14 Clear Breakpoint Request and Response

The Clear Breakpoint request asks the server to remove a breakpoint from the debuggee. Its fields are:

The command code of a Clear Breakpoint request is 0xA4.

The Clear Breakpoint response contains no fields other than the message header. Possible failure reasons are:

The command code of a Clear Breakpoint response is 0x80A4.

B.11.1.15 Get Next Breakpoint Request and Response

Using Get Next Breakpoint requests the client can get a complete list of the breakpoints known to the server that affect the current debuggee. In some servers this will include breakpoints set in the debuggee by previous remote debugger sessions or through an alternative interface. For example, in the evaluation board server it includes breakpoints set by previous debugger sessions and those set through the local debugger interface.

To get a complete list of breakpoints the client should start by sending a Get Next Breakpoint with a breakpoint address of zero. It should then send further Get Next Breakpoint requests each containing the address returned by the previous Get Next Breakpoint response. A server that receives this sequence of requests with no other requests intervening must return each breakpoint it knows about precisely once. The protocol does not define the order in which the server will return the breakpoints it knows about.

The fields of a Get Next Breakpoint request are:

The command code of a Get Next Breakpoint request is 0xA5.

The fields of a Get Next Breakpoint response are:

Possible failure reasons are:

The command code of a Get Next Breakpoint response is 0x80A5.

B.11.1.16 Get Registers Request and Response

The Get Registers request asks the server to send the client the contents of all the debuggee's registers and pseudo registers. It contains no fields other than the message header. Its command code 0xA6.

The fields of the Get Registers response are:

Possible failure reasons are:

The command code of a Get Registers response is 0x80A6.

B.11.1.17 Set Registers Request and Response

The Set Registers request asks the server to all the debuggee's registers and pseudo registers: including the debuggee's program counter. The request succeeds even if it is unable to change the values of some of the registers. Its fields are:

The command code of a Set Registers request is 0xA7.

A Set Registers response contains no fields other than the message header. Possible failure reasons are:

The command code of a Set Registers response is 0x80A7.

B.11.1.18 Read Request and Response

A Read request asks the server to read a portion of the debuggee's memory. Its fields are, in order:

Its command code is 0xA8.

The fields of a Read response are, in order:

Possible failure reasons are:

The command code of a Read response is 0x80A8.

B.11.1.19 Write Request and Response

A Write request asks the server to overwrite a portion of the debuggee's memory. Its fields are, in order:

Its command code is 0xA9.

A Write response contains no fields other than the message header. Possible failure reasons are:

The command code of a Write response is 0x80A9.

B.11.2 Order of Messages

A server can be modeled as a single control thread plus a debuggee thread for each debuggee. Each thread is uniquely identified by its UDP port number. Once a client has sent a message to a thread, it cannot send further messages to that thread (other than copies of the original message) until it receives a response.

A server (control or debuggee) thread can only send responses to the requests it receives. Servers threads are expected to respond promptly to all requests they receive. A server thread must never send more than one response to each message it receives. If it receives a duplicate request it must send a copy of its original response, with an updated retransmission count, without acting a second time on the request.

The only messages a client can send to a control thread are Connect to Process, Connect to Process Insist, and Load Process requests. A positive response to either of these requests identifies a new debuggee thread that the client should use for debugging the new debuggee.


Note
None of the previous information suggests that a server must be implemented as a multithreaded program. A server that can only debug one process at a time can send replies containing no resource failure codes to clients that attempt to connect to it when it is already debugging a process.

A debuggee thread can be in either running or stopped state. Initially a debuggee thread is in running state. In running state it will accept the following requests:

The client must not send any other requests to a debuggee thread when it is in running state. A debuggee thread will enter stopped state from running state whenever the debuggee stops. The client can discover the state of a debuggee thread by sending it a probe request. Any debuggee state other than running indicates that the debuggee thread is in stopped state. When it is in stopped state, the client can send the debuggee thread any request except Connect to Process, Connect to Process Insist, or Load Process.

A debuggee thread will return from stopped state to running state when it receives a Continue or Step request that it can act upon.

A debuggee thread will exit immediately after sending a positive Disconnect from Process or Kill Process response. It can also exit at any other time, either through some external cause, or as a result of some other client taking over the debuggee using a Connect to Process Insist request. Once a debuggee thread has exited the server will either ignore requests sent to it or send responses with not connected failure codes.

B.11.3 Recovering from Packet Loss

The standard packet header contains two fields that are used to recover from packet loss:

The sequence number is used to distinguish between different messages. The retransmission count is used to distinguish between copies of the same message. A client should give every message it sends to a particular server thread a different sequence number.

To avoid confusion between different clients started by the same user on the same host, it should also attempt to give load and connect requests sequence numbers that will not be used by other clients. One way to do this would be to base the sequence number upon the time at which the message is sent.

The first time it sends a message it will give it a retransmission count of 0. If it does not receive a response to a request within a reasonable time, it will increment the retransmission count and repeat the message. If after a number of attempts it has still not received a response with the same sequence number and retransmission count as the last message it sent it will assume that the server thread had exited or communications link has failed.


Note
The best values for "reasonable time" and "a number of attempts" are still being under investigation. At present, Ladebug starts off by waiting 1.6 seconds. Each time it receives a response within its time-out it halves its time-out down to a minimum of 1/10 of a second. Each time it fails to receive a response it doubles its time-out up to a maximum of 12.8 seconds. It makes a maximum of 8 attempts at sending any message.

On receiving a duplicate packet, the server should copy the new retransmission count into its original response and send this updated response to the client.

The server cannot normally detect communication failure and will wait indefinitely for messages from a client.

B.12 Transport Layer

The Ladebug Remote Debugger Protocol uses UDP running over an IP network layer as its transport. The client can use any UDP socket as its source but will always send load and connect requests to UDP socket 410. If the request is successful, the response will have as its source socket the socket allocated to the new debuggee thread. The client will send all messages for this debuggee thread to this socket.


Note
This socket has been allocated to this protocol by the Internet Assigned Number Authority (IANA).

The server should always send responses to the source socket of the associated request. The source socket for any message sent by the server should be either 410 (for responses to rejected load and connect messages) or the socket allocated to associated debuggee thread (for all other messages).