[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


3    Understanding CPU and Bus Issues That Influence Device Driver Design

This chapter discusses design issues related to writing device drivers that can operate on multiple CPU and bus architectures. The issues relate specifically to the Alpha CPU architecture and to bus architectures that Digital implements (EISA, ISA, PCI, and TURBOchannel). However, these issues might be applicable to other CPU and bus architectures. The chapter begins with a discussion of the CPU issues that influence device driver design and concludes with a summary of the bus issues that you need to consider when designing your device drivers.


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


3.1    CPU Issues That Influence Device Driver Design

Whenever possible, you should design a device driver so that it can accommodate peripheral devices that operate on more then one CPU architecture. You need to consider the following issues to make your drivers portable across CPU architectures:

The discussion centers around 64-bit Alpha CPU platforms and 32-bit CPU platforms, but the topics may be applicable to other CPU architectures. The following sections discuss each of these issues.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.1    Control Status Register Issues

Many device drivers based on the UNIX operating system access a device's control status register (CSR) addresses directly through a device register structure. This method involves declaring a device register structure that describes the device's characteristics, which include a device's control status register. After declaring the device register structure, the driver accesses the device's CSR addresses through the member that maps to it.

There are some CPU architectures that do not allow you to access the device CSR addresses directly. If you want to write your device driver to operate on both types of CPU architectures, you can write one device driver with the appropriate conditional compilation statements. You can also avoid the potentially confusing proliferation of conditional compilation statements by using the CSR I/O access kernel interfaces provided by Digital UNIX to read from and write to the device's CSR addresses. Because the CSR I/O access interfaces are designed to be CPU hardware independent, their use not only simplifies the readability of the driver, but also makes the driver more portable across different CPU architectures and different CPU types within the same architecture.

Section 7.1.9 shows how the /dev/none driver uses the CSR I/O access kernel interfaces read_io_port and write_io_port to read from and write to the device's CSR addresses. See Section 7.1.10 to learn how to build your own macros based on the read and write macros that Digital provides.

Device drivers operating on Alpha CPUs must access the device registers by defining device register offsets and passing them as arguments to the I/O access interfaces.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.2    I/O Copy Operation Issues

I/O copy operations can differ markedly from one device driver to another because of the differences in CPU architectures. Using techniques other than the generic kernel interfaces that Digital provides for performing I/O copy operations, you would probably not be able to write one device driver that operates on more than one CPU architecture or more than one CPU type within the same architecture.

To provide portability when performing I/O copy operations, Digital UNIX provides generic kernel interfaces to the system-level interfaces required by device drivers to perform an I/O copy operation. Because these I/O copy interfaces are designed to be CPU hardware independent, their use makes the driver more portable across different CPU architectures and more than one CPU type within the same architecture.

Section 18.5.1 shows you how to call these I/O copy operation interfaces.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.3    Direct Memory Access Operation Issues

Direct memory access (DMA) operations can differ markedly from one device driver to another because of the DMA hardware support features for buses on Alpha systems and because of the diversity of the buses themselves. Using the current techniques for performing DMA, you would probably not be able to write one device driver that operates on more than one CPU architecture or more than one CPU type within the same architecture.

To provide portability with regard to DMA operations, Digital UNIX provides generic kernel interfaces to the system-level interfaces required by device drivers to perform a DMA operation. These generic interfaces are typically called ``mapping interfaces.'' This is because their historical background is to acquire the hardware and software resources needed to map contiguous I/O bus addresses and accesses into discontiguous system memory addresses and accesses. Because these interfaces are designed to be CPU hardware independent, their use makes the driver more portable across different CPU architectures and more than one CPU type within the same architecture.

Section 18.6 shows you how to use these mapping interfaces to achieve device driver portability across different CPU architectures.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.4    Memory Mapping Issues

Many device drivers based on the UNIX operating system provide a memory map section to handle applications that make use of the mmap system call. An application calls mmap to map a character device's memory into user address space. Some CPU architectures, including the Alpha architecture, do not support an application's use of the mmap system call. If your device driver operates only on CPUs that support the mmap feature, you can continue writing a memory map section. If, however, you want the device driver to operate on CPUs that do not support the mmap feature, you should design the device driver so that it uses something other than a memory map section.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5    32-Bit Versus 64-Bit Issues

This section describes issues related to declaring data types for 32-bit and 64-bit CPU architectures. By paying careful attention to data types, you can make your device drivers work on both 32-bit and 64-bit systems. Table 3-1 lists the C compiler data types and bit sizes for 32-bit CPUs and the Alpha 64-bit CPUs.

Table 3-1: C Compiler Data Types and Bit Sizes

C Type 32-Bit Data Size Alpha 64-Bit Data Size
short 16 bits 16 bits
int 32 bits 32 bits
long 32 bits 64 bits
* (pointer) 32 bits 64 bits
long long 64 bits 64 bits
char 8 bits 8 bits

The following sections describe some common declaration situations:

Note

The /usr/sys/include/io/common/iotypes.h file defines constants used for 64-bit conversions. See Writing Device Drivers: Reference for a description of the contents of this file.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.1    Declaring 32-Bit Variables

Declare any variable that you want to be 32 bits in size as type int, not type long. The size of variables declared as type int is 32 bits on both 32-bit systems and the 64-bit Alpha systems.

Look at any variables declared as type int in your existing device drivers to determine if they hold an address. On Alpha systems, sizeof (int) is not equal to sizeof (char *).

In your existing device drivers, also look at any variable declared as type long. If it must be 32 bits in size, you have to change the variable declaration to type int.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.2    Declaring 32-Bit and 64-Bit Variables

If a variable should be 32 bits in size on a 32-bit system and 64 bits in size on a 64-bit Alpha system, declare it as type long.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.3    Declaring Arguments to C Functions

Be aware of arguments to C interfaces (functions) where the argument is not explicitly declared and typed. You should explicitly declare the formal parameters to C interfaces; otherwise, their sizes may not match up with the calling program. The default size is type int, which truncates 64-bit addresses.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.4    Declaring Register Variables

When you declare variables with the register keyword, the compiler defaults its size to that of type int. For example:

register somevariable;

Remember that these variable declarations also default to type int. For example:

unsigned somevariable;

Thus, if you want the variable to be 32 bits in size on both 32-bit systems and 64-bit Alpha systems, the above declarations are correct. However, if you want the variable to be 32 bits in size on a 32-bit system and 64 bits in size on a 64-bit Alpha system, declare the variables explicitly, using the type long.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.5    Performing Bit Operations on Constants

By default, constants are 32-bit quantities. When you perform shift or bit operations on constants, the compiler gives 32-bit results. If you want a 64-bit result, you must follow the constant with an L. Otherwise, you get a 32-bit result. For example, the following is a left shift operation that uses the L:

long foo, bar;
foo = 1L << bar;


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.6    Using NULL and Zero Values

Using the value zero (0) where you should use the value NULL means that you get a 32-bit constant. On Alpha systems, this usage could mean the value zero (0) in the low 32 bits and indeterminate bit values in the high 32 bits. Using NULL from the types.h file allows you to obtain the correct value for both the 32-bit CPUs and the 64-bit Alpha CPUs.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.7    Modifying Type char

Modifying a variable declared as type char is not atomic on Alpha systems. You will get a load of 32 or 64 bits and then byte operations to extract, mask, and shift the byte, followed by a store of 32 or 64 bits.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.8    Declaring Bit Fields

Bit fields declared as type int on Alpha systems generate a load/store of longword (32 bits). Bit fields declared as type long on Alpha systems generate a load/store of quadword (64 bits).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.9    Using printf Formats

The printf formats %d and %x will print 32 bits of data. To obtain 64 bits of data, use %ld and %lx.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.10    Using mb and wbflush

Device drivers used the wbflush interface with ULTRIX systems on MIPS CPUs. Although wbflush is aliased to the mb (memory barrier) interface for Alpha CPUs, Digital recommends that all new device drivers call the mb interface. The remainder of this section discusses when to call the mb interface on Alpha CPUs.

In most situations that would require a cache flush on other CPU architectures, you should call the mb interface on Digital UNIX Alpha systems. The reason is not that mb is equivalent to a cache flush (as it is not). Rather, a common reason for doing a cache flush is to make data that the host CPU wrote available in main memory for access by the DMA device or to access from the host CPU data that was put in main memory by a DMA device. In each case, on an Alpha CPU you should use a memory barrier to synchronize with that event.

A call to mb is occasionally needed even where a call to wbflush was not needed. In general, a memory barrier causes loads/stores to be serialized (not out-of-order), empties memory pipelines and write buffers, and ensures that the data cache is coherent.

You should use the mb interface to synchronize DMA buffers. Use it before the host releases the buffer to the device and before the host accesses a buffer filled by the device.

Alpha CPUs do not guarantee to preserve write ordering, so memory barriers are required between multiple writes to I/O registers where order is important. The same is also true for read ordering.

Use the memory barrier to prevent writes from being collapsed in the write buffer, that is, to prevent bytes, shorts, and ints from being merged into one 64-bit write.

Alpha CPUs require that data caches be transparent. Because there is no way to explicitly flush the data cache on an Alpha platform, you need not call mb before or after. The following code fragment illustrates the use of a memory barrier:


.
.
.
bcopy (data, DMA_buffer, nbytes); mb(); device->csr = GO; mb();
.
.
.

Another example is presented in the following code fragment:


.
.
.
device_intr() { mb(); bcopy (DMA_buffer, data, nbytes); /* If we need to update a device register, do: */ mb(); device->csr = DONE; mb(); }

Another way to look at this issue is to recognize that Alpha CPUs maintain cache coherency for you. However, Alpha CPUs are free to do the cache coherency in any manner and time. The events that cause you to want to read buffers, or the events you want to trigger to release a buffer you have written, are not guaranteed to occur at a time consistent with when the hardware maintains cache coherency. You need the memory barrier to achieve this synchronization.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.5.11    Using the volatile Compiler Keyword

The volatile keyword prevents compiler optimizations from being performed on data structures and variables; such actions could result in unexpected behavior. The following example shows the use of the volatile keyword on a device register structure:

typedef volatile struct {
    unsigned adder;
    unsigned pad1;
    unsigned data;
    unsigned pad2;
    unsigned csr;
    unsigned pad3;
    unsigned test;
    unsigned pad4;
} CB_REGISTERS;

The following variables or data structures should be declared as volatile by device drivers:

The purpose of using the volatile keyword on the example data structure is to prevent compiler optimizations from being performed on it; such actions could result in unexpected behavior.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.6    Memory Barrier Issues

The Alpha architecture does not guarantee read/write ordering. That is, the memory subsystem is free to complete read and write operations in any order that is optimal, without regard for the order in which they were issued. Read/write ordering is not the same as cache coherency, which is handled separately and is not an issue.

The Alpha architecture also contains a write buffer (as do many high-performance RISC CPUs, including the MIPS R3000). This write buffer can coalesce multiple writes to identical or adjacent addresses into a single write, effectively losing earlier write requests. Similarly, multiple reads to the same identical or adjacent addresses can be coalesced into a single read.

This coalescing has implications for multiprocessor systems, as well as systems with off-board I/O or DMA engines that can read or modify memory asynchronously or that can require multiple writes to actually issue multiple data items. The mb (memory barrier) interface guarantees ordering of operations. The mb interface is derived from the MB instruction, which is described in the Alpha Architecture Reference Manual.

The mb interface is a superset of the wbflush interface that ULTRIX drivers use. For compatibility, wbflush is aliased to mb on Digital UNIX Alpha systems.

You call mb in a device driver under the following circumstances:

Each of these is briefly discussed in the following sections.

Note

Device drivers and the Digital UNIX operating system are the primary users of the mb interface. However, some user programs, such as a graphics program that directly maps the frame buffer and manipulates registers, might need to call mb. The Digital UNIX operating system does not provide a C library interface for mb. User programs that require use of mb should use the following asm construct:

#include <c_asm.h>

asm ("mb");


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.6.1    Forcing a Barrier Between Load/Store Operations

You can call the mb interface to force a barrier between load/store operations. This call ensures that all previous load/store operations access memory or I/O space before any subsequent load/store operations. The following call to mb ensures that the first register is physically written before the load attempts to read the second register. The call assumes that device is an I/O handle that you can use to reference a device register or memory located in bus address space (either I/O space or memory space). You can perform standard C mathematical operations (addition and subtraction only) on the I/O handle. For example, this code fragment adds the I/O handle and general register 1 and general register 2 in the calls to write_io_port and read_io_port.


.
.
.
#define csr 0 /* Command/Status register */ #define reg1 8 /* General register 1 */ #define reg2 16 /* General register 2 */ io_handle_t device;
.
.
.
write_io_port(device + reg1, 4, 0, value); [1] mb (); [2] next_value = read_io_port(device + reg2, 4, 0); [3]
.
.
.

  1. Writes the first value to general register 1 by calling the write_io_port interface. See Section 7.1.9 for a detailed discussion of write_io_port. [Return to example]

  2. Calls the mb interface to ensure that the write of the value is completed. [Return to example]

  3. Reads the new value from general register 2 by calling the read_io_port interface. See Section 7.1.9 for a detailed discussion of read_io_port. [Return to example]


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.6.2    After the CPU Has Prepared a Data Buffer in Memory

You call the mb interface after the CPU has prepared a data buffer in memory and before the device driver tries to perform a DMA out of the buffer. You also call mb in device drivers that perform a DMA into memory and before using the data in the DMA buffer. The following calls to mb ensure that data is available (out of memory pipelines/write buffers) and that the data cache is coherent. The call assumes that device is an I/O handle that you can use to reference a device register or memory located in bus address space (either I/O space or memory space). You can perform standard C mathematical operations (addition and subtraction only) on the I/O handle. For example, this code fragment adds the I/O handle and the command/status register in the call to read_io_port.


.
.
.
#define csr 0 /* Command/Status register */ #define reg1 8 /* General register 1 */ #define reg2 16 /* General register 2 */ io_handle_t device;
.
.
.
bcopy (data, dma_buf, nbytes); [1] mb (); [2] write_io_port(device + csr, 4, 0, START_DMA); [3] /* or */ [4] if (read_io_port(device + csr, 4, 0) | DMA_DONE) { mb (); bcopy (dma_buf, data, nbytes); }
.
.
.

  1. Writes the data into the DMA buffer. [Return to example]

  2. Calls the mb interface to ensure that the write of the data is completed. [Return to example]

  3. Issues the start command to the device. [Return to example]

  4. This sequence of code presents another way to accomplish the same thing as the previous code. See Section 7.1.9 for a detailed discussion of read_io_port.

    If the DMA is finished:

    [Return to example]


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.6.3    Before Attempting to Read Any Device CSRs

You call the mb interface before attempting to read any device CSRs after taking a device interrupt. The call assumes that device is an I/O handle that you can use to reference a device register or memory located in bus address space (either I/O space or memory space). You can perform standard C mathematical operations (addition and subtraction only) on the I/O handle. For example, this code fragment adds the I/O handle and the command/status register in the call to read_io_port.


.
.
.
#define csr 0 /* Command/Status register */ #define reg1 8 /* General register 1 */ #define reg2 16 /* General register 2 */ io_handle_t device;
.
.
.
device_intr() { mb(); [1] stat = read_io_port(device + csr, 4, 0); [2] /* or */ [3] mb(); bcopy (dma_buf, data, nbytes); }
.
.
.

  1. Calls the mb interface to ensure that the device CSR write has completed. [Return to example]

  2. Reads the status from the device. See Section 7.1.9 for a detailed discussion of read_io_port. [Return to example]

  3. This sequence of code presents another way to accomplish the same thing as the previous code. It calls the mb interface to ensure that the buffer is correct. It then gets the data from the DMA buffer by calling the bcopy interface. [Return to example]

Note

Digital UNIX on an Alpha system provides a memory barrier in the interrupt stream before calling any device interrupt handlers. Thus, a call to mb is not strictly necessary in the device interrupt case. For performance reasons in the device interrupt case, you can omit the call to mb.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


3.1.6.4    Between Writes

You call the mb interface between writes if you do not want a write buffer to collapse the writes (merge bytes/shorts/ints/quads or reorder). The call assumes that device is an I/O handle that you can use to reference a device register or memory located in bus address space (either I/O space or memory space). You can perform standard C mathematical operations (addition and subtraction only) on the I/O handle. For example, this code fragment adds the I/O handle and general register 1 and general register 2 in the calls to write_io_port.


.
.
.
#define csr 0 /* Command/Status register */ #define reg1 8 /* General register 1 */ #define reg2 16 /* General register 2 */ io_handle_t device;
.
.
.
*ptr = value; [1] mb (); [2] *(ptr+1) = value2; [3] /* or */ [4] write_io_port(device + reg1, 4, 0, value);
mb (); write_io_port(device + reg2, 4, 0, value2);
.
.
.

  1. Writes the first location. [Return to example]

  2. Calls the mb interface to force a write out of the write buffer. [Return to example]

  3. Writes the second location. [Return to example]

  4. This sequence of code illustrates an example more specifically tailored to device drivers. Note that this use of mb is exactly equivalent to wbflush. See Section 7.1.9 for a detailed discussion of write_io_port.

    This sequence:

    [Return to example]

Note

The Alpha Architecture Reference Manual (1992 edition) has a technical error in the description of the MB instruction. It specifies that MB is needed only on multiprocessor systems. This statement is not accurate. The MB instruction must be used in any system to guarantee correctly ordered access to I/O registers or memory that can be accessed via off-board DMA. All such off-board I/O and DMA engines are considered ``processors'' in the Alpha architecture's definition of multiprocessor.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


3.2    Bus Issues That Influence Device Driver Design

Whenever possible, you should design a device driver so that it can accommodate peripheral devices that can operate on more than one bus architecture. You can achieve portability across bus architectures particularly when the bus architectures themselves exhibit common features and attributes. For example, you should be able to write one device driver for a device that operates on the Industry Standard Architecture (ISA) and Extended Industry Standard Architecture (EISA) buses because their architectures exhibit common features and attributes. In other cases, it may not be feasible to write one device driver for multiple bus architectures if these architectures exhibit dissimilar features and attributes.

You must consider the following issues to make your drivers portable across bus architectures. The discussion centers around the following Digital-implemented buses: EISA, ISA, PCI, and TURBOchannel. However, the issues may be applicable to other bus architectures.