5 Kernel-Mode Capabilities

Tru64 UNIX offers several kernel-mode programming capabilities. This chapter describes the tasks that you can do in kernel mode:

Work with string routines (Section 5.1)

Use data copying routines (Section 5.2)

Use kernel-related routines (Section 5.3)

Manage system time (Section 5.4)

Use kernel threads (Section 5.5)

Use locks (Section 5.6)

This chapter discusses the routines most commonly used and provides code fragments to illustrate how to call them in a kernel module. These code fragments and associated descriptions supplement the reference page descriptions for these and the other routines presented in Reference Pages, Section 9r, Device Drivers (Volume 1).

5.1 Using String Routines

String routines allow kernel modules to:

Compare two null-terminated strings (Section 5.1.1)

Compare two strings by using a specified number of characters (Section 5.1.2)

Copy a null-terminated character string (Section 5.1.3)

Copy a null-terminated character string with a specified limit (Section 5.1.4)

Return the number of characters in a null-terminated string (Section 5.1.5)

The following sections describe the routines that perform these tasks.

5.1.1 Comparing Two Null-Terminated Strings

To compare two null-terminated character strings, call the strcmp routine. The following code fragment shows a call to strcmp:


.
.
.
register struct device *device;
struct controller *ctlr;

.
.
.
if (strcmp(device->ctlr_name, ctlr->ctlr_name)) { [1]

.
.
.
}

Shows that the strcmp routine takes two arguments:
- The first argument specifies a pointer to a string (an array of characters terminated by a null character). In this example, this argument is the controller name pointed to by the ctlr_name field of the pointer to the device structure.
- The second argument also specifies a pointer to a string. In the example, this argument is the controller name pointed to by the ctlr_name field of the pointer to the controller structure.
[Return to example]

The code fragment sets up a condition statement that performs tasks that are based on the results of the comparison. Figure 5-1 shows how strcmp compares two sample character-string values in the code fragment. In item 1, strcmp compares the two controller names and returns the value 0 (zero) because the two strings were identical.

In item 2, strcmp returns an integer that is less than zero because the lexicographical comparison indicates that the characters in the first controller name, fb, come before the letters in the second controller name, ipi. In other words, the first pair of letters--in the same position in both strings--that do not match are f and i, and f is less than i.

Figure 5-1: Results of the strcmp Routine

5.1.2 Comparing Two Strings by Using a Specified Number of Characters

To compare two strings by using a specified number of characters, call the strncmp routine. The following code fragment shows a call to strncmp:


.
.
.
register struct device *device;

.
.
.
if( (strncmp(device->dev_name, "rz", 2) == 0)) [1]

.
.
.

Shows that the strncmp routine takes three arguments:
- The first argument specifies a pointer to a string. In the example, this argument is the device name pointed to by the dev_name field of the pointer to the device structure.
- The second argument also specifies a pointer to a string. In the example, this argument is the character string rz.
- The third argument specifies the number of bytes to be compared. In the example, the number of bytes to compare is 2.
[Return to example]

The code fragment sets up a condition statement that performs tasks that are based on the results of the comparison. Figure 5-2 shows how strncmp compares two sample character-string values in the code fragment. In item 1, strncmp compares the first two characters of the device name none with the string rz. It then returns an integer less than the value 0 (zero), because strncmp makes a lexicographical comparison between the two strings and the string no comes before the string rz. In item 2, strncmp compares the first two characters of the device name rza with the string rz and returns the value 0 (zero), because strncmp makes a lexicographical comparison between the two strings and the string rz is equal to the string rz.

Figure 5-2: Results of the strncmp Routine

5.1.3 Copying a Null-Terminated Character String

To copy a null-terminated character string, call the strcpy routine. The following code fragment shows a call to strcpy:


.
.
.
struct tc_slot  tc_slot[TC_IOSLOTS]; [1]
char curr_module_name[TC_ROMNAMLEN + 1]; [2]

.
.
.
strcpy(tc_slot[i].modulename, curr_module_name); [3]

.
.
.

Declares an array of tc_slot structures of size TC_IOSLOTS. [Return to example]

Declares a variable to store the module name from the ROM of a device on the TURBOchannel bus. [Return to example]

Shows that the strcpy routine takes two arguments:
- The first argument specifies a pointer to a buffer large enough to hold the string to be copied. In the example, this buffer is the modulename field of the tc_slot structure for the specified bus.
- The second argument specifies a pointer to a string. This is the string to be copied to the buffer that the first argument specifies. In the example, this is the module name from the ROM, which is stored in the curr_module_name variable.
[Return to example]

Figure 5-3 shows how strcpy copies a sample value in the code fragment. The routine copies the string CB (the value that is contained in curr_module_name) to the modulename field of the tc_slot structure for the specified bus. This field is presumed large enough to store the character string. The strcpy routine returns the pointer to the location following the end of the destination buffer.

Figure 5-3: Results of the strcpy Routine

5.1.4 Copying a Null-Terminated Character String with a Specified Limit

To copy a null-terminated character string with a specified limit, call the strncpy routine. The following code fragment shows a call to strncpy:


.
.
.
register struct device *device;
char * buffer;

.
.
.
strncpy(buffer, device->dev_name, 2); [1]
if (buffer == somevalue)

.
.
.

Shows that strncpy takes three arguments:
- The first argument specifies a pointer to a buffer of at least the same number of bytes as specified in the third argument. In the example, this is the pointer to the buffer variable.
- The second argument specifies a pointer to a string. This is the character string to be copied and in the example is the value pointed to by the dev_name field of the pointer to the device structure.
- The third argument specifies the number of characters to copy, which in the example is two characters.
[Return to example]

The code fragment sets up a condition statement that performs some tasks that are based on the characters stored in the pointer to the buffer variable.

Figure 5-4 shows how strncpy copies a sample value in the code fragment. The routine copies the first two characters of the string none (the value pointed to by the dev_name field of the pointer to the device structure). The strncpy routine stops copying after it copies a null character or the number of characters that are specified in the third argument, whichever comes first.

The figure also shows that strncpy returns a pointer to the /NULL character at the end of the first string (or to the location following the last copied character if there is no NULL). The copied string will not be null terminated if its length is greater than or equal to the number of characters that are specified in the third argument.

Figure 5-4: Results of the strncpy Routine

5.1.5 Returning the Number of Characters in a Null-Terminated String

To return the number of characters in a null-terminated character string, call the strlen routine. The following code fragment shows a call to strlen:


.
.
.
char *strptr;

.
.
.
if ((strlen(strptr)) > 1) [1]

Shows that the strlen routine takes one argument: a pointer to a string. In the example, this pointer is the variable strptr. [Return to example]

The code fragment sets up a condition statement that performs some tasks that are based on the length of the string. Figure 5-5 shows how strlen determines the number of characters in a sample string in the code fragment. As the figure shows, strlen returns the number of characters that the strptr variable points to, which in the code fragment is four. The strlen routine does not count the terminating null character.

Figure 5-5: Results of the strlen Routine

5.2 Using Data Copying Routines

The data copying routines allow kernel modules to:

Copy a series of bytes with a specified limit (Section 5.2.1)

Zero a block of kernel memory (Section 5.2.2)

Zero a block of memory in user space (Section 5.2.3)

Copy data from user address space to kernel address space (Section 5.2.4)

Copy data from kernel address space to user address space (Section 5.2.5)

Move data between user virtual space and system virtual space (Section 5.2.6)

The following sections describe the routines that perform these tasks.

5.2.1 Copying a Series of Bytes with a Specified Limit

To copy a series of bytes with a specified limit, call the bcopy routine. The following code fragment shows a call to bcopy:


.
.
.
struct tc_slot  tc_slot[TC_IOSLOTS]; [1]

.
.
.
char *cp; [2]

.
.
.
bcopy(tc_slot[index].modulename, cp, TC_ROMNAMLEN + 1); [3]

.
.
.

Declares an array of tc_slot structures of size TC_IOSLOTS. [Return to example]

Declares a pointer to a buffer that stores the bytes of data that are copied from the first argument. [Return to example]

Shows that the bcopy routine takes three arguments:
- The first argument is a pointer to a byte string (array of characters). In the example, this array is the modulename field of the tc_slot structure for this bus.
- The second argument is a pointer to a buffer that is at least the size that is specified in the third argument. In the example, this buffer is represented by the pointer to the cp variable.
- The third argument is the number of bytes to be copied. In the example, the number of bytes is the value of the constant TC_ROMNAMLEN plus 1.
[Return to example]

Figure 5-6 shows how bcopy copies a series of bytes by using a sample value in the code fragment. As the figure shows, bcopy copies the characters CB to the buffer cp without searching for null bytes. The copy is nondestructive; that is, the address ranges of the first two arguments can overlap.

Figure 5-6: Results of the bcopy Routine

5.2.2 Zeroing a Block of Memory in Kernel Address Space

To zero a block of memory in kernel address space, call the bzero routine. The following code fragment shows a call to bzero.


.
.
.
struct bus *new_bus

.
.
.
bzero(new_bus, sizeof(struct bus)); [1]

.
.
.

Shows that the bzero routine takes two arguments:
- The first argument is a kernel address at which the zeroing operation starts. In the example, the first argument is a pointer to a bus structure.
- The second argument is the number of bytes to be zeroed. In the example, this size is expressed through the use of the sizeof operator, which returns the size of a bus structure.
[Return to example]

In the example, bzero is used to zero the number of bytes that are associated with the size of the bus structure, starting at the address specified by new_bus.

5.2.3 Zeroing a Block of Memory in User Address Space

To zero a block of memory in user address space, call the uzero routine. The following code fragment shows a call to uzero.


.
.
.
void *user_addr
size_t cnt;

.
.
.
int err;

.
.
.
if (err = uzero(user_addr, cnt)) [1]

.
.
.

Shows that the uzero routine takes two arguments:
- The first argument is a user address at which the zeroing operation starts.
- The second argument is the number of bytes to be zeroed.
[Return to example]

In the example, uzero is used to zero cnt bytes starting at address user_addr. It returns the value 0 (zero) upon successful completion. If the address in user address space cannot be accessed, uzero returns the error EFAULT.

5.2.4 Copying Data from User Address Space to Kernel Address Space

To copy data from the unprotected user address space to the protected kernel address space, call the copyin routine. The following code fragment shows a call to copyin:


.
.
.
struct buf *bp;
int err;
void* buff_addr;
void* kern_addr;

.
.
.
if (err = copyin(buff_addr, kern_addr, bp->b_resid)) { [1]

.
.
.

Shows that the copyin routine takes three arguments:
- The first argument specifies the address in user space of the data to be copied. In the example, this address is the user buffer's address.
- The second argument specifies the address in kernel space to which to copy the data. In the example, this address is the address of the kernel buffer.
- The third argument specifies the number of bytes to copy. In the example, the number of bytes is contained in the b_resid field of the pointer to the buf structure.
[Return to example]

The code fragment sets up a condition statement that performs tasks that are based on whether copyin executes successfully. Figure 5-7 shows how copyin copies data from user address space to kernel address space by using sample data.

As Figure 5-7 shows, copyin copies the data from the unprotected user address space (specified by buff_addr) to the protected kernel address space (specified by kern_addr). The b_resid field indicates the number of bytes. The figure also shows that copyin returns the value 0 (zero) upon successful completion. If the address in user address space cannot be accessed, copyin returns the error EFAULT.

Figure 5-7: Results of the copyin Routine

5.2.5 Copying Data from Kernel Address Space to User Address Space

To copy data from the protected kernel address space to the unprotected user address space, call the copyout routine. The following code fragment shows a call to copyout:


.
.
.
register struct buf *bp;
int err;
void * buff_addr;
void * kern_addr;

.
.
.
if (err = copyout(kern_addr,buff_addr,bp->b_resid)) { [1]

.
.
.

Shows that the copyout routine takes three arguments:
- The first argument specifies the address in kernel space of the data to be copied. In the example, this address is the kernel buffer's address, which is stored in the kern_addr argument.
- The second argument specifies the address in user space to which to copy the data. In the example, this address is the user buffer's virtual address, which is stored in the buff_addr argument.
- The third argument specifies the number of bytes to copy. In the example, the number of bytes is contained in the b_resid field of the pointer to the buf structure.
[Return to example]

Figure 5-8 shows the results of copyout, based on the code fragment. As the figure shows, copyout copies the data from the protected kernel address space (specified by kern_addr) to the unprotected user address space (specified by buff_addr). The number of bytes is indicated by the b_resid field. The figure also shows that copyout returns the value 0 (zero) upon successful completion. If the address in kernel address space cannot be accessed or if the number of bytes to copy is invalid, copyout returns the error EFAULT.

Figure 5-8: Results of the copyout Routine

5.2.6 Moving Data Between User Virtual Space and System Virtual Space

To move data between user virtual space and system virtual space, call the uiomove routine. The following code fragment shows a call to uiomove:


.
.
.
struct uio *uio;
void * kern_addr;
int err;
long cnt;

.
.
.
err = uiomove(kern_addr,cnt,uio); [1]

.
.
.

Shows that the uiomove routine takes three arguments:
- The first argument specifies a pointer to the kernel buffer in system virtual space.
- The second argument specifies the number of bytes of data to be moved. In this example, the number of bytes to be moved is stored in the cnt variable.
- The third argument specifies a pointer to a uio structure. This structure describes the current position within a logical user buffer in user virtual space.
[Return to example]

5.3 Using Kernel-Related Routines

The kernel-related routines allow kernel modules to:

Print text to the console and error logger (Section 5.3.1)

Put a calling process to sleep (Section 5.3.2)

Wake up a sleeping process (Section 5.3.3)

Initialize a timer (callout) queue element (Section 5.3.4)

Remove the scheduled routine from the timer queues (Section 5.3.5)

Set the interrupt priority mask (Section 5.3.6)

Allocate memory (Section 5.3.7)

The following sections describe the routines that perform these tasks.

5.3.1 Printing Text to the Console and Error Logger

To print text to the console terminal and the error logger, call the printf routine. The kernel printf routine is a scaled-down version of the C library printf routine. The printf routine prints diagnostic information directly on the console terminal and writes ASCII text to the error logger. Because printf is not interrupt driven, all system activities are suspended when you call it. Only a limited number of characters (currently 128) can be sent to the console display during each call to printf because the characters are formatted into a fixed-size buffer whose address may be handed off to the primary CPU for console output. If more than 128 characters are generated in a single call to printf, all characters following the first 128 will be discarded.

If you need to see the results on the console terminal, limit the message size to the maximum of 128 whenever you send a message from within the module. However, printf also stores the messages in an error log file. You can use the uerf command to view the text of this error log file. See the printf(9) reference page for this command. The messages are easier to read if you use uerf with the -o terse option.

The following code fragment shows a call to this routine:


.
.
.
printf("CBprobe ctlr = %8x\n",ctlr); 

.
.
.

The code example shows a typical use for the printf routine in the debugging of kernel modules. In the example, printf takes two arguments:

The first argument specifies a pointer to a string that contains two types of objects. One object is ordinary characters such as, ``hello, world,'' which are copied to the output stream. The other object is a conversion specification, such as %d. (Supported conversion specifications include %c, %d, %ld, %lx, %o, %s, and %x. See printf(9) for explanations of these specifications.)

The second argument specifies the value to be formatted in place of the %8x specifier in the format string. In this example, the argument is ctlr.

The operating system also supports the uprintf routine. The uprintf routine prints to the current user's terminal. Never have interrupt service routines call uprintf. Do not use this routine to print verbose messages. The uprintf routine does not log messages to the error logger.

5.3.2 Putting a Calling Process to Sleep

To put a calling process to sleep in a symmetric multiprocessing (SMP) environment, call the mpsleep routine. The mpsleep routine blocks the current kernel thread until a wakeup is issued (see Section 5.3.3).

Generally, kernel modules call this routine to wait for the transfer to complete an interrupt from the device. That is, the write routine of the kernel module sleeps on the address of a known location, and the device's interrupt service routine wakes the process when the device interrupts. The wakened process determines whether the condition for which it was sleeping has been removed. The following code fragment shows a call to this routine:


.
.
.
mpsleep((vm_offset_t)&sc->error_recovery_flag, PCATCH,
         "ftaerr", 0, &sc->lk_fta_kern_str,
          MS_LOCK_SIMPLE | MS_LOCK_ON_ERROR))[1]

.
.
.

Calls the mpsleep routine to block the current kernel thread. The mpsleep routine takes several arguments:
- The channel argument specifies an address to associate with the calling kernel thread to be put to sleep. In this example, the address (or event) associated with the current kernel thread is stored in the error_recovery_flag field.
- The pri argument specifies whether the sleep request is interruptible. Setting this argument to the PCATCH flag causes the process to sleep in an interruptible state (that is, the kernel thread can take asynchronous signals). Not setting the PCATCH flag causes the process to sleep in an uninterruptible state (that is, the kernel thread cannot take asynchronous signals).
- The wmesg argument specifies the wait message. In this call, fta_error_recovery passes the string ftaerr.
- The timo argument specifies the maximum amount of time that the kernel thread should block. If you pass the value 0 (zero), mpsleep assumes there is no timeout.
- The lockp argument specifies a pointer to a simple or complex lock. You pass a simple or complex lock structure pointer if you want to release the lock. Pass the value 0 (zero) if you do not want to release the lock.
- The flags argument specifies the lock type. You can pass the bitwise inclusive OR of the valid lock bits defined in /usr/sys/include/sys/param.h.
[Return to example]

.

5.3.3 Waking Up a Sleeping Process

To wake up all processes that are sleeping on a specified address, call the wakeup routine. The following code fragment shows a call to this routine:


.
.
.
wakeup(&ctlr->bus_name); [1]

.
.
.

Shows that the wakeup routine takes one argument: the address on which the wakeup is to be issued. In the example, this address is the bus name for the bus to which this controller is connected. This address was specified in a previous call to the mpsleep routine. All processes that are sleeping on this address are awakened. [Return to example]

5.3.4 Initializing a Timer (Callout) Queue Element

To initialize a timer queue element, call the timeout routine. The following code fragment shows a call to this routine:


.
.
.
#define NONEIncSec  1

.
.
.
cb = &none_unit[unit];

.
.
.
timeout(noneincled, (caddr_t)none, NONEIncSec*hz); [1]

.
.
.

Shows that the timeout routine takes three arguments:
- The first argument specifies a pointer to the routine to be called. In the example, timeout will call the noneincled routine on the interrupt stack (not in processor context) as dispatched from the softclock routine.
- The second argument specifies a single argument to be passed to the called routine. In the example, this argument is the pointer to the NONE device's none_unit data structure. This argument is passed to the noneincled routine. Because the data types of the arguments are different, the code fragment performs a type-casting operation that converts the argument type to be of type caddr_t.
- The third argument specifies the amount of time to delay before calling the specified routine. You express time as ticks of the system clock. To obtain a particular time in seconds, you multiply the number of ticks times hz (hz contains the number of ticks per second).
  In the example, the constant NONEIncSec is used with the hz global variable to determine the amount of time before timeout calls noneincled. The global variable hz contains the number of clock ticks per second. This variable is a second's worth of clock ticks. The example illustrates a 1-second delay.
[Return to example]

5.3.5 Removing Scheduled Routines from the Timer (Callout) Queue

To remove the scheduled routines from the timer queue, call the untimeout routine. The following code fragment shows a call to this routine:


.
.
.
untimeout(noneincled, (caddr_t)none); [1]

.
.
.

Shows that the untimeout routine takes two arguments:
- The first argument specifies a pointer to the routine to be removed from the timer queue. In the example, untimeout removes the noneincled routine from the timer queue. This routine was placed on the timer queue in a previous call to the timeout routine.
- The second argument specifies a single argument to be passed to the called routine. In the example, this argument is the pointer to the NONE device's none_unit data structure. It matches the parameter that was passed in a previous call to timeout. Because the data types of the arguments are different, the code fragment performs a type-casting operation that converts the argument type to be of type caddr_t.
The two arguments uniquely identify which timeout entry to remove. This is useful if more than one thread has called timeout with the same routine argument. [Return to example]

5.3.6 Setting the Interrupt Priority Mask

To set the interrupt priority level (IPL) mask to a specified level, call one of the spl routines. Table 5-1 summarizes the uses for the different spl routines.

Table 5-1: Uses for spl Routines

spl Routine	Meaning
`splextreme`	Highest priority; blocks everything except halt interrupts (for example, realtime devices, machine checks, and so forth).
`splrt`	Blocks realtime devices but allows machine checks and halt interrupts.
`splclock`	Masks all hardware clock and lower-level interrupts.
`splhigh`	Masks all interrupts except clock interrupts, realtime devices, machine checks, and halt interrupts.
`spldevhigh`	Masks all device and software interrupts.
`splbio`	Masks all disk and tape controller interrupts.
`splimp`	Masks all LAN hardware interrupts.
`splvm`	Masks all interrupts that affect virtual memory operations.
`splnet`	Masks all network software interrupts.
`splsoftclock`	Masks all software clock interrupts.
`splx`	Resets the CPU proirity to the level specified by the argument.
`splnone`	Unmasks (enables) all interrupts.

The spl routines set the CPU priority to various interrupt levels. The current CPU priority level determines which types of interrupts are masked (disabled) and which are unmasked (enabled). Historically, seven levels of interrupts were supported, with eight different spl routines to handle the possible cases. For example, calling spl0 unmasked all interrupts and calling spl7 masked all interrupts. Calling an spl routine between 0 and 7 masked all interrupts at that level and at all lower levels.

Specific interrupt levels were assigned for different device types. For example, before it handled a given interrupt, a kernel module set the CPU priority level to mask all other interrupts of the same level or lower. This setting meant that the kernel module could be interrupted only by interrupt requests from devices of a higher priority.

The operating system currently supports the naming of spl routines to indicate the associated device types. Named spl routines make it easier to determine which routine to use to set the priority level for a given device type.

The following code fragment shows the use of spl routines as part of a disk strategy routine:


.
.
.
int s;

.
.
.
s = splbio(); [1]

.
.
.
[Code to deal with data that can be modified by the disk interrupt
code]
splx(s); [2]

.
.
.

Calls the splbio routine to mask (disable) all disk interrupts. This routine does not take an argument. [Return to example]

Calls the splx routine to reset the CPU priority to the level that the s argument specifies. The argument associated with splx is a CPU priority level, which in the example is the value that splbio returns. (The splx routine is the only one of the spl routines that takes an argument.) Upon successful completion, each spl routine returns an integer value that represents the CPU priority level that existed before it was changed by a call to the specified spl routine. [Return to example]

5.3.7 Allocating Memory

A kernel module may need to declare a significant number of data structures to contain a large amount of data. For example, a kernel module that is a device driver may need to support a large number of disks and controllers. Statically allocating the maximum number of data structures wastes space. Dynamically allocating memory for the required data structures is a better use of system resources, especially when working with temporary or transient data.

To dynamically allocate memory, you need to:

Use the MALLOC macro to allocate the data structures

Use the FREE macro to free up the dynamically allocated data structures

The following sections describe these steps.

5.3.7.1 Allocating Data Structures with MALLOC

Use the MALLOC macro to dynamically allocate a variable-size section of kernel virtual memory. The MALLOC macro maintains a pool of preallocated memory for quick allocation and returns the address of the allocated memory. The MALLOC macro is actually a wrapper that calls malloc. Do not allow a kernel module to directly call the malloc routine.

The syntax for the MALLOC macro is as follows:

MALLOC(
       addr,
       cast,
       u_long size,
       int type,
       int flags );

Call the MALLOC macro with the following parameters:

addr

Specifies the memory location that points to the allocated memory. You specify the addr argument's data type in the cast argument.

cast

Specifies the data type of the addr argument and the type of the memory pointer that MALLOC returns.

size

Specifies the size in bytes of the memory to allocate. Typically, you pass the size as a constant to speed up the memory allocation.

type

Specifies the purpose for which the memory is being allocated. The memory types are defined in the file sys/malloc.h. Typically, kernel modules use the constant M_DEVBUF to indicate that kernel module memory is being allocated (or freed).

flags

Specifies one of the following flag constants that are defined in /usr/sys/include/sys/malloc.h:

M_WAITOK: Allocates memory from the virtual memory subsystem if there is not enough memory in the preallocated pool. This constant signifies that MALLOC can block.
M_NOWAIT: Does not allocate memory from the virtual memory subsystem if there is not enough memory in the preallocated pool. This constant signifies that MALLOC cannot block. M_NOWAIT must be used when calling MALLOC from an interrupt context or if the caller is holding a simple lock. Otherwise, a system panic will occur.
M_ZERO: Allocates zero-filled memory. You pass this bit value OR'd with M_WAITOK or M_NOWAIT.

The following example illustrates how to allocate memory using the MALLOC macro:

 struct foo *foo1;
 struct foo *foo2;
 struct bar *bar[];

.
.
.
MALLOC(foo1, struct foo *, sizeof(struct foo),
  M_DEVBUF, M_NOWAIT|M_ZERO);[1]
 
 if (!foo1) {

.
.
.
return;[2]
 }

.
.
.
MALLOC(foo2, struct foo *,
     nfoo * sizeof(struct foo), M_DEVBUF,
     M_WAITOK|M_ZERO);[3]

.
.
.
MALLOC(bar, struct bar **,
     nbar * sizeof(struct bar *), M_DEVBUF,
     M_WAITOK|M_ZERO);[4]

.
.
.
MALLOC(bar[1], struct bar *, sizeof(struct bar),
    M_DEVBUF, M_WAITOK|M_ZERO);[5]

Allocates a single data structure. [Return to example]

Because M_NOWAIT is specified, examines the return value to determine whether the allocation failed. [Return to example]

Allocates an array of structures with nfoo elements. [Return to example]

Allocates an array of pointers to structures. [Return to example]

Allocates a structure to the second element of bar. [Return to example]

5.3.7.2 Freeing Up Dynamically Allocated Memory

When a block of memory that is allocated through MALLOC is no longer needed it, free it back to the system using the FREE macro. The FREE macro takes two arguments:

The first argument specifies the memory pointer that points to the allocated memory to be freed. You must have previously set this argument in the call to MALLOC.

The second argument specifies the purpose for which the memory is being allocated. The memory types are defined in the file /usr/sys/include/sys/malloc.h. Typically, kernel modules that are device drivers use the constant M_DEVBUF to indicate that memory is being allocated (or freed).

The following example shows how to use the FREE macro:

 FREE(foo1, M_DEVBUF);
 
 /*
  * Free the second element from the array of pointers
  */
 FREE(bar[1], M_DEVBUF);
 bar[1] = NULL;

5.4 Working with System Time

This section describes considerations for working with system time. Information in this section explains the following concepts:

Understanding system time concepts (Section 5.4.1)

Fetching time (Section 5.4.2)

Modifying a timestamp (Section 5.4.3)

Enabling an application to convert time to a string (Section 5.4.4)

Delaying a routine a specified number of microseconds (Section 5.4.5)

5.4.1 Understanding System Time Concepts

This section discusses concepts for working with system time:

How a kernel module fetches or modifies time

How time is created

5.4.1.1 How a Kernel Module Uses Time

Kernel modules can save timestamps that can be passed to applications on request for many purposes. For example:

When a bus was last scanned

When the last error on a disk occurred

When the last interrupt for the some device (for example, a line printer) occurred

When the system booted

When the file system was mounted on a particular disk

The application then needs to print the date and time. Your kernel module code must determine several things for each timestamp that it wants to preserve:

When it needs to fetch time

Whether or not the time value that was fetched needs modification to reflect accurate time

How to pass the time value to the application

5.4.1.2 How Is System Time Created?

System time, which is platform-dependent, is defined as ticks of the system clock, measured as units of hertz (hz). The operating system makes system time available to kernel modules. The representation of system time is not based on the current calendar time of day because the actual time value does not become available to the operating system until you are partially through the boot sequence.

From the beginning of a boot sequence to dispatch point CFG_PT_TOPOLOGY_CONF, the operating system time value is 0 (zero). In Tru64 UNIX, zero is equivalent to January 1, 1970, 00:00:00, UTC. At dispatch point CFG_PT_TOPOLOGY_CONF, the operating system begins incrementing system time from zero. Later, at the dispatch point CFG_PT_ROOT_FS_AVAIL, system time is set to the actual time of day.

The time between CFG_PT_TOPOLOGY_CONF and CFG_PT_ROOT_FS_AVAIL is called the boot delta. Figure 5-9 illustrates these concepts.

Figure 5-9: When Time Becomes Available During a System Boot

At the start of a boot sequence, the value is 0 (zero).

At CFG_PT_TOPOLOGY_CONF, the kernel starts incrementing time. The initial date and time is recorded as 00:00:00 UTC 1 Jan 1970 (the Epoch).

At CFG_PT_ROOT_FS_AVAIL, the kernel sets the time to the correct calendar date and time.

If your kernel module fetches time before CFG_PT_ROOT_FS_AVAIL is reached, the time value it fetches is incorrect and you will need to modify that timestamp later (see Section 5.4.3).

5.4.2 Fetching System Time

A kernel module decides when to fetch system time. When it performs a fetch operation, it also needs a way to fetch system time. The TIME_READ macro provides a way for your kernel module to fetch the current time. The following code fragment shows how to use this macro in your kernel module:

#include <sys/time.h>[1]

.
.
.
extern struct timeval time;[2]

.
.
.
{   struct timeval my_time;[3]
    
.
.
.
TIME_READ(my_time);[4]

Includes the time.h header file. [Return to example]

Declares the global time variable as external. [Return to example]

Declares your own storage for your timestamp. [Return to example]

Fetches the current time and stores it in your own time variable using the TIME_READ macro. TIME_READ takes one parameter, which specifies the memory location to store the current time. Its type is struct timeval. [Return to example]

5.4.3 Modifying a Timestamp

If your kernel module fetches time before the operating system sets the current time at CFG_PT_ROOT_FS_AVAIL, you must modify the timestamp you fetched and stored. For example, assume your kernel module keeps track of when it last scanned the bus. Because scanning the bus takes place prior to CFG_PT_ROOT_FS_AVAIL, the fetched time is interpreted as approximately Jan. 1, 1970, 00:00:00. (This is because time was not set to the proper value when you fetched it.) The global variable bootdelta keeps track of how many seconds and microseconds have been counted between the two configuration points.

To modify a timestamp, perform these steps:

Register a callback for CFG_PT_ROOT_FS_AVAIL in your kernel module.

Use the following algorithm to modify the timestamp:
1. Subtract the number of seconds (tv_sec) and microseconds (tv_usec) that were counted before time was set to the actual time.
2. Add the number of seconds and microseconds that were counted to the point where the kernel module fetched time.

The following code example subtracts bootdelta seconds and adds my_time seconds:

#include <sys/time.h>

.
.
.
extern struct timeval bootdelta;

.
.
.
struct timeval temp_time;
  TIME_READ(temp_time);[1]

.
.
.
temp_time.tv_sec -= (bootdelta.tv_sec - my_time.tv_sec);[2]
 
   if (bootdelta.tv_usec > temp_time.tv_usec) {
    temp_time.tv_usec = 1000000 -
      (bootdelta.tv_usec - temp_time.tv_usec);
    temp_time.tv_sec--;
  } else {
    temp_time.tv_usec -= bootdelta.tv_usec;[3]
  }

.
.
.
temp_time.tv_usec += my_time.tv_usec;[4]
 
    if (temp_time.tv_usec >= 1000000) {
      temp_time.tv_usec -= 1000000;
      temp_time.tv_sec++;[5]
  }

.
.
.
my_time = temp_time;[6]

Obtains the current time, which is set to the actual time of day. [Return to example]

Subtracts bootdelta seconds from the current time and adds the number of seconds in the timestamp. [Return to example]

Subtracts bootdelta microseconds; make sure its value is not negative. [Return to example]

Adds my_time microseconds. [Return to example]

Fixes any microseconds that may have wrapped. [Return to example]

Stores the results into the time variable. [Return to example]

5.4.4 Enabling Applications to Convert a Kernel Timestamp to a String

A user application can receive a timestamp from a kernel module in a variety of ways. The standard way is for a kernel module to pass a timestamp to the application as a struct timeval.

For an application to convert the timestamp it received from the kernel module, it uses the ctime function that is defined in /usr/include/sys/time.h. This function converts time values between tm structures, time_t type variables, and strings.

The ctime function expresses time in units by converting the time_t variable, to which thetimer parameter points, into a string with the 5-field format. The time_t variable, which also defined in /usr/include/sys/time.h, contains the number of seconds since the Epoch, 00:00:00 UTC 1 Jan 1970. For example:

Tue Jul 11 15:37:29 2000

For more information on converting timestamps to strings, see the reference page for ctime(3) .

5.4.5 Delaying the Calling Routine a Specified Number of Microseconds

To delay the calling routine a specified number of microseconds, use the DELAY macro. The following code fragment shows how to use this macro:


.
.
.
DELAY(10000) [1]

.
.
.

Shows that the DELAY macro takes one argument: the number of microseconds for the calling thread to spin.
The DELAY macro delays the routine by a specified number of microseconds. DELAY spins while it waits for the specified number of microseconds to pass before continuing execution. The example shows a 10000-microsecond (10-millisecond) delay. The range of delays is system dependent, due to its relation to the granularity of the system clock. The system defines the number of clock ticks per second in the hz variable. Specifying any value smaller than 1/hz to the DELAY macro results in an unpredictable delay. For any delay value, the actual delay may vary by plus or minus one clock tick.
We do not recommend using the DELAY macro because the processor will be consumed for the specified time interval and therefore will be unavailable to service other threads. In cases where kernel modules need timing mechanisms, use the sleep and timeout routines instead of the DELAY macro. The most common usage of the DELAY macro is in the system boot path. Using DELAY in the boot timeline is often acceptable because there are no other threads in contention for the processor. [Return to example]

5.5 Using Kernel Threads

A kernel thread is a single sequential flow of control within a kernel module or other systems-based program. The kernel module or other systems-based program makes use of the routines (instead of a threads library package such as Posix Threads Library) to start, terminate, and delete threads, and to perform other kernel thread operations.

Kernel threads execute within (and share) a single address space. Therefore, kernel threads read and write to the same memory locations.

You use kernel threads to improve the performance of a kernel module. Multiple kernel threads are useful in a multiprocessor environment, where kernel threads run concurrently on separate CPUs. However, multiple kernel threads also improve kernel module performance on single-processor systems by permitting the overlap of input, output, or other slow operations with computational operations.

Kernel threads allow kernel modules to perform other useful work while waiting for a device to produce its next event, such as the completion of a disk transfer or the receipt of a packet from the network. For more information on using kernel threads, see Chapter 9.

5.6 Using Locks

In a single-processor environment, kernel modules need not protect the integrity of a resource from activities that result from the actions of another CPU. However, in a symmetric multiprocessing (SMP) environment, the kernel module must protect (lock) the resource from multiple CPU access to prevent corruption. A resource, from the kernel module's standpoint, is data that more than one kernel thread can manipulate. Locks are the mechanism for sharing resources in an SMP enviroment.

See Chapter 6 for an overview of symmetric multiprocessing and the two locking methods that you can use when your kernel modules execute in an SMP environment. Chapter 7 provides information for using simple locks in your kernel module. Chapter 8 provides information for using complex locks.