1 Operating System Overview

Before you attempt to tune your system to improve performance, you must fully understand your applications, users, and system environment and you must correctly diagnose the source of your performance problem. This chapter provides information on the major elements of the system environment that must be considered in a performance and tuning analysis:

System architecture (Section 1.1)
Process management (Section 1.2)
Memory usage (Section 1.3)
Interprocess communication (Section 1.4)
I/O subsystems (Section 1.5)

For more information on all components of the operating system, refer to the manual Technical Overview.

1.1 Alpha 64-bit Architecture

The Alpha architecture contains instructions that can operate directly on 64- and 32-bit data items. It does not contain instructions that operate directly on data items that are smaller than 32 bits. As a result, if a program uses a data item that is smaller than 32 bits, the compilers generate a sequence of instructions to extract the data item from a 32-bit quantity. Thus, it consumes more system resources to access a data item that is less than 32 bits than it does to access a 32-bit or 64-bit data item.

This increase in overhead will not cause a problem if a program uses small data only occasionally. However, if a program uses small data regularly (for example, in the body of a critical loop), this overhead can be significant. For information on how to modify data declarations in your program to avoid this problem, see the Programmer's Guide.

The Alpha architecture also affects disk space and memory usage. While the 64-bit architecture benefits applications that would otherwise exhaust the address space in a 32-bit implementation, the Digital UNIX operating system implementation on Alpha systems does result in larger memory and disk space requirements than those associated with operating systems based on a 32-bit architecture. For details on the Alpha architecture, see the Alpha Architecture Reference Manual.

1.2 Process Management

Programs that are being executed by the Digital UNIX operating system are known as processes. Each process runs within a protected virtual address space. The process abstraction is separated into two low-level abstractions, the task and the thread:

A task is not executable, but it does have a protected virtual address space. This is the environment in which one or more threads can execute (that is, a thread executes within the framework of a task).
A thread has access to all system resources assigned to the task. If the task contains multiple threads, the threads share the task's resources. (See the Guide to DECthreads and the Programmer's Guide for information about programming using threads.)

The kernel schedules threads. A process priority can be managed by the nice interface or by the real-time interface. The nice interface allows adjustments of priorities within the range 19 through -19, where 19 is the lowest priority. You can adjust real-time priorities on those systems running the real-time kernel by using the sched_setscheduler interface.

Under the Digital UNIX operating system, most applications will execute as traditional UNIX processes (that is, as a task with a single thread).

Interprocess communication (IPC) is the mechanism that facilitates the exchange of information among processes. The IPC facilities include shared memory, pipes, semaphores, and messages. The IPC facilities are described in Section 1.4.

1.3 Memory Management

The memory management system is responsible for distributing the available main memory space among competing processes and buffers. You have some level of control over the following components of the memory management system:

Virtual memory is used to enlarge the available address space beyond the physical address space. Virtual memory consists of main memory and swap space. The Digital UNIX operating system keeps only a set of the recently used pages of all processes in main memory and keeps the other pages on disk in swap space. Virtual memory and the unified buffer cache (UBC) share all physical memory. Virtual memory is described in Section 1.3.1.
Paging and swapping is used to ensure that the active task has the pages in memory that are needed for it to execute. Paging is controlled by the page reclamation code. Swapping is controlled by the task swapping daemon. Paging and swapping are described in Section 1.3.1.1.
The I/O buffer cache resides in main memory and is used to minimize the number of accesses to disk during I/O operations. The I/O buffer cache serves as a layer between the file system on the disk and the operating system. The I/O buffer cache is divided into the unified buffer cache (UBC) and the metadata buffer cache:
- The metadata buffer cache contains file metadata - superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. (See Section 3.4.1.3 for information on how to tune metadata buffer cache usage.)
- The UBC contains the actual file data for reads and writes from conventional file activity and for page faults from mapped file regions. Virtual memory and the UBC share all physical memory. (See Section 1.3.2 for additional information about the UBC.)

The Digital UNIX memory management components constantly interact with each other. As a result, a change in one of the components can also affect the other components. The following sections discuss each component in more detail.

1.3.1 Virtual Memory

The virtual memory subsystem controls the allocation of pages in physical memory and keeps track of the pages that have been paged out. Specifically, the virtual memory subsystem coordinates the allocation of resources for a task among the hardware components shown in Table 1-1 (in the order of fastest to slowest access time).

Table 1-1: Memory Resources Controlled by Virtual Memory

Resource	Description
CPU cache	Internal instruction and data caches (one each) that reside in the CPU chip and vary in size up to a maximum of 64KB (depending upon processor types). Also includes instruction and data translation lookaside buffers (ITLBs and DTLBs).
Secondary cache	Secondary direct-mapped physical data cache that is external to the CPU, but usually resides on the main processor board. Block sizes for the secondary cache vary from 32 bytes to 256 bytes (depending upon processor type). Secondary cache ranges in size from 128KB to 8MB.
Tertiary cache	Not available on all Alpha CPUs, and typically does not reside on main processor board. Otherwise, same as the secondary cache.
System memory	The actual physical memory. Size varies from 24 megabytes to 14 gigabytes.
Swap disk	Block special device. (Avoiding the file system saves overhead.)

For more information on the CPU, secondary cache, and tertiary cache, see the Alpha Architecture Reference Manual.

Figure 1-1 gives an overview of how instructions and data can be moved among various storage components during the execution of a program.

Figure 1-1: Instruction and Data Movement in the System

Much of the movement of addresses and data among the CPU cache, secondary and tertiary cache, and physical memory is controlled by the hardware logic and the Privileged Architecture Library (PAL) code, which is transparent to the Digital UNIX operating system. The virtual memory subsystem becomes involved when the CPU's translation buffer is unable to map a requested virtual address to a physical address and then traps to the PAL's page lookup code, which is responsible for monitoring and loading addresses from the page table into the CPU's translation buffer.

If the requested address is in the page table, the PAL lookup code loads the address into the translation buffer, which in turn passes the address to the CPU. If the address is not in the page table, the PAL code issues a virtual memory fault, which is the virtual memory subsystem's cue to locate the requested page and to load its physical address into the page table for use by the PAL lookup code:

The virtual memory subsystem first looks in its internal data structures (for example, the hash queue list and the page queue list) for the requested page. If it finds the page, it loads the physical address into the page table. This is known as a short page fault.
If the virtual memory subsystem cannot find the page in the internal data structures and if the requested page is new and has never been referenced, it initializes a physical page (the contents of the page are cleared) and loads the address into the page table. This is known as a zero-filled-on-demand page fault.
If the requested page has been referenced before, the virtual memory subsystem loads the address of the requested page into the translation buffer and performs a disk I/O to copy the contents of the page from swap space into memory. This is known as a page-in page fault.
Finally, if the requested page is not shared by a parent process and one or more child processes (using the fork function) and if one of the processes needs to modify the page, the virtual memory system loads a new address into the translation buffer, copies the contents of the requested page into the new address for modification by the process. This is known as a copy-on-write page fault.

Page-in and copy-on-write page faults are handled by the virtual memory subsystem's paging and swapping mechanism, which is described in Section 1.3.1.1.

The virtual memory subsystem attempts to keep the movement of pages as fast as possible. To do this, it tracks the utilization and the location of all pages in the memory subsystem.

The virtual memory subsystem maintains five lists to perform its tasks. Each existing page can be found on one of the following lists:

Free list - Pages that are clean and available for use
Active list - Pages that are currently in use but can be reclaimed for paging
Inactive list - Virtual memory pages that are allocated but are most likely to be reclaimed when memory is needed
UBC least-recently-used page list (LRU list) - UBC pages that are allocated but most likely to be reclaimed and used for paging
Wired list - Pages that are currently in use and cannot be reclaimed (not a real list)

The virtual memory subsystem tries to maintain a reasonable number of pages on the free page list so that pages are available for use by processes. All pages are shared by virtual memory and the UBC. Four configuration attributes in the sysconfigtab file define the size of the free page list and thus control when paging and swapping occur:

vm-page-free-min - Minimum number of pages on the free page list before paging begins
vm-page-free-target - Number of pages on the free page list before paging stops
vm-page-free-optimal - Number of pages on the free page list that triggers task swapping (if below this number for five seconds)
vm-page-free-reserved - Absolute minimum number of pages on the free page list before only privileged tasks are able to get memory

See Section 2.2.9 for general information on sysconfigtab configuration attributes.

Figure 1-2 shows the default values of the sysconfigtab configuration attributes that control paging and swapping.

Figure 1-2: Paging and Swapping Parameters - Default Values

If the number of pages on the free page list falls below the value associated with the vm-page-free-min attribute, the virtual memory subsystem first trims down the size of the UBC until the percentage associated with the ubc-borrowpercent attribute is reached. If this does not satisfy the memory deficit, it then activates two page-stealer routines that reclaim the least recently used pages from the virtual memory system's inactive list and the UBC's LRU list. This process continues until the number of pages on the free page list reaches the value associated with the vm-page-free-target attribute. If necessary, the contents of the reclaimed pages are moved to swap space.

When the maximum number of pages is reached, the page-stealer daemon becomes dormant again. This procedure enables the virtual memory subsystem to keep the most recently used pages in memory and move the least recently used pages to swap space, where they can be easily accessed if necessary.

The value associated with the vm-page-free-reserved attribute specifies the absolute minimum number of pages on the free page list. If the free page list falls below the value associated with the vm-page-free-reserved attribute, only privileged tasks can get memory, thus preventing deadlocks.

The page-stealer daemon maintains a ratio of one active page to two inactive pages. If the inactive list becomes too small, the page-stealer daemon deactivates the oldest and least recently used pages and moves them to the inactive list.

When the virtual memory subsystem maps an application into memory, it tries to anticipate which pages the task will need next. Using an algorithm that checks which pages were most recently used and the size of the free page list (as well as other factors), it passes some number of pages to the task in addition to the requested page. It tries to anticipate the pages that a task will need, thus accelerating the execution of the application by lowering the chances that a page fault will occur.

The virtual memory subsystem also attempts to optimize the utilization of the secondary cache. To do this, it uses a nontunable technique called page coloring. Essentially, it attempts to map the most recently referenced pages of a running task's virtual address space into the secondary cache and to execute the entire task, text, and data out of that cache. If the task is loaded in the secondary cache, the task does not have to fetch from physical memory and the task's execution time is decreased.

The virtual memory subsystem maintains system-wide counters for all of the physical pages that it manages. The following counters, which can be viewed with the vmstat command, track the overall use of physical memory:

act: The total number of pages on the active list, the inactive list, and the UBC LRU list

free: The physical pages not currently in use

wired: The physical pages currently in use and not pageable

See Section 2.2.3 for additional information on the vmstat command.

To determine how much memory an application uses, you can use the ps command. The ps aux command displays the virtual address size (VSZ), which is the total amount of virtual memory allocated to the process, and its resident set size (RSS), which is the total amount of physical memory mapped to virtual pages at some point in time. See Section 2.2.1 for additional information on the ps command.

Figure 1-3 shows the amount of time that it takes to access data and instructions in memory and on disk. It illustrates the impact that excessive paging and swapping could have on the performance of an application.

Figure 1-3: Time Consumed to Access Storage Locations

1.3.1.1 Paging and Swapping

Paging and swapping is the process of moving pages between memory and disk to ensure that a task has the pages in memory that it needs to run. The virtual memory subsystem controls this activity. It initiates paging and swapping activity under the following circumstances:

Page in - A page in occurs strictly in the context of a task when it allocates the pages that it needs to execute. When you first execute a task, a page in occurs. If the address of a referenced page is not in the translation buffer or any of the internal data structures, the virtual memory subsystem must go to disk to obtain the page and bring it into memory.
To perform a page in, the virtual memory subsystem allocates a physical page off the free page list, which is a linked list of available pages. When it has the address, the virtual memory subsystem fills the physical page with the contents of the page that it obtained from disk, loads the physical address into the page table, and marks the page as active.
Page out - A page out occurs when the number of pages in the free page list falls below the value associated with the vm-page-free-min attribute. The page reclamation code activates the page-stealer daemon, which performs the following tasks:
1. Takes the least recently used and lowest priority pages from the inactive and UBC LRU lists.
2. Activates the code that moves the contents of those selected pages that are dirty to swap space. (Note that clean pages already have copies in swap space. Dirty pages are modified pages, and clean pages are unmodified pages.)
3. Places those pages on the free page list until the number of pages on the free page list reaches the value associated with the vm-page-free-target attribute.
Swap out - A swap out occurs when the page-stealer daemon cannot keep up with the demand for free pages. This indicates that the system does not have enough memory to execute its processes. The swap out procedure dramatically increases the number of pages on the free page list and reduces the demand for physical memory by suspending swapped out tasks.
If the number of pages in the free page list falls below the value associated with the vm-page-free-optimal attribute for more than five seconds, the task swapper (an extension of the page reclamation code) is activated. The task swapper thread suspends processes, writes to disk all of the dirtied pages associated with the suspended processes, and places those pages on the free page list.
The task swapper first swaps out all swappable tasks that have been idle for 30 seconds or more. If this does not satisfy the memory demand, it begins swapping out the lowest priority tasks with the largest resident set size, one at a time, until the memory demands are satisfied (that is, until the number of pages on the free page list reaches the value associated with the vm-page-free-target attribute).
Swap in - A swap in occurs when a swapped out task becomes runnable and the number of pages on the free page list reaches an adequate level (above the value associated with the vm-page-free-optimal attribute for a period of time). This enables a task that was swapped out (suspended) to once again be able to execute. The task will then begin to page in its resident set.

From a performance viewpoint, swapping is worse than paging because swapped out processes can experience a long latency that is unsuitable for interactive processes. In addition, swapping can reduce system throughput. However, swapping does move long-sleeping threads out of memory and thus "cleans up" memory.

1.3.1.2 Writing Modified Pages

The virtual memory subsystem prewrites pages to disk under the following circumstances:

When the virtual memory subsystem anticipates that the pages on the free list are about to be exhausted, it prewrites modified pages from the inactive list. The vm-page-prewrite-target attribute controls the number of pages that the virtual memory subsystem will attempt to keep clean (256 by default).
When the number of dirty UBC pages exceeds a certain level, the virtual memory subsystem prewrites the oldest UBC pages. The vm-ubcdirtypercent attribute controls the maximum number of UBC pages that can be dirty at any point in time. (The ubc-maxdirtywrites attribute controls the number of I/O operations (per second) that the virtual memory subsystem will perform when the number of dirty pages in the UBC exceeds the value associated with vm-ubcdirtypercent.)

1.3.1.3 Swap Buffers

To facilitate the movement of data between memory and disk, the virtual memory subsystem uses two types of swap buffers: synchronous and asynchronous.

Synchronous swap buffers are used by the page-in code for synchronous page ins and by the task swapper for synchronous swap outs. The vm-syncswapbuffers attribute specifies the number of synchronous swap I/O requests that the page-in code and the task swapper can have outstanding to the I/O subsystem at any one time. This value should be roughly equivalent to the number of simultaneously running processes that the system can easily handle.
Asynchronous swap buffers are used by the page-stealer daemon for asynchronous page outs. The vm-asyncswapbuffers attribute specifies the number of asynchronous swap I/O requests that the page-stealer daemon can have outstanding at any one time. This number should be roughly equivalent to the number of I/O transfers that the swap devices can handle.

The virtual memory subsystem uses the two types of swap buffers in order to satisfy the immediate demands of a page-in request without having to wait for the completion of a page out, which is a relatively slow process.

1.3.1.4 Swap Space Allocation Modes

How swap space is allocated is determined by two modes: immediate mode and deferred mode. The two strategies differ in the point in time at which swap space is allocated.

In immediate mode, swap space is allocated when modifiable virtual address space is created. (Note that immediate mode is sometimes referred to as eager mode.)
In deferred mode, swap space is not allocated until the system needs to write a modified virtual page to swap space. (Note that deferred mode is sometimes referred to as overcommitment mode or lazy mode.)

The Digital UNIX operating system's default swap mode is immediate mode. The operating system will reserve swap space for anonymous memory (for example, stack space, heap space, and memory allocated by the malloc or sbrk routines) when that memory is allocated. This results in more swap space being reserved than is probably required. (Note: anonymous memory is any memory that is not backed by a file; it is backed by swap space.)

You can change the swap mode to deferred (or overcommitment) mode. This causes the reservation and allocation of swap space used to back up anonymous memory to be postponed until the physical memory actually needs to be reclaimed.

Deferred mode requires less swap space than immediate mode and causes the system to run faster because less swap bookkeeping is required. However, because deferred mode does not reserve swap space in advance, the swap space may not be available when it is needed by a task and the process may be killed asynchronously. You should ensure that you have sufficient swap space if you want to use deferred mode.

Immediate swap mode is used if the /sbin/swapdefault file exists. This file is a symbolic link to /dev/rzxx, which is the first defined swap device. If this file does not exist, the system uses deferred mode. If you change from one mode to another, you must reboot the system to activate the new mode.

Refer to the manual System Administration for more information on swap space allocation modes.

1.3.2 Unified Buffer Cache

The Digital UNIX operating system uses a unified buffer cache (UBC) to hold the actual file data, which includes reads and writes from conventional file activity and page faults from mapped file sections. The UBC and the virtual memory subsystem share and compete for all of main memory and utilize the same physical pages. This means that all available physical memory can be used both for buffering I/O and for the address space of the processes.

For AdvFS, the UBC contains file data and metadata. For UFS, the UBC contains only file data, and metadata (for example, file header information, blocks, directories, and inodes) is contained in the metadata buffer cache.

The UBC uses a buffer to facilitate the movement of data between memory and disk. The vm-ubcbuffers attribute specifies the number of UBC I/O requests that can be outstanding.

The UBC is dynamic, and it can potentially utilize all physical memory; thus the UBC can respond to changing file system demands. You can limit the amount of memory allocated to the UBC:

The ubc-maxpercent attribute specifies the maximum percentage of memory that the UBC can utilize.
The ubc-minpercent attribute specifies the minimum percentage of memory that the UBC will be trimmed down to when page reclamation occurs.
The ubc-borrowpercent attribute determines the percentage of memory above which the UBC is borrowing memory from the system. If the free list falls below the value associated with vm-page-free-target, the system will take memory away from the UBC until it is reduced to the percentage associated with ubc-borrowpercent. (In other words, the system does not page out until the UBC returns its borrowed pages.)

Changes in relative rates of demand can enlarge or shrink the size of the UBC. Heavy virtual memory activity, such as large increases in the working set caused by large executable files or by large amounts of uninitialized data being accessed, will increase the number of pages reserved for virtual memory and decrease the number reserved for the UBC. Conversely, heavy file system activity will increase the number of pages reserved for the UBC and decrease the number of pages reserved for virtual memory.

1.4 Interprocess Communications Facilities

Interprocess communication (IPC) is the exchange of information between two or more processes. Some examples of IPC include messages, shared memory, semaphores, pipes, signals, process tracing, and processes communicating with other processes over a network. IPC is a functional interrelationship of several operating system subsystems. Elements are found in scheduling and networking.

In single-process programming, modules within a single process communicate with each other using global variables and function calls, with data passing between the functions and the callers. When programming using separate processes, with images in separate address spaces, you need to use additional communication mechanisms.

The Digital UNIX operating system provides the following facilities for interprocess communication:

System V IPC - Interprocess communication facilities provided by System V IPC include messages, shared memory, and semaphores. See the System V Compatibility User's Guide for information about System V IPC.
Pipes - See the Guide to Realtime Programming for information about pipes.
Signals - See the Guide to Realtime Programming for information about signals.
Sockets - See the Programmer's Guide: Network interface (UNIX System V Release 4) for information about sockets.
Streams - See the Programmer's Guide: STREAMS for information about streams.
X/Open Transport Interface (XTI) - See the Network Programmer's Guide for information about XTI.

1.5 I/O Subsystems

The I/O subsystems involve the software and hardware that performs all reading and writing operations:

The software portion includes device drivers, file systems, and networks.
The hardware portion includes all peripheral equipment, for example, disks, tape drives, printers, and network and communication lines.

The sections that follow describe the various I/O subsystems: disk systems (Section 1.5.1), file systems (Section 1.5.3), and network systems (Section 1.5.4).

1.5.1 Disk Systems

The Digital UNIX operating system supports two hardware storage architectures: Small Computer System Interface (SCSI) and Digital Storage Architecture (DSA).

All Alpha systems support SCSI devices. This support is provided through the Common Access Method (CAM) architecture. The CAM architecture defines a software model that is layered, providing hardware independence for SCSI device drivers. In the CAM model, a single SCSI/CAM peripheral driver controls SCSI devices of the same type, for example, direct access devices. This driver communicates with a device on the bus through a defined interface. Using this interface makes a SCSI/CAM peripheral device driver independent of the underlying SCSI Host Bus Adapter.

This hardware independence is achieved by using the Transport (XPT) and SCSI Interface Module (SIM) components of CAM. Because the XPT/SIM interface is defined and standardized, users and third parties can write SCSI/CAM peripheral device drivers for a variety of devices and use existing operating system support for SCSI. The drivers do not contain SCSI HBA dependencies; therefore, they can run any hardware platform that has an XPT/SIM interface present.

The Digital Storage Architecture (DSA) conforms to the Mass Storage Control Protocol (MSCP).

1.5.2 Logical Disk Subsystem - Logical Storage Manager (LSM)

LSM is a disk storage management subsystem that protects against data loss and improves disk I/O performance. It also allows you to perform administrative tasks, such as performance monitoring and online disk reconfiguration.

LSM builds virtual disks, called volumes. A volume is a Digital UNIX special device that contains data used by a file system (UFS or AdvFS), a database, or other application. A volume exists transparently between a physical disk and an application. Under LSM, file system I/O operations are handled at the volume level, not the physical disk level. I/O operations involving a physical disk are handled by LSM.

Duplicate copies of file systems and databases can be set up under LSM. This capability is referred to as mirroring. Mirroring speeds up read operations and protects against data loss from disk malfunctions. (Mirroring can slightly degrade the performance of applications with more write requests than read requests because of the need to perform multiple writes in parallel to multiple disks.)

Striping can also be used under LSM to improve disk I/O performance by spreading the data within a volume across several physical disks.

1.5.3 File Systems

The file system architecture for the Digital UNIX operating system is based on the OSF/1 Virtual File System (VFS), which is based on Berkeley 4.3 Reno VFS. VFS provides an abstract layer interface into files regardless of the file systems in which the files reside. Included in VFS is the namei cache, which stores recently used file system pathname/inode number pairs. It also stores inode information for files that were referenced but not found. Having this information in the cache substantially reduces the amount of searching that is needed to perform pathname translations. (See Section 3.6.1.1 for information on VFS tuning.)

Layered below VFS, the Digital UNIX operating system supports the following file systems:

Berkeley UNIX File System (UFS), which is fully parallelized (that is, multiple threads correctly execute simultaneously). UFS is the operating system's native file system. (See Section 3.6.1.2 for information on UFS tuning.)
Advanced File System (AdvFS), which is a local file system that uses write-ahead logging to provide rapid crash recovery. AdvFS ensures that file structures are recovered consistently and offers a flexible structure. Also available for AdvFS users is the POLYCENTER Advanced File System Utilities layered product. POLYCENTER provides utilities specifically designed for performance tuning, such as file striping and defragmenting. (See Section 3.6.1.3 for information on AdvFS tuning.)
Network File System (NFS), which allows users to mount remote file systems in their own local directories. (See Section 3.6.3 for information on NFS tuning.)
Memory File System (MFS), which is memory-based UFS. Data resides entirely in memory instead of on disk. The contents of an MFS are lost after a reboot, unmount operation, or power failure.

The UFS file system uses the UBC to avoid disk I/O. Because of this, I/O accesses may appear random. The UBC shares all of memory with the virtual memory subsystem and adjusts itself dynamically to accommodate varying I/O loads. As the I/O load increases, the UBC increases to the limit defined by the ubc-maxpercent attribute. All I/O passes through the UBC and is periodically flushed to disk by the update daemon.

Laying out your file system tree across multiple disks can improve performance. The access time tends to be more important than the transfer rate for most workstation, time-share, and server environments. Access time is the seek time plus the rotational delay time, that is, the time the disk takes to access the requested block.

You can modify file system fragment sizes to optimize either I/O performance or disk space usage. Large fragment sizes optimize for I/O performance, and small fragment sizes optimize for disk space usage.

1.5.3.1 UFS Block Clustering

Block clustering is an important factor in UFS performance. Block clustering causes the file system and the UBC to combine multiple small I/O operations into a larger single I/O operation to disk. This results in a dramatic decrease in read/write requests to disk, which reduces kernel overhead. With clustering, I/O can nearly attain a raw device bandwidth for sequential operations.

Clusters are groups of file system blocks in a contiguous sequence. For a standard 8KB/1KB (block size/fragment size) UNIX file system, the default cluster size is 8 blocks (64KB). This is determined by multiplying the default number of blocks (8) by the block size (8192 bytes). You can modify the number of blocks that are combined into a single read request by using either the tunefs or newfs command to establish a new value for maxcontig. You can modify the number of blocks that are combined into a single write request by using dbx to establish a new value for the cluster_maxcontig global variable.

UFS tries to group contiguous writes into clusters. Individual contiguous block writes are collected into a cluster. The cluster is written asynchronously as a unit either when its full size is reached or a discontiguous block is encountered. Specifically, contiguous writes are done in 64KB units, which is the file system block size (8KB) multiplied by the default value of cluster_maxcontig.

UFS uses clusters to make read-ahead more efficient and effective as follows:

Initial read brings in two blocks: one synchronous, one asynchronous (read-ahead block).
Subsequent contiguous reads trigger read-ahead in cluster units - up to the limit established by the cluster_max_read_ahead parameter. The cluster_max_read_ahead parameter specifies the maximum number of clusters to stay ahead of the sequential reader. The default value (8) can be changed by using dbx.
The first noncontiguous read causes read-ahead to reset its counters.
The maximum amount of read-ahead is determined by the value of the file system block size multiplied by the value of maxcontig multiplied by the value of the cluster_max_read_ahead parameter, for example, 8192 * 8 * 8 = 512KB.
The read-ahead policy is more aggressive than traditional UFS in that it attempts to anticipate the needs of contiguous reads.

See Section 3.6.1.2 for information on how to tune read-ahead and write clusters.

1.5.4 Network Systems

A network provides a means to move data from one computer to another. This data may be no more complicated than electronic mail. You can copy files containing printable data (for example, word processor files), or binary data from a local computer to a remote computer, with the same ease as files copied from one directory to another on the local computer. With remote login, users can log in to a remote computer on which they have an account and access programs and data as if they were at a terminal connected to their own host computer.

A network consists of two essential component parts: the hardware implementation and the software that runs the network. The hardware consists of controllers and connectors.

1.5.4.1 Network Hardware

The controller sends and receives packets of data over the network. Controllers are specialized and are designed to work with a particular type of computer (bus architecture). For example, controllers designed to work with a Digital workstation will not work with a Sun or Hewlett-Packard workstation, or an IBM-PC.

The cables or wires connecting different computers (or nodes) on a network can be twisted-pair (as with telephone wires), thick or thin Ethernet cable, or optical fiber. The type of controller determines the type of connector.

1.5.4.2 Network Software

The Digital UNIX operating system supports one network software implementation by default, TCP/IP (Transmission Control Protocol/Internet Protocol). It also supports a variety of other network software implementations as layered products, for example, DECnet, PATHWORKS, and X.25. Each of these network software implementations uses its own set of protocols, which are the rules and formats that are used to conduct communications on a network. Protocols govern relationships among network nodes, polling, the exchange of control information, and the way messages are packaged, addressed, and routed.

The following list provides some general background information on TCP/IP, NFS (Network File System), and UDP (User Datagram Protocol).

TCP/IP was developed by the U.S. Department of Defense, Defense Advanced Research Projects Agency (DARPA). Because TCP/IP is a collection of protocols, rather than a particular software program, the software that provides its services has been implemented for many different hardware platforms and operating systems. The TCP/IP protocols are used as building blocks on which other products or applications are built. As a result, it is a widely accepted standard in the UNIX world.
The Network File System (NFS) is a product that utilizes TCP/IP and was built originally in a Berkeley UNIX environment. It is a proprietary product developed by Sun Microsystems and is supported and licensed on many other UNIX implementations, including the Digital UNIX operating system.
NFS allows users to mount remote file systems in their own local directories, thereby giving the appearance of an extension of their local file system. The machine that offers file systems for other machines to access is called the server or file server; the machines that access these file systems by remotely mounting them are called clients.
NFS, however, is not a network extension of UNIX and does not adhere to UNIX semantics. It does not support all UNIX file system operations, cannot obtain access to remote devices (that is, files and file systems can be operated on, but not the physical devices on which they reside), and does not guarantee atomic operations. It operates independently of the machine and operating system and can be used on non-UNIX machines as well as those running UNIX.
The User Datagram Protocol (UDP) is commonly used in NFS. UDP is the internet standard protocol that allows an application program on one host to send a datagram to an application program on another host. UDP provides an unreliable, connectionless delivery service using IP to transport messages among hosts.
UDP is similar to TCP and it provides a mechanism for user applications to communicate with IP. UDP differs from TCP in that it is a simple protocol that is entirely dependent upon IP to provide reliability. UDP does not guarantee delivery, occasionally generates duplicate data packets, and may send data in the incorrect order. However, layers above UDP can create reliable services using UDP.
Both the host (client) and remote (server) machines start network daemon processes running when they are booted. Machines that can be reached from the network are listed in a data file with their network addresses. Each local machine knows its own name and network address. As data is sent out over the network, the address and routing information are filled in by the sending network daemon. Network daemons on receiving machines decode the address to determine for whom the message is intended. If the message is intended for the receiving machine, it decodes the message and processes it; otherwise, it does nothing.