10    Managing Network Performance

This chapter describes how to manage Tru64 UNIX network subsystem performance. The following sections describe these tasks:

10.1    Gathering Network Information

Table 10-1 describes the commands you can use to obtain information about network operations.

Table 10-1:  Network Monitoring Tools

Name Use Description

netstat

Displays network statistics (Section 10.1.1)

Displays a list of active sockets for each protocol, information about network routes, and cumulative statistics for network interfaces, including the number of incoming and outgoing packets and packet collisions. Also, displays information about memory used for network operations.

traceroute

Displays the packet route to a network host

Tracks the route network packets follow from gateway to gateway. See traceroute(8) for more information.

ping

Determines if a system can be reached on the network

Sends an Internet Control Message Protocol (ICMP) echo request to a host in order to determine if a host is running and reachable, and to determine if an IP router is reachable. Enables you to isolate network problems, such as direct and indirect routing problems. See ping(8) for more information.

sobacklog_hiwat attribute

Reports the maximum number of pending requests to any server socket (Section 10.1.2)

Allows you to display the maximum number of pending requests to any server socket in the system.

sobacklog_drops attribute

Reports the number of backlog drops that exceed a socket backlog limit (Section 10.1.2)

Allows you to display the number of times the system dropped a received SYN packet, because the number of queued SYN_RCVD connections for a socket equaled the socket backlog limit.

somaxconn_drops attribute

Reports the number of drops that exceed the value of the somaxconn attribute (Section 10.1.2)

Allows you to display the number of times the system dropped a received SYN packet because the number of queued SYN_RCVD connections for a socket equaled the upper limit on the backlog length (somaxconn attribute).

tcpdump

Monitors network interface packets

Monitors and displays packet headers on a network interface. You can specify the interface on which to listen, the direction of the packet transfer, or the type of protocol traffic to display.

The tcpdump command allows you to monitor the network traffic associated with a particular network service and to identify the source of a packet. It lets you determine whether requests are being received or acknowledged, or to determine the source of network requests, in the case of slow network performance.

Your kernel must be configured with the packetfilter option to use the command. See tcpdump(8) and packetfilter(7) for more information.

The following sections describe some of these commands in detail.

10.1.1    Monitoring Network Statistics by Using the netstat Command

To check network statistics, use the netstat command. Some problems to look for are as follows:

Most of the information provided by netstat is used to diagnose network hardware or software failures, not to analyze tuning opportunities. See the Network Administration manual for more information on how to diagnose failures.

The following example shows the output produced by the netstat -i command:

# /usr/sbin/netstat -i
Name  Mtu   Network     Address         Ipkts Ierrs    Opkts Oerrs  Coll
ln0   1500  DLI         none           133194     2    23632     4  4881
ln0   1500  <Link>                     133194     2    23632     4  4881
ln0   1500  red-net     node1          133194     2    23632     4  4881
sl0*  296   <Link>                          0     0        0     0     0
sl1*  296   <Link>                          0     0        0     0     0
lo0   1536  <Link>                        580     0      580     0     0
lo0   1536  loop        localhost         580     0      580     0     0
 

Use the following netstat command to determine the causes of the input (Ierrs) and output (Oerrs) shown in the preceding example:

# /usr/sbin/netstat -is
 
ln0 Ethernet counters at Fri Jan 14 16:57:36 1998
 
        4112 seconds since last zeroed
    30307093 bytes received
     3722308 bytes sent
      133245 data blocks received
       23643 data blocks sent
    14956647 multicast bytes received
      102675 multicast blocks received
       18066 multicast bytes sent
         309 multicast blocks sent
        3446 blocks sent, initially deferred
        1130 blocks sent, single collision
        1876 blocks sent, multiple collisions
           4 send failures, reasons include:
                Excessive collisions
           0 collision detect check failure
           2 receive failures, reasons include:
                Block check error
                Framing Error
           0 unrecognized frame destination
           0 data overruns
           0 system buffer unavailable
           0 user buffer unavailable

The netstat -s command displays the following statistics for each protocol:

# /usr/sbin/netstat -s
ip:
        67673 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        8616 fragments received
        0 fragments dropped (dup or out of space)
        5 fragments dropped after timeout
        0 packets forwarded
        8 packets not forwardable
        0 redirects sent
icmp:
        27 calls to icmp_error
        0 errors not generated  old message was icmp
        Output histogram:
                echo reply: 8
                destination unreachable: 27
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                echo reply: 1
                destination unreachable: 4
                echo: 8
        8 message responses generated
igmp:
        365 messages received
        0 messages received with too few bytes
        0 messages received with bad checksum
        365 membership queries received
        0 membership queries received with invalid field(s)
        0 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        0 membership reports sent
tcp:
        11219 packets sent
                7265 data packets (139886 bytes)
                4 data packets (15 bytes) retransmitted
                3353 ack-only packets (2842 delayed)
                0 URG only packets
                14 window probe packets
                526 window update packets
                57 control packets
        12158 packets received
                7206 acks (for 139930 bytes)
                32 duplicate acks
                0 acks for unsent data
                8815 packets (1612505 bytes) received in-sequence
                432 completely duplicate packets (435 bytes)
                0 packets with some dup. data (0 bytes duped)
                14 out-of-order packets (0 bytes)
                1 packet (0 bytes) of data after window
                0 window probes
                1 window update packet
                5 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        19 connection requests
        25 connection accepts
        44 connections established (including accepts)
        47 connections closed (including 0 drops)
        3 embryonic connections dropped
        7217 segments updated rtt (of 7222 attempts)
        4 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
        0 keepalive timeouts
                0 keepalive probes sent
                0 connections dropped by keepalive
udp:
        12003 packets sent
        48193 packets received
        0 incomplete headers
        0 bad data length fields
        0 bad checksums
        0 full sockets
        12943 for no port (12916 broadcasts, 0 multicasts)
 

See netstat(1) for information about the output produced by the various command options.

10.1.2    Checking Socket Listen Queue Statistics by Using the sysconfig Command

You can determine whether you need to increase the socket listen queue limit by using the sysconfig -q socket command to display the values of the following attributes:

It is recommended that the value of the sominconn attribute equal the value of the somaxconn attribute. If so, the value of somaxconn_drops will have the same value as sobacklog_drops.

However, if the value of the sominconn attribute is 0 (the default), and if one or more server applications uses an inadequate value for the backlog argument to its listen system call, the value of sobacklog_drops may increase at a rate that is faster than the rate at which the somaxconn_drops counter increases. If this occurs, you may want to increase the value of the sominconn attribute.

See Section 10.2.3 for information on tuning socket listen queue limits.

10.2    Tuning the Network Subsystem

Most resources used by the network subsystem are allocated and adjusted dynamically; however, there are some tuning recommendations that you can use to improve performance, particularly with systems that are Internet servers.

Network performance is affected when the supply of resources is unable to keep up with the demand for resources. The following two conditions can cause this congestion to occur:

Neither of these problems are network tuning issues. In the case of a problem on the network, you must isolate and eliminate the problem. In the case of high network traffic (for example, the hit rate on a Web server has reached its maximum value while the system is 100 percent busy), you must either redesign the network and redistribute the load, reduce the number of network clients, or increase the number of systems handling the network load. See the Network Programmer's Guide and the Network Administration manual for information on how to resolve network problems.

Table 10-2 lists network subsystem tuning guidelines and performance benefits as well as tradeoffs.

Table 10-2:  Network Tuning Guidelines

Action Performance Benefit Tradeoff
Increase the size of the hash table that the kernel uses to look up TCP control blocks (Section 10.2.1) Improves the TCP control block lookup rate and increases the raw connection rate Slightly increases the amount of wired memory
Increase the number of TCP hash tables (Section 10.2.2) Reduces head lock contention for SMP systems Slightly increases the amount of wired memory
Increase the limits for partial TCP connections on the socket listen queue (Section 10.2.3) Improves throughput and response time on systems that handle a large number of connections Consumes memory when pending connections are retained in the queue
Increase the number of outgoing connection ports (Section 10.2.4) Allows more simultaneous outgoing connections None
Modify the range of outgoing connection ports (Section 10.2.5) Allows you to use ports from a specific range None
Enable TCP keepalive functionality (Section 10.2.6) Enables inactive socket connections to time out None
Increase the size of the kernel interface alias table (Section 10.2.7) Improves the IP address lookup rate for systems that serve many domain names Slightly increases the amount of wired memory
Make partial TCP connections time out more quickly (Section 10.2.8) Prevents clients from overfilling the socket listen queue A short time limit may cause viable connections to break prematurely
Make the TCP connection context time out more quickly at the end of the connection (Section 10.2.9) Frees connection resources sooner Reducing the timeout limit increases the potential for data corruption, so guideline should be applied with caution
Reduce the TCP retransmission rate (Section 10.2.10) Prevents premature retransmissions and decreases congestion A long retransmit time is not appropriate for all configurations
Enable the immediate acknowledgement of TCP data (Section 10.2.11) Can improve network performance for some connections May adversely affect network bandwidth
Increase the TCP maximum segment size (Section 10.2.12) Allows sending more data per packet May result in fragmentation at router boundary
Increase the size of the transmit and receive socket buffers (Section 10.2.13) Buffers more TCP packets per socket May decrease available memory when the buffer space is being used
Increase the size of the transmit and receive buffers for a UDP socket (Section 10.2.14) Helps to prevent dropping UDP packets May decrease available memory when the buffer space is being used
Allocate sufficient memory to the UBC (Section 10.2.15) Improves disk I/O performance May decrease the physical memory available to the virtual memory subsystem
Disable the use of a PMTU (Section 10.2.16) Improves the efficiency of servers that handle remote traffic from many clients May reduce server efficiency for LAN traffic
Increase the size of the ARP table (Section 10.2.17) May improve network performance on a system that is simultaneously connected to many nodes on the same LAN Consumes memory resources
Increase the maximum size of a socket buffer (Section 10.2.18) Allows large socket buffer sizes Consumes memory resources
Increase the number of IP input queues (Section 10.2.19) Reduces IP input queue lock contention for SMP systems None
Prevent dropped input packets (Section 10.2.20) Allows high network loads None
Enable mbuf cluster compression (Section 10.2.21) Improves efficiency of network memory allocation None
Modify the NetRAIN retry limit (Section 10.2.22) Controls the time to detect an interface failure Aggressive monitoring can increase CPU usage
Modify the NetRAIN monitoring timer (Section 10.2.23) Controls the time to detect an interface failure Aggressive monitoring can increase CPU usage

The following sections describe these tuning guidelines in detail.

10.2.1    Improving the Lookup Rate for TCP Control Blocks

You can modify the size of the hash table that the kernel uses to look up Transmission Control Protocol (TCP) control blocks. The inet subsystem attribute tcbhashsize specifies the number of hash buckets in the kernel TCP connection table (the number of buckets in the inpcb hash table). The default value is 512.

The kernel must look up the connection block for every TCP packet it receives, so increasing the size of the table can speed the search and and improve performance.

For Internet, Web, proxy, firewall, and gateway servers, set the tcbhashsize attribute to 16384.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.2    Increasing the Number of TCP Hash Tables

If you have an SMP system, you may be able to reduce head lock contention by increasing the number of hash tables that the kernel uses to look up Transmission Control Protocol (TCP) control blocks.

Because the kernel must look up the connection block for every TCP packet it receives, a bottleneck may occur at the TCP hash table in SMP systems. Increasing the number of tables distributes the load and may improve performance.

The inet subsystem attribute tcbhashnum specifies the number of TCP hash tables. For busy Internet server SMP systems, you can increase the value of the tcbhashnum attribute to 16. The minimum and default values are 1; the maximum value is 64.

It is recommended that you make the value of the tcbhashnum attribute the same as the value of the inet subsystem attribute ipqs. See Section 10.2.19 for information.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.3    Tuning the Socket Listen Queue Limits

You may be able to improve performance by increasing the limits for the socket listen queue (only for TCP). The socket subsystem attribute somaxconn specifies the maximum number of pending TCP connections (the socket listen queue limit) for each server socket. If the listen queue connection limit is too small, incoming connect requests may be dropped. Note that pending TCP connections can be caused by lost packets in the Internet or denial of service attacks. The default value of the somaxconn attribute is 1024; the maximum value is 65535.

To improve throughput and response time with fewer drops, you can increase the value of the somaxconn attribute. A busy system running applications that generate a large number of connections may have many pending connections. For Internet, Web, proxy, firewall, and gateway servers, set the value of the somaxconn attribute to the maximum value of 65535.

The socket subsystem attribute sominconn specifies the minimum number of pending TCP connections (backlog) for each server socket. The attribute controls how many SYN packets can be handled simultaneously before additional requests are discarded. The default value is zero. The value of the sominconn attribute overrides the application-specific backlog value, which may be set too low for some server software.

To improve performance without recompiling an application and for Internet, Web, proxy, firewall, and gateway servers, set the value of the sominconn attribute to the maximum value of 65535. The value of the sominconn attribute should be the same as the value of the somaxconn attribute.

Network performance can degrade if a client saturates a socket listen queue with erroneous TCP SYN packets, effectively blocking other users from the queue. To eliminate this problem, increase the value of the sominconn attribute to 65535. If the system continues to drop incoming SYN packets, you can decrease the value of the inet subsystem attribute tcp_keepinit to 30 (15 seconds).

See Section 10.1.2 for information about monitoring the sobacklog_hiwat, sobacklog_drops, and somaxconn_drops attributes. If the values show that the queues are overflowing, you may need to increase the socket listen queue limit.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.4    Increasing the Number of Outgoing Connection Ports

When a TCP or UDP application creates an outgoing connection, the kernel dynamically allocates a nonreserved port number for each connection. The kernel selects the port number from a range of values between the value of the inet subsystem attribute ipport_userreserved_min and the value of the ipport_userreserved attribute. Using the default attribute values, the number of simultaneous outgoing connections is limited to 3976 (5000 minus 1024).

If your system requires many outgoing ports, you may want to increase the value of the ipport_userreserved attribute. If your system is a proxy server (for example, a Squid Caching Server or a firewall system) with a load of more than 4000 simultaneous connections, increase the value of the ipport_userreserved attribute to the maximum value of 65000.

It is not recommended that you reduce the value of the ipport_userreserved attribute to a value that is less than 5000 or increase it to a value that is higher than 65000.

You can also modify the range of outgoing connection ports. See Section 10.2.5 for information.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.5    Modifying the Range of Outgoing Connection Ports

When a TCP or UDP application creates an outgoing connection, the kernel dynamically allocates a nonreserved port number for each connection. The kernel selects the port number from a range of values between the value of the inet subsystem attribute ipport_userreserved_min and the value of the ipport_userreserved attribute. Using the default values for these attributes, the range of outgoing ports starts at 1024 and stops at 5000.

If your system requires outgoing ports from a particular range, you can modify the values of the ipport_userreserved_min and ipport_userreserved attributes. The maximum value of both attributes is 65000. Do not reduce the ipport_userreserved attribute to a value that is less than 5000 or reduce the ipport_userreserved_min attribute to a value that is less than 1024.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.6    Enabling TCP Keepalive Functionality

Keepalive functionality enables the periodic transmission of messages on a connected socket in order to keep connections active. If you enable keepalive, sockets that do not exit cleanly are cleaned up when the keepalive interval expires. If keepalive is not enabled, those sockets will continue to exist until you reboot the system.

Applications enable keepalive for sockets by setting the setsockopt function's SO_KEEPALIVE option. To override programs that do not set keepalive on their own or if you do not have access to the application sources, set the inet subsystem attribute tcp_keepalive_default to 1 in order to enable keepalive for all sockets.

If you enable keepalive, you can also configure the following TCP options for each socket:

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.7    Improving the Lookup Rate for IP Addresses

The inifaddr_hsize attribute specifies the number of hash buckets in the kernel interface alias table (in_ifaddr). The default value of the inet subsystem attribute inifaddr_hsize is 32; the maximum value is 512.

If a system is used as a server for many different server domain names, each of which are bound to a unique IP address, the code that matches arriving packets to the right server address uses the hash table to speed lookup operations for the IP addresses. Increasing the number of hash buckets in the table can improve performance on systems that use large numbers of aliases.

For the best performance, the value of the inifaddr_hsize attribute is always rounded down to the nearest power of 2. If you are using more than 500 interface IP aliases, specify the maximum value of 512. If you are using less than 250 aliases, use the default value of 32.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.8    Decreasing the Partial TCP Connection Timeout Limit

The inet subsystem attribute tcp_keepinit specifies the amount of time that a partially established TCP connection remains on the socket listen queue before it times out. The value of the attribute is in units of 0.5 seconds. The default value is 150 units (75 seconds).

Partial connections consume listen queue slots and fill the queue with connections in the SYN_RCVD state. You can make partial connections time out sooner by decreasing the value of the tcp_keepinit attribute. However, do not set the value too low, because you may prematurely break connections associated with clients on network paths that are slow or network paths that lose many packets. Do not set the value to less than 20 units (10 seconds). If you have a 32000 socket queue limit, the default (75 seconds) is usually adequate.

Network performance can degrade if a client overfills a socket listen queue with TCP SYN packets, effectively blocking other users from the queue. To eliminate this problem, increase the value of the sominconn attribute to the maximum of 64000. If the system continues to drop SYN packets, decrease the value of the tcp_keepinit attribute to 30 (15 seconds).

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.9    Decreasing the TCP Connection Context Timeout Limit

You can make the TCP connection context time out more quickly at the end of a connection. However, this will increase the chance of data corruption.

The TCP protocol includes a concept known as the Maximum Segment Lifetime (MSL). When a TCP connection enters the TIME_WAIT state, it must remain in this state for twice the value of the MSL, or else undetected data errors on future connections can occur. The inet subsystem attribute tcp_msl determines the maximum lifetime of a TCP segment and the timeout value for the TIME_WAIT state.

The value of the attribute is set in units of 0.5 seconds. The default value is 60 units (30 seconds), which means that the TCP connection remains in TIME_WAIT state for 60 seconds (or twice the value of the MSL). In some situations, the default timeout value for the TIME_WAIT state (60 seconds) is too large, so reducing the value of the tcp_msl attribute frees connection resources sooner than the default behavior.

Do not reduce the value of the tcp_msl attribute unless you fully understand the design and behavior of your network and the TCP protocol. It is strongly recommended that you use the default value; otherwise, there is the potential for data corruption.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.10    Decreasing the TCP Retransmission Rate

The inet subsystem attribute tcp_rexmit_interval_min specifies the minimum amount of time between the first TCP retransmission. For some wide area networks (WANs), the default value may be too small, causing premature retransmission timeouts. This may lead to duplicate transmission of packets and the erroneous invocation of the TCP congestion-control algorithms.

The tcp_rexmit_interval_min attribute is specified in units of 0.5 seconds. The default value is 2 units (1 second).

You can increase the value of the tcp_rexmit_interval_min attribute to slow the rate of TCP retransmissions, which decreases congestion and improves performance. However, not every connection needs a long retransmission time. Usually, the default value is adequate. Do not specify a value that is less than 1 unit. Do not change the attribute unless you fully understand TCP algorithms.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.11    Disabling Delaying the Acknowledgment of TCP Data

The value of the inet subsystem attribute tcpnodelack determines whether the system delays acknowledging TCP data. The default is 0, which delays the acknowledgment of TCP data. Usually, the default is adequate. However, for some connections (for example, loopback), the delay can degrade performance.

You may be able to improve network performance by setting the value of the tcpnodelack attribute to 1, which disables the acknowledgment delay. However, this may adversely impact network bandwidth. Use the tcpdump command to check for excessive delays.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.12    Increasing the Maximum TCP Segment Size

The inet subsystem attribute tcp_mssdflt specifies the TCP maximum segment size (the default value of 536). You can increase the value to 1460. This allows sending more data per socket, but may cause fragmentation at the router boundary.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.13    Increasing the Transmit and Receive Socket Buffers

The inet subsystem attribute tcp_sendspace specifies the default transmit buffer size for a TCP socket. The tcp_recvspace attribute specifies the default receive buffer size for a TCP socket. The default value of both attributes is 32 KB. You can increase the value of these attributes to 60 KB. This allows you to buffer more TCP packets per socket. However, increasing the values uses more memory when the buffers are being used by an application (sending or receiving data).

You may want to increase the maximum size of a socket buffer before you increase the transmit and receive buffers. See Section 10.2.18 for information.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.14    Increasing the Transmit and Receive Buffers for a UDP Socket

The inet subsystem attribute udp_sendspace specifies the default transmit buffer size for an Internet User Datagram Protocol (UDP) socket; the default value is 9 KB. The inet subsystem attribute udp_recvspace specifies the default receive buffer size for a UDP socket; the default value is 40 KB. You can increase the values of these attributes to 64 KB. However, increasing the values uses more memory when the buffers are being used by an application (sending or receiving data).

These attributes do not have an impact on the Network File System (NFS).

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.15    Allocating Sufficient Memory to the UBC

You must ensure that you have sufficient memory allocated to the Unified Buffer Cache (UBC). Servers that perform lots of file I/O (for example, Web and proxy servers) extensively utilize both the UBC and the virtual memory subsystem. In most cases, use the default value of 100 percent for the vm subsystem attribute ubc-maxpercent, which specifies the maximum amount of physical memory that can be allocated to the UBC. If necessary, you can decrease the size of the attribute by increments of 10 percent.

See Section 9.2.3 for more information about tuning the UBC.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.16    Disabling Use of a PMTU

Packets transmitted between servers are fragmented into units of a specific size in order to ease transmission of the data over routers and small-packet networks, such as Ethernet networks. When the inet subsystem attribute pmtu_enabled is enabled (set to 1, which is the default behavior), the system determines the largest common path maximum transmission unit (PMTU) value between servers and uses it as the unit size. The system also creates a routing table entry for each client network that attempts to connect to the server.

On a server that handles local traffic and some remote traffic, enabling the use of a PMTU can improve bandwidth. However, if a server handles traffic among many remote clients, enabling the use of a PMTU can cause an excessive increase in the size of the kernel routing tables, which can reduce server efficiency.

If an Internet, Web, proxy, firewall, or gateway server has poor performance and the routing table increases to more than 1000 entries, set the value of the pmtu_enabled attribute to 0 to disable the use of PMTU protocol.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.17    Increasing the Size of the ARP Table

The net subsystem attribute arptab_nb specifies the number of hash buckets in the address resolution protocol (ARP) table (that is, the table's width). The default value is 37.

You can modify the value of the arptab_nb attribute if the ARP table is thrashing. In addition, you may be able to improve performance by modifying the attribute. You can view the ARP table by using the arp -a command or the kdbx arp debugger extension. However, changing the attribute values will not affect performance unless the system is simultaneously connected to many nodes on the same LAN. See the Kernel Debugging manual and kdbx(8) for more information.

You can increase the width of the ARP table by increasing the value of the inarptab_nb attribute. In general, wide ARP tables can decrease the chance that a search will be needed to match an address to an ARP entry. Increasing the value of the arptab_nb attribute will increase the memory used by the ARP table.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.18    Increasing the Maximum Size of a Socket Buffer

If you require a large socket buffer, increase the maximum socket buffer size. To do this, increase the value of the socket subsystem attribute sb_max, before you increase the size of the transmit and receive socket buffers (see Section 10.2.13).

The default maximum socket buffer size is 128 KB.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.19    Increasing the Number of IP Input Queues

For SMP systems, you may be able to reduce lock contention at the IP input queue by increasing the number of queues and distributing the load. The inet subsystem attribute ipqs specifies the number of IP input queues.

For busy Internet server SMP systems, you may want to increase the value of the ipqs attribute to 16. The default value of the ipqs attribute is 1. The minimum value is 1; the maximum value is 64.

It is recommended that you make the value of the ipqs attribute the same as the value of the inet subsystem attribute tcbhashnum. See Section 10.2.2 for information.

See Section 4.4 for information about modifying kernel subsystem attributes.

10.2.20    Preventing Dropped Input Packets

If the IP input queue overflows under a heavy network load, input packets may be dropped. To check for dropped packets, examine the ipintrq kernel structure by using dbx. For example:

# dbx -k /vmunix 
(dbx)print ipintrq
struct {
    ifq_head = (nil)
    ifq_tail = (nil)
    ifq_len = 0
    ifq_maxlen = 512
    ifq_drops = 0
 .
 .
 .

If the ifq_drops field is not 0, increase the value of the inet subsystem attribute ipqmaxlen. For example, you may want to increase the value to 2000. The default and minimum value is 512; the maximum value is 65535.

See Section 4.4 for information about modifying kernel subsystem attributes.

The ipqmaxlen attribute cannot be tuned at run time. You can immediately determine the impact of the kernel modification by using dbx to increase the value of the ipintrq.ifq_maxlen kernel variable. For example:

# dbx -k /vmunix 
(dbx)patch ipintrq.ifq_maxlen=2000

See Section 4.4.6 for information about using dbx.

10.2.21    Enabling mbuf Cluster Compression

The socket subsystem attribute sbcompress_threshold controls whether mbuf clusters are compressed.

By default, mbuf clusters are not compressed (sbcompress_threshold is set to 0), which can cause proxy servers to consume all the available mbuf clusters. This situation is more likely to occur if you are using FDDI instead of Ethernet.

To enable mbuf cluster compression, modify the default value of the socket subsystem attribute sbcompress_threshold. Packets will be copied into the existing mbuf clusters if the packet size is less than this value. For proxy servers, specify a value of 600.

To determine the memory that is being used for mbuf clusters, use the netstat -m command. The following example is from a firewall server with 128 MB memory that does not have mbuf cluster compression enabled:

# netstat -m
  2521 Kbytes for small data mbufs (peak usage 9462 Kbytes)
 78262 Kbytes for mbuf clusters (peak usage 97924 Kbytes)
  8730 Kbytes for sockets (peak usage 14120 Kbytes)
  9202 Kbytes for protocol control blocks (peak usage 14551
     2 Kbytes for routing table (peak usage 2 Kbytes)
     2 Kbytes for socket names (peak usage 4 Kbytes)
     4 Kbytes for packet headers (peak usage 32 Kbytes)
 39773 requests for mbufs denied
     0 calls to protocol drain routines
 98727 Kbytes allocated to network

The previous example shows 39773 requests for memory were denied. This indicates a problem because this value should be 0. The example also shows that 78 MB of memory has been assigned to mbuf clusters, and that 98 MB of memory is being consumed by the network subsystem.

If you increase the value of the sbcompress_threshold attribute to 600, the memory allocated to the network subsystem immediately decreases to 18 MB, because compression at the kernel socket buffer interface results in a more efficient use of memory.

10.2.22    Modifying the NetRAIN Retry Limit

The netrain subsystem attribute nr_max_retries specifies how many failed tests must occur before a NetRAIN interface is determined to have failed and a backup interface is brought on line. The default value is 4.

Decreasing the default value will cause NetRAIN to be more aggressive about declaring an interface to be failed and forcing an interface failover, but will increase CPU usage. Increasing the default value will cause NetRAIN to be more tolerant of temporary failures, but may result in long failover times.

For ATM LAN Emulation (LANE) interfaces, set the value of the nr_max_retries attribute to 5.

10.2.23    Modifying the NetRAIN Monitoring Timer

The netrain subsystem attribute netrain_timeout specifies the number of clock ticks between runs of the kernel thread that monitors the health of the network interfaces. All other NetRAIN timers are based on this frequency. The default value is 1000 ticks (1 second).

Decreasing the value of the netrain_timeout attribute causes NetRAIN to monitor network interfaces more aggressively so that it can quickly detect a failed interface, but it will increase CPU usage. Increasing the value will cause NetRAIN to be more tolerant of temporary failures, but may result in long failover times.