SUMMARY
When examining the SQL Server errorlog, you may periodically see the
following types of messages:
bufwait: timeout, BUF_IO, bp 0xd37ab0, pg 0x5e74e, stat 0x1008/0x400003,
obj 0x6d4f8c1b, bpss 0x2069e1f0
writelog: timeout, dbid 2, dbstat2 0x22(0x22), q1/q2 0/1, owner=3,
waittype=0x81
This article explains the meaning of bufwait and writelog timeout messages.
MORE INFORMATION
SQL Server uses the asynchronous I/O calls ReadFile(), ReadFileEx(),
WriteFile(), and WriteFileEx() when performing reads and writes. These
Win32 API calls submit an I/O request to Windows NT Server, and then allow
the application to continue with other operations while waiting for the
operation to complete. At completion, it signals the application that the
write has completed by means of a synchronization event or completion
routine.
Both the bufwait and logwrite messages are actually warnings, and may not
necessarily be a sign of any problem with the computer or SQL Server. If
SQL Server has waited longer than the number of seconds configured in the
'resource timeout' setting (which by default is 10 seconds), it prints an
error message indicating the timeout. However, it must continue waiting on
the I/O request to complete before it can continue. The bufwait message
corresponds to a delay in either reading a page from disk to put in cache,
or flushing a page in cache out to disk. The logwrite message is specific
to reading or writing log records.
SQL Server uses normal thread synchronization methods to guarantee ordered
access to the transaction log. In a writelog timeout message, the database
ID is supplied in the dbid field, as well as the process ID (owner) of the
other connection on which it is waiting. In the sample message above,
notice that the current process was using tempdb (dbid 2), and that it is
waiting on the checkpoint process (owner=3). When investigating the cause
of writelog timeouts, it may be beneficial to periodically query
sysprocesses to gather information about what types of processing are being
performed by the "owner" spid listed in the writelog message.
Heavy rollback activity adds additional log and disk I/O contention, and
such activity has been observed as contributing to writelog timeouts. For
best performance, attempt to minimize the number of transactions that must
be rolled back; as activity on the server increases, using a query timeout
that is too short may further exacerbate the problem by causing unnecessary
rollbacks. Beginning with SQL Server 6.5 Service Pack 2, a new
configuration option, LogLRU buffers, was added that may improve
performance in environments requiring heavy log reads, such as would be
required in processing a rollback or trigger. Note that using this option
reserves space to hold log pages in cache, reducing the amount of data
cache available. See the Readme.txt file included in the service pack for
more information on determining an optimum value for this parameter.
If these messages appear only intermittently, they may not be indicative of
any problem. However, if they appear on a fairly regular basis, it may be a
sign that the disk subsystem is reaching capacity, and that you may benefit
from spreading the I/O load over a larger number of controllers or disks,
to reduce the length of time you have to wait for the operation to
complete.
To determine what load the disk subsystem is handling, you can use
Performance Monitor to watch the disk counters appropriate to your system.
Monitor the %Disk Time counter for LogicalDisk and/or PhysicalDisk that
correspond to your log and data devices; if the value is consistently high,
consider ways to spread the I/O load, and look for indications that the
disk subsystem is reaching its limits. You can perform additional fine
tuning by using the %Disk Read Time and %Disk Write Time counters, which
would allow finer control in spreading the I/O load across the disks and
controllers. A large Disk Queue Length value indicates a large backlog in
I/O requests, and you may want to investigate adding further disks or
additional controllers to spread the I/O load. You can find additional
information about the disk counters in the Windows NT Server documentation,
the Windows NT Server Resource Kit, and the following article in the
Microsoft Knowledge Base:
102020
: How to Monitor Disk Performance with Performance Monitor
While these messages result from delay in performing I/O, several SQL
Server configuration values have a direct impact on the overall I/O
throughput of the system: max async I/O; max lazywrite I/O; logwrite sleep;
and resource timeout.
The 'max async I/O' configuration value determines the maximum number of
outstanding asynchronous I/O requests that SQL Server will allow,
throttling the number of pending requests. When an asynchronous I/O request
is made, Windows NT Server must queue the request, which has an associated
overhead. While asynchronous I/O requests can increase performance when
used appropriately, a limit can be reached at which point further use may
actually begin to degrade system performance. This limit largely depends on
the controller, driver, and associated disk subsystem. If you persistently
receive any of the above errors and have changed this setting from its
default value, consider setting it back to its default value.
The 'max lazywrite I/O' parameter is closely related to the 'max async I/O'
setting, but specifically throttles the number of outstanding requests made
by the Lazywriter process. The Lazywriter process attempts to flush changed
pages from cache to disk so that there is always some free space in data
cache. The amount of space that the process attempts to keep free is
controlled by the 'free buffers' configuration option.
The 'logwrite sleep' configuration setting is an advanced configuration
option that forces an intentional delay before writing log pages to disk.
Because the log pages must be physically committed to disk before data
modifications are made, you generally want these writes to be performed as
quickly as possible. In certain circumstances, you can enhance performance
by intentionally forcing a delay so that multiple processes can "pack" log
records on a single page and thus reduce the number of write operations.
The amount of time used in this intentional delay is counted as part of the
timeout period for the overall write operation, so a writelog timeout may
be artificially induced by increasing the 'logwrite sleep' configuration
setting. Changes to this configuration value should only be made in a
controlled fashion, to ensure that performance is not hindered.
The 'resource timeout' configuration setting controls the overall timeout
on the asynchronous I/O operations. By default, the setting is set to 10
(seconds), so any operation that takes longer than 10 seconds to complete
results in a timeout warning. Setting this value too low may cause timeouts
in an unrealistically short time. If you are frequently seeing bufwait or
writelog timeouts, you can increase this setting to reduce the frequency
with which these errors are generated.
Because SQL Server stresses the I/O subsystem so extensively, it is also
important to verify that you are running with the latest versions of the
drive and controller firmware, as well as the latest driver. Also consult
your hardware manufacturer for any diagnostics they may have for stressing
the disk subsystem.
If the above information does not help resolve the issue, you may want to
consider stressing the system in a manner similar to that done by SQL
Server. To assist you in doing this, a utility called SQLHDTST has been
written. For more information about the utility and locations from which it
can be downloaded, see the following article in the Microsoft Knowledge
Base:
178444 SQL Server Utility Files Available