This chapter discusses the features of the operating system that support this process including the following topics:
Enabling, Disabling, and Tuning Memory Trolling (Section 6.2)
Controlling system resource use (Section 6.3)
Memory Troller Messages (Section 6.4)
Interactions with OLAR (Section 6.5)
The operating system handles memory errors with a just-in-time scrubbing model, where correctable errors are scrubbed when encountered by the operating system or an application. To enhance this capability, a trigger mechanism, called the memory troller proactively locates and scrubs correctable memory errors. The memory troller systematically reads each memory location. If it discovers a correctable memory error, it triggers the just-in-time scrubbing mechanism.
Because the memory troller reads all memory available to the operating
system, it also might discover uncorrectable memory errors, which would lead
to an unrecoverable machine check.
To avoid this, the operating system recognizes
that the machine check resulted from memory trolling, dismisses the error,
and continues normal operation.
The memory troller then causes the memory
page containing the uncorrectable error to be marked as a bad page.
If the
bad page is free (or when it becomes free) it is then mapped out so it will
not be reused.
6.2 Enabling, Disabling, and Tuning Memory Trolling
For systems supported by the memory troller, use the
vm_troll_percent
variable to enable, disable, and tune the trolling rate.
This parameter
is part of the kernel's
vm
subsystem.
The trolling rate
is expressed as a percentage of the system's total memory trolled per hour
and can be changed at any time.
Valid troll rate settings are as follows:
This value is
used by default if you do not specify any value for
vm_troll_percent
.
At this default rate, each 8 kilobyte memory page is trolled once
every 24 hours.
A value of zero disables the memory troller.
The troll rate is set to the specified percentage of memory to troll per hour. For example, a 50 percent troll rate reads half the total memory in one hour. After all memory is read, the troller starts a new pass at the beginning of memory.
Any value greater than 100 percent invokes one-pass accelerated trolling. All memory is trolled at a rate of approximately 6000 8 kilobyte pages per second, then trolling is disabled. This mode is intended for trolling all memory quickly during off peak hours. For example, on a GS320 system with 32 processors and 128 gigabytes of memory, one-pass accelerated trolling takes approximately five minutes.
Enter the following command to display the current value of
vm_troll_percent
(the troll rate):
# /sbin/sysconfig -q vm vm_troll_percent
You
can override the default troll rate by adding the following lines to the
/etc/sysconfigtab
file:
vm: vm_troll_percent=percent_rate
The
percent_rate
variable is the troll rate as described previously.
Use the
sysconfigdb
command to add entries to the
/etc/sysconfigtab
file.
See
sysconfigdb
(8)
for more information.
The new rate takes effect on the next system boot.
You can enable, disable, or change the troll rate at any time using the following command:
# /sbin/sysconfig -r vm vm_troll_percent=percent_rate
The
percent_rate
variable
is the troll rate as described previously.
Only the superuser (root) or a
user authorized by division of privileges (DOP) can use this command.
See
dop
(8)
for more information on sharing superuser privileges.
6.2.1 Understanding the Configuration Messages
If the memory troller does not support your system, the following error
is displayed on your terminal when you attempt to configure the memory troller
using
/sbin/sysconfig
:
vm_configure: Memory Trolling not supported on this system.
Enter the following command to disable trolling:
# /sbin/sysconfig -r vm vm_troll_percent=0
The following warning message is displayed on your terminal when the preceding command is executed:
vm_configure: shutting down memory troller. [WARNING: disabling the memory troller is not recommended on this system.]
This message notifies you that permanently disabling
memory trolling is not recommended.
6.2.2 Configuring Accelerated Trolling
To schedule one-pass accelerated trolling at off peak hours, follow this procedure:
Create a shell script named
/usr/local/fast_troll.sh
containing the following lines:
#!/sbin/sh /sbin/sysconfig -r vm vm_troll_percent=101
Enter the following commands to set the file owner and permissions
of
/usr/local/fast_troll.sh
:
# chown root /usr/local/fast_troll.sh # chmod 744 /usr/local/fast_troll.sh
Use the
cron
facility to schedule execution
of the shell script as root user at the wanted time.
See
cron
(8)
for more
information.
6.3 Controlling the Use of System Resources
Low trolling rates, such as the 4 percent default, have negligible impact on system performance. Processor usage for memory trolling increases as the troll rate is increased. To approximate the performance overhead, use the following procedure:
Log in as root or become superuser.
Choose a time when the system is idle and disable the memory troller.
Enter the following command to dissable the memory troller:
# /sbin/sysconfig -r vm vm_troll_percent=0
Enter the following command with the memory troller disabled to establish a performance baseline:
# vmstat 1
procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 2130 21 15K 40K 7682 104M 37M 27M 19K 22M 184 70 178 177 1 1 98
In the command output, note the system time, labeled
sy
under the
cpu
heading.
Enter the following
command to adjust the
vm_troll_percent
value:
# /sbin/sysconfig -r vm vm_troll_percent=percent_rate
Repeat step 3 and note any change in the
value of
sy
under the
cpu
heading.
A system time (sy
) increase of one or less represents
negligible performance cost.
Repeat the procedure, adjusting the percent
value of
vm_troll_percent
until the performance cost is
acceptable.
For example, a GS320 system with 32 processors and 128 GB of memory
will show approximately 25 percent of system time during one-pass accelerated
trolling.
The same system at the 4 percent default troll rate will show one
percent or less system time.
6.4 Understanding Memory Troller Messages
The memory troller might produce both informational messages and error
messages as described in the following sections.
6.4.1 Informational Messages
The following messages provide information about events associated with memory troller operation. These messages do not indicate a failure in the memory troller:
If a memory page containing an uncorrectable error was located by the memory troller and the bad page will be mapped out, the following message is displayed:
Memory Troller: bad page found (address = 0x################)
In addition to the
bad page found...
message,
machine check messages similar to the following are displayed on the system's
console when the memory troller encounters a bad page:
25-Mar-2000 17:24:25 [700] CPU machine check/exception - CPU 0 25-Mar-2000 17:24:25 [700] CPU machine check/exception - CPU 18
These messages come from the event notification subsystem. They indicate that the machine checks resulting from the memory troller reading the bad page have been entered into the binary error log.
If any of the following error messages are displayed on the console terminal, a malfunction has occurred in the memory troller and you must contact your technical support organization.
VM_CONFIGURE: Memory Trolling is currently disabled on this
system
The memory troller has been disabled due to a fatal error.
adjust_troll_quantity: null MAD pointer, disabling troller
A fatal internal error has occurred, the troller is disabled.
adjust_troll_quantity: invalid troll_percent 0 defaulting to
4 percent
The troller is active, but the troll rate is zero. The troller continues operating, but at the default troll rate. This is a serious error.
vm_memory_troller: CPU # vmmt_get_mad() failed, disabling troller
A fatal internal error has occurred, the troller is disabled.
vm_memory_troller: MAD # invalid state [#], shutting down
A fatal internal error has occurred, the troller is disabled.
6.5 Memory Troller Interactions with OLAR
The Memory Troller automatically reconfigures itself when CPUs are taken off line (for removal) or placed back on line (after addition).
The Memory Troller automatically switches to another CPU if the CPU designated as the vm_primary CPU is taken off line.
The Memory Troller reconfigures itself when a new CPU comes on line (if necessary) to select a new vm_primary. If the memory troller does not need to select a new vm_primary it will not.
If there are enough CPUs available in a system, the Memory Troller avoids the following configurations:
Resource Affinity Domains that only have memory
Using the same processor as a vm_primary processor and for handling interrupts