6    Memory Trolling

This chapter discusses the features of the operating system that support this process including the following topics:

6.1    Overview

The operating system handles memory errors with a just-in-time scrubbing model, where correctable errors are scrubbed when encountered by the operating system or an application. To enhance this capability, a trigger mechanism, called the memory troller proactively locates and scrubs correctable memory errors. The memory troller systematically reads each memory location. If it discovers a correctable memory error, it triggers the just-in-time scrubbing mechanism.

Because the memory troller reads all memory available to the operating system, it also might discover uncorrectable memory errors, which would lead to an unrecoverable machine check. To avoid this, the operating system recognizes that the machine check resulted from memory trolling, dismisses the error, and continues normal operation. The memory troller then causes the memory page containing the uncorrectable error to be marked as a bad page. If the bad page is free (or when it becomes free) it is then mapped out so it will not be reused.

6.2    Enabling, Disabling, and Tuning Memory Trolling

For systems supported by the memory troller, use the vm_troll_percent variable to enable, disable, and tune the trolling rate. This parameter is part of the kernel's vm subsystem. The trolling rate is expressed as a percentage of the system's total memory trolled per hour and can be changed at any time. Valid troll rate settings are as follows:

Default value: 4 percent per hour

This value is used by default if you do not specify any value for vm_troll_percent. At this default rate, each 8 kilobyte memory page is trolled once every 24 hours.

Disable value: 0 (zero)

A value of zero disables the memory troller.

Range: 1 - 100 percent

The troll rate is set to the specified percentage of memory to troll per hour. For example, a 50 percent troll rate reads half the total memory in one hour. After all memory is read, the troller starts a new pass at the beginning of memory.

Accelerated trolling: 101 percent

Any value greater than 100 percent invokes one-pass accelerated trolling. All memory is trolled at a rate of approximately 6000 8 kilobyte pages per second, then trolling is disabled. This mode is intended for trolling all memory quickly during off peak hours. For example, on a GS320 system with 32 processors and 128 gigabytes of memory, one-pass accelerated trolling takes approximately five minutes.

Enter the following command to display the current value of vm_troll_percent (the troll rate):

#  /sbin/sysconfig -q vm vm_troll_percent

You can override the default troll rate by adding the following lines to the /etc/sysconfigtab file:

vm:
	vm_troll_percent=percent_rate 

The percent_rate variable is the troll rate as described previously. Use the sysconfigdb command to add entries to the /etc/sysconfigtab file. See sysconfigdb(8) for more information. The new rate takes effect on the next system boot.

You can enable, disable, or change the troll rate at any time using the following command:

#  /sbin/sysconfig -r vm vm_troll_percent=percent_rate 

The percent_rate variable is the troll rate as described previously. Only the superuser (root) or a user authorized by division of privileges (DOP) can use this command. See dop(8) for more information on sharing superuser privileges.

6.2.1    Understanding the Configuration Messages

If the memory troller does not support your system, the following error is displayed on your terminal when you attempt to configure the memory troller using /sbin/sysconfig:

vm_configure: Memory Trolling not supported on this system.
 

Enter the following command to disable trolling:

#  /sbin/sysconfig -r vm vm_troll_percent=0

The following warning message is displayed on your terminal when the preceding command is executed:

vm_configure: shutting down memory troller.
[WARNING: disabling the memory troller is not recommended on
this system.]

This message notifies you that permanently disabling memory trolling is not recommended.

6.2.2    Configuring Accelerated Trolling

To schedule one-pass accelerated trolling at off peak hours, follow this procedure:

  1. Create a shell script named /usr/local/fast_troll.sh containing the following lines:

    #!/sbin/sh
     
    /sbin/sysconfig -r vm vm_troll_percent=101
    

  2. Enter the following commands to set the file owner and permissions of /usr/local/fast_troll.sh:

    #  chown root /usr/local/fast_troll.sh
    #  chmod 744 /usr/local/fast_troll.sh
    

  3. Use the cron facility to schedule execution of the shell script as root user at the wanted time. See cron(8) for more information.

6.3    Controlling the Use of System Resources

Low trolling rates, such as the 4 percent default, have negligible impact on system performance. Processor usage for memory trolling increases as the troll rate is increased. To approximate the performance overhead, use the following procedure:

  1. Log in as root or become superuser.

  2. Choose a time when the system is idle and disable the memory troller.

    Enter the following command to dissable the memory troller:

    #  /sbin/sysconfig -r vm vm_troll_percent=0
    

  3. Enter the following command with the memory troller disabled to establish a performance baseline:

    #  vmstat 1
    

    procs    memory         pages                          intr        cpu
    r  w  u  act  free wire fault cow zero react pin pout  in  sy  cs  us  sy  id
    2130 21   15K  40K 7682 104M  37M  27M  19K  22M  184  70 178 177   1   1  98
     
    

  4. In the command output, note the system time, labeled sy under the cpu heading. Enter the following command to adjust the vm_troll_percent value:

    #  /sbin/sysconfig -r vm vm_troll_percent=percent_rate
    

    Repeat step 3 and note any change in the value of sy under the cpu heading.

A system time (sy) increase of one or less represents negligible performance cost. Repeat the procedure, adjusting the percent value of vm_troll_percent until the performance cost is acceptable.

For example, a GS320 system with 32 processors and 128 GB of memory will show approximately 25 percent of system time during one-pass accelerated trolling. The same system at the 4 percent default troll rate will show one percent or less system time.

6.4    Understanding Memory Troller Messages

The memory troller might produce both informational messages and error messages as described in the following sections.

6.4.1    Informational Messages

The following messages provide information about events associated with memory troller operation. These messages do not indicate a failure in the memory troller:

6.4.2    Error Messages

If any of the following error messages are displayed on the console terminal, a malfunction has occurred in the memory troller and you must contact your technical support organization.

VM_CONFIGURE: Memory Trolling is currently disabled on this system

The memory troller has been disabled due to a fatal error.

adjust_troll_quantity: null MAD pointer, disabling troller

A fatal internal error has occurred, the troller is disabled.

adjust_troll_quantity: invalid troll_percent 0 defaulting to 4 percent

The troller is active, but the troll rate is zero. The troller continues operating, but at the default troll rate. This is a serious error.

vm_memory_troller: CPU # vmmt_get_mad() failed, disabling troller

A fatal internal error has occurred, the troller is disabled.

vm_memory_troller: MAD # invalid state [#], shutting down

A fatal internal error has occurred, the troller is disabled.

6.5    Memory Troller Interactions with OLAR

The Memory Troller automatically reconfigures itself when CPUs are taken off line (for removal) or placed back on line (after addition).

The Memory Troller automatically switches to another CPU if the CPU designated as the vm_primary CPU is taken off line.

The Memory Troller reconfigures itself when a new CPU comes on line (if necessary) to select a new vm_primary. If the memory troller does not need to select a new vm_primary it will not.

If there are enough CPUs available in a system, the Memory Troller avoids the following configurations: