TITLE:  HP Tru64 UNIX - v 5.1B-3 Multiple Virtual Memory Fixes 

Copyright (c) Hewlett-Packard Company 2006.  All rights reserved.


PRODUCT:    HP Tru64 UNIX [R] V5.1B-3
SOURCE:     Hewlett-Packard Company

ECO INFORMATION:

     ECO Name:  T64KIT1000910-V51BB26-E-20060928
     ECO Kit Approximate Size:  4.39MB 
     Kit Applies To:  HP Tru64 UNIX V5.1B-3 PK5 (BL26)

     ECO Kit CHECKSUMS:
	/usr/bin/sum results:  
	28933   4500

	/usr/bin/cksum results: 
	2875295415 4608000

	MD5 results:   
	02b397f6c90fde5305b2387f121b226c

	SHA1 results:
	f425f8fc206bcc844bda9cef055763e89948cc75


ECO KIT SUMMARY:

A dupatch-based, Early Release Patch kit exists for HP Tru64 UNIX V5.1B-3
that contains solutions to the following problem(s):

DESCRIPTION
This Early Release Patch (ERP) kit provides several virtual memory-related
fixes. Specifically, the ERP provides the following: 


Enhancements to the vm_overflow tunable. 
This tunable may be enabled to allow NUMA systems (GS80, GS160, GS320,
GS1280, ES47, ES80) to more easily allocate memory from other RADS
(Resource Affinity Domains). This is mostly a benefit for single
applications that use large amounts of memory. For other applications, the
added NUMA latency may actually degrade system performance. 
To enable, set vm_overflow to 1. To disable, set to 0.

Changes to the way page migrations occur on NUMA systems to address poor
system performance due to excessive paging. 
Corrections that address an incompatibility between cpus_in_rad and
gh_chunks/rad_gh_regions tunables that can result in the following boot
failures: 
vm_mad_init[0]: unable to allocate vm_page array for region 0 
vm_mad_init[0]: unable to allocate vm_page array for region 1 
pmap_update_send: missing ack from cpu <n> 
trap: invalid memory read access from kernel mode 

Correction to a problem where a thread could be left waiting on the
original RAD when a page table page allocation requires an overflow to
another RAD. This problem presents itself as an increased elapsed time for
fork operations to complete. 
Changes that increase the maximum value of cpus_in_rad to 64 and unhide
this tunable.

Correction for "ubc_wire: hash failed" panics on non-NUMA systems. The
following is a typical stack trace of this panic: 
1 panic 
2 ubc_wire 
3 u_vp_oop_pagecontrol 
4 u_anon_update_pmap 
5 u_anon_fault_backed 
6 u_anon_fault 
7 u_anon_lockop 
8 u_map_lockvas 
9 plock 
10 syscall 
11 _Xsyscall 


Corrections for the "not wired" panic with system V shared memory and
bigpages. This panic can occur if callers of the shmat syscall provide
addresses with different page alignments when attaching the same shared
memory region. A typical stack trace follows: 
1 panic 
2 pmap_lw_unwire_new 
3 lw_unwire_new 
4 vm_map_pageable 
5 cfs_condio_issue_io 
6 cfs_blkmap_directio 
7 cfs_condio_rw 
8 cfs_read 
9 vn_read 
10 rwuio 
11 read 
12 syscall 
13 _Xsyscall 

Corrections for a boot failure on systems with sparsely populated cpus. 
Fixes for a Kernel Memory Fault panic in _OtsMove() called from vaious I/O
or filesystem routines. The following are typical stack traces: 
Kernel Memory Fault 

4 panic 
5 trap 
6 _XentMM 
7 _OtsMove 
8 bs_refpg_direct 
9 fs_read_direct 
10 fs_read 
11 msfs_read 
12 vn_pread 
13 msfs_strategy 
14 aio_rw 
15 syscall 
16 _Xsyscall 

Kernel Memory Fault

0 stop_secondary_cpu 
1 panic 
2 event_timeout 
3 printf 
4 panic 
5 trap 
6 _XentMM 
7 _OtsZero 
8 cfs_condio_issue_io 
9 cfs_blkmap_directio 
10 cfs_condio_rw 
11 cfs_read 
12 vn_read 
13 rwuio 
14 read 
15 syscall 
16 _Xsyscall 

Fixes for vl_unwire panic when gh_chunks are in use. 
Fixes the panic: 'vm_pg_free: page wired' when bigpages are enabled. 
Fixes bug in wiring code and light weight wirings. 
Fixes lock management issues within the UBC that can lead to "mcs_unlock:
current lock not found" and "mcs_lock: time limit exceeded" panics. 
Fixes a "kernel memory fault" panic when the vm tunable 'anon_rss_enforce'
is set to the hard limit (2). 
Fixes the panic in pmap_pagemove(), when getblk() is invoked under certain
situation. 
Fixes a race condition in the ubc bigpage allocation routine. 

The Patch Kit Installation Instructions and the Patch Summary and Release
Notes documents provide patch kit installation and removal instructions
and a summary of each patch. Please read these documents prior to 
installing patches on your system.

The patches in this ERP kit will also be available in the next mainstream
patch kit - HP Tru64 UNIX V5.1B-4.


INSTALLATION NOTES:

1) Install this kit with the dupatch utility that is included in the patch
   kit. You may need to baseline your system if you have manually changed
   system files on your system. The dupatch utility provides the baselining
   capability.

2) The patch in this ERP kit does not have any file intersections with any
   other ERP available at this time for this product version.

3) This ERP kit will NOT install over any Customer Specific Patches (CSPs)
   which have file intersections with this ERP kit. Contact your normal
   Service Provider for assistance if the installation of this ERP kit is
   blocked by any of your installed CSPs.


INSTALLATION PREREQUISITES:

You must have installed HP Tru64 UNIX V5.1B-3 PK5 (BL26) prior to
installing this Early Release Patch Kit.


SUPERSEDE INFORMATION:

None

KNOWN PROBLEMS WITH THE PATCH KIT:

None.


RELEASE NOTES FOR T64KIT1000910-V51BB26-E-20060928:


			 Release Notes

     This document summarizes the contents and special instructions for the 
     Tru64 UNIX V5.1B patches contained in this kit.

     For information about installing or removing patches, baselining, 
     and general patch management, see the Patch Kit Installation 
     Instructions document. 

1 Release Notes


This Early Release Patch Kit Distribution contains:

   - fixes that resolve the problem(s) reported in: 
        o 19082 19182 19283 19290 19399 19411 19415 19435 19491 19580 
             * for Tru64 UNIX V5.1B T64V51BB26AS0005-20050502.tar (BL26)

	This kit includes a patch which requires system reboot.

 The patches in this kit are being released early for general customer use.
 Refer to the Release Notes for a summary of each patch and installation 
 prerequisites.

 Patches in this kit are installed by running dupatch from the directory 
 in which the kit was untarred. For example, as root on the target system:

	> mkdir -p /tmp/CSPkit1
	> cd /tmp/CSPkit1
	> <copy the kit to /tmp/CSPkit1>
	> tar -xpvf DUV40D13-C0044900-1285-20000328.tar
	> cd patch_kit
	> ./dupatch

2 Special Instructions

SPECIAL INSTRUCTIONS for Tru64 UNIX V5.1B Patch C1884.00
In the V5.1B-3 release, performance enhancements for NUMA class systems have
been provided through two new tuning options. 

The vm_overflow feature changes how large memory applications "borrow" memory
from other resource affinity domains (RADs) when their local memory resources
are exhausted. This can be used on all NUMA class systems. 

For the ES47, ES80, and GS1280 systems the previously hidden cpus_in_rad
variable can now be set to more than two CPUs; this allows pooling together the
CPU, memory, and I/O resources of a set of CPUs and treating it as a single
resource affinity domain (RAD). Like vm_overflow, this can allow large memory
applications to have access to more memory that is managed as if it were all
physically "local". 

generic: cpus_in_rad

    With the default value of cpus_in_rad (zero) every cpu is in its own
    resource affinity domain. This is equivalent to setting the value to 1. 
    When the value of cpus_in_rad is set larger than 1, certain configuration
    restrictions must be considered:

    1) Values for the cpus_in_rad tunable must be a power of two. 

    2) Take caution setting cpus_in_rad to 64 on a 64 processor system. When
       the number of RADs is decreased by increasing the value of cpus_in_rad,
       the number of per-RAD locks needed to manage resources also decreases.
       This may result in increased lock contention and may result in poor
       performance or system panics. The system automatically adjusts the 
       generic: locktimeout tunable if cpus_in_rad is set to 64 on a 64 
       processor system, but depending on system load it may need to be 
       manually increased to avoid locktimeout panics. The maximum value 
       for locktimeout is 60 seconds. If the value needs to be increased, do
       so in 5 second increments. If the maximum value is reached and the 
       system is unstable, reduce the value of cpus_in_rad and escalate the 
       problem through your support channels.

    3) "Missing" cpus are included in the count of cpus in a rad.

       Consider a system configured with cpus 0,1,4,5,8,9,12,13

       Setting cpus_in_rad to 2 on this system would result in the following
       resource affinity domain configuration:

        RAD[0] - cpus 0, 1 
        RAD[2] - cpus 4, 5 
        RAD[4] - cpus 8, 9 
        RAD[6] - cpus 12, 13 

       Setting cpus_in_rad to 4 on this system would result in the following
       resource affinity domain configuration:

        RAD[0] - cpus 0, 1 (2 and 3 are missing)
        RAD[1] - cpus 4, 5 (6 and 7 are missing)
        RAD[2] - cpus 8, 9 (10 and 11 are missing)
        RAD[3] - cpus 12, 13 

Interaction of cpus_in_rad and the rad_gh_regions tunables:

    When cpus_in_rad is increased, the number or RADs configured decreases. 
    If the system is configured with settings for rad_gh_regions, those
    settings must also be changed.

    Consider a system configured with cpus 0,1,4,5,8,9,12,13 and rad_gh_regions 


    configured to allocate 4 Gigabytes of granularity hint memory. With the 
    default setting of cpus_in_rad (zero) or cpus_in_rad set to 1, 
    rad_gh_regions would have the following settings:

        rad_gh_regions[0] = 512 
        rad_gh_regions[1] = 512 
        rad_gh_regions[4] = 512 
        rad_gh_regions[5] = 512 
        rad_gh_regions[8] = 512 
        rad_gh_regions[9] = 512 
        rad_gh_regions[12] = 512 
        rad_gh_regions[13] = 512 

    If the system is configured to place 2 cpus in a rad (cpu_in_rad=2), 
    the rad_gh_regions settings would need to be changed to the following:

        rad_gh_regions[0] = 1024 
        rad_gh_regions[2] = 1024 
        rad_gh_regions[4] = 1024 
        rad_gh_regions[6] = 1024 

    Because of how missing cpus are handled, if cpus_in_rad is set to 4, the
    RADs would still contain only 2 cpus (2 existing, 2 missing) but the
    rad numbers change, so rad_gh_regions would have the following settings:

        rad_gh_regions[0] = 1024 
        rad_gh_regions[1] = 1024 
        rad_gh_regions[2] = 1024 
        rad_gh_regions[3] = 1024 

vm: vm_overflow

    When memory resources are depleted on a RAD in a NUMA system, the vm
    subsystem will automatically overflow to another RAD to fulfill the 
    memory allocation request. The default overflow behavior is to:

        attempt an allocation from the "local" RAD
        if that fails, page out a page of memory and "steal" it
        if that fails, attempt an allocation from the "next" RAD
        if that fails, page out a page on that RAD and "steal" it.
        This continues until allocation/stealing has been attempted on all
        RADs.

    Setting the vm_overflow tunable to 1 changes the order of page allocations
    and page stealing:

        attempt an allocation from the "local" RAD
        if that fails, attempt an allocation from the "next" RAD
        This continues until allocation has been attempted on all RADs.
        If the memory allocation is still not successful revert back to
        the original behavior stated above.

    Using a setting of 1 may result in less paging activity for some
    applications and improve performance.



3 Summary of CSPatches contained in this kit


Tru64 UNIX V5.1B

PatchId			Summary Of Fix
----------------------------------------
C1884.00			Fixes to cpu_in_rad, gh_chunks and  UBC


4 Additional information from Engineering


None


5 Affected system files
This patch delivers the following files:

Tru64 UNIX V5.1B
	Patch C1884.00
		./sys/BINARY/arch_alphapmap.mod
			CHECKSUM:	52033 351
			SUBSET:	OSFHWBIN540
		./sys/BINARY/generic.mod
			CHECKSUM:	50039 12
			SUBSET:	OSFBIN540
		./sys/BINARY/marvel_cpu.mod
			CHECKSUM:	28870 147
			SUBSET:	OSFHWBIN540
		./sys/BINARY/marvel_soc.mod
			CHECKSUM:	10168 237
			SUBSET:	OSFHWBIN540
		./sys/BINARY/vfs.mod
			CHECKSUM:	51074 656
			SUBSET:	OSFBIN540
		./sys/BINARY/vm.mod
			CHECKSUM:	32272 674
			SUBSET:	OSFBIN540
		./usr/sys/BINARY/alpha_init.o
			CHECKSUM:	47020 142
			SUBSET:	OSFHWBIN540
		./usr/sys/BINARY/pmap_init.o
			CHECKSUM:	63788 145
			SUBSET:	OSFBIN540


[R] UNIX is a registered trademark in the United States and other countries 
licensed exclusively through X/Open Company Limited.

Copyright Hewlett-Packard Company 2006.  All Rights reserved.

  This software is proprietary to and embodies the confidential technology
  of Hewlett-Packard Company.  Possession, use, or copying of this
  software and media is authorized only pursuant to a valid written license
  from Hewlett-Packard or an authorized sublicensor.

       This ECO has not been through an exhaustive field test process.
       Due to the experimental stage of this ECO/workaround, Hewlett-Packard
       makes no representations regarding its use or performance. The
       customer shall have the sole responsibility for adequate protection
       and back-up data used in conjunction with this ECO/workaround.