TITLE: HP Tru64 UNIX - v 5.1B-3 Multiple Virtual Memory Fixes Copyright (c) Hewlett-Packard Company 2006. All rights reserved. PRODUCT: HP Tru64 UNIX [R] V5.1B-3 SOURCE: Hewlett-Packard Company ECO INFORMATION: ECO Name: T64KIT1000910-V51BB26-E-20060928 ECO Kit Approximate Size: 4.39MB Kit Applies To: HP Tru64 UNIX V5.1B-3 PK5 (BL26) ECO Kit CHECKSUMS: /usr/bin/sum results: 28933 4500 /usr/bin/cksum results: 2875295415 4608000 MD5 results: 02b397f6c90fde5305b2387f121b226c SHA1 results: f425f8fc206bcc844bda9cef055763e89948cc75 ECO KIT SUMMARY: A dupatch-based, Early Release Patch kit exists for HP Tru64 UNIX V5.1B-3 that contains solutions to the following problem(s): DESCRIPTION This Early Release Patch (ERP) kit provides several virtual memory-related fixes. Specifically, the ERP provides the following: Enhancements to the vm_overflow tunable. This tunable may be enabled to allow NUMA systems (GS80, GS160, GS320, GS1280, ES47, ES80) to more easily allocate memory from other RADS (Resource Affinity Domains). This is mostly a benefit for single applications that use large amounts of memory. For other applications, the added NUMA latency may actually degrade system performance. To enable, set vm_overflow to 1. To disable, set to 0. Changes to the way page migrations occur on NUMA systems to address poor system performance due to excessive paging. Corrections that address an incompatibility between cpus_in_rad and gh_chunks/rad_gh_regions tunables that can result in the following boot failures: vm_mad_init[0]: unable to allocate vm_page array for region 0 vm_mad_init[0]: unable to allocate vm_page array for region 1 pmap_update_send: missing ack from cpu trap: invalid memory read access from kernel mode Correction to a problem where a thread could be left waiting on the original RAD when a page table page allocation requires an overflow to another RAD. This problem presents itself as an increased elapsed time for fork operations to complete. Changes that increase the maximum value of cpus_in_rad to 64 and unhide this tunable. Correction for "ubc_wire: hash failed" panics on non-NUMA systems. The following is a typical stack trace of this panic: 1 panic 2 ubc_wire 3 u_vp_oop_pagecontrol 4 u_anon_update_pmap 5 u_anon_fault_backed 6 u_anon_fault 7 u_anon_lockop 8 u_map_lockvas 9 plock 10 syscall 11 _Xsyscall Corrections for the "not wired" panic with system V shared memory and bigpages. This panic can occur if callers of the shmat syscall provide addresses with different page alignments when attaching the same shared memory region. A typical stack trace follows: 1 panic 2 pmap_lw_unwire_new 3 lw_unwire_new 4 vm_map_pageable 5 cfs_condio_issue_io 6 cfs_blkmap_directio 7 cfs_condio_rw 8 cfs_read 9 vn_read 10 rwuio 11 read 12 syscall 13 _Xsyscall Corrections for a boot failure on systems with sparsely populated cpus. Fixes for a Kernel Memory Fault panic in _OtsMove() called from vaious I/O or filesystem routines. The following are typical stack traces: Kernel Memory Fault 4 panic 5 trap 6 _XentMM 7 _OtsMove 8 bs_refpg_direct 9 fs_read_direct 10 fs_read 11 msfs_read 12 vn_pread 13 msfs_strategy 14 aio_rw 15 syscall 16 _Xsyscall Kernel Memory Fault 0 stop_secondary_cpu 1 panic 2 event_timeout 3 printf 4 panic 5 trap 6 _XentMM 7 _OtsZero 8 cfs_condio_issue_io 9 cfs_blkmap_directio 10 cfs_condio_rw 11 cfs_read 12 vn_read 13 rwuio 14 read 15 syscall 16 _Xsyscall Fixes for vl_unwire panic when gh_chunks are in use. Fixes the panic: 'vm_pg_free: page wired' when bigpages are enabled. Fixes bug in wiring code and light weight wirings. Fixes lock management issues within the UBC that can lead to "mcs_unlock: current lock not found" and "mcs_lock: time limit exceeded" panics. Fixes a "kernel memory fault" panic when the vm tunable 'anon_rss_enforce' is set to the hard limit (2). Fixes the panic in pmap_pagemove(), when getblk() is invoked under certain situation. Fixes a race condition in the ubc bigpage allocation routine. The Patch Kit Installation Instructions and the Patch Summary and Release Notes documents provide patch kit installation and removal instructions and a summary of each patch. Please read these documents prior to installing patches on your system. The patches in this ERP kit will also be available in the next mainstream patch kit - HP Tru64 UNIX V5.1B-4. INSTALLATION NOTES: 1) Install this kit with the dupatch utility that is included in the patch kit. You may need to baseline your system if you have manually changed system files on your system. The dupatch utility provides the baselining capability. 2) The patch in this ERP kit does not have any file intersections with any other ERP available at this time for this product version. 3) This ERP kit will NOT install over any Customer Specific Patches (CSPs) which have file intersections with this ERP kit. Contact your normal Service Provider for assistance if the installation of this ERP kit is blocked by any of your installed CSPs. INSTALLATION PREREQUISITES: You must have installed HP Tru64 UNIX V5.1B-3 PK5 (BL26) prior to installing this Early Release Patch Kit. SUPERSEDE INFORMATION: None KNOWN PROBLEMS WITH THE PATCH KIT: None. RELEASE NOTES FOR T64KIT1000910-V51BB26-E-20060928: Release Notes This document summarizes the contents and special instructions for the Tru64 UNIX V5.1B patches contained in this kit. For information about installing or removing patches, baselining, and general patch management, see the Patch Kit Installation Instructions document. 1 Release Notes This Early Release Patch Kit Distribution contains: - fixes that resolve the problem(s) reported in: o 19082 19182 19283 19290 19399 19411 19415 19435 19491 19580 * for Tru64 UNIX V5.1B T64V51BB26AS0005-20050502.tar (BL26) This kit includes a patch which requires system reboot. The patches in this kit are being released early for general customer use. Refer to the Release Notes for a summary of each patch and installation prerequisites. Patches in this kit are installed by running dupatch from the directory in which the kit was untarred. For example, as root on the target system: > mkdir -p /tmp/CSPkit1 > cd /tmp/CSPkit1 > > tar -xpvf DUV40D13-C0044900-1285-20000328.tar > cd patch_kit > ./dupatch 2 Special Instructions SPECIAL INSTRUCTIONS for Tru64 UNIX V5.1B Patch C1884.00 In the V5.1B-3 release, performance enhancements for NUMA class systems have been provided through two new tuning options. The vm_overflow feature changes how large memory applications "borrow" memory from other resource affinity domains (RADs) when their local memory resources are exhausted. This can be used on all NUMA class systems. For the ES47, ES80, and GS1280 systems the previously hidden cpus_in_rad variable can now be set to more than two CPUs; this allows pooling together the CPU, memory, and I/O resources of a set of CPUs and treating it as a single resource affinity domain (RAD). Like vm_overflow, this can allow large memory applications to have access to more memory that is managed as if it were all physically "local". generic: cpus_in_rad With the default value of cpus_in_rad (zero) every cpu is in its own resource affinity domain. This is equivalent to setting the value to 1. When the value of cpus_in_rad is set larger than 1, certain configuration restrictions must be considered: 1) Values for the cpus_in_rad tunable must be a power of two. 2) Take caution setting cpus_in_rad to 64 on a 64 processor system. When the number of RADs is decreased by increasing the value of cpus_in_rad, the number of per-RAD locks needed to manage resources also decreases. This may result in increased lock contention and may result in poor performance or system panics. The system automatically adjusts the generic: locktimeout tunable if cpus_in_rad is set to 64 on a 64 processor system, but depending on system load it may need to be manually increased to avoid locktimeout panics. The maximum value for locktimeout is 60 seconds. If the value needs to be increased, do so in 5 second increments. If the maximum value is reached and the system is unstable, reduce the value of cpus_in_rad and escalate the problem through your support channels. 3) "Missing" cpus are included in the count of cpus in a rad. Consider a system configured with cpus 0,1,4,5,8,9,12,13 Setting cpus_in_rad to 2 on this system would result in the following resource affinity domain configuration: RAD[0] - cpus 0, 1 RAD[2] - cpus 4, 5 RAD[4] - cpus 8, 9 RAD[6] - cpus 12, 13 Setting cpus_in_rad to 4 on this system would result in the following resource affinity domain configuration: RAD[0] - cpus 0, 1 (2 and 3 are missing) RAD[1] - cpus 4, 5 (6 and 7 are missing) RAD[2] - cpus 8, 9 (10 and 11 are missing) RAD[3] - cpus 12, 13 Interaction of cpus_in_rad and the rad_gh_regions tunables: When cpus_in_rad is increased, the number or RADs configured decreases. If the system is configured with settings for rad_gh_regions, those settings must also be changed. Consider a system configured with cpus 0,1,4,5,8,9,12,13 and rad_gh_regions configured to allocate 4 Gigabytes of granularity hint memory. With the default setting of cpus_in_rad (zero) or cpus_in_rad set to 1, rad_gh_regions would have the following settings: rad_gh_regions[0] = 512 rad_gh_regions[1] = 512 rad_gh_regions[4] = 512 rad_gh_regions[5] = 512 rad_gh_regions[8] = 512 rad_gh_regions[9] = 512 rad_gh_regions[12] = 512 rad_gh_regions[13] = 512 If the system is configured to place 2 cpus in a rad (cpu_in_rad=2), the rad_gh_regions settings would need to be changed to the following: rad_gh_regions[0] = 1024 rad_gh_regions[2] = 1024 rad_gh_regions[4] = 1024 rad_gh_regions[6] = 1024 Because of how missing cpus are handled, if cpus_in_rad is set to 4, the RADs would still contain only 2 cpus (2 existing, 2 missing) but the rad numbers change, so rad_gh_regions would have the following settings: rad_gh_regions[0] = 1024 rad_gh_regions[1] = 1024 rad_gh_regions[2] = 1024 rad_gh_regions[3] = 1024 vm: vm_overflow When memory resources are depleted on a RAD in a NUMA system, the vm subsystem will automatically overflow to another RAD to fulfill the memory allocation request. The default overflow behavior is to: attempt an allocation from the "local" RAD if that fails, page out a page of memory and "steal" it if that fails, attempt an allocation from the "next" RAD if that fails, page out a page on that RAD and "steal" it. This continues until allocation/stealing has been attempted on all RADs. Setting the vm_overflow tunable to 1 changes the order of page allocations and page stealing: attempt an allocation from the "local" RAD if that fails, attempt an allocation from the "next" RAD This continues until allocation has been attempted on all RADs. If the memory allocation is still not successful revert back to the original behavior stated above. Using a setting of 1 may result in less paging activity for some applications and improve performance. 3 Summary of CSPatches contained in this kit Tru64 UNIX V5.1B PatchId Summary Of Fix ---------------------------------------- C1884.00 Fixes to cpu_in_rad, gh_chunks and UBC 4 Additional information from Engineering None 5 Affected system files This patch delivers the following files: Tru64 UNIX V5.1B Patch C1884.00 ./sys/BINARY/arch_alphapmap.mod CHECKSUM: 52033 351 SUBSET: OSFHWBIN540 ./sys/BINARY/generic.mod CHECKSUM: 50039 12 SUBSET: OSFBIN540 ./sys/BINARY/marvel_cpu.mod CHECKSUM: 28870 147 SUBSET: OSFHWBIN540 ./sys/BINARY/marvel_soc.mod CHECKSUM: 10168 237 SUBSET: OSFHWBIN540 ./sys/BINARY/vfs.mod CHECKSUM: 51074 656 SUBSET: OSFBIN540 ./sys/BINARY/vm.mod CHECKSUM: 32272 674 SUBSET: OSFBIN540 ./usr/sys/BINARY/alpha_init.o CHECKSUM: 47020 142 SUBSET: OSFHWBIN540 ./usr/sys/BINARY/pmap_init.o CHECKSUM: 63788 145 SUBSET: OSFBIN540 [R] UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited. Copyright Hewlett-Packard Company 2006. All Rights reserved. This software is proprietary to and embodies the confidential technology of Hewlett-Packard Company. Possession, use, or copying of this software and media is authorized only pursuant to a valid written license from Hewlett-Packard or an authorized sublicensor. This ECO has not been through an exhaustive field test process. Due to the experimental stage of this ECO/workaround, Hewlett-Packard makes no representations regarding its use or performance. The customer shall have the sole responsibility for adequate protection and back-up data used in conjunction with this ECO/workaround.