AlphaServer SC patch kit: ========================== AlphaServer SC Version : SC V2.6 SSB Kit Name: SCV26SSB108023 Release Date: 07/15/2004 PTR: 153-2-1446 IPMT Number: CFS.108023, CFS.107975, CFS.107489, CFS.105805, CFS.107549 Abstract: Various libshmem and libelan fixes Description of Patch: ===================== This kit fixes various problems/cases as follows: 1) CFS.108023: Lack of performance symmetry with touchbuf code in shmalloc() This performance issue affects shmem communications only, and does not affect MPI communications. Processes on the first node of an allocation touched all of the shmalloc'ed region, thereby loading both the elan page translations and the intranode shmem page translations for the entire shmalloc'ed region. However processes on the other nodes failed to touch the shmalloc'ed region, thereby deferring the loading of the intranode shmem page translations to such a time that the shmalloc'ed region was subsequently used by those processes. With this patch, then by default at shmalloc() time, all processes will load just the elan page translations and the processes will not load the intranode shmem page translations for the shmalloc'ed region. The default behaviour can be changed using environment var SHMALLOC_TOUCHBUF 0x0 defer both elan and intranode translations until memory is actually used by the job 0x1 during shmem_init - do just elan page translations 0x2 during shmalloc - do just elan page translations 0x5 during shmem_init - do both elan and intranode page translations 0x6 during shmalloc - do both elan and intranode page translations When SHMALLOC_TOUCHBUF env var is undefined, a value of 0x2 is assumed (i.e. default) 2) CFS.107975: Dual-rail shmem put issue Resolves outgoing cache performance related issues in libelan. This patch supersedes patch kit SCV26SSB105460. 3) CFS.107489: Intranode shmem memory corruption This patch fixes a memory corruption problem encountered when calling shmalloc during intra node shmem. 4) CFS.105805: An MPI benchmark caused a SEGV on dual rail Problem with unmatched messages if the recv buffer is bigger than the send buffer. 5) CFS.107549: An MPI benchmark hung at n=49 unless MPI_USE_LIBELAN_SUB=0 Subtle problem with sub groups in libelan library where nodes in the middle of an allocation used the wrong libelan virtual process (ie. the libelan process corresponding to the MPI rank) to do the hardware broadcast. This problem would only manifest itself when the first process of an allocation was not included within the MPI subgroup. Kit location: ============= The patch kit is SCV26SSB108023.tar.gz and it is available from ftp://ftp.ilo.cpqcorp.net/pub/sierra/patches/V2.6_SSB/108023 Prerequisites: ============== Before installing this Patch kit, you should ensure the following: 1) You have all mandatory patches for this release installed 2) You have the following specific patches installed: SCV26105882 Kit checksum: ============= # cksum SCV26SSB108023.tar.gz 2039796239 4441093 SCV26SSB108023.tar.gz Updated files: ============== A list of the files included in this patch is given below along with the cksum values for each file. 924413096 1340108 /usr/opt/rms/lib/libshmem.a 273816187 591984 /usr/opt/rms/lib/libshmem.so 301138563 6070 /usr/opt/rms/lib/libelan_thread.a 2823930912 4012868 /usr/opt/rms/lib/libelan.a 2621977283 969376 /usr/opt/rms/lib/libelan.so 206928481 1350986 /usr/opt/rms/lib/dbg/libshmem.a 647735006 1088848 /usr/opt/rms/lib/dbg/libelan.so 83798522 600752 /usr/opt/rms/lib/dbg/libshmem.so 2703063183 6262 /usr/opt/rms/lib/dbg/libelan_thread.a 2392028 4215076 /usr/opt/rms/lib/dbg/libelan.a Instructions: ============= This patch is provided as a setld installable kit. Unpack it into a directory that is NFS mounted on all domains e.g. /nfs/SCV26SSB108023 and install it as follows: Patch required on Management Server (if used) : YES Patch required on Domains : YES On the Management Server (if Used): ----------------------------------- Install the patch as follows: # cd /nfs/SCV26SSB108023 # setld -l /nfs/SCV26SSB108023 SCV26SSB108023 Restart any applications which use the libelan and libshmem shared objects. Rebuild and restart any applications which were linked with the archive files. To remove the patch, use the following commands: # setld -d SCV26SSB108023 Restart any applications which use the libelan and libshmem shared objects. Rebuild and restart any applications which were linked with the archive files. On Domains: ----------- Install the patch as follows: # cd /nfs/SCV26SSB108023 # scrun -d all setld -l /nfs/SCV26SSB108023 SCV26SSB108023 Restart any applications which use the libelan and libshmem shared objects. Rebuild and restart any applications which were linked with the archive files. To remove the patch use the following commands: # scrun -d all setld -d SCV26SSB108023 Restart any applications which use the libelan and libshmem shared objects. Rebuild and restart any applications which were linked with the archive files. =======================================================================