AlphaServer SC patch kit: ========================== AlphaServer SC 2.6 SSB Kit Name: SCV26SSB107370 Release Date: 05/13/2004 PTR: 153-2-1424 IPMT Number: CFS.107370 Abstract: Jobs submitted via RMS api not cleaned up properly after node crash This patch does not contain any kernel mods so you do not have to build or deploy kernels or reboot any nodes to install it. It provides a new version of pmanager, which cannot be replaced while it is in use, so you will have to stop and start all RMS processes to install it. This procedure is outlined below. Description of Patch: ===================== 1. Fixes a problem in pmanager which could cause it to fail to close the connection to a third party job scheduler after deallocating a resource that was originally created through the RMS api. 2. Minor fix to a pmanager error message. One argument to printf was being omitted so the error text was always (null). For example: pmanager[parallel]: Warning: failed to deallocate resource 1104880768: (null) This did not cause pmanager any problems but made the error message meaningless. This kit also contains the following fix from SCV26SSB105292: 3. A change to pmanager to ensure that the core-file directories are created with the correct ownership. i.e. the second argument to rms_createResource() which specifies the user id on whose behalf the resource should be allocated. Kit location: ============= The patch kit is SCV26SSB107370.tar.gz and it is available from ITRC. Dependencies: ============= Before installing this Patch kit, you should ensure the following: 1) You have all mandatory patches for this release installed Kit checksum: ============= # cksum SCV26SSB107370.tar.gz 1352102142 622282 SCV26SSB107370.tar.gz Updated files: ============== A list of the files included in this patch is given below along with the cksum values for each file. 3575726990 2699376 /usr/opt/rms/bin/pmanager Instructions: ============= This patch is provided as a setld installable kit. Unpack it into a directory that is NFS mounted on all domains e.g. /usr/kits/ and install it as follows: 1. Stop Partitions, e.g. # rcontrol stop partition=parallel 2. Stop RMS on all nodes: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms stop" 3. Stop RMS and msql on Management Server # /sbin/init.d/rms stop # /sbin/init.d/msqld stop 4. Install on Management Server: # /usr/sbin/setld -l SCV26SSB107370 5. Start RMS and msql on Management Server: # /sbin/init.d/msqld start # /sbin/init.d/rms start 6. Install across all domains, eg: # sra command -domains all -m 1 -command "/usr/sbin/setld -l SCV26SSB107370" 7. Start RMS on all nodes eg: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms start" 8. Restart partitions e.g. # rcontrol start partition=parallel -------- To remove the patch use the following steps: 1. Stop Partitions e.g. # rcontrol stop partition=parallel 2. Stop RMS on all nodes: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms stop" 3. Delete across all domains: # sra command -domains all -m 1 -command "/usr/sbin/setld -d SCV26SSB107370" 4. Stop RMS and msql on Management Server: # /sbin/init.d/rms stop # /sbin/init.d/msqld stop 5. Delete from Management Server: # /usr/sbin/setld -d SCV26SSB107370 7. Start RMS and msql on Management Server # /sbin/init.d/msqld start # /sbin/init.d/rms start 8. Start RMS on all nodes: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms start" 9. Restart partitions e.g. # rcontrol start partition=parallel