AlphaServer SC patch kit: ========================== AlphaServer SC 2.5 UK1 Kit Name: SCV25UK199935 Release Date:042903 PTR: 153-2-1075 IPMT Number: CFS.99935 Abstract: Support for 'Partition Blocked Timeout', more frequent polling of housekeeper when partition is blocked and correction to CPU usage counts after frequent resource suspend/resume. Description of Patch: ===================== This kit installs a pmanager which supports the new 'Partition Blocked Timeout' behaviour. This defines the time (default 15 minutes) pmanager will wait when the partition is blocked before it will poll each rmsd in turn. Any nodes that do not respond will be configured out with the comment 'no response' in the pmanager logfile. This process will continue until a working set of nodes is reached. This timeout value can be controlled by the partition-blocked-timeout attribute. Its value must be at least twice the inactivity timeout. # rmsquery -v "select * from attributes where name = 'partition-blocked-timeout'" name val -------------------------------- partition-blocked-timeout 900 This pmanager also polls the housekeeper more frequently when the partition is blocked incrementing to the default polling interval. Finally, it fixes a bug relating to the incorrect caculation of CPU usage counts after repeated resource suspend/resume. Kit checksum: ============= bash-2.02$ cksum SCV25UK199935.tar.gz 2820657694 595646 SCV25UK199935.tar.gz Updated files: ============== /usr/opt/rms/bin/pmanager /usr/bin/pmanager Dependencies: ============= This patch should be installed over the RMS kit shipped with UK1. Instructions: ============= This patch is provided as a setld installable kit. Unpack it into a directory that is NFS mounted on all domains e.g. /usr/kits/ and install it as follows: 1. Stop Partitions, eg # rcontrol stop partition=parallel 2. Stop RMS on all nodes eg: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms stop" 3. Stop RMS and msql on Management Server # /sbin/init.d/rms stop # /sbin/init.d/msqld stop 4. Install on Management Server: # /usr/sbin/setld -l SCV25UK199935 5. Start RMS and msql on Management Server: # /sbin/init.d/msqld start # /sbin/init.d/rms start 6. Install across all domains, eg: # sra command -domains all -m 1 -command "/usr/sbin/setld -l SCV25UK199935" 7. Start RMS on all nodes eg: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms start" 8. Create the partition-blocked-timeout attribute with a value of 900: # rcontrol create attribute = partition-blocked-timeout val=900 9. Restart Parallel partition # rcontrol start partition=parallel -------- To remove the patch use the following steps: 1. Stop Partitions, eg # rcontrol stop partition=parallel 2. Remove the partition-blocked-timeout attribute: # rcontrol remove attribute = partition-blocked-timeout 3. Stop RMS on all nodes eg: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms stop" 4. Delete across all domains, eg: # sra command -domains all -m 1 -command "/usr/sbin/setld -d SCV25UK199935" 5. Stop RMS and msql on Management Server: # /sbin/init.d/rms stop # /sbin/init.d/msqld stop 6. Delete from Management Server: # /usr/sbin/setld -d SCV25UK199935 7. Start RMS and msql on Management Server # /sbin/init.d/msqld start # /sbin/init.d/rms start 8. Start RMS on all nodes eg: # sra command -domains all -m 1 -command "CluCmd /sbin/init.d/rms start" 9. Restart Parallel partition # rcontrol start partition=parallel