AlphaServer SC patch kit: ========================== AlphaServer SC Version : SC V2.6 UK1 Kit Name: SCV26UK13209964871 Release Date: 07/04/2005 QuIX: QXCM1000220347/QXCM1000225914 WFM Number: 1206623459/3209964871 Abstract: scfs: Use REMOTEREAD to avoid elan driver prefetch problem Description of Patch: ===================== The elan driver has a built-in prefetch algorithm that tries to predict what pages a remote process will read next and speculatively prepares these pages for transfer. This prefetcher has problems with DMAs that go right up to a 256k elan page boundary if the next page has the permission "local_read" (which is the most secure and means "can only be read by local processes"). The prefetcher can attempt to read ahead into a protected page where it will get an "access denied" error. This error causes the prefetcher to place an entry in the DMA retry queue that is very difficult for the ep_dma_retry thread to retransmit if the destination node is very busy - there is a race condition that the sending node is likely to lose over and over again for an extended period (maybe hours). SCFS uses local_read pages for all data that it will only read locally. This is the correct thing for SCFS to do, but the prefetcher will have problems if it happens to read ahead into one of these pages. Quadrics believe that this is exactly what is happening in the "scfs_buf_wait" stalls being experienced by several large customers. The larger and busier the system the more likely this is to happen and the longer it will last. The most elegant solution would be to rewrite this part of the elan prefetcher, however this is a very complex task. Instead we are opting for a simpler and more reliable solution - we can avoid the problem completely by having SCFS use a lower protection level for its pages (remote_read) that allows the prefetcher to read them. That's exactly what this patch does. Note: In order for this workaround to operate reliably, we must be sure that all DMAs are correctly aligned on 256k boundaries. This is why the updated ep3.mod elan driver SCV26UK11206623459C is a pre-requisite for this patch. Kit location: ============= The patch kit is SCV26UK13209964871.tar.gz and it is available from ITRC. Prerequisites: ============== Before installing this Patch kit, you should ensure the following: 1) You have all mandatory patches for this release installed 2) You have SCV26UK11206623459C installed. You can find this on ITRC Kit checksum: ============= # cksum SCV26UK13209964871.tar.gz 3984345051 702080 SCV26UK13209964871.tar.gz Updated files: ============== A list of the files included in this patch is given below along with the cksum values for each file: 751783857 559290 /usr/opt/SCFS/sys/scfs.mod 1554028406 661224 /usr/opt/SCFS/sys/scfs_client.mod Instructions: ============= This patch is provided as a setld installable kit. Unpack it into a directory that is NFS mounted on all domains e.g. /nfs/SCV26UK13209964871 and install it as follows: Patch required on Management Server (if used) : NO Patch required on Domains : YES On Domains: ----------- Install the patch as follows: # cd /nfs/SCV26UK13209964871 # scrun -d all setld -l /nfs/SCV26UK13209964871 SCV26UK13209964871 # # scrun -d all BuildKernels # scrun -d all DeployKernels # sra shutdown -domains all # sra boot -domains all Installation is complete at this point. To remove the patch use the following commands: # scrun -d all setld -d SCV26UK13209964871 # # scrun -d all BuildKernels # scrun -d all DeployKernels # sra shutdown -domains all # sra boot -domains all Once the patch is installed and you are satisfied that it is working correctly the GENERIC kernels should also be updated, using the following command: # scrun -d all DeployKernels -g =======================================================================