PROBLEM: (#80500, #80226) (PATCH ID: OSF445-080) ******** Fix for incorrect header length in GS-series Hierarchical Switch machine check frames. Early revisions of PCA chips don't allow DMA Window3. PROBLEM: (81882) (PATCH ID: OSF445-129) ******** Under a specific set of unlikely circumstances it is possible for revision 4 PCA hardware to falsely report PCI hung bus errors, which will cause a uncorrectable hardware machine check and operating system panic. The following console error messages indicate this problem has occurred. The "PCI bus hung during Fault" text in the error message is the key to detecting this error. Note: the false PCI bus hung error only occurs with revision 4 PCA hardware. iop: ioa_err_sum = 0x100000000 Bit 32: Hose 0 PCA reported an Uncorrectable Error pca: pca_whatami = 0x8000034 Bit 2: ASIC revision Bit 4: Backplane revision Bit 5: Backplane revision Bit 27: Microprocessor present pca: pca_err_sum = 0x8000 Bit 15: PCI bus hung during Fault pca: ne_whatami = 0x100000302 pca: fe_whatami = 0x100000202 pca: pci0_err_sum = 0x10068000 Bit 15: PCI bus hung during Fault Bit 17: Failing PCI command Bit 18: Failing PCI command Bit 28: PCA was PCI initiator when bus was hung pca: pci0_err_addr = 0x2030000 pca: pci1_err_sum = 0x4060000 Bit 17: Failing PCI command Bit 18: Failing PCI command Bit 26: PCI slot that was active during the failure pca: pci1_err_addr = 0xc0011500 Machine Check SYSTEM Fatal Abort Machine check code = 0x100000202 Ibox Status = 0000000000000000 Dcache Status = 0000000000000000 Cbox Address = 0000000000000000 Fill Syndrome 1 = 0000000000000000 Fill Syndrome 0 = 0000000000000000 Cbox Status = 0000000000000000 EV6 captured status of Bcache mode = 0000000000000000 EV6 Exception Address = fffffc000097aff0 EV6 Interrupt Enablement and Current Processor mode = 0000007ee0000000 EV6 Interrupt Summary Register = 0000002000000000 EV6 TBmiss or Fault status = 0000000000000000 EV6 PAL Base Address = 0000000000020000 EV6 Ibox control = fffffffc0c306396 EV6 Ibox Process_context = 0000718000000004 Cpu Fault Summary = 0x1 QBB bit-directed reg. dump QBB 0 CSRs to be logged summary = 0x1100 QBB 1 CSRs to be logged summary = 0x0 QBB 2 CSRs to be logged summary = 0x0 QBB 3 CSRs to be logged summary = 0x0 QBB 4 CSRs to be logged summary = 0x0 QBB 5 CSRs to be logged summary = 0x0 QBB 6 CSRs to be logged summary = 0x0 QBB 7 CSRs to be logged summary = 0x0 panic (cpu 0): System Uncorrectable Machine Check PROBLEM: (81582) (PATCH ID: OSF445-127) ******** When an Alphaserver GS80, GS160, or GS320 system reaches a critical temperature, it will shut itself down as a precaution. This temperature is safe for the power supply's but the system will issue an enviromental warning and shut itself down. The fix to this problem is to no longer take into account the power supply's tempature when determining the system's tempature. PROBLEM: (83129, 84172) (PATCH ID: OSF445-206) ******** On wildfires with PCI adapters of mixed revision, the PCA registers are set up incorrectly. This leads to kernel memory faults of variable characteristics. While this is not a configuration which is sold, it may exist in the field, and will certainly exist in-house. PCA and their revisions may be identified by their model numbers: PCA4 (B4171-AC) PCA3 (B4171-AA E02 or B4171-AB - doesn't matter) PROBLEM: (90551) (PATCH ID: OSF445-346) ******** Patch adds ECC information to error log. PROBLEM: (91926) (PATCH ID: OSF445-475) ******** This patch fixes a bit masking error in the kernel code such that correctable error reporting will get turned back on for any CPU, not just CPU 0, when the time period to throttle the correctable errors has expired. Throttling is a mechanism in the kernel to turn off the reporting of correctable errors for a period of time, if a lot of correctable errors are getting reported within a short time frame. This prevents the error logs from getting filled up with too many correctable error messages. The problem is that once correctable errors are throttled for any CPU, except CPU 0, the reporting of the correctable errors will stay turned off indefinitely, rather than getting turned back on after the throttling time period has expired.