Memory parity errors: Causes and suggestions (101272)



The information in this article applies to:

  • Microsoft Windows 2000 Server
  • Microsoft Windows 2000 Advanced Server
  • Microsoft Windows 2000 Professional
  • Microsoft Windows 2000 Datacenter Server
  • Microsoft Windows NT Server 3.1
  • Microsoft Windows NT Server 3.5
  • Microsoft Windows NT Server 3.51
  • Microsoft Windows NT Server 4.0
  • Microsoft Windows NT Workstation 3.1
  • Microsoft Windows NT Workstation 3.5
  • Microsoft Windows NT Workstation 3.51
  • Microsoft Windows NT Workstation 4.0
  • Microsoft Windows NT Advanced Server

This article was previously published under Q101272
Warning The information in this article includes suggestions regarding the examination and cleaning of hardware. If you do not have chip-maintenance experience, Microsoft recommends that you closely examine your hardware warrantee information to avoid invalidating any warrantee you may have and seek help from a trained hardware technician to avoid any damages to the hardware. ANY USE BY YOU OF THE INFORMATION PROVIDED IN THIS ARTICLE IS AT YOUR OWN RISK. Microsoft provides this information "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

SUMMARY

This article discusses the extensive study in determining the causes of some NMI memory parity errors in Windows with the aid of a high tech SIMM tester. The results are not conclusive, and the research into this is ongoing.

MORE INFORMATION

Both IBM OS/2 2.x and Windows seem to experience problems that appear to be associated with system memory in some circumstances. It can be frustrating to have a system that is able to run DOS, Windows 3.1, or OS/2 1.x and suddenly find it cannot run Windows due to this problem. The first issue to clear up is that not all NMI errors are due to memory. Other boards in the system can cause this problem, and components directly on the system motherboard can be at fault.

When memory is at fault, it is usually for the following reasons:
  • The memory is not functioning at the specified access rate as required by the system board. If the system specification calls for 80 ns access rate, Windows most likely fails if memory is accessing at a slower rate such as 90 ns. Even though the chips may be marked as 80 ns, in testing, some fail to meet this access rate. Quite often memory chips run at a slower speed when they reach operating temperature. This produces an effect called "speed drift." The symptoms are a system which runs Windows when first turned on; however, after 15 minutes or so, the system starts having memory errors. A high quality SIMM tester can cycle the chips through various voltage and heat cycles, so this is fairly easy to see.
  • The memory meets the system specifications, but the speeds are different between individual SIMM modules. The average access rate may be 70 ns on one SIMM module while the next is running at 60 ns. We have found SIMMs stamped at the factory to be rated at a 70 ns average access rate to actually be running as fast as 50 ns. Although the SIMMs are obviously well under the system required access specification, the difference of 10 ns or more between them can often cause problems on some systems. An interesting note here is that you can move these to a different system board which is using a different BIOS and chip set, and it may not have any memory problems. This is because each BIOS and chip set regulate the "refresh wait states" used for timing, and this difference often allows for variance in speed to be acceptable. If your system's BIOS allows you to adjust the "wait states" for memory refresh, this often will allow the system to run with SIMMs or DRAM memory chips which are running at different access rates. The downside to increasing the number of wait states is a slower system.
  • The individual chips on the SIMM module are running at different access rates. This requires a sensitive memory testing device to determine. It must be able to gauge the access rate of each individual bit (chip) on the module. A difference of 10 ns or more between bits has been known to cause problems. This once again can be regulated somewhat by the BIOS and chip set of the system board if it allows you to lengthen the refresh wait states for memory access.
  • One of the memory chips is being affected by "cell leakage." This ends up being a true parity error and is also known as a "soft error." This occurs when the change in the state of an individual cell (a zero or one) electrically leaks into a neighboring cell changing it's state. When the memory is read back, it no longer matches the parity bit's checksum value and an NMI is issued to the processor signaling a parity error has occurred. This memory SIMM must be replaced. If problems persist with replacement chips, there is quite possibly a voltage or heat anomaly occurring with the socket or circuitry which is damaging the chips.
  • Cache memory is another thing to suspect. We have seen instances where the Cache memory access rates were too slow and caused enormous problems. On most Intel-based 486 computers, a 15 ns to 25 ns is normal. You will most likely have problems if it is slower than 25 ns. The system manufacturer can provide the specifications and locations of these chips.
In general, you should first carefully clean the system of dust. This includes the areas allowing ventilation so that heat does not build up abnormally. The contacts of all boards and SIMMs should be cleaned. You can use the eraser of a pencil to do this, thus ensuring good contacts. Be certain that all boards are firmly seated in their slots or sockets. It may be necessary to replace old cabling which may degrade over time and under high temperatures. Power supplies can also cause many problems, thus, if possible, have the output voltages checked. Monitors can cause strange behaviors on your system as well. It is also highly recommended that computers be placed on some type of Surge Suppression power strip since after a power outage occurs, the return of power back on is usually a fairly high surge and can permanently damage sensitive electrical components of your system.

If you add more memory to the system, it is possible that the BIOS will recognize the full amount of physical RAM that is installed in the server but that Windows will recognize only a part of the RAM. If the server has a redundant memory feature or a memory mirroring feature that is enabled, the full complement of memory may not be visible to Windows. Redundant memory provides the system with a failover memory bank when a memory bank fails. Memory mirroring splits the memory banks into a mirrored set. Both features are enabled or disabled in the BIOS and cannot be accessed through Windows. To modify the settings for these features, you may have to refer to the system user manual or the OEM Web site. Alternatively, you may have to contact the hardware vendor.

For example, if you are running a system that has 4 GB of RAM installed and you then add 4 GB of additional RAM, Windows may recognize only 4 GB of physical memory or possibly 6 GB instead of the full 8 GB. The redundant memory feature or the memory mirroring feature may be enabled on the new memory banks without your knowledge. These symptoms are similar to the symptoms that occur when you do not add the /PAE switch to the Boot.ini file.

Modification Type:MajorLast Reviewed:6/17/2005
Keywords:kbHardware KB101272