FIX: MSMQ: Network Traffic from Site Controllers Increases When Site Controllers Are Unreachable (268813)



The information in this article applies to:

  • Microsoft Message Queue Server (MSMQ) 1.0

This article was previously published under Q268813

SYMPTOMS

When site controllers are unavailable, you may observe the following behavior:
  • Traffic congestion may occur, depending on network bandwidth.
  • Messages may seem to get stuck on MSMQ computers or may be delivered sporadically.
  • Network monitors may show bandwidth being exhausted by traffic between MSMQ primary site controllers.
  • MSMQ traffic to offline primary site controllers occurs on a frequent basis.

CAUSE

MSMQ site controllers (PEC/PSC) send periodic synchronization (sync) and hello messages to all other site controllers for replication and availability testing. When one of these messages cannot reach one or more site controllers, it is resent every 5 seconds until a response is received or until the time-to-live for the message expires. By default, the time-to-live is 20 minutes and is controlled by the following registry value:

HKLM\software\microsoft\msmq\parameters\ReplicationMsgTimeout
					

This registry value also controls the interval at which a new sync or hello message is sent. Because this value controls both intervals, every time one of these messages expires, a new replacement message is sent, resulting in unending traffic from all primary site controllers to an offline primary site controller until the offline primary site controller comes back online.

RESOLUTION

A supported hotfix is now available from Microsoft, but it is only intended to correct the problem that is described in this article. Only apply it to systems that are experiencing this specific problem. This hotfix may receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next Windows NT 4.0 service pack that contains this hotfix.

To resolve this problem immediately, contact Microsoft Product Support Services to obtain the fix. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:NOTE: In special cases, charges that are ordinarily incurred for support calls may be canceled if a Microsoft Support Professional determines that a specific update will resolve your problem. The typical support costs will apply to additional support questions and issues that do not qualify for the specific update in question.

The English version of this fix should have the following file attributes or later:
Date          Time    Version    Size      File name     Platform
-----------------------------------------------------------------
05-JUL-2000   7:06    309        227,088   Mqis.dll      x86
05-JUL-2000   7:06    309        504,080   Mqqm.dll      x86
05-JUL-2000   7:06    309        106,768   Mqutil.dll    x86
				

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article.

MORE INFORMATION

This fix adds two new registry values to control the intervals that are used to send a new hello, or sync, message. The ReplicationMsgTimeout value continues to control the time-to-live for both hello and sync request messages. To implement, add all three registry values. Set the two new interval values to 20 minutes and the ReplicationMsgTimeout value to a smaller value.

The ReplicationMsgTimeout value determines how long unresponded messages are sent at 5 second intervals until they are thrown away. Do not set this value to less than 5 seconds. When a server comes back up after being offline, keep in mind that it may not be recognized as being online again for the difference between the interval values and the ReplicationMsgTimeout value. This may delay message delivery significantly.

Existing MQIS message traffic value:

HKLM\software\microsoft\msmq\parameters\ReplicationMsgTimeout- dword, in seconds. default- 20 minutes.
					

The ReplicationMsgTimeout value defines the time-to-live of most MQIS messages: replication, sync request, sync reply, ack, hello. This is an old value, and by default it was not set. MQIS used the 20 minute default.

New registry values to control hello/synch messages:

HKLM\software\microsoft\msmq\parameters\IntervalBetweenHello- dword, in seconds. default- 20 minutes.
					

The IntervalBetweenHello value defines the interval between hello cycles.

HKLM\software\microsoft\msmq\parameters\IntervalBetweenSyncReq- dword, in seconds. default- 20 minutes.
					

The IntervalBetweenSyncReq value defines the interval between sync request cycles. When a PSC sends a sync request to another PSC, it waits this amount of time (multiplied by 2 and incremented by 5 seconds) before it checks for a reply. If a reply is not found, a new sync request is sent.

NOTE: You cannot set the ReplicationMsgTimeout value to be greater than any of the intervals. In such a case, the MQIS code will reset the interval to be equal to the message timeout value.

Modification Type:MinorLast Reviewed:10/12/2005
Keywords:kbHotfixServer kbQFE kbbug kbfix KB268813