INTRODUCTION
This article discusses common link state issues and common routing issues that
you may experience in Microsoft Exchange 2000 Server and in Microsoft Exchange Server 2003.
back to the topThe purpose of a routing group
The routing group is the smallest unit of servers that are
likely to always be connected to one another. The routing group can be assumed to be one
node on the graph of connector paths, with multiple possible connectors between
routing groups.
To configure the way that messages are routed between
servers so that point-to-point connections between servers are always made, the
servers must be grouped in routing groups, and the Routing Group connectors
must be defined.
In a routing group, link state information updates and routing
information updates are pushed between master nodes and member nodes through a persistent
port 691 Transmission Control Protocol (TCP) connection. Between two routing
groups, servers advertise the X-LINK2STATE verb to exchange link state information
by comparing the MD5 digest in the Exchange organization information packet of the two
routing group bridgeheads. A mismatch triggers an exchange of link state
information between the two servers through SMTP port 25.
back to the topThe role of a routing group master
The routing group master coordinates changes to link states that are
learned by servers in its routing group and retrieves updates from the directory service. By having
a single server coordinate the changes, you can treat a routing group as a single
entity for the purposes of computing a least-cost path between routing groups in an organization.
back to the topWhat occurs when the routing group master stops responding
All servers in the routing group continue to operate on the same
information that they had at the time that they lost contact with the master.
When the routing group master comes back up, it examines the status of all other
servers, reconstructs the link state information, processes the State Change
Queue (SCQ), and then updates members in the routing group.
back to the topCommon issues
The following sections present several routing issues that you
may experience. Additionally, the following sections suggest methods that you can use to troubleshoot
the issues.
back to the
topRouting member node is not connected to master
When you use the WinRoute tool (Winroute.exe) to view Exchange organization routing, you may see the words "connected to master - NO" and a red X next to the organization's name. These words and the red X indicate that the routing member node is not connected to the master.
In a routing group, the routing group
nodes, including the master, must be connected to the master node on
Transmission Control Protocol (TCP) port 691 to propagate routing information and link
state information to and from the master node.
Note To download the Microsoft Exchange Server 2003 WinRoute tool for troubleshooting routing in an Exchange 2000 and Exchange 2003 mail-handling environment,
visit the following Microsoft Web site.
The following file is available for download from the Microsoft Download Center:
Download the Winroute.exe package now.
For more information about how to download Microsoft Support files, click the following article number to view the article in the Microsoft Knowledge Base:
119591 How to obtain Microsoft support files from online services
Microsoft scanned this file for viruses. Microsoft used the most current virus-detection software that was available on the date that the file was posted. The file is stored on security-enhanced servers that help prevent any unauthorized changes to the file.
To resolve this issue, follow these steps:
- Make sure that the Exchange Routing Engine
Service (RESvc service) is started on all affected servers in the routing group and
that it remains in a controlled state. If the service is in an unstable state,
the server may not connect to master nodes. Investigate the root cause of
any unstable services before you go to the next step.
- Verify that a firewall does not restrict TCP port 691. To
do this, initiate a Telnet session to port 691 on the affected servers and on the
master node. A Microsoft Routing Engine banner indicates an active state.
- At the command prompt, run the netstat -a -n command.
The output of this command reveals all member nodes and the master itself
connecting to TCP port 691 on the master node.
- In Event Viewer, check the application logs for any events that
indicate a failure to authenticate by using the computer account , such as Domain\serverName$. Events such as Transport events 962 and 961 indicate a
failure of the RESvc service to connect.
- Verify that the affected servers or the Exchange Domain
Server group that they belong to do not have the SendAs right missing, denied, or
denied from a nested membership of another group. To do this, run the Exchange
Trace Utility (Regtrace.exe), and then restart the RESvc service.
For more information about RegTrace setup on Exchange 2000, click the following article number to view the article in the Microsoft Knowledge Base:
238614
How to set up Regtrace for Exchange 2000
Note For additional information about tools and processes that you can use to troubleshoot and to diagnose transport issues and routing issues in Exchange 2003, download the Exchange Server 2003 Transport and Routing Guide online book. To download this book, visit the following Microsoft Web site: - Verify that the affected servers can generate a
ServicePrincipalName (SPN) for authentication. To verify this, check the network address attribute of the
affected servers by using the ADSI Edit tool (ADSIEdit.exe) or by using the Lightweight
Directory Protocol tool (Ldp.exe).
Nodes in a routing group have to
mutually authenticate with the routing group master to be connected. To do
this, they use the ncacn_ip_tcp value in the Network address attribute of the
Exchange Server computer to generate the SPN for the master node by using Kerberos authentication.
Make sure that this value is a Fully Qualified Domain Name (FQDN) instead of a NetBIOS
name or an IP address. Restart the RESvc service. - Check the application log and the system log on all the affected servers for any Kerberos
authentication errors. Kerberos authentication errors may be caused by an expired domain computer account
password. To gain additional information about this
issue, run the NLTEST utility with debug flags.
For more information about how to run the NLTEST utility with debug flags, click the following article number to view the article in the Microsoft Knowledge Base:
109626
Enabling debug logging for the Net Logon service
Important If the domain computer account password has apparently expired, you must contact Microsoft Product Support Services (PSS) to confirm and to correct the issue. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site: - Verify that the FQDN of the
virtual server matches the FQDN in Domain Name System (DNS).
- If the membership
of the routing group spans multiple domains, make sure that DNS is correctly
designed and implemented between the domains.
- Look for any third-party applications that use Group Policy objects to restrict
permissions or to restrict security settings.
back to the
topRouting group master wars
In a routing group, the first server installed in the
routing group is automatically elected as the master node. As other servers are
installed, the administrator has the option to appoint another server as
master.
When the new routing group master is elected, only one server should be assigned the master role at a time. This rule is enforced
by an algorithm that is based on the formula "(
N/2) +1" (where
N denotes the number of
servers in the routing group). The algorithm calculates the number of servers in the
routing group that must agree and that must acknowledge the master. Therefore, the member nodes send link state ATTACH data (information about the routing group)
to the master.
It is not uncommon for two or more servers to have
erroneous information about which server is the current routing group master.
For example, if a routing group master was moved or was deleted, and another master node was not chosen, the MsExchRoutingMasterDN
attribute may point to a non-existent server.
This issue may also occur
when an old master does not detach as master, or when a problematic node keeps sending
incorrect link state ATTACH information.
Note In Microsoft Exchange
Server 2003, if a routing group points to a deleted object, the master
node gives up its role as master and initiates a shutdown.
To
resolve this issue, use one of the following methods:
- Look for link state data propagation through TCP port 691, for
firewall hindrances such as firewall blocking of TCP port 691, and for SMTP filters.
- Look for Active Directory replication latencies.
- Look for network problem and latencies.
- Look for deleted routing group masters or servers that no
longer exist. If this is the case, a Transport event 958 that references a
routing group master distinguished name that no longer exists is logged in
the application log. Use the Lightweight Directory Protocol (Ldp.exe) tool or
the ADSI Edit (Adsiedit.exe) tool to verify that this is the case.
back to the
topDeleted routing groups are followed by [object_not_found_in_DS]
When servers are moved between routing groups, and when the routing
groups are subsequently deleted, if you use Winroute.exe you may see the text
[object_not_found_in_DS] next to the object name.
This issue may occur if the routing engine service
tries to correlate an object that still exists in a dynamic routing library
that is maintained by the server with objects in Active Directory, where the object
does not exist any more.
Tips to resolve this issue:
- Restart all servers in the organization at the same time. This action updates routing information. Additionally, this action removes deleted routing groups and
deleted connectors.
- Use the Remonitor.exe tool in injection mode.
Note Contact Microsoft Product Support Services for information about the Remonitor.exe tool in injection mode. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site: - Make sure that the servers are on a recent build of Exchange Server and that they have the Exchange Server service pack rollups installed.
Note Applying the hotfix that is described in the following Knowledge Base article is no longer necessary if your servers are on a recent build of Exchange Server and have the current Exchange Server service pack rollups installed. If you cannot install the most recent Exchange Server service pack rollups, apply the hotfix that is described in the following Knowledge Base article:330279 Deleted routing groups are listed in the WinRoute tool; fix requires Exchange 2000 SP3
- Restart all Exchange Server services and Windows Management
Instrumentation (WMI) services on all Exchange Server computers in the
organization. This resolution is effective only if all servers are restarted at
the same time.
Note Contact Microsoft Product Support Services for information about restarting all servers at the same. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site: - Make sure that the account that is logged on to the server has
sufficient permissions. To do this, run Winroute.exe under the System Account.
Note The lack of sufficient read permissions may cause
Winroute.exe to incorrectly report [object_not_found_in_DS].
back to the
topConnectors are not reported to be marked as "DOWN"
When you use the Winroute.exe tool to view Exchange
routing topology, you may see that connectors that are unavailable are reported as
being available ( they are marked as "UP"). This behavior may occur for the following connectors:
- Connectors that use DNS to route. For example, this behavior may occur for SMTP
connectors that use DNS instead of smart host.
- Microsoft Exchange 5.5 Server connectors or Exchange
Development Kit (EDK) connectors. These connectors do not use link state
routing.
- Routing group connectors with source bridgeheads of the
"any" type.
- Any connectors where one bridgehead is an Exchange 5.5 Server computer.
- Connectors that use smart host settings and recently
changed smart hosts.
back to the
topLink state oscillations: connectors are repeatedly marked as "UP" and then as "DOWN"
This common scenario involves connectors being marked as "UP"
and then as "DOWN" repeatedly. It causes excessive link state updates between
servers. These excessive link state updates cause a very expensive and frequent recalculation
of routes within the server. This is also indicated by Event 4005 Reset Routes.
This issue may occur in the following scenarios:
- Network problems. Use a network trace to diagnose this
scenario.
- A reaction to link status notification calls from
underlying protocol services, such as SMTP/AQ and message transfer agent (MTA). This behavior is caused by interference on the X.400 protocol levels or on the SMTP protocol levels by third-party
applications.
In this scenario, only a network monitor capture can reveal
the issues that are involved. Additionally, if you notice very frequent changes of
the major versions, of the minor versions, and of the user versions in the WinRoute tool, this may also
indicate a link state problem (see the WinRoute
routing version changes section).
To reduce link state oscillations, apply the hotfix that is described in the following article in the Microsoft Knowledge Base:
825314 Link state traffic saturates slow links between servers
After the hotfix has been applied, you must enable the AttachedTimeout registry subkey to make sure that the hotfix works as expected.
Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
To enable the AttachedTimeout registry value, follow these steps:
- Click Start, click Run, type regedit, and then click OK.
- Locate the HKLM\SYSTEM\CurrentControlSet\Services\RESvc\Parameters subkey.
- Right-click the Parameters subkey, point to New, and then click DWORD value.
- Name the new value AttachedTimeout.
- Double-click AttachedTimeout, and then type any data value from 1 to 604800. Click to select Decimal for the Base type.
Note The AttachedTimeout value represents time in seconds. The valid range for this value is 1 second to 604,800 seconds (7 days). - Click OK, and then quit Registry Editor.
Note Contact Microsoft Product Support Services for more information about the AttachedTimeout registry subkey. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:
back to the
topHow connector states affect link states
A connector can be located anywhere in any routing group in the
Exchange organization. A specific connector that is frequently marked as "UP"
and as "DOWN" may seriously affect the possible routes that a message can take through
the organization. Such a connect may even lead to mail loops.
Exchange routing
chooses the most optimal path, based on variables such as cost, message type,
and restrictions. Exchange routing locates the next server for a message to make the next hop
to, and then Exchange routing gives the name of the next server to Message Queuing. Because the oscillating state of a connector causes link state changes,
Exchange has to repeatedly recalculate the optimal path. This
recalculation process involves queries to the directory service.
back to the topHow link states affect connector states
When Message Queuing detects that a link to the bridgehead server on a
connector failed, it calls into routing by using a method that is named LinkStateNotify( ).
Routing then suppresses this information for up to 10 minutes to prevent
connector state fluctuation, and then routing relays this information to the routing group master. If routing decides to mark the connector as "DOWN," this change is
propagated to all computers in the organization, including the computer where the original failure occurred. This behavior leads to a very expensive process
that is named "reset routes." Thereafter, the routing engine no longer recommends that the Advanced Queuing engine (AQ) connect to the "failed" next-hop computer. The reverse is true for
a connector that is marked as "UP."
back to the
topWinRoute routing version changes
The WinRoute tool reports routing versions in the following format:
"RoutingGroup (d5.2.3)." The three numbers that are separated by periods that follow the
routing group name are the major version, the minor version, and the user version.
Major version changes are typically changes in directory service
that involve routing and connectors. If there is a frequent change here, monitor it
by using the Remonitor.exe tool, and then investigate it for a probable root cause. For
example, an administrator may make significant changes in directory service. A
major version of zero is shown for isolated routing groups with no routing
and no link state exchange with other nodes. Additionally, a
major version of zero is shown for Microsoft Exchange 5.5
Server-based sites because they do not use link state information.
A minor version change may indicate changes to the state of a
connector. Frequent changes may be caused by faulty links or by links
that fluctuate between the "UP" state and the "DOWN" state. AQ
tries to send a message over a connector. If AQ fails, it sends a
notification to routing to mark the connector as "DOWN." Then, AQ initiates retry pings to the connector. After AQ detects that
the connector is up, AQ notifies routing by calling the LinkStateNotify()
method.
User version changes may occur in the following
situations:
- Servers attach to or detach from master nodes.
- WMI services send data to the routing group master.
- There is callback registration by routing clients such as MTA or
SMTP.
- There are routing group membership changes.
- You rename the routing group
- A new master node is elected.
back to the
topBase-level callbacks
Routing base-level callbacks are updates that occur after a routing group
object is modified, and after the updates are then propagated throughout the organization.
The Winroute.exe major version changes may be triggered by the following
events:
- Renaming a routing group
- Electing a new routing group master
- Removing a routing group member
- Adding a
routing group member
back to the
topOne-level callbacks
One-level callbacks are typically updates to routing when changes
that are one level below the routing group object are detected. Some examples of this
are deleting a connector in the routing group and adding a connecter to the routing group.
back to the
topDNS
Incorrect configuration of Domain Name System (DNS) may cause several
routing issues. These issues are addressed in the following sections.
back to the
topThe DNS Resolver sink event on the SMTP virtual server
The DNS Resolver sink event
is primarily for resolving external SMTP domains. Your internal Active
Directory servers and DNS servers still have to be able to resolve all Exchange Server computers
internally.
The SMTP virtual server DNS Resolver sink event is synchronous
and can affect performance on a heavily used server. To slightly improve
the situation, increase the number of threads that are used for DNS lookups.
The
DNS Resolver sink event is used only when a server is not in the Exchange organization. Exchange Server determines this by querying Active
Directory directory service.
back to the
topWindows 2000 DNS API
If you use the DNS Resolver tool for name resolution, the
lookups that are created by this tool are asynchronous and are much faster than using the default settings of the external DNS
Resolver sink event.
Exchange DNS that uses the Windows DNS API or
the Exchange DNS Resolver sink event has to be able to resolve an Internet Protocol address (IP address) in the following
ways:
- mail exchanger resource record (MX record)-to-IP address
- MX record -to-A record-to-IP address
- MX record-to-CNAME record-to-A record-to-IP address
- CNAME record-to-A record-to-IP address
- A record-to-IP address
DNS records that are incorrectly configured, especially MX records and
CNAME records, may seriously affect mail flow.
Note Although Microsoft
Exchange Server 2003 does provide limited support for chained CNAME records, we
do not recommend implementing this configuration.
In Microsoft Exchange Server 2003,
the external DNS Resolver sink event has been improved. Additionally, you can
use the DNS Diagnostic utility (DNSdiag.exe) from the Windows Server 2003 Resource Kit
to troubleshoot DNS issues that involve the external SMTP resolver and the
Windows TCP/IP DNS. DNSdiag.exe shows the asynchronous queries and the synchronous queries
to Global DNS servers or to the DNS server that are called by the DNS sink event. Additionally, DNSdiag.exe shows any
corresponding failures or errors.
Note The DNS Diagnostic utility is also known as also known as the DNS Resolver tool. They are the same file, DNSdiag.exe.
The following file is available for download from the Microsoft Download Center:
Download the Dnsdiag.exe package now.
For more information about how to download Microsoft Support files, click the following article number to view the article in the Microsoft Knowledge Base:
119591 How to obtain Microsoft support files from online services
Microsoft scanned this file for viruses. Microsoft used the most current virus-detection software that was available on the date that the file was posted. The file is stored on security-enhanced servers that help prevent any unauthorized changes to the file.
back to the top