SYMPTOMS
An organization's intersite directory replication that has previously been
running without problems may start to fail to keep up with the changes made
to it's (the organization's) directory.
This can be caused by a Directory Replication bridgehead server being
unable to cope with the sheer volume of replication messages that it is
having to send and receive. This is likely to be because the schedule set
on the organization's Directory Replication Connectors is too demanding for
the intersite replication topology and messaging infrastructure trying to
support it.
The Intersite Directory Replication Schedule
Each Directory Replication Connector can be scheduled. The default for this
schedule is once every three hours. Administrators can make this schedule
more (or less) frequent, up to four times an hour.
The schedule controls when and how often the intersite replication
connector sends out update requests to the destination directory
replication connector.
When making the decision about how often directory replication should take
place, Administrators must consider the potential load put on the
organization's Directory Replication bridgehead servers and the messaging
infrastructure.
NOTE: When setting the activation schedule, if the detail view is set to '1
Hour', selecting a one-hour time block, will activate the connector four
times, if you only want to activate the connector once an hour you must use
the 15 minute detail view.
How Intersite Directory Replication Works
Below are the minimum set of events seen when a directory replication
connector is activated (as according to the schedule) and the diagnostic
logging category 'Replication' (on the Directory Service object) is turned
to Maximum (on both adjacent directory bridgehead servers).
The requesting bridgehead connector for each context will log the
following:
1068 - Ask for updates for naming context (either site or
configuration)
1100 -> Message submitted
1058 - Completed successfully
For each request message, the responding bridgehead server will log the
following:
1099 <- Message received from requesting directory
1070 - The context to get, from the starting USN
1071 - The number of objects retrieved, and entries up to the USN
1101 -> Message submitted back to the requesting connector
Finally, the requesting bridgehead server will log the following for each
received message:
1099 <- Message received back from the remote Directory
This entire set of events will be seen twice (once for the configuration
naming context and once for the site naming context) for each site listed
as an 'Inbound site' on the requesting directory replication connector's
Sites tab.
NOTE: Every message sent out by the requesting bridgehead server will
result in a reply from the remote bridgehead server.
Maximum Objects Sent per Request
A maximum of 512 objects will be sent back to a requesting bridgehead
server in any one response message. If the remote directory bridgehead
server has more than 512 objects to send, it will send an additional
message indicating that it has more objects. Subsequently, the requesting
bridgehead server, when ready, will issue a request for the next set of
objects. This prevents the requesting bridgehead server from becoming
overloaded (that is, when doing a 'refresh all items in directory' for
example).
The Number of Directory Replication Messages a Day in the Organization
To work out the minimum number of messages intersite directory replication
will generate in your organization on a typical day, you can apply the
simple formula:
N = number of sites participating in intersite directory replication
M = (N-1) * 2 = number of replication messages sent out (for both
naming contexts)
F = The number of times each connector is active a day (24 / 3 by
default = 8)
2 = factor for every request must get a reply
N sites * (M * F * 2) = Intersite Replication Messages per day
For example, using the default schedule:
10 sites * (18 * 8 * 2) = 2880 messages a day
20 sites * (38 * 8 * 2) = 12160 messages a day
30 sites * (58 * 8 * 2) = 27840 messages a day
NOTE: Additional replication messages will be generated for the 'address
book views' naming context, but these are a relatively small number and
their quantity is not effected by the directory replication schedule set.
CAUSE
Directory replication will slow up significantly if the Directory
Replication Connector becomes active (as according to the schedule) and the
replies from the remote bridgehead server have not been processed from the
previous cycle.
In this situation, the connector must presume that the replies that have
not been processed will not be forthcoming. Thus, the connector will
request the same updates (plus any other which have happened) from the
remote bridgehead server.
It is possible that in a large organization where an aggressive schedule
for intersite replication has been set, these messages may be getting held
up in the directory service mailbox on the directory bridgehead server.
NOTE: It might be reasonable for this to happen in organizations that have
implemented the widely adopted hub and spoke directory replication
topology, where central hub server(s) are responsible for passing directory
updates between the spokes.
The process of taking the messages from the directory service's inbox and
giving them to the Directory Replication Agent is performed on a single
thread in the Directory service (DSAMAIN). The messages in the inbox must
also be sorted (TABLE_SORT_ASCEND) on the client submit time
(PR_CLIENT_SUBMIT_TIME), this becomes a computationally expensive operation
as messages build up.
To see the number of messages waiting to be processed by the Directory
Replication Agent, view the "Total no. Items" column for the Directory
Service on the mailbox resources page of the Private Information Store
object in the Exchange Server Administrator program.
If you believe messages are building up in this mailbox you might want to
observe what the Directory Service is doing. To do this using Performance
Monitor, add the following counter:
Object: Thread
Counter: % Processor Time
Instance: All the instances for DSAMAIN (use shift key to select)
If the directory service is failing to keep up with the demand, you will
notice a single thread consuming the majority of Processor Time (between 50
percent and 90 percent), whilst the remaining threads are using less than 5
percent. The busy thread is responsible for passing the messages from the
Directory Service inbox to the Directory Replication Agent while keeping
the inbox sorted on 'client submit time'.