SUMMARY
This step-by-step article describes the various symptoms and
resolution methods that you can use to troubleshoot intra-site replication
failure issues.
Important Although Directory Services (DS) replication and File Replication
Services (FRS) use the same connection setup mechanisms and replication
schedules, they are two completely different components. This article describes
common tools and techniques to troubleshoot replication connection objects to
diagnose intra-site replication problems.
back to the topCommon Symptoms of Replication Failure
Common symptoms that indicate intra-site replication failure
include the following:
- Users and computers do not receive updated
policies.
- The correct SYSVOL share content is not replicated to all
domain controllers (DCs).
Note This may also occur because of an FRS failure.
To troubleshoot these issues, use the following utilities:
- Domain Controller Diagnostics (Dcdiag.exe) and Network
Diagnostics (Netdiag.exe) utilities. You can obtain these tools from the
Windows 2000 Support Tools on the Windows 2000 CD-ROM. For additional information
about how to obtain and use the Dcdiag.exe and Netdiag.exe diagnostic
utilities, click the following article number to view the article in the
Microsoft Knowledge Base:
265706
DCDiag and NetDiag in Windows 2000 Facilitate Domain Join and DC
Creation
- Replication diagnostics utility (Repadmin.exe). Use this
tool to verify correct site links and to display inbound and outbound
connections. You can also use it to display the replication queue. You can
obtain this tool from the Windows 2000 Support Tools on the Windows 2000
CD-ROM. For additional information about how to obtain and use
the Repadmin.exe utility, click the following article number to view the
article in the Microsoft Knowledge Base:
229896
Using Repadmin.exe to Troubleshoot Active Directory
Replication
- File Replication Service utility (Ntfrsutil.exe).
- Active Directory Replication Monitor (Replmon.exe). You can
obtain this tool from the Windows 2000 Support Tools on the Windows 2000
CD-ROM.
The following list describes the basic steps to follow when you
try to troubleshoot problems of this type:
- Make sure that the Domain Name service (DNS) is correctly
configured. A correct DNS configuration is necessary to correct directory
replication.
- Make sure that you can use the Ping.exe utility to "ping"
the domain controller by host name and IP address from its hub
partner.
- Make sure that computers in the branch can resolve names in
the hub. For example, "ping" server1.domain1.site1.forest.com.
- Make sure that you can ping servers by their Globally
Unique Identifiers (GUIDs) as they are listed in the event logs. If you can
successfully ping a server by its host name, but not by its GUID, a DNS
configuration problem exists.
- Run the Dcdiag.exe utility. This utility runs a series of
tests, with the result of either "Passed" or "Failed". Make sure that all tests
pass.
- View the Directory Service log of the Event Viewer on the
branch with which you experience problems. Investigate and resolve all errors.
- Verify correct site links by using the Repadmin.exe utility
with the /showreps switch.
- Verify inbound connections by using the Repadmin.exe
utility with the /showconn switch.
- View all the log files in the Winnt\Debug folder.
back to the topSpecific Symptoms and Troubleshooting Steps
Note In the following sections, the domain controller that is
reporting the problem is referred to as the "destination server". The domain
controller from which the destination server tries to replicate content is
referred to as the "source server".
"Access Denied" Errors
When you use the Repadmin.exe tool with the
/showreps switch, one or more "Access Denied" error messages are listed in
the replication status information that is returned. This indicates that the
domain controller was unsuccessful when it last tried to contact the other
domain controller. Because a domain controller is a member of the Enterprise
Domain Controllers Group, it is authorized to call any function on another
domain controller. If you see that calls between domain controllers result in
"Access_Denied" errors, it is not an issue about the lack of correct
credentials, but that one of the domain controllers is not configured
correctly.
- If the error is "ERROR_ACCESS_DENIED", look for a Kerberos
problem.
- If the error is "ERROR_DRA_ACCESS_DENIED", look to see if
the computer accounts for both of the two computers involved, on both
directories, are correct. Make sure that the
userAccountControl field is correct for a domain
controller.
back to the topRepadmin.exe or Replmon.exe Report "Access Denied" for a Particular Directory
Partition
This issue typically indicates a Kerberos authentication problem,
although there are several exceptions. To resolve the replication failure in
this case, resolve the authentication failure before you try to fix the
replication problem. To resolve this issue:
- Make sure that the "Access this computer from network" user
right in the source server's security policy includes the destination servers'
machine account. You can do so either by the Everyone group, the Enterprise
Domain Controllers group, or by specifying it individually.
- Make sure that the Key Distribution Center service is
started. You can use Dcdiag.exe to test for service failure on all domain
controllers by using the dcdiag /test:services
command.
Note In this command, there is a colon between "test" and
"services". - Make sure that the destination server has connection
objects from other source servers. If it does not, you may have to create
manual connections if the Knowledge Consistency Checker (KCC) does not
automatically create them, or if it has been disabled.
- Make sure that the KCC topology is connected. If the KCC
has not formed a full topology, changes cannot be replicated. To test this, use
the dcdiag/test:topology command, specifying the domain
topology that you want to check.
- Make sure that the Trust computer for
delegation check box is selected on the General tab
of the domain controller Properties
dialog box in the Active Directory Users and Computers MMC snap-in.
- If the problem exists between domain controllers from
different domains, check the trust relationship. To do so, use the Active
Directory Domains and Trusts snap-in or by using the netdom trust
trusting_domain_name
/domain:trusted_domain_name /verify
/kerberos command.
- Make sure that each computer is synchronized for the
Configuration Naming Context (Config NC). The KCC must know what the servers
and sites are. You can use the repadmin/syncall command
to force a server to become up-to-date with the whole enterprise. Specify that
the naming context that you want to synchronize is the Config NC. Make sure
that your site link topology is correct. Force the KCC to run on each server to
rebuild the topology, or wait 15 minutes.
- Make sure that key bridgehead servers are operational. You
must determine if changes can flow throughout the enterprise. Run the
dcdiag/test:intersite command one time for each site.
This command returns the names of the bridgehead servers and whether or not
they are reporting errors.
- Check the attribute of the
userAccountControl property. Make sure that the UF_SERVER_TRUST_ACCOUNT
0x2000 and the UF_TRUSTED_FOR_DELEGATION 0x80000 attributes are defined. For example, if you convert the attribute
value of 532480 decimal to hexadecimal, it becomes x82000 of which 0x8000
corresponds to UF_TRUSTED_FOR_DELEGATION and 0x2000 corresponds to UF_SERVER_TRUST_ACCOUNT.
- Use the Replmon.exe utility to determine if the pwdLastSet and unicodePwd attributes have consistent time/date stamps across
computers.
- Make sure that service principal names (SPNs) are
registered on each domain controller. Use the
dcdiag/test:outboundsecurechannels command to test this.
You can identify the SPN that is used for replication by the previous GUID:
E3514235-4B06-11D1-AB04-00C04FC2DCD2/b2f6f255-4446-45e8-81a3-0649d5d71a66/domain.com.
- Force all computer accounts to be replicated throughout
the enterprise. That means that all domain controllers must be synchronized
with all other copies of their domain. For each computer that is reporting a
replication error such as "Access Denied", use the
repadmin/syncall command to force that computer to
become up- to-date. Note that you must specify the domain that you want to
synchronize.
- You may receive the following error message when you run
the previous Repadmin.exe command:
The security context
could not be established due to a failure in the requested quality of
service.
If you do, turn up internal processing and look for "DSID"s.
Contact Microsoft Product Support Services (PSS) for information about how to
obtain the Dsid.exe tool. For information about how to contact Microsoft PSS,
visit the following Microsoft Web site: - Make sure that the Enterprise Domain Controllers group has
the required permissions on the directory partitions ACLs:
- Start the Active Directory Users and Computers
snap-in.
- On the View menu, click
Advanced Features, if it is not already selected.
- Right-click the root domain object, and then click
Properties.
- Click the Security tab, click
ENTERPRISE DOMAIN CONTROLLERS in the name list, and then make
sure that the following permissions are selected under Allow:
Manage Replication Topology
Replicating Directory Changes
Replication Synchronization
- Use the Active Directory Sites and Services snap-in to
make sure that the Server object and its corresponding "NTDS Settings" child
object exist in the correct site.
- Check the destination server for old or invalid tickets to
the source server. Use the Kerbtray and Klist Windows 2000 Resource
Kit utilities to perform these tests. Use the NETDOM
RESETPWD command to reset the account password and write this
change to an immediate replication partner. This effectively changes the
password, sets the old and new passwords to be the same, and then writes this
change to the replication partner. This requires that you use the following
command or that you restart the computer:
back to the top"The DSA Operation Is Unable to Proceed Because of a DNS Lookup
Failure" Error
To troubleshoot this error:
- Use the Nltest /dsgetdc: /pdc /force /avoidself command to determine if the correct PDC is returned.
- If there a connection object and not a replication link
reported by the REPLMON or REPADMIN commands, the problem might be related to the KCC.
- Run the following commands on the PDC, and then submit the
output to Microsoft PSS for more troubleshooting:
nltest
/DBFLAG:0x2000FFFF
-and-
nltest /DSGETDC: /GC
- Run the nltest /dsgetdc: /gc /force
command to determine if you can contact a global catalog server
(GC).
- Check the "password last changed" parameter on both the PDC
and the server(s) with which you experience the problem.
back to the topOperation Queued or No Replication Links Displayed
No replication links are reported when you run the Repadmin.exe or
Replmon.exe utilities. To troubleshoot this issue, Trigger the KCC and look in
the Directory Services log for any events that relate to the KCC. This
typically points to a failure to communicate with a domain controller.
back to the topReplication Access Denied or Naming Context Is in the Process of Being Deleted
You receive one of the following messages when you try to trigger
replication:
Replication access is
denied.
-or-
The naming context is in the process
of being deleted.
This may occur if the user who is using the Active
Directory Sites and Services snap-in to trigger replication on a domain
controller does not have the appropriate permission to initiate replication.
Check the credentials of the user who performs this operation.
back to the topDuplicate Connection Objects Between Sites
To troubleshoot this issue:
WARNING: If you use Registry Editor incorrectly, you may cause serious
problems that may require you to reinstall your operating system. Microsoft
cannot guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own risk.
- Determine if explicit bridgeheads between sites were used
in the past and not removed, or are currently used and misconfigured. One way
to verify this is to use the LDP tool to connect to the Inter-Site Topology
Generator (ISTG) in the site that has duplicate connections. If you look
through the Config NC to the Intersite Transports container, then to
cn=ip, view this object. If it contains the
"bridgeheadServerListBL" attribute, explicit bridgeheads exist.
For additional information about how to determine
the ISTG of a Site, click the following article number to view the article in
the Microsoft Knowledge Base:
224599
Determining the Inter-Site Topology Generator (ISTG) of a Site in the Active Directory
- Determine if the duplicate connections appear in all sites
or in a particular subset. Look for a pattern such as duplicate connections
between certain sets of servers. In a site that has duplicate connections, view
the fromServer attribute on the duplicate connection. For that "fromServer",
consider the site in which the "fromServer" resides. Try to isolate the
activities in that site. How many servers are in that site? Are there any
servers that are reachable, by using the Ping utility from the ISTG?
- Make sure that the replication interval is appropriately
set and the ISTG can complete it's replication.
- To help isolate duplicate connection issues:
- Pick a DC that is building duplicate inbound intersite
connections. For example, the same source DC and destination DC, not just the
same source site and destination site. The selected DC must be the ISTG for its
site. You can determine the ISTG for a site by viewing the NTDS Site Settings
properties for that site in the Active Directory Sites and Services snap-in.
- Increase the Directory Service event log to a very
large size. For example, 64 megabytes (MB).
- Use Registry Editor to set the regedit to set the
1 Knowledge Consistency Checker value to a data value of 5 and
9 Internal Processing value to a data value of 1 in the
following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics
- Run the ldifde -f before.ldf -d
"CN=Sites,CN=Configuration,DC=Site1,DC=Forest1,DC=com"
command.
- Let T0=current time.
- Run the repadmin /kcc command,
and then wait for it to complete.
- Start the Event Viewer, and then make sure that the
Directory Service event log recorded informational events back to time T0 (including KCC event 1009, "The consistency checker has started
updating the replication topology for this server"). If not, double the event
log size and go back to step e: Let T0=current time.
- Save the Directory Service event log.
- Run the ldifde -f after.ldf -d
"CN=Sites,CN=Configuration,DC=Site1,DC=Forest1,DC=com"
command.
- Review the Before.ldf, the After.ldf, and the Directory
Service event log for more analysis.
back to the topGroup Policy Is Applied Inconsistently Across Domain Controllers
You can use the following example script to make sure that Group
Policy has replicated correctly throughout the domain controllers in your
domain.
Microsoft provides
programming examples for illustration only, without warranty either expressed
or implied, including, but not limited to, the implied warranties of
merchantability and/or fitness for a particular purpose. This article assumes
that you are familiar with the programming language being demonstrated and the
tools used to create and debug procedures. Microsoft support professionals can
help explain the functionality of a particular procedure, but they will not
modify these examples to provide added functionality or construct procedures to
meet your specific requirements. If you have limited programming experience,
you may want to contact a Microsoft Certified Partner or the Microsoft
fee-based consulting line at (800) 936-5200. For more information about
Microsoft Certified Partners, visit the following Microsoft Web site
For additional information about the support options available
from Microsoft, visit the following Microsoft Web site:
Use the
chkpolicy the name
of your domain command to run this script:
@echo off
REM \logs\chkpolicy domain_name
set filename=sysvol\%dom_name%\Policies\{6AC1786C-016F-11D2-945F-00C04fB984F9}\Machine\Microsoft\Windows NT\SecEdit\GPTTMPL.INF
nltest /dclist:%dom_name% > dclist.tmp
del dclist1.tmp
FOR /F "eol=; tokens=1 delims=, " %%i in (dclist.tmp) do (
@echo %%i >> dclist1.tmp
)
FOR /F "eol=. tokens=1 delims=. " %%i in (dclist1.tmp) do (
@echo %%i
dir "\\%%i\%filename%"
)
back to the topThe Directory Service Is Too Busy to Complete the Operation
You may receive error 8438, ERROR_DS_DRA_BUSY, "The directory
service is too busy to complete the replication operation at this time." This
is the error that the Directory Service returns when it has made progress
removing the Naming Context (having removed 500 objects), but there are too
many objects to complete in one pass without tying up the replication queue. If
Global Catalog cleanup is preventing successful replication, you create a batch
file to speed up the process. You can then re-promote the computer to act as a
global catalog server. The following example script provides this
functionality:
Microsoft provides
programming examples for illustration only, without warranty either expressed
or implied, including, but not limited to, the implied warranties of
merchantability and/or fitness for a particular purpose. This article assumes
that you are familiar with the programming language being demonstrated and the
tools used to create and debug procedures. Microsoft support professionals can
help explain the functionality of a particular procedure, but they will not
modify these examples to provide added functionality or construct procedures to
meet your specific needs. If you have limited programming experience, you may
want to contact a Microsoft Certified Partner or the Microsoft fee-based
consulting line at (800) 936-5200. For more information about Microsoft
Certified Partners, see the following Microsoft Web site:
For additional information about the support options available
from Microsoft, visit the following Microsoft Web site:
setlocal
set destgc=__setgcnamehere__.site1.forest1.com
:domain1
repadmin /delete DC=domain1,DC=site1,DC=forest1,DC=com %destgc% /nosource
if %errorlevel% == 8438 goto :domain2
:domain2
repadmin /delete DC=domain2,DC=Site1,DC=forest1,DC=com %destgc% /nosource
if %errorlevel% == 8438 goto :domain3
REM ...
endlocal
back to the topAdvanced Troubleshooting Techniques
Knowledge Consistency Checker and ISTG
You can create an event log for the Knowledge Consistency Checker
that contains more diagnostic information. To do this perform the following
steps on the ISTG of the site where duplicate connections appear:
- Save the contents of the event log, and then clear the
event log.
- Set the 1 Knowledge Consistency Checker
registry DWORD value to 5 in the following registry
subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics
- Run the Knowledge Consistency Checker by running the
repadmin /kcc command.
- Reset the 1 Knowledge Consistency Checker
registry DWORD value to 0 (zero).
- Save the new event log.
To obtain a new baseline measurement:
- Make sure that the computer has a site link to the hub. If
it does not, create one.
- Delete all connection objects that come into the
computer.
- Run the Knowledge Consistency Checker by running the
repadmin /kcc command.
- Make sure that it has created the connections you expect by
running the repadmin /showconn command.
- Look in the Directory Service event log for errors. You may
see errors (for example, event ID 1265) indicating that a replica cannot be
added for naming context X, and error
Y. Determine if the error is related to a DNS issue
or if it is a connectivity error, and then try to correct the corresponding
problem. If the error indicates that a target account name is incorrect or if
it is an SPN error, it may be more difficult to resolve.
- If the event log reports that the replica was added
successfully, check this by running the repadmin /showreps
command.
After you adjust site link replication intervals, wait for the
configuration change to replicate to other hub servers, and then restart each
of the hub servers to clear the replication queue. You can use the
repadmin/sync command or the Active Directory Sites and
Servers snap-in to force replication of the Configuration naming context so
that the updated site links are visible on each of the hub servers before you
restart them. Use the Dcdiag.exe utility to assess the replication health of
each site. This can be run remotely through a script and the output parsed for
the word "fail". You can use the following sample script as an example:
Microsoft provides
programming examples for illustration only, without warranty either expressed
or implied, including, but not limited to, the implied warranties of
merchantability and/or fitness for a particular purpose. This article assumes
that you are familiar with the programming language being demonstrated and the
tools used to create and debug procedures. Microsoft support professionals can
help explain the functionality of a particular procedure, but they will not
modify these examples to provide added functionality or construct procedures to
meet your specific needs. If you have limited programming experience, you may
want to contact a Microsoft Certified Partner or the Microsoft fee-based
consulting line at (800) 936-5200. For more information about Microsoft
Certified Partners, see the following Microsoft Web site:
For additional information about the support options available
from Microsoft, visit the following Microsoft Web site:
REM check replications in site site1
dcdiag /s:dc1 /test:replications /a /n:domain1
dcdiag /s:dc1 /test:replications /a /n:domain2
dcdiag /s:dc1 /test:replications /a /n:domain3
REM check replications in site site2
REM continue Dcdiag statements for domains in site2
back to the topFile Replication Service (FRS)
- If you suspect that Directory Service replication is
working, but that FRS is failing, make sure the FRS post-Service Pack 1 (SP1)
hotfix is installed on all replication partners. This update is included in
Service Pack 2 and Service Pack 3 for Windows 2000.
- Run the Ntfrsutil ds command to verify the following:
- Make sure that there is only one subscriber object with
the name "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" and that it has a "Member Ref".
For example:
- SUBSCRIBER: DOMAIN SYSTEM VOLUME (SYSVOL
SHARE)
- Member Ref:
CN=TEST1,CN=Domain System Volume (SYSVOL
share),CN=File Replication Se...
- Locate the member object output ("dump") for this
domain controller, and then make sure that it has a Server Ref
and a Computer Ref attribute. Also make sure that at least one
connection exists right under this member object. This is the inbound
connection to this domain controller. For example: MEMBER: TEST1
- Server Ref : CN=NTDS
Settings,CN=TEST1,CN=Servers,CN=Default-First-Site-Name,CN=Sit...
- Computer Ref :
cn=test1,ou=domain
controllers,dc=domain1,dc=site1,dc=forest1,dc=com...
- DN :
cn=d7874204-c331-4750-82ec-30b96a8ec732,cn=ntds
settings,cn=test1,cn=s...
- Make sure that at least one other member object had
this domain controller as its inbound partner. Use the Partner
Dn attribute to indicate which partner this connection is from.
- Partner Dn : cn=ntds
settings,cn=test1,cn=servers,cn=default-first-site-name,cn=sit...
- Run the Ntfrsutil command to check the following:
- Make sure that the replica set DOMAIN SYSTEM
VOLUME (SYSVOL SHARE) has a Service State value of
ACTIVE For example:
ServiceState : 3 (ACTIVE)
- Make sure that there is at least one inbound and one
outbound connection from this domain controller. For example:
Inbound : FALSE
Inbound : TRUE
- Increase FRS logging levels. To do this, add the following
registry values to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters
registry subkey:
Value name: Debug Log Severity
Value type: REG_DWORD
Value: 0x00000004
Value name: Debug Maximum Log Messages
Value type: REG_DWORD
Value: 50000
Value name: Debug Log Files
Value type: REG_DWORD
Value: 0x00000032
- To aid troubleshooting, you can "dump" the state of the FRS
on a domain controller to a file. Use the following sample script as an example
of how to do this:
Microsoft
provides programming examples for illustration only, without warranty either
expressed or implied, including, but not limited to, the implied warranties of
merchantability and/or fitness for a particular purpose. This article assumes
that you are familiar with the programming language being demonstrated and the
tools used to create and debug procedures. Microsoft support professionals can
help explain the functionality of a particular procedure, but they will not
modify these examples to provide added functionality or construct procedures to
meet your specific needs. If you have limited programming experience, you may
want to contact a Microsoft Certified Partner or the Microsoft fee-based
consulting line at (800) 936-5200. For more information about Microsoft
Certified Partners, see the following Microsoft Web site: For additional information about the support options available
from Microsoft, visit the following Microsoft Web site:
@echo off
REM FRS_CHECK.CMD - Records the state of FRS
SETLOCAL ENABLEEXTENSIONS
SET FRSCK=C:\FRS_CHECK
if NOT EXIST %FRSCK% (md %FRSCK%)
REM run dcdiag
dcdiag > %FRSCK%\dcdiag.txt
REM For FRS
ntfrsutl ds > %FRSCK%\ntfrs_ds.txt
ntfrsutl sets > %FRSCK%\ntfrs_sets.txt
ntfrsutl inlog > %FRSCK%\ntfrs_inlog.txt
ntfrsutl outlog > %FRSCK%\ntfrs_outlog.txt
ntfrsutl version > %FRSCK%\ntfrs_version.txt
regdmp HKEY_LOCAL_MACHINE\system\currentcontrolset\services\NtFrs\Parameters > %FRSCK%\ntfrs_reg.txt
dir \\.\sysvol /s > %FRSCK%\ntfrs_sysvol.txt
REM scan the frs debug logs for errors.
findstr /i ":SO: error invalid fail abort warn" %windir%\debug\ntfrs_*.log | findstr /v "IO_PEND ERROR_SUCCESS FrsErrorSuccess" > %FRSCK%\ntfrs_errscan.txt
REM For DS replication
repadmin /showreps > %FRSCK%\ds_showreps.txt
repadmin /showconn > %FRSCK%\ds_showconn.txt
back to the
top