How to identify logical corruption problems in Exchange Server (828068)

SUMMARY

This article discusses methods that you can use to identify and to correct common problems that are caused by logical corruption. The following is a list of possible causes of logical corruption in Microsoft Exchange Server databases that are later than Microsoft Exchange Server 5.5:

Deleting Edb.log

Some records in the log file (Edb.log) may contain changes to pages that have been flushed to the database. If the Edb.log log file is deleted, the deletion of the log file may cause a partially-committed transaction to occur. A partially-committed transaction may cause a minor problem, such as a partially-moved message. Or, a partially-committed transaction may cause a major internal inconsistency in one or more B-trees.

Write-Back Caching

If you run an unprotected write-back cache against the log drive, logical corruption may occur. An unprotected write-back cache is a cache where data is lost if the electricity is interrupted. When a write operation to the log disk is complete, Extensible Storage Engine 98 (ESE 98) commits the transaction and makes the data durable on the disk. Any durable updates persist even if the computer stops responding immediately after the transaction is committed because the system's restart procedure completes any unfinished operations that the transaction requires. After the data is written to the log disk, ESE 98 is free to flush the corresponding page to the database disk. If the data was never written to the log disk, the same problem occurs and the recovery process cannot restore the database to the state that it was in before the database was corrupted. Therefore, the database remains corrupted.

The problems that may occur when you use write-back caching do not prevent you from using write-back caching. However, because the potential for problems does exist, you must make sure that the data in the cache is correctly protected with a battery backup, with error checking, with error correction, and with sound operational procedures.

Repairing by using Eseutil /p

Logical corruption may also occur when you run the eseutil /p command. Jet-level database repairs cause some data loss and inefficient space usage in the database because the repairs fix the database by removing the data that is corrupted. When there is data loss and inefficient space usage in the database, you must run an offline defragmentation (eseutil /d) to get rid of wasted space and inefficient space usage. After you run the offline defragmentation, you must run an integrity check (Isinteg -fix) to resolve any Store-level logical corruption that the repair process might have introduced. Additionally, any typical transaction operations that are performed against the database could fail if the space tree prevented the insertion of new records into a table. Therefore, you should run a defragmentation before you run an integrity check.

MORE INFORMATION

Microsoft Exchange Server 2003 and Microsoft Exchange 2000 Server have safeguards that help prevent logical corruption. These safeguards are not included in Exchange Server 5.5 or earlier versions. Extensible Storage Engine 98 (ESE 98) includes additional functionalities that prevent the wrong log or the wrong log sequence from being written into a database.

Under Exchange 2000 and later versions, each log record has a before and after timestamp that is named DBTIME. When a transaction is scheduled to be written into a page in the database, if the DBTIME timestamp value in the log record is higher than the DBTIME timestamp value on the page, ESE 98 logs the following error in the application event log:

JET_errDbTimeTooOld (-566)

If the DBTIME timestamp value in the log record is lower than the DBTIME timestamp value on the page, ESE 98 logs the following error in the application event log:

JET_errDbTimeTooNew (-567)

In both cases, the recovery process is not successful, and the failure of the recovery process prevents logical corruption in the database.

When an administrator restores the database from online backup and incorrectly runs soft recovery by using the eseutil /r command, the patch files are ignored, and only the log files are replayed. This may cause logical corruption in the database. ESE 98 makes it impossible to run soft recovery on a database that requires hard recovery. If you try to run soft recovery on a database that requires hard recovery, ESE 98 logs the following error in the application event log:

JET_errSoftRecoveryOnBackupDatabase (-544)

When an inconsistent database is initialized, the initialization process verifies that the log range that is specified in the database header is present in the log's directory and that the database and the log signatures are correct. The recovery process stops and the following error is logged if the log range that is specified in the Log Required value is not present:

JET_errRequiredLogFilesMissing (-543)

This safeguard prevents logical corruption when there is a missing log file. When the recovery process starts, ESE 98 detects if a consistent database is trying to write a log transaction that is not the next transaction in the log sequence. If there is a gap between the last transaction that was written into the database and the log transaction that is currently writing into the database, ESE 98 logs the following error in the application event log:

JET_errAttachedDatabaseMismatch (-1216)