Large Text Files Are Not Fully Indexed (318747)



The information in this article applies to:

  • Microsoft SharePoint Portal Server 2001

This article was previously published under Q318747
IMPORTANT: This article contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows Registry

SYMPTOMS

If you crawl documents on a computer that is running SharePoint Portal Server, large text files may not be fully indexed.

The Microsoft Search service may log an error message in the Microsoft Windows Event Viewer Application event log that is similar to:
Event Type: Warning
Event Source: Microsoft Search
Event Category: Gatherer
Event ID: 3035
Date: 1/1/2002
Time: 12:00:00 PM
User: N/A
Computer: COMPUTERNAME
Description:
One or more warnings or errors were logged to file <C:\Program Files\SharePoint Portal Server\Data\FTData\SharePointPortalServer\GatherLogs\WORKSPACE\WORKSPACE.1.gthr>. If you are interested in these messages, please, look at the file using the gatherer log query object (gthrlog.vbs, log viewer web page).

Context: SharePointPortalServer Application, WORKSPACE Catalog
The Content Source log may also contain error messages that are similar to:
Time: 1/1/2002 12:00:00 PM
Type: Document Added
Message: Error fetching URL, (8004173e - The document was too large to filter in its entirety. Portions of the document were not emitted.)
URL: file://./backofficestorage/localhost/sharepoint portal server/workspaces/HOME/Do...

Time: 1/1/2002 12:00:00 PM
Type: Document Added
Message: Error fetching URL, (8004173e - The document was too large to filter in its entirety. Portions of the document were not emitted.)
URL: \\.\backofficestorage\localhost\sharepoint portal server\workspaces\HOME\documen...
NOTE: To view the Content Source log for a workspace, browse to the following URL on your SharePoint Portal Server computer (where computer_name is the name of your SharePoint Portal Server computer, and workspace is the name of your workspace):

http://computer_name/workspace/portal/resources/updatelog.asp?Workspace=workspace

CAUSE

This issue can occur if some text files are too large for the server to index by using the default SharePoint Portal Server settings, which are configured for performance reasons.

RESOLUTION

Indexing Large Text Files

If you are indexing large text files (.txt), to resolve this issue, change the MaxTextFilterBytes registry value.

WARNING: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

To change the MaxTextFilterBytes registry value:
  1. Start Registry Editor (Regedt32.exe).
  2. Locate the following key in the registry:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex

  3. Double-click the MaxTextFilterBytes value, change the value to Decimal, and then type the new value. The value is the maximum size (in bytes) for files that the text filter indexes. (The default value is 25,000,000 bytes, or approximately 25 megabytes.)
See the "More Information" section of this article for a description of the MaxTextFilterBytes value.

Indexing Other Types of Large Documents

You can fully index most other document types by changing the MaxDownloadSize and MaxGrowFactor registry values:
  1. Start Registry Editor (Regedt32.exe).
  2. Locate the following key in the registry:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\Gathering Manager

  3. Double-click the MaxDownloadSize value, change the value to Decimal, and then type the new value. The value is the maximum size (in megabytes) for files that the gatherer downloads.
  4. Double-click the MaxGrowFactor value, change the value to Decimal, and then type the new value. The value is the size of the output for the index filter.
  5. Quit Registry Editor.
See the "More Information" section of this article for a description of the MaxDownloadSize and MaxGrowFactor values.

NOTE: After you make these changes, restart the Microsoft Search service. If you want your documents to be re-indexed immediately, do a full update on the content source that contains the large files.

MORE INFORMATION

For additional information, click the article number below to view the article in the Microsoft Knowledge Base:

287231 Search Only Indexes 16 Megabytes of a Document

The following registry keys and values are used in this article:
  • Indexing Service. The Content Indexing service registry values are located in the following registry path:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex

    The Content Indexing service value that is used in this article is:
    • MaxTextFilterBytes. The MaxTextFilterBytes value specifies the maximum amount of information that the text filter can process from a single file with a well-known extension.

      Type: REG_DWORD
      Units: Bytes
      Default: 25000000 (approximately 25 MB)
      Range: 1-4294967295 (0xFFFFFFFF)
  • Gatherer Service. The Gatherer service registry values are located in the following registry path:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\Gathering Manager

    The Gatherer service values that are used in this article are:
    • MaxDownloadSize. The MaxDownloadSize value specifies the maximum size of the document text that is filtered.

      Type: REG_DWORD
      Units: Megabytes
      Default: 16 (16 MB)
      Range: 1-4294967295 (0xFFFFFFFF)
    • MaxGrowFactor. The MaxGrowFactor value specifies how large (as a factor of the MaxDownloadSize value) the output of the Index Filter on the document can be.

      Type: REG_DWORD
      Units: Megabytes
      Default: 4 (4 MB)
      Range: 1-4294967295 (0xFFFFFFFF)

Modification Type:MajorLast Reviewed:1/3/2003
Keywords:kbprb KB318747