Description of crawl behavior and of crawl types in SharePoint Portal Server 2003 (840167)



The information in this article applies to:

  • Microsoft Office SharePoint Portal Server 2003

INTRODUCTION

This article lists the different crawl types that are available in Microsoft Office SharePoint Portal Server 2003. This article also identifies the types of crawl and the conditions where content is removed from a content index.

MORE INFORMATION

The following table lists conditions where content may be removed from a content index and contains information about whether content is removed from the content index for each crawl type.
ConditionFull crawlIncremental crawlIncremental-inclusive crawlAdaptive crawl
An HTTP 300 error message is returned.Content is not removed.Content is not removed.Content is not removed.Content is not removed.
An HTTP 400 error message is returned.Content is immediately removed.Content is immediately removed.Content is immediately removed.Content is immediately removed.
An HTTP 500 error message is returned.Content is not removed.Content is not removed.Content is not removed.Content is not removed.
A Web page on the portal site page is deleted.Content is removed after the third crawl.Content is not removed.Content is immediately removed.Content is not removed.
A Web page on a Microsoft Windows SharePoint Services Web site is deleted.Content is removed after the third crawl.Content is not removed.Content is immediately removed.Content is not removed.
A rule is created to exclude content.Content is immediately removed.Content is immediately removed.Content is immediately removed.Content is immediately removed.
A content source is deleted.Content is immediately removed.Content is immediately removed.Content is immediately removed.Content is immediately removed.
A URL in the content index has no hits.Content is removed after the third crawl.Content is not removed.Content is not removed.Content is not removed.
For conditions where content is removed after the third crawl, three full updates of the content index must occur before any previously crawled pages are removed from the content index. The reason this logic exists for full crawls is that, generally, it is impossible to know exactly why a particular URL was "unvisited" during a full crawl because the crawler in SharePoint Portal Server 2003 does not keep track of all links between URLs. Therefore, this is a precautionary measure to prevent the unintended removal of content from the content index.

For conditions where content is immediately removed, the actual time that it takes for content to be removed may vary. The actual time depends on the time it takes for the crawl operation to complete and the time it takes for the index to propagate from the index management server to the search server.

Customers may see documents being removed with fewer than three full crawls. There may have been another crawl between the time that the document was deleted and the time that you started a full crawl, or between full crawls. However, the constant that keeps track of how many crawls the document is kept before it is deleted is set to three.

REFERENCES

For more information about how to manage search settings in SharePoint Portal Server 2003, see the "Managing Search Settings" topic in the "Administration" chapter of the Microsoft Office SharePoint Portal Server 2003 Administrator's Guide. The Microsoft Office SharePoint Portal Server 2003 Administrator's Guide (Administrator's Help.chm) is located in the Docs folder in the root of the SharePoint Portal Server 2003 CD.

For more information about SharePoint Portal Server 2003, visit the following Microsoft Web site:

Modification Type:MinorLast Reviewed:9/2/2004
Keywords:kbinfo KB840167 kbAudITPRO