How to customize SharePoint Portal Server 2003 by using IFilters, noise word files, and thesaurus files (837847)



The information in this article applies to:

  • Microsoft Office SharePoint Portal Server 2003

Important This article contains information about how to modify the registry. Make sure to back up the registry before you modify it. Make sure that you know how to restore the registry if a problem occurs. For more information about how to back up, restore, and modify the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows registry

SUMMARY

This article describes how to use IFilters such as the TIFF filter, noise word files, thesaurus files, and the Robots.txt file to customize SharePoint Portal Server 2003. This article describes how to enable optical character recognition for Tagged Image File Format (TIFF) files, how to change the TIFF file size limit, how to enable automatic file rotation, and how to log TIFF error messages to the application event log. This article also contains information about how to change noise word files and how to change thesaurus files that are included in SharePoint Portal Server 2003.

INTRODUCTION

This article describes how to use the Tagged Image File Format (TIFF) IFilter, noise word files, thesaurus files, and the Robots.txt file to customize Microsoft Office SharePoint Portal Server 2003.

back to the top

Overview of IFilters

To crawl documents that have proprietary file extensions, you have to register the IFilter for that file type in SharePoint Portal Server 2003. When you configure a content source, you can specify the file types that you want to include in the content index. For example, you might want to include files that have an .xyz extension and a .yyy extension in the content index. The inclusion of a file type applies only to content that is stored outside the portal site and that is included in the content index by using content sources. The inclusion of a file type does not apply to content that is stored in the portal site.

If a file type has an IFilter that is associated with that file type, you have to register the IFilter for a particular file type on the SharePoint Portal Server 2003 computer that crawls that file type. After you register the IFilter, SharePoint Portal Server 2003 can crawl documents that use that file type and include those documents in the content index. If you add a file type, and you do not register the IFilter for that file type, SharePoint Portal Server 2003 only includes the file properties in the content index.

The steps that you follow to register an IFilter vary according to the IFilter that you want to register. For more information about how to register an IFilter, see the documentation that is included with the IFilter that you want to register. SharePoint Portal Server 2003 includes filters for the following items:
  • Microsoft Office documents, including Microsoft Publisher documents and Microsoft Visio documents
  • HTML files
  • TIFF files
  • Text files
SharePoint Portal Server 2003 also accepts third-party IFilters for custom file types.

back to the top

The TIFF IFilter

When you install SharePoint Portal Server 2003, the Setup program automatically installs an IFilter for TIFF files. The TIFF filter handles both the .tif extension and the .tiff extension. The following sections explain how to do the following tasks:
  • Enable optical character recognition (OCR) for TIFF files
  • Change the TIFF file size limit
  • Enable automatic file rotation
  • Log TIFF error messages to the application event log
Note After you edit registry entries that are associated with TIFF files, you have to restart the Microsoft Search service.

back to the top

How to enable optical character recognition in TIFF Files

When SharePoint Portal Server 2003 crawls TIFF files, SharePoint Portal Server 2003 only looks at the file properties. If you enable optical character recognition, SharePoint Portal Server scans the TIFF file and tries to recognize characters in the document so that additional information can be included in the index.

To enable optical character recognition in TIFF files, use one of the following methods.Method 1: Manually edit the registryAdd the PerformOCR registry entry to the following registry subkey, and then set the PerformOCR registry entry to a value of 1:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper

To enable optical character recognition in TIFF files, follow these steps.

Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
  1. Click Start, and then click Run.
  2. In the Open box, type regedit, and then click OK.
  3. Locate and then click the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper

  4. On the Edit menu, point to New, and then click DWORD Value.
  5. Type PerformOCR, and then press ENTER.
  6. On the Edit menu, click Modify.
  7. To enable optical character recognition, type 1 in the Value data box, and then click OK.

    Note To disable optical character recognition, set the PerformOCR registry entry to 0 (zero).
  8. Quit Registry Editor.
  9. Restart the Microsoft Search service. To do this, follow these steps:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft Search, and then click Restart.
Method 2: Use the Tiff_ocr_on.reg fileUse the Tiff_ocr-on.reg file to add the PerformOCR registry entry to the following registry subkey:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper



Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
  1. Locate the Support\Tools folder on the SharePoint Portal Server 2003 CD, and then double-click the Tiff_ocr_on.reg file.
  2. Restart the Microsoft Search service. To do this, follow these steps:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft Search, and then click Restart.
back to the top

How to change the TIFF file size limit

By default, when optical character recognition is enabled, SharePoint Portal Server 2003 does not include any single-page TIFF files that are larger than 1 megabyte (MB) in the content index. To change the size limit for TIFF files, change the MaxImageSize registry entry in the following registry subkey:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper



Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
  1. Click Start, and then click Run.
  2. In the Open box, type regedit, and then click OK.
  3. Locate and then click the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper

  4. Right-click MaxImageSize, and then click Modify.
  5. Type 100,000 in the Value data box, and then click OK.

    Note A value of 100,000 is equal to a 1-MB file size limit.
  6. Quit Registry Editor.
  7. Restart the Microsoft Search service. To do this, follow these steps:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft Search, and then click Restart.
back to the top

How to enable automatic file rotation

If you enable optical character recognition, and if some TIFF files are oriented upside down or sideways, you can enable automatic file rotation to increase scanning accuracy.

If you enable optical character recognition, you can also enable automatic file rotation. If you enable automatic file rotation, the filter rotates TIFF files that are oriented upside down or sideways. The filter also rotates the TIFF file in memory before the filter scans the TIFF file. Although rotating the file uses resources, the results from scanning a file that is oriented upside down or sideways may be poor. If you know that all your TIFF files are oriented upright, you do not have to enable this option.

To enable automatic file rotation, set the AutoRotation registry entry in the following registry subkey to a value of 1:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper

By default, automatic file rotation is enabled when you install SharePoint Portal Server 2003. However, if the PerformOCR registry entry is set to 0 (zero) or does not exist, the AutoRotation registry entry has no effect.

To enable automatic file rotation, follow these steps.

Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
  1. Click Start, and then click Run.
  2. In the Open box, type regedit, and then click OK.
  3. Locate and then click the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\MSPaper

  4. Right-click AutoRotation, and then click Modify.
  5. Type 1 in the Value data box, and then click OK.

    Note To disable automatic file rotation, set the AutoRotation registry entry to 0 (zero).
  6. Quit Registry Editor.
  7. Restart the Microsoft Search service. To do this, follow these steps:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft Search, and then click Restart.
back to the top

How to log TIFF error messages to the application event log

By default, SharePoint Portal Server 2003 logs error messages that are associated with TIFF files in the gatherer log. If you want SharePoint Portal Server 2003 to log error messages that are associated with TIFF files in the application event log, set the LoggingLevel registry entry in the following registry subkey to the value that you want:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application\Microsoft Office Document Imaging

You can set the LoggingLevel registry entry to one of the following values:
  • To disable logging, set the LoggingLevel registry entry to a value of 0 (zero). This setting is the default setting.
  • To log information messages and error messages, set the LoggingLevel registry entry to a value of 1.
  • To log warning messages and error messages, set the LoggingLevel registry entry to a value of 2.
  • To log all messages, set the LoggingLevel registry entry to a value of 3.
  • To log only error messages, set the LoggingLevel registry entry to a value of 4.


To enable logging of TIFF file messages in the application event log, follow these steps.

Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
  1. Click Start, and then click Run.
  2. In the Open box, type regedit, and then click OK.
  3. Locate and then click the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application\Microsoft Office Document Imaging

  4. Right-click LoggingLevel, and then click Modify.
  5. Type the value that you want in the Value data box, and then click OK.
  6. Quit Registry Editor.
  7. Restart the Microsoft Search service. To do this, follow these steps:
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft Search, and then click Restart.
back to the top

Noise word files

A noise word is a word that is not useful in a search. For example, the following words are noise words:
  • the
  • an
A list of noise words for a language is stored in the noise word file for that language. SharePoint Portal Server 2003 includes noise word files for the following languages:
  • Chinese-Simplified (Noisechs.txt)
  • Chinese-Traditional (Noisecht.txt)
  • Czech (Noisecsv.txt)
  • Dutch (Noisenld.txt)
  • English-International (Noiseeng.txt)
  • English-US (Noiseenu.txt)
  • Finnish (Noisefin.txt)
  • French (Noisefra.txt)
  • German (Noisedeu.txt)
  • Hungarian (Noisehun.txt)
  • Italian (Noiseita.txt)
  • Japanese (Noisejpn.txt)
  • Korean (Noisekor.txt)
  • Polish (Noiseplk.txt)
  • Portuguese (Brazil) (Noiseptb.txt)
  • Russian (Noiserus.txt)
  • Spanish (Noiseesn.txt)
  • Swedish (Noisesve.txt)
  • Thai (Noisetha.txt)
  • Turkish (Noisetrk.txt)
If a noise word list does not exist for a language, SharePoint Portal Server 2003 uses the neutral Noiseneu.txt noise word file. The word breaker for the language parses noise words.

By default, noise word files are stored in the following location on the server:

Drive:\Program Files\SharePoint Portal Server\Data\Config

If you installed SharePoint Portal Server 2003 in a location that is different from the default location, the Data folder is located in a different folder on your server.

You can change the noise word file. If you add noise words, the accuracy of your searches may decrease. However, the size of the content index also decreases. A smaller content index helps increase performance. You can delete noise words if you want searches to return those words.

If you remove words from the noise word file, the changes do not take effect until you reset the content indexes and perform a full update of the content indexes in SharePoint Portal Server 2003. If you remove words from the noise word file, the words are removed from files before the files are included in an index. You must update the content index after you modify the noise word list. Otherwise, documents that contain the removed noise words are not returned in queries.

Do not delete noise word files. If you do not want noise words removed during an update or a query, remove those specific entries from the file. If you delete the noise word file, all single characters are removed as noise words. If you remove all noise words from your noise word file, you will experience errors during crawling. Therefore, you must have at least one noise word in the file, even if the noise word is something as simple as a period character.

Noise word files are copied to the Drive\Program Files\SharePoint Portal Server\DATA\Applications\ProgramUID\Config folder. You can specify noise words at the program level instead of at the server level or at the server farm level. For example, if SharePoint Portal Server 2003 and Microsoft SQL Server are installed on the same server, you can specify one noise word list for SharePoint Portal Server and a different noise word list for Microsoft SQL Server.

back to the top

How to change the noise word file

To change the noise word file:
  1. Start Notepad, and then open the noise word file.
  2. Add or delete the words that you want.
  3. Save the noise word file, and then quit Notepad.
  4. Restart the Microsoft SharePointPS Search service.
    1. Click Start, point to Administrative Tools, and then click Services.
    2. Right-click Microsoft SharePointPS Search, and then click Restart.
  5. Perform a full update of the content index.
Note When you search the portal site, SharePoint Portal Server 2003 may discard some query terms as noise words even if the query term itself is not a noise word. This behavior occurs in situations when the query term is an inflectional form of the noise word. For example, if the noise word file contains the word be, and you search for the word am, the word am is treated as a noise word because it is a form of be.

back to the top

Thesaurus files

The thesaurus is a query-expansion search feature in SharePoint Portal Server 2003. The thesaurus permits you to type a phrase in a search query and to receive results for words that are related to the phrase that you typed. For example, you can search for the word run and receive results that contain either the words run or jog if the two terms are related in the thesaurus. Additionally, the thesaurus permits the server farm administrator to configure search rankings by assigning different weights to words. SharePoint Portal Server 2003 includes thesaurus files for the following languages:
  • Chinese-Simplified (Tschs.xml)
  • Chinese-Traditional (Tscht.xml)
  • Czech (Tscsv.xml)
  • Dutch (Tsnld.xml)
  • English-International (Tseng.xml)
  • English-US (Tsenu.xml)
  • Finnish (Tsfin.xml)
  • French (Tsfra.xml)
  • German (Tsdeu.xml)
  • Hungarian (Tshun.xml)
  • Italian (Tsita.xml)
  • Japanese (Tsjpn.xml)
  • Korean (Tskor.xml)
  • Polish (Tsplk.xml)
  • Portuguese (Brazil) (Tsptb.xml)
  • Russian (Tsrus.xml)
  • Spanish (Tsesn.xml)
  • Swedish (Tssve.xml)
  • Thai (Tstha.xml)
  • Turkish (Tstrk.xml)
The thesaurus files contain inactive sample content. The neutral Tsneu.xml thesaurus file is applied to queries that do not have a thesaurus file that is associated with the query language. The neutral thesaurus file is always applied to queries, even when there is a specific thesaurus file that is associated with the query language.

By default, SharePoint Portal Server 2003 stores thesaurus files in the following folder on the server:

Drive:\Program Files\SharePoint Portal Server\Data\Config.

If you installed SharePoint Portal Server 2003 in a location that is different from the default location, the Data folder is located in a different folder on your server.

Thesaurus files are also copied to the Drive\Program Files\SharePoint Portal Server\Data\Applications\Application UID\Config folder for each instance of the Microsoft Search service or the Microsoft SharePointPS Search service. You can modify the thesaurus at the program level instead of at the server level or at the server farm level. For example, if SharePoint Portal Server 2003 and Microsoft SQL Server are installed on the same server, you can specify one thesaurus file for SharePoint Portal Server and a different thesaurus file for Microsoft SQL Server.

You can change the thesaurus entries by changing the thesaurus file in a text editor. The thesaurus file must use well-formed XML that contains matching opening and closing tags around each entry. If the XML is malformed, SharePoint Portal Server 2003 logs an error in the application event log. When you change the thesaurus file, make sure that you do not change the case of the tags. Only the XML tags use uppercase letters. All other tags use lowercase letters. For example, the <replacement> tag must use lowercase letters.

Important There is a file named Tsschema.xml that is installed with the thesaurus files. Do not modify the Tsschema.xml file.

Thesaurus files contain two types of thesaurus entries. These types are replacement sets and expansion sets. Thesaurus files also permit you to configure the word weighting and word stemming options in a replacement set or an expansion set.

back to the top

Replacement sets

A replacement set specifies a pattern that is replaced by one or more substitutions in a search query. For example, you can add a replacement set where W2K is the pattern and where Windows 2000 is the substitution. If you query the term W2K, SharePoint Portal Server 2003 only returns search results that contain the term Windows 2000. You do not receive items in the search results that contain the term W2K.

Each replacement set is enclosed in a <replacement> tag. In the replacement tag, you specify one or more patterns by enclosing the patterns in a <pat> tag. You specify one or more substitutions by enclosing the substitutions in a <sub> tag. Patterns and substitutions can contain a word or a sequence of words. For example, to add a replacement set where W2K is the pattern and Windows 2000 is the substitution, use the following:
 <replacement>
<pat>W2K</pat>
<sub>Windows 2000</sub>
</replacement> 
You can have more than one substitution for each pattern that you specify. By default, patterns are case sensitive. For example, if your thesaurus file contains the term W2K, and a user searches for the term w2k, SharePoint Portal Server 2003 does not return search results that contain the term Windows 2000. SharePoint Portal Server 2003 does not recognize the term w2k as the term W2K because the case of the text is different.

You can specify patterns to be case sensitive or not to be case sensitive if you add a tag to the thesaurus file for your language. For example, if you specify that patterns are not case sensitive, the <pat> and <sub> terms match query terms regardless of the case of the query term.

When you query by using the CONTAINS FORMSOF syntax, the thesaurus works as described previously. For more information about the CONTAINS FORMSOF syntax, see the Microsoft SharePoint Products and Technologies 2003 Software Development Kit.

By default, a portal site uses the FREETEXT query type. FREETEXT queries automatically open the thesaurus. However, if you type your search terms in quotation marks, SharePoint Portal Server 2003 disables the FREETEXT query and does not use the thesaurus. Therefore, SharePoint Portal Server returns results that are based on the exact search term or terms that are enclosed by the quotation marks. If the thesaurus replaces one word of a phrase with another word, a FREETEXT query returns results for the new version of the whole phrase.

For the replacement set where the term Windows 2000 replaces the term W2K, the following table shows the results that occur based on different user input from the search interface on the portal site. This example assumes that the thesaurus is set as case sensitive and that the search is not case sensitive.
User inputWhether a thesaurus is usedText in documents that are returned in the search results
w2kYes. A FREETEXT query. W2k, W2K, w2k, or w2K.

No results are returned for Windows 2000 because the pattern in the thesaurus is uppercase W2K.
"w2k"NoW2K, w2k, W2k, or w2K.
W2KYes. A FREETEXT query.Windows 2000, windows 2000, w2k, W2k, w2K, or case combinations such as wInDows 2000.

No results are returned for W2K.
"W2K"NoW2K, w2k, W2k, or w2K.
W2K ServerYes. A FREETEXT query.Windows 2000, windows 2000, and case combinations such as wInDows 2000; w2k, W2k, or w2K; Server, server, and case combinations such as SeRvEr; W2K Server and case combinations of that term.

No results are returned for W2K operating system.
"W2K Server"NoW2K Server, w2k Server, W2k Server, w2K Server, W2K server, w2k server, W2k server, or w2K server.
Note In each of the previous examples in the table, the case-sensitivity setting for search is specified as false. If the case-sensitivity setting is specified as true, all the case differences are significant when pattern matching is performed. If two replacement sets that have similar patterns are being matched, the longer of the two replacement sets takes precedence. For example, if you have the following two replacement sets, the term Internet Explorer takes precedence over the term Internet:
 <replacement>
<pat>Internet</pat>
<sub>intranet</sub>
</replacement> 

 <replacement>
<pat>Internet Explorer</pat>
<sub>IE</sub>
<sub>IE 5</sub>
</replacement> 
For this replacement set, the following table shows the results that occur based on user input from the search interface on the portal site.
User inputWhether a thesaurus is usedText in documents that are returned in the search results
InternetYes. A FREETEXT query. Intranet, intranet, or case combinations such as iNtranEt. No results are returned for IE or IE 5.
Internet ExplorerYes. A FREETEXT query.

IE, IE 5, and case combinations such as iE or Ie 5. No results are returned for Internet, Internet Explorer, or intranet.
back to the top

Expansion sets

An expansion set is a group of substitutions that are synonyms of each other. Queries that contain matches in one substitution are expanded to include all other substitutions in the expansion set. For example, you can add an expansion set where the following substitutions are synonyms:
  • writer
  • author
  • journalist
If you query the term author, SharePoint Portal Server 2003 also returns search results that contain the term writer and the term journalist.

Each expansion set is enclosed in an <expansion> tag. In the expansion tag, you specify one or more substitutions that are enclosed by a <sub> tag. For the example that is described earlier, add the following lines:
 <expansion>
<sub>writer</sub>
<sub>author</sub>
<sub>journalist</sub>
</expansion> 

back to the top

Word weighting

Substitution entries support word weighting. You can use word weighting to rank certain words higher in the search results. You can rank words higher in the search results by assigning the words a higher value relative to the other words in the substitution set. You can specify a value between 0 and 1. For example, you can weight the following substitutions as follows:
 <expansion>
<sub weight="0.8">Internet Explorer</sub>
<sub weight="0.2">IE</sub>
<sub weight="0.9">IE5</sub>
</expansion> 

back to the top

Word stemming

Word stemming maps a linguistic stem to all matching words. You can specify word stemming in pattern entries and substitution entries. For example, in English, the stem buy matches the following:
  • bought
  • buying
  • buys
You can specify word stemming by adding two asterisks to the end of the string. SharePoint Portal Server 2003 then returns matches for variations of the word. For example, you might want to create queries for the term run that also return the following:
  • running
  • jog
  • jogging
To do this, modify the expansion set as follows:
 <expansion>
<sub weight="0.5">run**</sub>
<sub weight="0.5">jog**</sub>
</expansion>
If you query the term run or the term running, the search results include the term jog and the term jogging. If you query the term running, you receive the same search results that you receive when you query the term run.

For example, if your thesaurus file includes the <pat>User1 ran to the store** </pat> pattern or the <sub> User1 ran to the store**</sub> substitution, the query returns the following strings, or search adds the following strings to the query:
  • User1 runs to the store
  • User1 running to the store
  • User1 ran to the store
  • User1 runs to the stores
  • User1 running to the stores
  • User1 ran to the stores
back to the top

How to change a thesaurus file

To change the thesaurus file, follow these steps:
  1. Start Notepad, and then open the thesaurus file.

    Note If the thesaurus file contains double-byte character set (DBCS) characters, you must save the thesaurus file in Unicode format code before you change the thesaurus file.
  2. If you are changing the thesaurus file for the first time, remove the following comment lines that appear at the beginning and the end of the file:
     <!---Commented out---> 
  3. If you do not want the patterns to be case sensitive, add the following tag at the beginning of the file:

    <case caseflag="false"></case>

    If you want the patterns to be case sensitive later in the file, change the setting from "false" to "true" in the tag as follows:

    <case caseflag="true"></case>

  4. Make the changes that you want. Add, modify, or delete a replacement set or an expansion set. Add, modify, or delete the weighting or the stemming that is configured for a set.

    Note The entries that you add to the thesaurus file cannot contain only special characters or only noise words. However, you can have blank entries. For example, if you want to make sure that queries for a specific term return no results, change the entry. In the following example, queries for the term windows do not return results:
    <replacement>
    <pat>windows</pat>
    <sub></sub>
    </replacement>
  5. Save the thesaurus file, and then quit Notepad.
back to the top

How to use the Robots.txt file and HTML tags to prevent access to content on the portal site

You can use a Robots.txt file to control where robots (Web crawlers) can go on a Web site. You can also use the Robots.txt file to indicate whether to exclude specific crawlers. Web servers use these rules to control access to Web sites by preventing robots from accessing certain areas. SharePoint Portal Server 2003 looks for this file when it crawls, and it obeys the restrictions that are contained in the Robots.txt file.

You can prevent another server from crawling content on the portal site by modifying the Robots.txt file. For example, you might want to restrict a specific robot from accessing the server because the frequency of requests from the robot is blocking the Web site. You may also want to restrict all robots from certain areas on the server.

SharePoint Portal Server 2003 does not install a Robots.txt file. However, you can create a Robots.txt file and put the Robots.txt file in the home directory of the Default Web Site on the server. To determine the home directory of the Default Web Site on the server, follow these steps:
  1. Start Internet Information Services (IIS) Manager.
  2. Expand server name, and then expand Web Sites.
  3. Right-click Default Web Site, and then click Properties.
  4. Click the Home Directory tab.
  5. Make a note of the path that appears in the Local Path box, and then click Cancel.

    Put the Robots.txt file in the path that appears in the Local Path box. For example, if the path is D:\Inetpub\Wwwroot, put the Robots.txt in the D:\Inetput\Wwwroot folder on the server. To confirm that the Robots.txt file is in the correct folder on the server, start your Web browser, and then type http://server name/robots.txt.
You can restrict access to certain documents by using HTML META tags. HTML META tags tell the robot whether a document can be included in the index and whether the robot can follow the links in the document by using the INDEX/NOINDEX attribute and the FOLLOW/NOFOLLOW attributes in the tag. For example, you can mark a document with the following if you do not want the document crawled and you do not want links in the document followed:

<META name="robots" content= "NOINDEX, NOFOLLOW">

SharePoint Portal Server 2003 automatically obeys the restrictions that are contained in the Robots.txt file.

back to the top

REFERENCES

For more information about how to administer and configure SharePoint Portal Server 2003, see the Microsoft Office SharePoint Portal Server 2003 Administrator's Guide. The Microsoft Office SharePoint Portal Server 2003 Administrator's Guide (Administrator's Help.chm) is located in the Docs folder in the root of the SharePoint Portal Server 2003 CD.

For more information about SharePoint Portal Server 2003, visit the following Microsoft Web site:back to the top

Modification Type:MajorLast Reviewed:8/4/2005
Keywords:kbregistration kbHOWTOmaster KB837847 kbAudITPRO