You obtain incorrect results when you run a full-text search query that uses a thesaurus file in SQL Server 2005 (923317)



The information in this article applies to:

  • Microsoft SQL Server 2005 Standard Edition
  • Microsoft SQL Server 2005 Workgroup
  • Microsoft SQL Server 2005 Developer Edition
  • Microsoft SQL Server 2005 Enterprise Edition

SYMPTOMS

Consider the following scenario:
  • You define a term in a thesaurus file in Microsoft SQL Server 2005.
  • The term contains a special character. For example, the term contains one of the following characters:
    • $ (dollar sign)
    • & (ampersand)
    • # (number sign)
    • ' (single quotation mark)
    • - (hyphen)
  • You run a full-text search query that uses the thesaurus file.
In this scenario, you obtain incorrect results.

Additionally, if you use a hyphen in the thesaurus file, you will find that an error message is logged in the event log. For example, the thesaurus file contains the follow tag:
        <expansion>

            <sub>pa</sub>

            <sub>posterior anterior</sub>

            <sub>posterior-anterior</sub>

        </expansion>
You receive the following event log when you run a full-text search query that uses this thesaurus file:
Event Type: Error
Event Source: MSFTESQL
Event Category: MSFTESQL Service
Event ID: 4154
Date: Date
Time: Time
User: N/A
Computer: ComputerName
Description:
The thesaurus file C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml for lcid 1033 has invalid format. Please edit the file and correct it. Use retail tracing for detailed error description. See user documentation on fulltext retail tracing.Component: MicrosoftIndexer
Note The special characters that are listed here are only examples. Other special characters may also cause the problem.

WORKAROUND

To work around this problem, avoid using special characters in thesaurus file terms.

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.

MORE INFORMATION

When the word breaker encounters a word-breaking character in a term, the word breaker parses the character as a white space character. For example, if the word breaker encounters the term "look&out," the word breaker parses the term as the two words "look" and "out."

Note Word-breaking characters include the following:
  • $ (dollar sign)
  • , (comma)
  • & (ampersand)
  • # (number sign)
When the word breaker encounters a single quotation mark (') in a term, the word breaker correctly parses the term. However, the full-text thesaurus component keeps only the characters that follow the single quotation mark in the term. For example, if the original term is "L'Mary," the term appears as "Mary" in the thesaurus file.

When the word breaker encounters a hyphen (-) in a term, the word breaker correctly parses the term. However, the full-text thesaurus component treats the characters that are connected by the hyphen together with the hyphen itself as empty characters. For example, if the original term is "well-known celebrity," the term appears as "celebrity" in the thesaurus file.

The problem that is described in the "Symptoms" section may also cause a duplicate entry or an empty entry for a term in the thesaurus file. The full-text thesaurus component cannot load a thesaurus file if the file contains a duplicate entry or an empty entry.

Note In SQL Server 2005, if the full-text thesaurus component finds an error in the thesaurus file, the full-text thesaurus component does not load the thesaurus file.

For example, you define the following terms in the thesaurus file:
<expansion>
	<sub>Windows</sub>
	<sub>Windows&you</sub>
<expansion>
In this scenario, the ampersand will be parsed as a white space character. By default, the word "you" is listed in the noise file and is ignored. Therefore, the word breaker parses the term "Windows&you" as "Windows." Because the thesaurus file already contains the term "Windows," the thesaurus file now contains duplicate terms. Therefore, the full-text thesaurus component does not load the thesaurus file.

Modification Type:MajorLast Reviewed:9/22/2006
Keywords:kbsql2005fts kbExpertiseAdvanced kbtshoot kbprb KB923317 kbAudDeveloper kbAudITPRO