FIX: Full-Text Searches for Data That is Enclosed in Parenthesis() Inside HTML Tags is Not Being Returned (321332)
The information in this article applies to:
- Microsoft SQL Server 2000 (all editions)
This article was previously published under Q321332 SYMPTOMS When you run SQL Server full-text searches, the search may
not return the correct results if all of the following conditions are
true: - The rows contain an HTML document that is stored in a text column.
- The data that you are searching for is enclosed in
parenthesis "()".
- There is no blank space between the parenthesis and the
HTML tag.
- The full-text search is using an English word
breaker.
The following example illustrates the problem:
CREATE TABLE [tbl_fts_test] (
[row_num] int NOT NULL constraint PK_fts_test primary key ,
[field_value] [varchar] (8000) NULL
)
go
---insert data
insert into tbl_fts_test values (1,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (2,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060) <br>Author: Microsoft<br>')
insert into tbl_fts_test values (3,'<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (4,'Title: Microsoft Security Bulletin (MS01-060) Author: Microsoft')
If you run the following query after the full-text index is created and
populated, the query only returns rows with row_num 2 and 4:
select * from tbl_fts_test where contains(field_value, 'MS01-060')
---Row_Num=2 is returned because there is a blank space between the parenthesis and the HTML (<br>) tag.
---Row_Num=4 is returned because it does not contain any HTML tags.
CAUSE The English word breaker does not perform HTML filtering.
Instead, it looks at the text and applies the rules of the English language.
The English word breaker is designed to tokenize English text (not necessarily
HTML). RESOLUTION To resolve this problem, obtain the latest
service pack for Microsoft SQL Server 2000. For additional information, click
the following article number to view the article in the Microsoft Knowledge
Base: 290211 INF: How To Obtain the Latest SQL Server 2000 Service Pack NOTE: The following hotfix was created before the release of Microsoft
SQL Server 2000 Service Pack 3. WORKAROUND To work around this behavior, you can either:
- Use a Neutral word breaker for HTML data that is stored in
text columns. To avoid the problem, you can modify the code in the More Information section as follows:
--add column to indexing
exec sp_fulltext_column @tabname = 'dbo.tbl_fts_test' ,
@colname = 'field_value' ,
@action = 'add' ,
@language = 0 ---------This is the additional parameter that you must add to force the Neutral word breaker.
go
-or-
- Store the HTML as a Binary Large Object (BLOB), and then
use the HTML IFilter. For more information see the "Full-Text Catalogs and
Indexes", "Using Full-text Predicates to Query image Columns", and "Filtering
Supported File Types" topics in SQL Server Books Online.
NOTE: When you use the Neutral word breaker, linguistic based searches
may not work. Linguistic analysis involves finding word boundaries
(word-breaking) and conjugating verbs (stemming). For example, any
full-text query that uses the FORMSOF(INFLECTIONAL) may not work with the
Neutral word breaker. For more information, see the "Column-Level Linguistic
Analysis" topic in SQL Server Books Online.
STATUSMicrosoft
has confirmed that this is a problem in the Microsoft products that are listed
at the beginning of this article.
This problem was first corrected in Microsoft SQL Server
2000 Service Pack 3.
Modification Type: | Major | Last Reviewed: | 11/5/2003 |
---|
Keywords: | kbfix kbSQLServ2000sp3fix kbprb KB321332 |
---|
|