FIX: Full-Text Searches for Data That is Enclosed in Parenthesis() Inside HTML Tags is Not Being Returned (321332)



The information in this article applies to:

  • Microsoft SQL Server 2000 (all editions)

This article was previously published under Q321332

SYMPTOMS

When you run SQL Server full-text searches, the search may not return the correct results if all of the following conditions are true:

  • The rows contain an HTML document that is stored in a text column.
  • The data that you are searching for is enclosed in parenthesis "()".
  • There is no blank space between the parenthesis and the HTML tag.
  • The full-text search is using an English word breaker.
The following example illustrates the problem:
CREATE TABLE [tbl_fts_test] (
	[row_num]  int  NOT NULL constraint PK_fts_test primary key ,
	[field_value] [varchar] (8000) NULL 
) 
go

---insert data
insert into tbl_fts_test values (1,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (2,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060) <br>Author: Microsoft<br>')
insert into tbl_fts_test values (3,'<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (4,'Title: Microsoft Security Bulletin (MS01-060) Author: Microsoft')
				
If you run the following query after the full-text index is created and populated, the query only returns rows with row_num 2 and 4:
select * from tbl_fts_test where contains(field_value, 'MS01-060')

---Row_Num=2 is returned because there is a blank space between the parenthesis and the HTML (<br>) tag.

---Row_Num=4 is returned because it does not contain any HTML tags. 
				

CAUSE

The English word breaker does not perform HTML filtering. Instead, it looks at the text and applies the rules of the English language. The English word breaker is designed to tokenize English text (not necessarily HTML).

RESOLUTION

To resolve this problem, obtain the latest service pack for Microsoft SQL Server 2000. For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

290211 INF: How To Obtain the Latest SQL Server 2000 Service Pack

NOTE: The following hotfix was created before the release of Microsoft SQL Server 2000 Service Pack 3.

WORKAROUND

To work around this behavior, you can either:
  • Use a Neutral word breaker for HTML data that is stored in text columns. To avoid the problem, you can modify the code in the More Information section as follows:
    
    --add column to indexing
    exec sp_fulltext_column  @tabname =  'dbo.tbl_fts_test' , 
         @colname =  'field_value' , 
         @action =  'add' , 
         @language = 0     ---------This is the additional parameter that you must add to force the Neutral word breaker.
    go
    					
    -or-

  • Store the HTML as a Binary Large Object (BLOB), and then use the HTML IFilter. For more information see the "Full-Text Catalogs and Indexes", "Using Full-text Predicates to Query image Columns", and "Filtering Supported File Types" topics in SQL Server Books Online.
NOTE: When you use the Neutral word breaker, linguistic based searches may not work. Linguistic analysis involves finding word boundaries (word-breaking) and conjugating verbs (stemming).

For example, any full-text query that uses the FORMSOF(INFLECTIONAL) may not work with the Neutral word breaker. For more information, see the "Column-Level Linguistic Analysis" topic in SQL Server Books Online.

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article.
This problem was first corrected in Microsoft SQL Server 2000 Service Pack 3.

MORE INFORMATION

The following code segment illustrates the problem.
use master
go
---Create database
drop database testfts
go
create database testfts
go


use testfts
go
---Create table
if object_id ('tbl_fts_test') is not null
	drop table tbl_fts_test
go
CREATE TABLE [tbl_fts_test] (
	[row_num]  int  NOT NULL constraint PK_fts_test primary key,
	[field_value] [varchar] (8000) NULL 
) 
go

---Insert data
insert into tbl_fts_test values (1,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (2,'<br>Type: Web Page<br>Title: Microsoft Security Bulletin (MS01-060) <br>Author: Microsoft<br>')
insert into tbl_fts_test values (3,'<br>Title: Microsoft Security Bulletin (MS01-060)<br>Author: Microsoft<br>')
insert into tbl_fts_test values (4,'Title: Microsoft Security Bulletin (MS01-060) Author: Microsoft')
go

---Enable database for full-text
exec sp_fulltext_database 'enable'
go

---Create catalog
exec sp_fulltext_catalog  
	@ftcat =  'fts_test'
	, @action =  'create' 
go

---Enable fulltext for the table
exec sp_fulltext_table  @tabname =  'dbo.tbl_fts_test' 
    ,  @action =  'create' 
     ,  @ftcat =  'fts_test' 
    ,  @keyname =  'PK_fts_test' 
go

---Add column to indexing
exec sp_fulltext_column  @tabname =  'dbo.tbl_fts_test' , 
     @colname =  'field_value' , 
     @action =  'add' 
go

---Start full population
exec sp_fulltext_catalog  @ftcat =  'fts_test' , 
     @action =  'start_full' 

---Check status
while (fulltextcatalogproperty ('fts_test', 'populatestatus')) <>0
begin
	waitfor delay '0:0:2'
end
--Make sure status column is 0, which means population is completed.
exec sp_help_fulltext_catalogs 'fts_test'
go

---Query that shows that data does exist, returns 4 rows
select * from tbl_fts_test where field_value like '%MS01-060%'
order by field_value
go

---Run query, which returns 2 rows when it ought to return 4 rows.
select * from tbl_fts_test where contains(field_value, '*MS01-060*')
select * from tbl_fts_test where contains(field_value, '"*MS01-060*"')
select * from tbl_fts_test where contains(field_value, '"MS01-060"')
select * from tbl_fts_test where contains(field_value, 'MS01-060')
select * from tbl_fts_test where contains(field_value, '(MS01-060)')
go
				

Modification Type:MajorLast Reviewed:11/5/2003
Keywords:kbfix kbSQLServ2000sp3fix kbprb KB321332