Korean Documents Detected as Japanese (248306)
The information in this article applies to:
- Microsoft Site Server 3.0
- Microsoft Index Server 2.0
This article was previously published under Q248306 SYMPTOMS
Microsoft Site Server 3.0 Search incorrectly detects Korean documents as Japanese.
CAUSEMicrosoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article.
WORKAROUND
If it is possible to pre-process the documents, converting them to HTML, and then you can add the language
and charset tags. Otherwise, the Site Server Search crawl (also known as Gatherer) server must be dedicated to
crawling Korean documents to allow proper language handling of Korean language text documents.
Text documents cannot be tagged. Therefore, using document tagging to identify the language of the document
is not an option in this case.
The following configuration is required on Site Server Service Pack 2 or later:
Regional Settings
Set the region to Korean and select the Set as system default locale option. This installs the Korean character
set and makes iso-8959-5 the default character set. Restart the computer to activate the system locale change.
Input Locales
Korean and Japanese need to both be listed. Korean should be the default input locale.
The Japanese character set is needed to recognize some of the characters.
Internet Explorer Language Settings
In Internet Explorer, click Internet Options, click Languages, and then click the General tab.
Make sure Korean is listed, because Site Server Search uses a part of Internet Explorer (WinInet) to crawl the documents.
With the above settings, all Korean and most Japanese text documents are recognized as Korean.
English text documents, however, are correctly recognized as English.
Modification Type: | Major | Last Reviewed: | 6/12/2001 |
---|
Keywords: | kbprb KB248306 |
---|
|