Korean Documents Detected as Japanese (248306)



The information in this article applies to:

  • Microsoft Site Server 3.0
  • Microsoft Index Server 2.0

This article was previously published under Q248306

SYMPTOMS

Microsoft Site Server 3.0 Search incorrectly detects Korean documents as Japanese.

CAUSE

Microsoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article.

WORKAROUND

If it is possible to pre-process the documents, converting them to HTML, and then you can add the language and charset tags. Otherwise, the Site Server Search crawl (also known as Gatherer) server must be dedicated to crawling Korean documents to allow proper language handling of Korean language text documents. Text documents cannot be tagged. Therefore, using document tagging to identify the language of the document is not an option in this case.

The following configuration is required on Site Server Service Pack 2 or later:

Regional Settings

Set the region to Korean and select the Set as system default locale option. This installs the Korean character set and makes iso-8959-5 the default character set. Restart the computer to activate the system locale change.

Input Locales

Korean and Japanese need to both be listed. Korean should be the default input locale. The Japanese character set is needed to recognize some of the characters.

Internet Explorer Language Settings

In Internet Explorer, click Internet Options, click Languages, and then click the General tab. Make sure Korean is listed, because Site Server Search uses a part of Internet Explorer (WinInet) to crawl the documents.

With the above settings, all Korean and most Japanese text documents are recognized as Korean. English text documents, however, are correctly recognized as English.

Modification Type:MajorLast Reviewed:6/12/2001
Keywords:kbprb KB248306