eucTW(5)

Index for
Section 5
Alphabetical
listing for E
Bottom of
page
eucTW(5)
NAME
  eucTW - A character encoding system (codeset) for Traditional Chinese

DESCRIPTION
  The Taiwanese EUC (Extended UNIX Code), or eucTW, codeset consists of the
  following character sets:

    ·  ASCII

    ·  CNS 11643 (Plane 1 to Plane 16)

  Taiwanese EUC uses a combination of single-byte data and 2-byte data to
  represent ASCII characters, symbols, and ideographic characters. Because
  too many character planes were included, Taiwanese EUC uses different
  leading codes to designate different character planes.

  ASCII characters are represented in the form of single byte 7-bit data in
  Taiwanese EUC; that is, the most significant bit (MSB) of the byte that
  represents an ASCII character is always set off. For more information,
  refer to ascii(5).

  Although the standard Taiwanese EUC codeset includes all characters defined
  by the CNS 11643-1992 standard, the operating system's eucTW implementation
  currently supports the following:

    ·  Characters defined in the first and second planes of CNS 11643

    ·  The EDPC Recommended Character Set (refer to dechanyu(5) for more
       information)

    ·  CNS 11643-1986 and DTSCS characters that have been remapped into the
       third and fourth character planes by the CNS 11643-1992 standard

  Characters that were added to CNS 11643-1986 by the CNS 11643-1992 standard
  are not supported.

  The characters that are defined in plane 1 and plane 2 of CNS 11643-1992
  and that are the same as those defined in CNS 11643-1986 are as follows:

  ___________________________________________________________________
  Character Plane   Character Type		 Number of Characters
  ___________________________________________________________________
  1		    Special characters		 651
		    Control characters		 33
		    Frequently-used characters	 5401
  2						 7650

		    Less frequently-used
		    characters
  ___________________________________________________________________

  The characters defined in plane 3 and plane 4 of CNS 11643-1992 are as
  follows:

  _________________________________________________________________________
  Character Plane   Character Type

							     Number of
							     Characters
  _________________________________________________________________________
  3		    Rarely-used characters (EDPC Part I)     6148
  4							     7298

		    Used for residency system, ISO 2nd
		    edition DIS 10646 Han characters, 171
		    EDPC Part II Characters
  _________________________________________________________________________

  The characters that have been remapped into the third and fourth character
  planes of CNS 11643-1992 as specified by the EDPC are as follows:

  ________________________________________________________
  EDPC Characters   Character Plane   Number of Characters
  ________________________________________________________
  Part I	    Plane 3	      6148
  Part II	    Plane 4	      171
  ________________________________________________________

  Taiwanese EUC Encoding

  Except for characters in the first plane of CNS 11643-1986, Taiwanese EUC
  makes use of a leading code (the 8-bit Single-Shift 2 control character
  (SS2) and an additional byte) to designate characters to a character plane.

  The position of a character on a plane is specified by two bytes. The first
  byte determines the character's row number and the second byte determines
  the character's column number. The MSB of both bytes is set on.

  The following table shows the encoding of Taiwanese EUC characters:

  ______________________________________________________
  CNS 11643-1986 Code Plane   Leading Code   Code Range
  ______________________________________________________
  1			      [nil]	     A1A1 - FEFE
  2			      SS2 A2	     A1A1 - FEFE
  3			      SS2 A3	     A1A1 - FEFE
  4			      SS2 A4	     A1A1 - FEFE
  5			      SS2 A5	     A1A1 - FEFE
  6			      SS2 A6	     A1A1 - FEFE
  7			      SS2 A7	     A1A1 - FEFE
  8			      SS2 A8	     A1A1 - FEFE
  9			      SS2 A9	     A1A1 - FEFE
  10			      SS2 AA	     A1A1 - FEFE
  11			      SS2 AB	     A1A1 - FEFE
  12			      SS2 AC	     A1A1 - FEFE
  13			      SS2 AD	     A1A1 - FEFE
  14			      SS2 AE	     A1A1 - FEFE
  15			      SS2 AF	     A1A1 - FEFE
  16			      SS2 B0	     A1A1 - FEFE
  ______________________________________________________

  Codeset Conversion

  The following codeset converter pairs are available for converting
  Traditional Chinese characters between eucTW and other encoding formats.
  Refer to iconv_intro(5) for an introduction to codeset conversion. For more
  information about the other codeset for which eucTW is the input or output,
  see the reference page specified in the list item.

    ·  big5_eucTW, eucTW_big5

       Converting from and to the Big-5 codeset: big5(5).

       Note that Big-5 encoding is equivalent to the Microsoft code-page
       format used on PCs for Traditional Chinese. You can therefore use this
       set of converters to convert Traditional Chinese text between the
       eucTW and PC code-page formats. For information about how the
       operating system supports PC code pages, see code_page(5).

    ·  dechanyu_eucTW, eucTW_dechanyu

       Converting from and to the DEC Hanyu codeset: dechanyu(5).

    ·  dechanzi_eucTW, eucTW_dechanzi

       Converting from and to the DEC Hanzi codeset: dechanzi(5).

    ·  sbig5_eucTW, eucTW_sbig5

       Converting from and to the Shift Big-5 codeset: sbig5(5).

    ·  telecode_eucTW, eucTW_telecode

       Converting from and to the Telecode codeset: telecode(5).

    ·  UTF-16_eucTW, eucTW_UTF-16

       Converting from and to UTF-16 format: Unicode(5).

    ·  UCS-4_eucTW, eucTW_UCS-4

       Converting from and to UCS-4 format: Unicode(5).

    ·  UTF-8_eucTW, eucTW_UTF-8

       Converting from and to UTF--8 format: Unicode(5).

  Fonts for Taiwanese EUC

  For both display devices and printers, the operating system supports
  Taiwanese EUC through internal conversion to DEC Hanyu code and use of DEC
  Hanyu fonts (see dechanyu(5)).

  For general information on printing non-English text, refer to
  i18n_printing(5).

SEE ALSO
  Commands: locale(1)

  Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5), GBK(5),
  iconv_intro(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5),
  telecode(5), Unicode(5)
Index for
Section 5
Alphabetical
listing for E
Top of
page