Index Index for
Section 5
Index Alphabetical
listing for G
Bottom of page Bottom of
page

GB18030(5)

NAME

GB18030, gb18030 - A Chinese character set that extends GBK by means of 4-byte code points

DESCRIPTION

The GB18030-2000 character set, defined by the Chinese national standard organization, is an extension of the GBK character set, which itself is an extension to the GB2312-80 character set. (See the GBK(5) reference page.) GB18030 incorporates GBK support for all the Hanzi characters specified by the Unicode Version 3.0 and ISO/IEC 10646-2001 standards. GB18030 Code Space and Code Points The GB18030 character set has 1-byte, 2-byte, and 4-byte encoding with the following structure: ________________________________________________________________ Number of Bytes Code Space Total Code Points ________________________________________________________________ 1-byte 0x00 to 0x7F 128 2-byte 0x81 to 0xFE 23940 0x40 to 0xFE (except 0x7F) 4-byte 0x81 to 0xFE 1587600 0x30 to 0x39 0x81 to 0xFE 0x30 to 0x39 ________________________________________________________________ The GB18030 1-byte code provides support for ASCII. The 2-byte code provides support for all the CJK characters (Chinese, Japanese, and Korean) defined in the Unicode 2.1 standard. The 4-byte code provides support for the Unicode Version 3.0 additions to Version 2.1. The 4-byte code also leaves a large number of unassigned codepoints that are available for future use. The GB18030 character set maps the invalid Unicode codepoints U+FFFE and U+FFFF to 4-byte codes. Because these two characters are invalid in UCS, this mapping can cause problems with round-trip character conversions. The GB18030 character set does no mapping from 4-byte code to the UCS surrogate area (U+D800 through U+DFFF). Codeset Converters for GB18030 The following codeset converter pairs are available for converting Simplified Chinese characters between GB18030 and UCS formats. Refer to Unicode(5) for more information about the UTF-16, UCS-4, and UTF-8 encoding formats. Refer to iconv_intro(5) for an introduction to codeset conversion. · UTF-16_GB18030, GB18030_UTF-16 Converting from and to UTF-16 format · UCS-4_GB18030, GB18030_UCS-4 Converting from and to UCS-4 format · UTF-8_GB18030, GB18030_UTF-8 Converting from and to UTF-8 format Fonts for GB18030 The operating system provides the following Simplified Chinese TrueType fonts for GB18030: FangSong -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-iso10646-1 HeiTi -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso10646-1 KaiTi -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso10646-1 SongTi -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso10646-1 These fonts can be used for printing with Chinese text printers. The operating system uses Unicode fonts and the SongTi font style as the default screen font for the GB18030 codeset. See wwpsof(8) for information on the PostScript print filter and TrueType fonts.

SEE ALSO

Commands: locale(1) Others: ascii(5), big5(5), Chinese(5), dechanyu(5), dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5), telecode(5)

Index Index for
Section 5
Index Alphabetical
listing for G
Top of page Top of
page