 |
Index for Section 5 |
|
 |
Alphabetical listing for G |
|
 |
Bottom of page |
|
GB18030(5)
NAME
GB18030, gb18030 - A Chinese character set that extends GBK by means of
4-byte code points
DESCRIPTION
The GB18030-2000 character set, defined by the Chinese national standard
organization, is an extension of the GBK character set, which itself is an
extension to the GB2312-80 character set. (See the GBK(5) reference page.)
GB18030 incorporates GBK support for all the Hanzi characters specified by
the Unicode Version 3.0 and ISO/IEC 10646-2001 standards.
GB18030 Code Space and Code Points
The GB18030 character set has 1-byte, 2-byte, and 4-byte encoding with the
following structure:
________________________________________________________________
Number of Bytes Code Space Total Code Points
________________________________________________________________
1-byte 0x00 to 0x7F 128
2-byte 0x81 to 0xFE 23940
0x40 to 0xFE (except 0x7F)
4-byte 0x81 to 0xFE 1587600
0x30 to 0x39
0x81 to 0xFE
0x30 to 0x39
________________________________________________________________
The GB18030 1-byte code provides support for ASCII. The 2-byte code
provides support for all the CJK characters (Chinese, Japanese, and Korean)
defined in the Unicode 2.1 standard. The 4-byte code provides support for
the Unicode Version 3.0 additions to Version 2.1. The 4-byte code also
leaves a large number of unassigned codepoints that are available for
future use.
The GB18030 character set maps the invalid Unicode codepoints U+FFFE and
U+FFFF to 4-byte codes. Because these two characters are invalid in UCS,
this mapping can cause problems with round-trip character conversions.
The GB18030 character set does no mapping from 4-byte code to the UCS
surrogate area (U+D800 through U+DFFF).
Codeset Converters for GB18030
The following codeset converter pairs are available for converting
Simplified Chinese characters between GB18030 and UCS formats. Refer to
Unicode(5) for more information about the UTF-16, UCS-4, and UTF-8 encoding
formats. Refer to iconv_intro(5) for an introduction to codeset conversion.
· UTF-16_GB18030, GB18030_UTF-16
Converting from and to UTF-16 format
· UCS-4_GB18030, GB18030_UCS-4
Converting from and to UCS-4 format
· UTF-8_GB18030, GB18030_UTF-8
Converting from and to UTF-8 format
Fonts for GB18030
The operating system provides the following Simplified Chinese TrueType
fonts for GB18030:
FangSong
-css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-iso10646-1
HeiTi
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso10646-1
KaiTi
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso10646-1
SongTi
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso10646-1
These fonts can be used for printing with Chinese text printers. The
operating system uses Unicode fonts and the SongTi font style as the
default screen font for the GB18030 codeset. See wwpsof(8) for information
on the PostScript print filter and TrueType fonts.
SEE ALSO
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), dechanyu(5), dechanzi(5), eucTW(5),
GBK(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5),
telecode(5)
 |
Index for Section 5 |
|
 |
Alphabetical listing for G |
|
 |
Top of page |
|