 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Bottom of page |
|
iconv_KEIS(5)
NAME
iconv_KEIS - Specification for controlling conversion between Hitachi KEIS
and Tru64 UNIX Japanese codesets
DESCRIPTION
The iconv utility supports the ability to convert the encoding of
characters between Hitachi KEIS (Kanji processing Extended Information
System) code and one of the following Tru64 UNIX codesets: DEC Kanji, Super
DEC Kanji, Japanese EUC, or Shift JIS. You choose the type of conversion by
specifying the appropriate values for the utility's from-code and to-code
parameters, as follows:
_______________________________________________
Type of Code Conversion from-code to-code
_______________________________________________
KEIS to DEC Kanji KEIS deckanji
KEIS to Super DEC Kanji KEIS sdeckanji
KEIS to Japanese EUC KEIS eucJP
KEIS to Shift JIS KEIS SJIS
DEC Kanji to KEIS deckanji KEIS
Super DEC Kanji to KEIS sdeckanji KEIS
Japanese EUC to KEIS eucJP KEIS
Shift JIS to KEIS SJIS KEIS
_______________________________________________
Conversion behavior for the following items is affected by the definition
of environment variables or profile entries in the user's environment. For
more information, see the "Environment Variables" and "Profile" sections.
· The UDC (User-Defined Character) mapping table that is used for UDC
conversion
This table must be an ASCII text file that contains UDC mapping
information. The table affects conversion of user-defined characters
between the codesets.
· The EBCDIC to/from ISO code (ASCII, JIS Roman characters) mapping
table that is used for conversion
This table must be ASCII text file that contains information on how to
map characters between EBCDIC and ISO code.
· The K-shift code
This is a one- or two-byte hexadecimal code that marks the beginning
of Kanji mode.
· The A-shift code
This is a one- or two-byte hexadecimal code that marks the beginning
of EBCDIC mode.
· The status of the initial mode (Kanji or EBCDIC) at the time iconv
command starts or the first time the iconv() function is called after
calling the iconv_open() function that initializes the converter in a
program
The status keywords are either kanji_mode or ebcdic_mode.
· How to treat undefined characters when these are detected in Kanji
mode
Specify this action by using one of the following keywords:
abort Stop codeset conversion.
pass Output the undefined characters without any processing and
continue codeset conversion.
replace Output padding characters instead of the undefined characters
and continue codeset conversion.
dismiss Ignore the undefined characters and continue codeset
conversion.
· The two-byte padding character used in Kanji mode
This value is meaningful when replace is chosen for the processing of
undefined characters in Kanji mode. Specify the padding character by
its hexadecimal value.
· How to treat undefined characters when these are detected in EBCDIC
mode
Specify this action by using one of the following keywords:
abort
Stop codeset conversion.
pass
Output the undefined characters without any processing and
continue codeset conversion.
replace
Output padding characters instead of the undefined characters and
continue codeset conversion.
dismiss
Ignore the undefined characters and continue codeset conversion.
· The one-byte padding character used in EBCDIC mode
This value is meaningful when replace is chosen for the processing of
undefined characters in EBCDIC mode. Specify the padding character by
its hexadecimal value.
When the to-code parameter for the conversion is KEIS, you can also specify
the following items for conversion behavior:
· Whether the initial shift code is output at the start of conversion if
the status of the initial mode (Kanji or EBCDIC) is different from the
mode of the first input character
The start of conversion is the time the iconv utility starts
processing, or when the iconv() function is called just after opening
the converter with iconv_open(). Keyword values for this item are yes
or no.
· Whether or not the utility outputs the last shift code when iconv() is
called with a zero length input string, and the current mode (Kanji or
EBCDIC) is different from the mode specified by the last shift state
Keyword values for this item are yes or no.
· The last status (Kanji mode or EBCDIC mode)
Specify kanji_mode or ebcdic_mode for this value. It is meaningful
only when yes is the setting for whether the utility outputs the last
shift code.
If the items that control conversion behavior are specified by both
environment variables and the profile file, values set by environment
variables override values set by comparable entries in the profile. Note
that values for all conversion control items are case-sensitive, whether
they are set by environment variables or in the profile. The following
table contains the default values for each conversion control item:
___________________________________________________
Conversion Control Item Default Value
___________________________________________________
UDC mapping table None
K shift code 0x0a42
A shift code 0x0a41
Initial state ebcdic_mode
Processing for undefined characters
in Kanji mode abort
Processing for undefined characters
in EBCDIC mode pass
___________________________________________________
The default padding characters are white spaces, whose code values for each
destination codeset are noted in the following table. These padding
characters are output when you specify replace for processing of undefined
characters and do not explicitly specify the padding character.
__________________________________________________
Mode Default Value Destination Codeset
__________________________________________________
Kanji mode 0xa1a1 KEIS, deckanji,
sdeckanji, or eucJP
0x8140 SJIS
EBCDIC mode 0x40 KEIS
0x20 deckanji, sdeckanji,
eucJP, or SJIS
__________________________________________________
The default EBCDIC-ISO mapping table is as follows;
· For conversion from KEIS to other codesets:
/usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl
· For conversion from other codesets to KEIS:
/usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl
These mapping tables map both EBCDIC and ISO code, which includes JIS Roman
characters. The kana_ebcdic.tbl mapping table also maps ISO lowercase
characters to EBCDIC uppercase characters.
The following default values for conversion control items are meaningful
when the iconv utility's to-code conversion parameter is KEIS:
____________________________________________
Conversion Control Item Default
____________________________________________
Output the initial shift code? yes
Output the last shift code? yes
Output the last status? ebcdic_mode
____________________________________________
Environment Variables
This section discusses the environment variables that you can set to
control conversion behavior. The names for these variables adhere to the
following format:
fromcode_tocode_controlitem
The name segments for fromcode or tocode can be one of the following key
words:
___________________________
For Codeset: Use:
___________________________
Hitachi KEIS KEIS
DEC Kanji DECKANJI
Super DEC Kanji SDECKANJI
Japanese EUC EUCJP
Shift JIS SJIS
___________________________
The name segments for controlitem can be one of the following keywords:
_______________________________________________________
For Control Item: Use:
_______________________________________________________
UDC mapping table UDC_TABLE
EBCDIC-ISO mapping table EBCDIC_TABLE
K shift code K_SHIFT_CODE
A shift code A_SHIFT_CODE
Initial state INITIAL_STATE
Processing of undefined characters
in Kanji mode KANJI_EXCEPT_PROC
Processing of undefined characters
in EBCDIC mode EBCDIC_EXCEPT_PROC
Padding characters
in Kanji mode PADDING_2BYTE_CHAR
Padding characters
in EBCDIC mode PADDING_1BYTE_CHAR
Output initial
shift code INITIAL_SHIFT_CODE
Output last
shift code TRAILER_SHIFT_CODE
Last status LAST_STATE
File path of the profile PROFILE
_______________________________________________________
Following are examples of using the setenv C shell command to define
environment variables to control conversion behavior. In these examples,
the fromcode name segment indicates Japanese EUC and the tocode name
segment indicates KEIS:
setenv EUCJP_KEIS_UDC_TABLE eucjp_keis_udc.tbl
setenv EUCJP_KEIS_EBCDIC_TABLE ebcdic_kana.tbl
setenv EUCJP_KEIS_K_SHIFT_CODE 0x0a42
setenv EUCJP_KEIS_A_SHIFT_CODE 0x0a41
setenv EUCJP_KEIS_INITIAL_STATE ebcdic_mode
setenv EUCJP_KEIS_KANJI_EXCEPT_PROC replace
setenv EUCJP_KEIS_EBCDIC_EXCEPT_PROC replace
setenv EUCJP_KEIS_PADDING_2BYTE_CHAR 0xa1a1
setenv EUCJP_KEIS_PADDING_1BYTE_CHAR 0x40
setenv EUCJP_KEIS_INITIAL_SHIFT_CODE yes
setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes
setenv EUCJP_KEIS_LAST_STATE ebcdic_mode
setenv EUCJP_KEIS_INITIAL_SHIFT_CODE yes
setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes
setenv EUCJP_KEIS_LAST_STATE ebcdic_mode
setenv EUCJP_KEIS_PROFILE .eucjp_keis_profile
Directory Search Path
When you specify a file name without a directory, the iconv utility
searches the following directories and uses the first file found:
1. Current directory
2. Home directory
3. The subdirectory iconv/data of the directory specified by the
environment variable LOCPATH
4. /usr/lib/nls/loc/iconv/data
5. /usr/i18n/lib/nls/loc/iconv/data
If you specify a relative directory path for a file, the utility searches
these same directories in the same order and uses the first file found.
Profile File
Entry lines in the profile file adhere to the following format:
entry_name string_value
The entry_name and string_value fields are separated by spaces or tabs. Do
not append a colon (:) after entry_name. The file can also include blank
lines and comment entries, which begin with the # character.
Following are the entry_name values for different conversion control items:
___________________________________________________________
Conversion Control Item entry_name
___________________________________________________________
UDC mapping table udc_mapping_table
EBCDIC-ISO mapping table ebcdic_mapping_table
K shift code k_shift_code
A shift code a_shift_code
Initial state initial_state
Processing undefined characters
in Kanji mode kanji_except_proc
Processing undefined characters
in EBCDIC mode ebcdic_except_proc
Padding character
in Kanji mode padding_2byte_char
Padding character
in EBCDIC mode padding_1byte_char
Output initial
shift code output_initial_shift_code
Output last
shift code output_trailer_shift_code
Last state last_state
___________________________________________________________
Following is a sample profile for converting from Japanese EUC to Hitachi
KEIS:
#
# sample profile for eucJP_KEIS
#
udc_mapping_table eucjp_keis_udc.tbl
ebcdic_mapping_table kana_ebcdic.tbl
k_shift_code 0x0a42 # ebcdic -> kanji
a_shift_code 0x0a41 # kanji -> ebcdic
initial_state ebcdic_mode
kanji_except_proc replace
ebcdic_except_proc replace
padding_2byte_char 0xa1a1 # kanji mode
padding_1byte_char 0x40 # ebcdic mode
output_initial_shift_code yes
output_trailer_shift_code yes
last_state ebcdic_mode
The default file names for the profile are as follows;
_________________________________________________
Code Conversion Default Profile Name
_________________________________________________
KEIS to DEC Kanji .keis_deckanji_profile
KEIS to Super DEC Kanji .keis_sdeckanji_profile
KEIS to Shift JIS .keis_sjis_profile
KEIS to Japanese EUC .keis_eucjp_profile
DEC Kanji to KEIS .deckanji_keis_profile
Super DEC Kanji to KEIS .sdeckanji_keis_profile
Shift JIS to KEIS .sjis_keis_profile
Japanese EUC to KEIS .eucjp_keis_profile
_________________________________________________
By default, the iconv utility checks the directory search path mentioned in
the "Directory Search Path" section and uses the first profile it finds.
However, you can also specify an arbitrary file path for your profile
instead of the default names by defining the following environment
variables:
___________________________________________________________
Code Conversion Profile Path Environment Variable
___________________________________________________________
KEIS to DEC Kanji KEIS_DECKANJI_PROFILE
KEIS to Super DEC Kanji KEIS_SDECKANJI_PROFILE
KEIS to Shift JIS KEIS_SJIS_PROFILE
KEIS to Japanese EUC KEIS_EUCJP_PROFILE
DEC Kanji to KEIS DECKANJI_KEIS_PROFILE
Super DEC Kanji to KEIS SDECKANJI_KEIS_PROFILE
Shift JIS to KEIS SJIS_KEIS_PROFILE
Japanese EUC to KEIS EUCJP_KEIS_PROFILE
___________________________________________________________
UDC Mapping Table
Entries in a UDC mapping table adhere to the following format:
fromcode tocode
Each of these values is a two-byte hexadecimal number. In the case of Super
DEC Kanji and Japanese EUC, three-byte hexadecimal values that begin with
SS3 (0x8f), such as 0x8fxxxx, are also valid.
You can specify ranges of UDC from and to values in the same file entry by
using a hyphen to separate the codes that start and end each range:
start_fromcode-end_fromcode start_tocode-end_tocode
When specifying entries that include ranges of values, the number of codes
in the from range must always equal the number of codes in the to range. A
UDC mapping table can also include blank lines and comment lines, which
begin with the # character. Following is an example of a UDC mapping table:
# KEIS eucJP
0x81a1-0x8afe 0xf5a1-0xfefe # udc
0x8ba1-0x94fe 0x8ff5a1-0x8ffefe # udc
0x95a1-0x9afe 0x8feea1-0x8ff3fe # udc
0x9ba1-0x9bfe 0x8ff4a1-0x8ff4fe # udc
The first entry in this file specifies a range of KEIS values from 0x80a1
to 0x8afe that are mapped to Japanese EUC code values in the range 0xf5a1
to 0xfefe. You can find additional sample UDC mapping table files in the
/usr/i18n/examples/iconv/data directory.
EBCDIC-ISO Mapping Table
Entries in an EBCDIC-ISO mapping table adhere to the following format:
fromcode tocode
Each code is a one-byte hexadecimal number. You can specify a range of
character codes as follows:
start_fromcode-end_fromcode start_tocode-end_tocode
When using the range format, the number of hex values in the from range
must be the same as the number of hex values in the to range.
The EBCDIC-/ISO mapping table can also include blank lines and comment
entries, which begin with the # character.
Following is an example of EBCDIC-ISO code mapping table:
# EBCDIC Kana
0x40 0x20 # space
0x4f 0x21 # '!'
0x7f 0x22 # '"'
. .
. .
. .
0xc1-0xc9 0x41-0x49 # 'A' - 'I'
0xd1-0xd9 0x4a-0x52 # 'J' - 'R'
0xe2-0xe9 0x53-0x5a # 'S' - 'Z'
. .
. .
. .
In this example, the first column of values are from codes and the second
column of values are to codes. The first three value entry lines specify
mapping for single characters, whereas the last three value entry lines
specify mapping for ranges of characters. You can find additional sample
EBCDIC-ISO mapping tables in the /usr/i18n/lib/nls/loc/iconv/data
directory.
NOTES
This reference page contains code conversion specifications that apply only
to conversion between Hitachi KEIS code and the DEC Kanji, Super DEC Kanji,
Japanese EUC, and Shift JIS codesets. Refer to iconv_ibmkanji(5) for code
conversion specifications between IBM Kanji System characters and the DEC
Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to
iconv_JEF(5) for code conversion specifications between Fujitsu JEF
characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS
codesets. Refer to iconv_intro(5) for information about conversion between
DEC Kanji, Super DEC Kanji, Japanese EUC, Shift JIS, and other Tru64 UNIX
codesets.
SEE ALSO
Commands: iconv(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: deckanji(5), eucJP(5), iconv_ibmkanji(5), iconv_intro(5),
iconv_KEIS(5), Japanese(5), sdeckanji(5), SJIS(5)
 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Top of page |
|