iconv_KEIS(5)

Index for
Section 5
Alphabetical
listing for I
Bottom of
page
iconv_KEIS(5)
NAME
  iconv_KEIS - Specification for controlling conversion between Hitachi KEIS
  and Tru64 UNIX Japanese codesets

DESCRIPTION
  The iconv utility supports the ability to convert the encoding of
  characters between Hitachi KEIS (Kanji processing Extended Information
  System) code and one of the following Tru64 UNIX codesets: DEC Kanji, Super
  DEC Kanji, Japanese EUC, or Shift JIS. You choose the type of conversion by
  specifying the appropriate values for the utility's from-code and to-code
  parameters, as follows:

  _______________________________________________
  Type of Code Conversion   from-code	to-code
  _______________________________________________
  KEIS to DEC Kanji	    KEIS	deckanji
  KEIS to Super DEC Kanji   KEIS	sdeckanji
  KEIS to Japanese EUC	    KEIS	eucJP
  KEIS to Shift JIS	    KEIS	SJIS
  DEC Kanji to KEIS	    deckanji	KEIS
  Super DEC Kanji to KEIS   sdeckanji	KEIS
  Japanese EUC to KEIS	    eucJP	KEIS
  Shift JIS to KEIS	    SJIS	KEIS
  _______________________________________________

  Conversion behavior for the following items is affected by the definition
  of environment variables or profile entries in the user's environment. For
  more information, see the "Environment Variables" and "Profile" sections.

    ·  The UDC (User-Defined Character) mapping table that is used for UDC
       conversion

       This table must be an ASCII text file that contains UDC mapping
       information.  The table affects conversion of user-defined characters
       between the codesets.

    ·  The EBCDIC to/from ISO code (ASCII, JIS Roman characters) mapping
       table that is used for conversion

       This table must be ASCII text file that contains information on how to
       map characters between EBCDIC and ISO code.

    ·  The K-shift code

       This is a one- or two-byte hexadecimal code that marks the beginning
       of Kanji mode.

    ·  The A-shift code

       This is a one- or two-byte hexadecimal code that marks the beginning
       of EBCDIC mode.

    ·  The status of the initial mode (Kanji or EBCDIC) at the time iconv
       command starts or the first time the iconv() function is called after
       calling the iconv_open() function that initializes the converter in a
       program

       The status keywords are either kanji_mode or ebcdic_mode.

    ·  How to treat undefined characters when these are detected in Kanji
       mode

       Specify this action by using one of the following keywords:

       abort   Stop codeset conversion.

       pass    Output the undefined characters without any processing and
	       continue codeset conversion.

       replace Output padding characters instead of the undefined characters
	       and continue codeset conversion.

       dismiss Ignore the undefined characters and continue codeset
	       conversion.

    ·  The two-byte padding character used in Kanji mode

       This value is meaningful when replace is chosen for the processing of
       undefined characters in Kanji mode. Specify the padding character by
       its hexadecimal value.

    ·  How to treat undefined characters when these are detected in EBCDIC
       mode

       Specify this action by using one of the following keywords:

       abort
	   Stop codeset conversion.

       pass
	   Output the undefined characters without any processing and
	   continue codeset conversion.

       replace
	   Output padding characters instead of the undefined characters and
	   continue codeset conversion.

       dismiss
	   Ignore the undefined characters and continue codeset conversion.

    ·  The one-byte padding character used in EBCDIC mode

       This value is meaningful when replace is chosen for the processing of
       undefined characters in EBCDIC mode. Specify the padding character by
       its hexadecimal value.

  When the to-code parameter for the conversion is KEIS, you can also specify
  the following items for conversion behavior:

    ·  Whether the initial shift code is output at the start of conversion if
       the status of the initial mode (Kanji or EBCDIC) is different from the
       mode of the first input character

       The start of conversion is the time the iconv utility starts
       processing, or when the iconv() function is called just after opening
       the converter with iconv_open(). Keyword values for this item are yes
       or no.

    ·  Whether or not the utility outputs the last shift code when iconv() is
       called with a zero length input string, and the current mode (Kanji or
       EBCDIC) is different from the mode specified by the last shift state

       Keyword values for this item are yes or no.

    ·  The last status (Kanji mode or EBCDIC mode)

       Specify kanji_mode or ebcdic_mode for this value. It is meaningful
       only when yes is the setting for whether the utility outputs the last
       shift code.

  If the items that control conversion behavior are specified by both
  environment variables and the profile file, values set by environment
  variables override values set by comparable entries in the profile. Note
  that values for all conversion control items are case-sensitive, whether
  they are set by environment variables or in the profile. The following
  table contains the default values for each conversion control item:

  ___________________________________________________
  Conversion Control Item		Default Value
  ___________________________________________________
  UDC mapping table			None
  K shift code				0x0a42
  A shift code				0x0a41
  Initial state				ebcdic_mode
  Processing for undefined characters
  in Kanji mode				abort
  Processing for undefined characters
  in EBCDIC mode			pass
  ___________________________________________________

  The default padding characters are white spaces, whose code values for each
  destination codeset are noted in the following table. These padding
  characters are output when you specify replace for processing of undefined
  characters and do not explicitly specify the padding character.

  __________________________________________________
  Mode		Default Value	Destination Codeset
  __________________________________________________
  Kanji mode	0xa1a1		KEIS, deckanji,
				sdeckanji, or eucJP
		0x8140		SJIS
  EBCDIC mode	0x40		KEIS
		0x20		deckanji, sdeckanji,
				eucJP, or SJIS
  __________________________________________________

  The default EBCDIC-ISO mapping table is as follows;

    ·  For conversion from KEIS to other codesets:
       /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl

    ·  For conversion from other codesets to KEIS:
       /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl

  These mapping tables map both EBCDIC and ISO code, which includes JIS Roman
  characters. The kana_ebcdic.tbl mapping table also maps ISO lowercase
  characters to EBCDIC uppercase characters.

  The following default values for conversion control items are meaningful
  when the iconv utility's to-code conversion parameter is KEIS:

  ____________________________________________
  Conversion Control Item	   Default
  ____________________________________________
  Output the initial shift code?   yes
  Output the last shift code?	   yes
  Output the last status?	   ebcdic_mode
  ____________________________________________

  Environment Variables

  This section discusses the environment variables that you can set to
  control conversion behavior. The names for these variables adhere to the
  following format:

       fromcode_tocode_controlitem

  The name segments for fromcode or tocode can be one of the following key
  words:

  ___________________________
  For Codeset:	    Use:
  ___________________________
  Hitachi KEIS	    KEIS
  DEC Kanji	    DECKANJI
  Super DEC Kanji   SDECKANJI
  Japanese EUC	    EUCJP
  Shift JIS	    SJIS
  ___________________________

  The name segments for controlitem can be one of the following keywords:

  _______________________________________________________
  For Control Item:		       Use:
  _______________________________________________________
  UDC mapping table		       UDC_TABLE
  EBCDIC-ISO mapping table	       EBCDIC_TABLE
  K shift code			       K_SHIFT_CODE
  A shift code			       A_SHIFT_CODE
  Initial state			       INITIAL_STATE
  Processing of undefined characters
  in Kanji mode			       KANJI_EXCEPT_PROC
  Processing of undefined characters
  in EBCDIC mode		       EBCDIC_EXCEPT_PROC
  Padding characters
  in Kanji mode			       PADDING_2BYTE_CHAR
  Padding characters
  in EBCDIC mode		       PADDING_1BYTE_CHAR
  Output initial
  shift code			       INITIAL_SHIFT_CODE
  Output last
  shift code			       TRAILER_SHIFT_CODE
  Last status			       LAST_STATE
  File path of the profile	       PROFILE
  _______________________________________________________

  Following are examples of using the setenv C shell command to define
  environment variables to control conversion behavior. In these examples,
  the fromcode name segment indicates Japanese EUC and the tocode name
  segment indicates KEIS:

       setenv EUCJP_KEIS_UDC_TABLE eucjp_keis_udc.tbl
       setenv EUCJP_KEIS_EBCDIC_TABLE ebcdic_kana.tbl
       setenv EUCJP_KEIS_K_SHIFT_CODE 0x0a42
       setenv EUCJP_KEIS_A_SHIFT_CODE 0x0a41
       setenv EUCJP_KEIS_INITIAL_STATE ebcdic_mode
       setenv EUCJP_KEIS_KANJI_EXCEPT_PROC replace
       setenv EUCJP_KEIS_EBCDIC_EXCEPT_PROC replace
       setenv EUCJP_KEIS_PADDING_2BYTE_CHAR 0xa1a1
       setenv EUCJP_KEIS_PADDING_1BYTE_CHAR 0x40
       setenv EUCJP_KEIS_INITIAL_SHIFT_CODE yes
       setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes
       setenv EUCJP_KEIS_LAST_STATE ebcdic_mode
       setenv EUCJP_KEIS_INITIAL_SHIFT_CODE yes
       setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes
       setenv EUCJP_KEIS_LAST_STATE ebcdic_mode
       setenv EUCJP_KEIS_PROFILE .eucjp_keis_profile

  Directory Search Path

  When you specify a file name without a directory, the iconv utility
  searches the following directories and uses the first file found:

   1.  Current directory

   2.  Home directory

   3.  The subdirectory iconv/data of the directory specified by the
       environment variable LOCPATH

   4.  /usr/lib/nls/loc/iconv/data

   5.  /usr/i18n/lib/nls/loc/iconv/data

  If you specify a relative directory path for a file, the utility searches
  these same directories in the same order and uses the first file found.

  Profile File

  Entry lines in the profile file adhere to the following format:

       entry_name	 string_value

  The entry_name and string_value fields are separated by spaces or tabs. Do
  not append a colon (:) after entry_name. The file can also include blank
  lines and comment entries, which begin with the # character.

  Following are the entry_name values for different conversion control items:

  ___________________________________________________________
  Conversion Control Item	    entry_name
  ___________________________________________________________
  UDC mapping table		    udc_mapping_table
  EBCDIC-ISO mapping table	    ebcdic_mapping_table
  K shift code			    k_shift_code
  A shift code			    a_shift_code
  Initial state			    initial_state
  Processing undefined characters
  in Kanji mode			    kanji_except_proc
  Processing undefined characters
  in EBCDIC mode		    ebcdic_except_proc
  Padding character
  in Kanji mode			    padding_2byte_char
  Padding character
  in EBCDIC mode		    padding_1byte_char
  Output initial
  shift code			    output_initial_shift_code
  Output last
  shift code			    output_trailer_shift_code
  Last state			    last_state
  ___________________________________________________________

  Following is a sample profile for converting from Japanese EUC to Hitachi
  KEIS:

       #
       #  sample profile for eucJP_KEIS
       #
       udc_mapping_table	       eucjp_keis_udc.tbl
       ebcdic_mapping_table	       kana_ebcdic.tbl
       k_shift_code		       0x0a42	       # ebcdic -> kanji
       a_shift_code		       0x0a41	       # kanji -> ebcdic
       initial_state		       ebcdic_mode
       kanji_except_proc	       replace
       ebcdic_except_proc	       replace
       padding_2byte_char	       0xa1a1	       # kanji mode
       padding_1byte_char	       0x40	       # ebcdic mode
       output_initial_shift_code       yes
       output_trailer_shift_code       yes
       last_state		       ebcdic_mode

  The default file names for the profile are as follows;

  _________________________________________________
  Code Conversion	    Default Profile Name
  _________________________________________________
  KEIS to DEC Kanji	    .keis_deckanji_profile
  KEIS to Super DEC Kanji   .keis_sdeckanji_profile
  KEIS to Shift JIS	    .keis_sjis_profile
  KEIS to Japanese EUC	    .keis_eucjp_profile
  DEC Kanji to KEIS	    .deckanji_keis_profile
  Super DEC Kanji to KEIS   .sdeckanji_keis_profile
  Shift JIS to KEIS	    .sjis_keis_profile
  Japanese EUC to KEIS	    .eucjp_keis_profile
  _________________________________________________

  By default, the iconv utility checks the directory search path mentioned in
  the "Directory Search Path" section and uses the first profile it finds.
  However, you can also specify an arbitrary file path for your profile
  instead of the default names by defining the following environment
  variables:

  ___________________________________________________________
  Code Conversion	    Profile Path Environment Variable
  ___________________________________________________________
  KEIS to DEC Kanji	    KEIS_DECKANJI_PROFILE
  KEIS to Super DEC Kanji   KEIS_SDECKANJI_PROFILE
  KEIS to Shift JIS	    KEIS_SJIS_PROFILE
  KEIS to Japanese EUC	    KEIS_EUCJP_PROFILE
  DEC Kanji to KEIS	    DECKANJI_KEIS_PROFILE
  Super DEC Kanji to KEIS   SDECKANJI_KEIS_PROFILE
  Shift JIS to KEIS	    SJIS_KEIS_PROFILE
  Japanese EUC to KEIS	    EUCJP_KEIS_PROFILE
  ___________________________________________________________

  UDC Mapping Table

  Entries in a UDC mapping table adhere to the following format:

       fromcode	     tocode

  Each of these values is a two-byte hexadecimal number. In the case of Super
  DEC Kanji and Japanese EUC, three-byte hexadecimal values that begin with
  SS3 (0x8f), such as 0x8fxxxx, are also valid.

  You can specify ranges of UDC from and to values in the same file entry by
  using a hyphen to separate the codes that start and end each range:

       start_fromcode-end_fromcode   start_tocode-end_tocode

  When specifying entries that include ranges of values, the number of codes
  in the from range must always equal the number of codes in the to range. A
  UDC mapping table can also include blank lines and comment lines, which
  begin with the # character. Following is an example of a UDC mapping table:

       # KEIS		       eucJP

       0x81a1-0x8afe	       0xf5a1-0xfefe	       # udc
       0x8ba1-0x94fe	       0x8ff5a1-0x8ffefe       # udc
       0x95a1-0x9afe	       0x8feea1-0x8ff3fe       # udc
       0x9ba1-0x9bfe	       0x8ff4a1-0x8ff4fe       # udc

  The first entry in this file specifies a range of KEIS values from 0x80a1
  to 0x8afe that are mapped to Japanese EUC code values in the range 0xf5a1
  to 0xfefe. You can find additional sample UDC mapping table files in the
  /usr/i18n/examples/iconv/data directory.

  EBCDIC-ISO Mapping Table

  Entries in an EBCDIC-ISO mapping table adhere to the following format:

       fromcode	      tocode

  Each code is a one-byte hexadecimal number. You can specify a range of
  character codes as follows:

       start_fromcode-end_fromcode     start_tocode-end_tocode

  When using the range format, the number of hex values in the from range
  must be the same as the number of hex values in the to range.

  The EBCDIC-/ISO mapping table can also include blank lines and comment
  entries, which begin with the # character.

  Following is an example of EBCDIC-ISO code mapping table:

       # EBCDIC		       Kana

       0x40		       0x20	       # space
       0x4f		       0x21	       # '!'
       0x7f		       0x22	       # '"'
	 .			 .
	 .			 .
	 .			 .
       0xc1-0xc9	       0x41-0x49       # 'A' - 'I'
       0xd1-0xd9	       0x4a-0x52       # 'J' - 'R'
       0xe2-0xe9	       0x53-0x5a       # 'S' - 'Z'
	 .			 .
	 .			 .
	 .			 .

  In this example, the first column of values are from codes and the second
  column of values are to codes.  The first three value entry lines specify
  mapping for single characters, whereas the last three value entry lines
  specify mapping for ranges of characters.  You can find additional sample
  EBCDIC-ISO mapping tables in the /usr/i18n/lib/nls/loc/iconv/data
  directory.

NOTES
  This reference page contains code conversion specifications that apply only
  to conversion between Hitachi KEIS code and the DEC Kanji, Super DEC Kanji,
  Japanese EUC, and Shift JIS codesets. Refer to iconv_ibmkanji(5) for code
  conversion specifications between IBM Kanji System characters and the DEC
  Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to
  iconv_JEF(5) for code conversion specifications between Fujitsu JEF
  characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS
  codesets.  Refer to iconv_intro(5) for information about conversion between
  DEC Kanji, Super DEC Kanji, Japanese EUC, Shift JIS, and other Tru64 UNIX
  codesets.

SEE ALSO
  Commands: iconv(1)

  Functions: iconv(3), iconv_close(3), iconv_open(3)

  Others: deckanji(5), eucJP(5), iconv_ibmkanji(5), iconv_intro(5),
  iconv_KEIS(5), Japanese(5), sdeckanji(5), SJIS(5)
Index for
Section 5
Alphabetical
listing for I
Top of
page