iconv_intro(5)

Index for
Section 5
Alphabetical
listing for I
Bottom of
page
iconv_intro(5)
NAME
  iconv_intro, iconv - Introduction to codeset conversion

DESCRIPTION
  Conversion of character encoding from one coded character set (codeset) to
  another is an operation that often has to be performed by the operating
  system and some applications. For example, the man command supports codeset
  conversion to allow one set of reference page files to meet the needs of
  locales that support the same language and territory but different codesets
  (see man(1)).

  The following commands and library interfaces give users and application
  developers direct access to codeset conversion operations:

    ·  The iconv command converts characters in a data file from one codeset
       to another (see iconv(1)).

    ·  The iconv(), iconv_open(), and iconv_close() functions convert a
       string of characters from one codeset to another (see iconv(3),
       iconv_open(3), and iconv_close(3)).  The iconv command uses these
       interfaces to convert characters.

  There are two types of codeset converters: algorithmic and table.
  Algorithmic converters, which reside in the /usr/lib/nls/loc/iconv
  directory, are shared libraries with a predefined entry point for
  invocation by functions in the libiconv.so library.  Algorithmic converters
  are needed for the conversion of multibyte codesets, in part because table
  converters cannot handle the required number of character values and also
  because some of these codesets require complex handling (see NOTES).
  Algorithmic converters are supplied as part of the operating system
  product; the internal interfaces that they require are not published for
  external use.

  Table converters, which reside in the /usr/lib/nls/loc/iconvTable
  directory, can be created by using the genxlt command (see genxlt(1)).
  These converters can support single-byte codesets and up to 256 encoded
  character values.

  Names of codeset converters are in the following form:

  from-codeset_to-codeset

  For example, the following converter converts values from Super DEC Kanji
  to Japanese Extended UNIX Code:

  sdeckanji_eucJP

  The codeset converters produce an invalid character error in response to
  characters that cannot be converted from the source codeset to the
  destination codeset. This error is always produced for character codes that
  are invalid in the source codeset. However, if the error results from
  characters that are valid in the source codeset but have no counterparts in
  the destination codeset, you can eliminate the error by defining the
  ICONV_DEFSTR environment variable to specify a substitute output string.
  See the ENVIRONMENT VARIABLES section for more information about using the
  ICONV_DEFSTR variable.

  It is possible to convert data directly between two codesets or by way of
  an intermediate codeset, such as UTF-16, UCS-4, or UTF-8. For conversion of
  Chinese characters, be aware that the results of converting a Traditional
  Chinese codeset directly to a Simplified Chinese codeset may not be the
  same as the results of converting Traditional Chinese first to UTF-16,
  UCS-4, or UTF-8 and then to Simplified Chinese.

ENVIRONMENT VARIABLES
  Some codeset converters require more complex algorithms than can be
  provided through tables. The following environment variables provide
  control over conversion behavior for different kinds of codeset converters:

  ICONV_ACTION
      Controls the behavior for the many-to-one value conversions for
      conversion of Traditional Chinese (except for Traditional Chinese
      encoded in Telecode) to Simplified Chinese.  The valid settings for
      this environment variable are as follows:

      batch
	  Specifies that the preferred mapping value (the first one in the
	  one-to-many mapping list) is always taken.  The batch setting is
	  the ICONV_ACTION default.

      conv_all
	  Specifies that all the possible values are printed to the standard
	  output, enclosed by braces ({ }), so that the user can later
	  manually edit the converted file and select the one to use.

      conv_all_nosym
	  Specifies that all the possible values are printed to the standard
	  output except for punctuation symbols, for which only the preferred
	  mapping value is printed. As is true for conv-all, the
	  conv_all_nosym setting prints value choices enclosed by braces so
	  that the converted file can later be edited.

  ICONV_BYTEORDER
      Sets byte ordering for UTF-16 or UCS-4 (UTF-32) converters only. Valid
      values are little-endian or big-endian.

      If ICONV_NOBOM is set to a non-null value, the default byte ordering is
      big-endian. If ICONV_NOBOM is not set, the default byte ordering is
      little-endian.  Setting the ICONV_BYTEORDER and ICONV_NOBOM environment
      variables may be necessary when producing UTF-16 or UCS-4 output that
      will be processed by codeset converters on platforms other than Tru64
      UNIX.

  ICONV_DEFSTR[_from-codeset_to-codeset]
      Defines the default string to be substituted in output for valid input
      characters that cannot be converted from the source codeset to the
      destination codeset. The variable value can be an arbitrary string or a
      code number. If the value is a code number (for example, 10, 07, 0x10,
      or, for Unicode converters, U+1234), the corresponding character in the
      output codeset (to-codeset) is printed.

      For a given type of codeset conversion, a matching ICONV_DEFSTR_from-
      codeset_to-codeset variable has precedence over the ICONV_DEFSTR
      variable without the from-codeset_to-codeset suffix.  When defining the
      variable with the suffix, replace from-codeset_to-codeset with the name
      of the codeset converter to which the variable applies. The
      ICONV_DEFSTR variable (defined without the  suffix) is used by a
      converter when no ICONV_DEFSTR_from-codeset_to-codeset variable has
      been defined specifically for the type of conversion being done.

      If these variables are not defined or are set to the null string, the
      characters that cannot be converted are skipped and have no
      representation in converted output.

      The following converter-specific restrictions apply to ICONV_DEFSTR*
      variables:

	·  ICONV_DEFSTR* environment variables do not work for converters
	   that convert between Japanese codesets or between Korean codesets.

	·  For converters that handle UTF-16, UCS-4 or UTF-8 format, the only
	   valid variable value is a code number (such as U+1234 or 0x10) or
	   a string whose value is a single ASCII character (such as ?). For
	   these converters, any string value other than a single ASCII
	   character is ignored and any characters that cannot be converted
	   have no representation in output.

	·  For converters that handle output in UTF-16, UCS-4 or UTF-8
	   format, characters that cannot be converted and for which no valid
	   ICONV_DEFSTR* value has been defined produce an error condition
	   that aborts the conversion process.

  ICONV_NOBOM
      Disables generation of the byte-order mark at the beginning of UTF-16
      or UCS-4 (UTF-32) output.	 A valid setting is any value other than a
      null string. If ICONV_NOBOM is set, big-endian is established as the
      default byte ordering and BOM generation is disabled. If ICONV_NOBOM is
      not set, little-endian is established as the default byte ordering and
      BOM generation is enabled.

      Codeset converters that process UTF-16 or UCS-4 data on platforms other
      than Tru64 UNIX usually require the byte-order mark. The ICONV_NOBOM
      and ICONV_BYTEORDER environment variables provide you with the means to
      control the generation of a byte-order mark and byte ordering. Thus,
      you can establish codeset conversion that is appropriate to the
      requirements of other platforms or is compatible with output produced
      by codeset converters that were included in versions of Tru64 UNIX
      prior to Version 4.0D.

  ICONV_PHRCONV
      Activates phrase conversion for converters that convert from a
      Traditional Chinese codeset (except for Traditional Chinese encoded in
      Telecode) to a Simplified Chinese codeset or the reverse. When phrase
      conversion is activated, a whole phrase in Traditional Chinese is
      converted to a different phrase in Simplified Chinese or the reverse.

      If ICONV_PHRCONV is set to mark, the converted phrases are be bracketed
      by [ and ] to highlight the conversion result for visual checking.

      The phrase conversion databases in the /usr/share/phrdb directory are
      normal text files with the same file names as those of the algorithmic
      converters in /usr/lib/nls/loc/iconv/*.  These phrase conversion
      databases contain entries for phrase conversion pairs.

FILES
  /usr/lib/nls/loc/iconv/*
      Algorithmic converters

  /usr/lib/nls/loc/iconvTable/*
      Table converters

  /usr/share/phrdb/*
      Phrase conversion databases

SEE ALSO
  Commands: genxlt(1), iconv(1), phrase(1)

  Functions: iconv(3), iconv_close(3), iconv_open(3)

  Others: i18n_intro(5), l10n_intro(5)
Index for
Section 5
Alphabetical
listing for I
Top of
page