sort(1)

Index for
Section 1
Alphabetical
listing for S
Bottom of
page
sort(1)
NAME
  sort - Sorts or merges files

SYNOPSIS
  sort [-m] [-o output_file] [-Abdfinru] [-k keydef]... [-t character] [-T
  directory] [-y] [kilobytes] [-z record_size]... file...

  sort -c  [-u] [-Abdfinru] [-k keydef]... [-t character] [-T directory] [-y]
  [kilobytes] [-z record_size]... file...

  The following older syntax is now maintained for backward compatibility,
  but may be withdrawn in future issues:

  sort [-Abcdfimnru] [-o output_file] [-t character] [-T directory] [-y]
  [kilobytes] [-z record_size] [+fskip] [.cskip] [-fskip] [.cskip]
  [-bdfinr]... file...

STANDARDS
  Interfaces documented on this reference page conform to industry standards
  as follows:

  sort:	 XCU5.0

  Refer to the standards(5) reference page for more information about
  industry standards and associated tags.

OPTIONS
  The -d, -f, -i, -n, and -r options override the default ordering rules.
  When ordering options appear independent of any key field specifications,
  the requested field ordering rules are applied globally to all sort keys.
  When attached to a specific key (see -k), the specified ordering options
  override all global ordering options for that key.  In the obsolescent
  forms, if one or more of these options follows a +fskip option, it affects
  only the key field specified by that preceding option.

  -A  [Tru64 UNIX]  Sorts on a byte-by-byte basis using each character's
      encoded value.  On some systems, extended characters will be considered
      negative values, and so sort before ASCII characters.  If you are
      sorting ASCII characters in a non-C/POSIX locale, this option performs
      much faster.

  -b  Ignores leading spaces and tabs when determining the starting and
      ending positions of a restricted sort key.  If the -b option is
      specified before the first -k option, the -b option is applied to all
      -k options on the command line; otherwise, the -b option can be
      independently attached to each -k field_start or field_end argument.

  -c  Checks that the input is sorted according to the ordering rules
      specified in the options and the collating sequence of the current
      locale.  No output is produced; only the exit code is affected.

  -d  Specifies that only spaces and alphanumeric characters (according to
      the current setting of LC_TYPE) are significant in comparisons.

  -f  Treats all lowercase characters as their uppercase equivalents
      (according to the current setting of LC_TYPE) for the purposes of
      comparison.

  -i  Sorts only by printable characters (according to the current setting of
      LC_TYPE).

  -k keydef
      Specifies one or more (up to 50) restricted sort key field definitions.
      This option replaces the obsolescent +fskip.cskip and -fskip.cskip
      options. A field comprises a maximal sequence of non-separating
      characters and, in the absence of the -t option, any preceding field
      separator.

      The format of a key field definition is as follows:

      field_start[type][,field_end[type]]

      The field_start and field_end arguments define a key field that is
      restricted to a portion of the line, and type is a modifier specified
      by b, d, f, i, n, r, or t.  The b modifier behaves like the -b option,
      but applies only to the field_start or field_end argument to which it
      is attached.  The t modifier indicates that the key field is processed
      as CPU time. The other modifiers behave like their corresponding
      options, but apply only to the key field to which they are attached;
      these modifiers have this effect if specified with field_start,
      field_end or both.

      Modifiers attached to a field_start or field_end argument override any
      specifications made by the options.  A missing field_end argument means
      the last character of the line.  When multiple sort keys are specified,
      it is advisable to specify a field_end argument to avoid possible
      confusion.

      The field_start portion of the keydef argument takes the following
      form:

      field_number[.first_character]

      Fields and characters within fields are numbered starting with 1. The
      field_number and first_character pieces, interpreted as positive
      decimal integers, specify the character to be used as part of a sort
      key.  If first_character is not specified, the default is the first
      character of the field.

      The field_end portion of the keydef argument takes the following form:

      field_number[.last_character]

      The field_number syntax is the same as that described for field_start.
      The last_character argument, interpreted as a nonnegative decimal
      integer, specifies the last character to be used as part of the sort
      key.  If last_character evaluates to 0 (zero) or is not specified, the
      default is the last character of the field specified by field_number.

      If -b is in effect, characters within a field are counted from the
      first nonspace character in the field.  (This applies separately to
      first_character and last_character.)

      If -k is not specified, the default sort key is the entire line.

      When there are multiple key fields, later keys are compared only after
      all earlier keys compare as equal.  Except when the -u option is
      specified, lines that otherwise compare as equal are ordered as though
      none of the options -d, -f, -i, -n, or -k were present (but with -r
      still in effect, if it was specified) and with all bytes in the lines
      significant to the comparison.

      The algorithm for the -k option can be summarized as follows:

	   /*
	    * -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
	    *		   else +(a-1).(b-1) -(c-1).d
	    */

  -m  Merges only (assumes sorted input).

  -n  Sorts any initial numeric strings (including regular expressions
      consisting of optional spaces, optional dashes, and zero (0) or more
      digits with optional radix character and thousands separator, as
      defined by the current locale) by arithmetic value.  An empty digit
      string is treated as zero; leading zeros and signs on zeros do not
      affect ordering.	Only one period (.) can be used in numeric strings.
      All subsequent periods (.) and any character to the right of the period
      (.) will be ignored.

  -o output_file
      Directs output to output_file instead of standard output.	 The
      output_file can be the same as one of the input files.

  -r  Reverses the order of the specified sort.

  -t character
      Sets the field separator character to character. The character argument
      is not considered to be part of a field (although it can be included in
      a sort key).  Each occurrence of character is significant (for example,
      two consecutive occurrences of character delimit an empty field).	 To
      specify the tab character as the field separator, you must enclose it
      in ' ' (single quotes).

      The default field separator is one or more spaces.

  -T directory
      [Tru64 UNIX]  Places all the temporary files that are created in
      directory.

  -u  Suppresses all but one in each set of equal lines (for example, lines
      whose sort keys match exactly).  Ignored characters such as leading
      tabs and spaces, and characters outside of sort keys are not considered
      in this type of comparison.

      If used with the -c option, -u checks that there are no lines with
      duplicate keys, in addition to checking that the input file is sorted.

  -y [kilobytes]
      [Tru64 UNIX]  Starts the sort command using kilobytes of main storage
      and adds storage as needed.  (If kilobytes is less than the minimum
      storage size or greater than the maximum, the minimum or maximum is
      used instead.)  If the -y option is omitted, the sort command starts
      with the default storage size; -y 0 starts with minimum storage, and -y
      (with no value) starts with the maximum storage.	The amount of storage
      used by the sort command has a significant impact on performance.
      Sorting a small file in a large amount of storage is wasteful.

  -z record_size
      Prevents abnormal termination if lines being sorted are longer than the
      default buffer size can handle.  When the -c or -m options are
      specified, the sorting phase is omitted and a system default size
      buffer is used.  If sorted lines are longer than this size, sort
      terminates abnormally.  The -z option specifies that the longest line
      be recorded in the sort phase so that adequate buffers can be allocated
      in the merge phase.  The record_size argument must be a value in bytes
      equal to or greater than the number of bytes in the longest line to be
      merged.

  +fskip.cskip
      Specifies the start position of a key field.  See the -k option for a
      description of the current way to perform this operation.
      (Obsolescent)

      The fskip variable specifies the number of fields to skip from the
      beginning of the input line, and the cskip variable specifies the
      number of additional characters to skip to the right beyond that point.
      For both the starting point (+fskip.cskip) and the ending point
      (-fskip.cskip) of a sort key, fskip is measured from the beginning of
      the input line, and cskip is measured from the last field skipped.  If
      you omit .cskip, .0 (zero) is assumed.  If you omit fskip, 0 (zero) is
      assumed.	If you omit the ending field specifier (-fskip.cskip), the
      end of the line is the end of the sort key.

      You can supply more than one sort key by repeating +fskip.cskip and
      -fskip.cskip.  In cases where you specify more than one sort key, keys
      specified further to the right on the command line are compared only
      after all earlier keys are sorted.  For example, if the first key is to
      be sorted in numerical order and the second according to the collating
      sequence, all strings that start with the number 1 are sorted according
      to the collating order before the strings that start with the number 2.
      Lines that are identical in all keys are sorted with all characters
      significant.  You can also specify different options for different sort
      keys in multiple sort keys.

  -fskip.cskip
      Specifies the end position of a key field.  See the -k option for a
      description of the current way to perform this operation.
      (Obsolescent)

DESCRIPTION
  The sort command sorts lines in its input files and writes the result to
  standard output.

  The sort command performs one of the following functions:

   1.  Sorts lines of all the named files together and writes the result to
       the specified output.

   2.  Merges lines of all the named (presorted) files together and writes
       the result to the specified output.

   3.  Checks that a single input file is correctly presorted.

  Comparisons are based on one or more sort keys extracted from each line of
  input (or the entire line if no sort keys are specified), and are performed
  using the collating sequence of the current locale.

  The sort command treats all of its input files as one file when it performs
  the sort.  A - (dash) in place of a file name specifies standard input.  If
  you do not specify a file name, it sorts standard input.

  The sort command can handle a variety of collation rules typically used in
  Western European languages, including primary/secondary sorting, one-to-two
  character mapping, N-to-one character mapping, and ignore-character
  mapping.  To summarize briefly:

  Primary/Secondary Sorting

  In this system, a group of characters all sort to the same primary
  location.  If there is a tie, a secondary sort is applied.  For example, in
  French, the plain and accented a's all sort to the same primary location.
  If two strings collate to the same primary location, the secondary sort
  goes into effect.  These words are in correct French order:

       abord
       pre
       aprs
       pret
       azur

  One-to-Two Character Mappings

  This system requires that certain single characters be treated as if they
  were two characters.	For example, in German, the  (scharfes-S) is collated
  as if it were ss.

  N-to-One Character Mappings

  Some languages treat a string of characters as if it were one single
  collating element.  For example, in Spanish, the ch and ll sequences are
  treated as their own elements within the alphabet.  (ch comes between c and
  d in the alphabet, and ll comes between l and m.)

  Ignore-Character Mappings

  In some cases, certain characters may be ignored in collation.  For
  example, if - were defined as an ignore-character, the strings re-locate
  and relocate would sort to the same place. The results that you get from
  sort depend on the collating sequence as defined by the current setting of
  the LC_COLLATE environment variable.	The configuration files for collation
  and character classification information are
  /usr/lib/nls/loc/src/locale.src. A field is one or more characters bounded
  by the beginning of a line and the current field separator, or one or more
  characters bounded by a field separator on either side.  The space
  character is the default field separator. Lines longer than 1024 bytes are
  truncated by sort.  The maximum number of fields on a line is 50.

EXIT STATUS
  The sort command returns the following exit values:

  0   All input files were output successfully, or -c was specified and the
      input file was correctly sorted.

  1   Under the -c option, the file was not ordered as specified, or if the
      -c and -u options were both specified, two input lines were found with
      equal keys.

  >1  An error occurred.

EXAMPLES
  The following examples apply to the C locale, unless it is specifically
  stated otherwise.

   1.  To perform a simple sort, enter:
	    sort fruits

       This displays the contents of fruits sorted in ascending lexicographic
       order.  This means that the characters in each column are compared one
       by one, including spaces, digits, and special characters.

       For instance, if fruits contains the text:

	    banana
	    orange
	    Persimmon
	    apple
	    %%banana
	    apple
	    ORANGE

       Then sort fruits displays:
	    %%banana
	    ORANGE
	    Persimmon
	    apple
	    apple
	    banana
	    orange

       This order follows from the fact that in the ASCII collating sequence,
       symbols (such as %) precede uppercase letters, and all uppercase
       letters precede the lowercase letters. If you are using a different
       collating order, your results may be different.

   2.  To group lines that contain uppercase and special characters with
       similar lowercase lines, and remove duplicate lines, enter:
	    sort -d -f -u fruits

       The -u option tells sort to remove duplicate lines, making each line
       of the file unique.  This displays:
	    apple
	    %%banana
	    orange
	    Persimmon

       Not only was the duplicate apple removed, but banana and ORANGE were
       removed as well. The -d option told sort to ignore symbols, so
       %%banana and banana were considered to be duplicate lines and banana
       was removed.  The -f option told sort not to differentiate between
       uppercase and lowercase, so ORANGE and orange were considered to be
       duplicate lines and ORANGE was removed.

       When the -u option is used with input that contains nonidentical lines
       that are considered by sort (due to other options) to be duplicates,
       there is no way to predict which lines sort will keep and which it
       will remove.

   3.  To sort as in Example 2, but remove duplicates unless capitalized or
       punctuated differently, enter:
	    sort -u -k 1df -k 1 fruits

       Options appearing between sort key specifiers apply only to the
       specifier preceding them.  There are two sorts specified in this
       command line. The -k 1df argument specifies the first sort, of the
       same type done with -d -f in Example 3.	Then -k 1 performs another
       comparison to distinguish lines that are not actually identical.	 This
       prevents -u, which applies to both sorts because it precedes the first
       sort key specifier, from removing lines that are not exactly identical
       to other lines.

       Given the fruits file shown in Example 1, the added -k 1 distinguishes
       %%banana from banana and ORANGE from orange. However, the two
       instances of apple are exactly identical, so one of them is deleted.
	    apple
	    %%banana
	    banana
	    ORANGE
	    orange
	    Persimmon

   4.  To specify a new field separator, enter:
	    sort -t : -k 2 vegetables

       This sorts vegetables, comparing the text that follows the first colon
       on each line.  The -t : option tells sort that colons separate fields.
       The -k 2 argument tells sort to ignore the first field and to compare
       from the start of the second field to the end of the line.  If
       vegetables contains:

	    yams:104
	    turnips:8
	    potatoes:15
	    carrots:104
	    green beans:32
	    radishes:5
	    lettuce:15

       then sort -t : -k 2 vegetables displays:
	    carrots:104
	    yams:104
	    lettuce:15
	    potatoes:15
	    green beans:32
	    radishes:5
	    turnips:8

       The numbers are not in ascending order. This is because a
       lexicographic sort compares each character from left to right.  In
       other words, 3 comes before 5 so 32 comes before 5.

   5.  To sort on more than one field, enter:
	    sort -t : -k 2n -k 1r vegetables

       This performs a numeric sort on the second field (-k 2n) and then,
       within that ordering, sorts the first field in reverse collating order
       (-k 1r).	 The output looks like this:
	    radishes:5
	    turnips:8
	    potatoes:15
	    lettuce:15
	    green beans:32
	    yams:104
	    carrots:104

       The lines are sorted in numeric order; when two lines have the same
       number, they appear in reverse collating order.

   6.  To replace the original file with the sorted text, enter:
	    sort -o vegetables vegetables

       The -o vegetables option stores the sorted output into the file
       vegetables.

   7.  To collate using Spanish rules, set the LC_COLLATE (or LANG)
       environment variable to a Spanish locale, and then use sort in the
       regular way, enter:
	    sort sp.words

       If an input file named sp.words contains the following Spanish words:

	    dama
	    loro
	    chapa
	    canto
	    mover
	    chocolate
	    curioso
	    llanura

       The sorted file looks like this:
	    canto
	    curioso
	    chapa
	    chocolate
	    dama
	    loro
	    llanura
	    mover

       If you sort the file in the default C locale, the output looks like
       this:
	    canto
	    chapa
	    chocolate
	    curioso
	    dama
	    llanura
	    loro
	    mover

ENVIRONMENT VARIABLES
  The following environment variables affect the execution of sort:

  LANG
      Provides a default value for the internationalization variables that
      are unset or null. If LANG is unset or null, the corresponding value
      from the default locale is used.	If any of the internationalization
      variables contain an invalid setting, the utility behaves as if none of
      the variables had been defined.

  LC_ALL
      If set to a non-empty string value, overrides the values of all the
      other internationalization variables.

  LC_CTYPE
      Determines the locale for the interpretation of sequences of bytes of
      text data as characters (for example, single-byte as opposed to
      multibyte characters in arguments) and the behavior of character
      classification for the -b, -d, -f, -i, and -n options.

  LC_MESSAGES
      Determines the locale for the format and contents of diagnostic
      messages written to standard error.

  NLSPATH
      Determines the location of message catalogues for the processing of
      LC_MESSAGES.

FILES
  /usr/lib/nls/loc/src/locale.src
      Configuration files

SEE ALSO
  Commands:  comm(1), join(1), uniq(1)

  Functions:  setlocale(3), tolower(3)

  Files:  locale(4)

  Standards:  standards(5)
Index for
Section 1
Alphabetical
listing for S
Top of
page