8    Document Data Filtering

Often, document data needs to be modified before it can be printed. For example, simple text documents need to be translated into PostScript before they can be printed on a PostScript printer. Or, documents using the EBCDIC character set might need to be converted to the ASCII character set before they can be printed on common desktop printers.

Because the need for document data modification varies by customer and country, the supervisor includes a mechanism for user-written or platform-supplied programs to modify document data before it is sent to a printer. These programs are known as filters or data type translators and can be applied to documents printed as part of a job.

Filters are executed by the supervisor in a child process. The document data is piped to the filter, and the supervisor reads the data back before it is sent to the printer. The supervisor controls the communication and control of the printer.

The information in this chapter applies only to physical printers supported by the supervisor, pdspvr. The Outbound Gateway Supervisor does not perform data filtering. It relies on a remote host or printer to perform filtering tasks.

8.1    Types of Filters

Advanced Printing Software supports translation filters and modification filters. Translation filters perform the following functions:

Modification filters perform the following functions:

There is no difference in how the two types of filter programs are written, and the supervisor does not verify that they are used properly. Both modification and translation filtering can be applied to a document. When this occurs, the modification filter receives the original document data, the output of the modification filter is piped to the translation filter, and the output from the translation filter is sent by the supervisor to the printer.

The supervisor cannot completely control what the filter does. A filter should not, for example, write to a file or directly to a device, but there is nothing the supervisor can do to prevent this.

8.1.1    Filter-Related Attributes

The following attributes provide information about filters:

filter-definition Server Attributes

The server filter-definition attribute defines a program as a filter and contains the information needed to invoke the program. The filter-definition attribute is a complex attribute with the syntax:

filter-definition={name type input-format output-format command}

In addition, the filter-definition attribute is multivalued. You can define any number of named filters.

Each component field of the attribute value is described in the following table.

Field Value Description
name text The name of the filter must be unique within the server. The print system uses the name as a search key for new filter definitions.
type translation or modification Type of filter. Defines the mechanism used to invoke the filter. The default is translation.
input-format Document format The document format the filter supports as input. If omitted, the filter can take any format as input. Used only for translation filter invocation.
output-format Document format The document format the filter produces on output.
command text The command that the server executes to invoke the filter.
     

To use a filter, it must be defined in the supervisor. An administrator defines filters by setting the filter-definition attribute with the pdset command. For example, the following command line adds a simple-text to PostScript translation filter to a list of filters known to the supervisor:

# pdset -c server -x "filter-definition+={name=my-text-to-ps \
  type=translation input-format=simple-text \
  output-format=PostScript command='/usr/bin/ttp'}" \
  blue_sup 

Once the filter-definition attribute has a value, more filters are added using the += syntax. To remove one filter while retaining others, use the -= syntax and express all five fields exactly. To remove all filter definitions, use the == syntax as follows:

# pdset -c server -x filter-definition== blue_sup

Important Security Note

Always specify a command executable that can only be replaced or modified by the root account. Specifying a filter program that resides in the directory of a nonprivileged user constitutes a serious security risk.

excluded-filters Printer Attribute

Use the printer excluded-filters attribute to disallow the use of certain translation filters for a particular printer. The value of excluded-filters is a list of filter names. When the supervisor chooses a translation filter for documents directed to the printer, it excludes any filter listed on this attribute. The supervisor does not ensure that names on the excluded- filters list are actually defined filters. The supervisor does not update the excluded-filters attribute if filters are removed from the filter-definition list.

modification-filter Document Attribute

Users specify the modification-filter document attribute to apply a modification filter to documents in a job. The value of this attribute is the name of a filter to be applied to the document data prior to any translation filtering. The print system does not verify that the filter specified is known to the supervisor when the document is submitted.

translation-filter Document Attribute

Users can specify the translation-filter document attribute to override the automatic invocation of a translation filter when more than one filter is available that can perform the specified translation. If this attribute is specified, the value of this attribute is the name of a filter that is applied to the document data regardless of the value of the document-format and native-document-formats-ready attributes.

no-filtering Document Attribute

Users can specify the Boolean document attribute, no-filtering, to disable both translation and modification filtering. If the no-filtering attribute is true, the server invokes no translation filters and ignores the value of the modification-filter attribute.

8.1.2    Command Text Processing

The command field of the filter-definition attribute contains the command that the supervisor executes to invoke the filter. This command field can contain variables that name attribute values. The supervisor replaces the variables with corresponding attribute values. The syntax for a substitution field is:

# {attribute-name, [default-value], [substitution-expression]}

Items in square brackets are optional. The default-value and substitution- expression fields can be empty strings. The attribute name can be any of the document attributes listed in Table 8-1. The supervisor replaces the substitution field with one of the following:

Attribute names and values cannot be abbreviated. If the attribute has no value and default-value is specified, the supervisor replaces the substitution-expression field with a default-value. If the attribute has no value and a default value is not specified, the supervisor replaces #{...} with an empty string.

Examples:

  1. "-N${number-up,0}"

    Evaluates to "-N2" if number-up has the value "2".

  2. "-N${number-up,0}"

    Evaluates to "-N0" if number-up is not defined.

  3. "${number-pages,,-P}"

    Evaluates to "-P" if number-pages is defined, but to an empty string ("") if number-pages is not defined.

Nested Evaluation

In some instances, it is necessary to include the value of one or more attributes in the substitution-expression field. This is done by including attribute substitution arguments within the substitution-expression field.

Examples:

  1. "${number-up,,-N${number-up}}"

    Evaluates to an empty string if number-up is not defined, or -N2 if number-up is defined with the value "2".

  2. "${top-margin,,-M${${top-margin},${left-margin},${right- margin},${bottom-margin}}"

    Evaluates to "-M4,0,0,4" when top-margin=4, left-margin=0, right-margin=0, and bottom-margin=4.

Table 8-1 lists the attributes that can be used in command substitution fields. The print system supports some attributes that are used primarily with simple-text documents. These attributes include: bottom-margin, footer-text, header-text, left-margin, length, number-pages, repeated-tab-stops, right-margin, top-margin, width, and content-orientation.

OID is the standardized value, Object Identifier. Name or OID indicates that the attribute can have either a standard value or a site specific-name value.

Table 8-1:  Document Attributes Used in Command Substitution Fields

Attribute Name Syntax Description
bottom-margin Integer Distance, in characters, between bottom edge of page and bottom of text area.
content-orientation Oid Portrait or landscape.
default-character-set NameOrOid The character set name of the document.
default-font Text A font name.
default-medium Oid-name or Text Requested media name.
document-format Oid The document's page description language.
document-length Integer Length, in characters, of a formatted page.
document-name Text The document or file name.
footer-text Text The footer line of each page.
header-text Text The header line of each page.
left-margin Integer Distance, in characters, between the left edge of the logical page and left edge of the text area.
number-pages Boolean Indicates whether or not to number the pages.
number-up Integer The OIDs will be converted to their integer values.
page-select Integer One or more page ranges separated by commas.
plex Oid-name Simplex, duplex, or tumble.
repeated-tab-stops Integer Number of characters between tab stops
right-margin Integer Distance, in characters, between the right edge of the page and the right edge of the text area.
top-margin Integer Distance, in characters, between the top of the page and the top of the text area.
width Integer Maximum line width in characters.

For simplification, Table 8-1 excludes attributes intended to control the printer (such as sides), attributes with complex syntax, and attributes with multiple values (such as explicit-tab-stops).

For example, if the command for a translation filter is /usr/pd/my-filter -d${document-format} and the command for a modification filter is /usr/pd/your-filter -o${content-orientation} -n${number-up} and a user requests modification and translation, a child process would be executed with a command such as the following:

/usr/pd/my-filter -simple-text|/usr/pd/your-filter -oportrait -n2

8.1.3    Invoking a Filter

The rules the supervisor uses to invoke a filter are the following:

8.1.4    Error Handling

In general, errors that occur while setting up, invoking, or executing a filter result in the job (not just the document) being aborted. Some of the conditions that result in an aborted job include:

The supervisor notifies the user of these conditions by way of event notification (job-aborted-by-server), through messages stored in the job-state-message attribute, and through an error page that is printed on the target physical printer. When an error occurs, the job is put into the retained state on the spooler.

8.1.5    Creating a Filter Program

Filter programs must adhere to the following rules:

8.2    Using the Text-to-PostScript Translation Filter

Advanced Printing Software includes one translation filter. This program translates simple text documents to PostScript and, optionally, performs number-up processing. This filter is stored during the installation procedure as /usr/pd/bin/trn_textps.

Simple text format documents sent to printers that handle only the PostScript language need to be translated to PostScript. This translation occurs when the document's document-format attribute value is simple-text and the physical printer, native-document-formats-ready attribute value is PostScript. If other formats, in particular PCL, are specified by the native-document-formats-ready attribute, the supervisor sends the data directly to the printer.

The print system software includes a command script, /usr/pd/scripts/pd_get_started, that automatically configures the text-to-PostScript translation filter when you create a supervisor.

Table 8-2 lists all the command options supported by the text-to-PostScript translator program. Administrators can set up the filter-definition attribute with command option substitutions that relate print system attributes to translation options.

The following example shows how the command options are used.

# pdset -c server \
-x filter-definition=\
'{name=text-to-ps \
type=translation \
input-format=simple-text \
output-format=PostScript \
command="/usr/pd/bin/trn_textps -N${number-up,0} \
${content-orientation,,-O${content-orientation}} \
${top-margin,,-a${top-margin}} \
${bottom-margin,,-b${bottom-margin}} \
${left-margin,,-c${left-margin}} \
${right-margin,,-d${right-margin}} -l \
${length} -w${width} \ 
${number-pages,,-P} \
${repeated-tab-stops,,-t \
${repeated-tab-stops}}" }' red_sup

The rules of substitution described in Table 8-2 are applied in the following example. The command is used on a document that requires number-up=2 and width=80:

# /usr/pd/bin/trn_textps -N2 -w80

The attributes that are not specified in the print request are not represented or replaced with default values, while those that are specified are converted to their substitution equivalents.

In addition, if the document attribute, number-up, has a value of 1, 2, or 4, the filter prints 1 (with margins), 2, or 4 pages per sheet. Note that a number-up value of 0 or none is valid and suppresses number-up processing.

The document attribute, content-orientation, affects number-up processing in the placement of the logical pages on the sheet of paper.

Table 8-2:  Text-to-PostScript Translator Command Options

Option Corresponding Attribute Description
-a top-margin The number of lines to add to the default margin at the top of the page. Valid value: Integer >= 0
-B No attribute Prints alternating grey bars three lines in width.
-b bottom-margin The number of lines to add to the default margin at the bottom of the page. Valid values: Integer >= 0
-c left-margin The number of characters to add to the default margin at the left side of the page. Valid values: Integer >= 0
-d right-margin The number of characters to add to the default margin at the right side of the page. Valid values: Integer >= 0
-F footer-text Prints page footer text.
-L No attribute Prints line numbers.
-l document-length Lines per page, the number of rows to be printed on a page before a new page is started. Valid values: Integer > 0
-N number-up The number-up value that specifies the number of page spots to be printed on the physical sheet. Valid values: 0, 1, 2, or 4
-O content-orientation Orientation value that specifies whether the page is formatted for long- or short-edge printing: Valid values: landscape, portrait
-P number-pages Value that specifies whether page numbers should be printed at the top of the page. This option has no arguments. The default is to not print page numbers.
-p page-select One or more page selection ranges separated by commas. A range can be a integer page number or two integers separated by a colon. To print pages 3-6, and 9 use the following command: 3:6,9
-Q No attribute Nowrap; specifies whether lines longer than allowed for the page (either by an explicit -w setting or derived from the sheet size) should be truncated. This option takes no arguments. Its presence specifies truncation. The default is linewrap.
-S default-medium Sheet size for which the translated page should be formatted. Default values for rows and columns are derived, though they can be overridden by the -w and -l options. Valid values: a ,b: com10, legal 7x9; a0, b4: d letter, 9x12_envelope; a1 , b5: d1_envelope, monarch; a2, b6: e postcard; a3: business_envelope , executive 10x13_envelope; a4: c folio 10x14: a5: c4_envelope, halfletter, 11x14; a6: c5_envelope, ledger, 7_envelope
-T header-text Prints page header text.
-t repeated-tab-stops Tab width value that expands tabs to byte positions number+1, 2*number+1, 3*number+1, and so on. The default value of number is 8. Tab characters in the input expand to the appropriate number of spaces to line up with the next tab setting. Valid values: Integer > 0
-w width Characters per line; the number of columns to be printed on a line before a line wrap or truncation occurs. Valid value: integer > 0