WASD Hypertext Services - Technical Overview

[next] [previous] [contents] [full-page]

11 - Scripting

Scripts are mechanisms for creating simple "servers" sending data to a client, extending the services provided by the basic server. Anything that can write to SYS$OUTPUT can be used to generate script output. A DCL procedure or an executable can be the basis for a script. Simply TYPE-ing a file can be provide script output.

Scripts are enabled using the exec or script rules in the mapping file (see 8 - Mapping Rules). The script portion of the result must be a URL equivalent of the physical VMS procedure or executable specification.


11.1 - Caution!

Scripts are executed within unprivileged subprocesses spawned by the HTTPd server. These subprocesses are owned by the HTTPd server account (HTTP$SERVER). Script actions can potentially affect server behaviour. For example it is possible for a script to issue an "HTTPD/DO=ABORT" command, or to create or modify logical name values in the JOB table (e.g. change the value of LNM$FILE_DEV altering the logical search path). Obviously these types of actions are undesirable. In addition scripts can access any WORLD-readable and modify any WORLD-writable resource in the system/cluster, opening a window for information leakage or mischievous/malicious actions (some might argue that anyone with important WORLD-accessable resources on their system deserves all that happens to them - but we know they're out there :^) Script authors should be aware of any potential side-effects of their scripts, and Web administrators vigilant against possible malicious behaviours of scripts they do not author.

As of version 4.2 it has become possible to exercise some control over the privileges of spawned subprocesses, allowing enviroments that require scripts to have minimum privileges (e.g. NETMBX, TMPMBX for IPC) to provide them using the server account's authorized privileges. See 6 - Server Configuration.


11.2 - Scripting Environment

WASD HTTPd scripting underwent a major redesign between v4.1 and v4.2. This was to provide a faster and more efficient scripting environment. It provided the opportunity for a much needed review of the DCL mechanism within the server. As a result two capabilities not found in earlier versions became available, persistant subprocesses (see below) and CGIplus (see 11.7 - CGIplus Scripting).

Process creation under the VMS operating system is notoriously slow and expensive. This is an inescapable overhead when scripting via child processes. An obvious strategy is to avoid, at least as much as possible, the creation of subprocesses. The only way to do this is to share subprocesses between multiple scripts/requests, addressing the attendant complications of isolating potential interactions between requests. These could occur through changes made by any script to the subprocess' enviroment. For VMS this involves symbol and logical name creation, and files opened at the DCL level. In reality few scripts need to make logical name changes and symbols are easily removed between uses. DCL-opened files are a little more problematic, but again, in reality most scripts doing file manipulation will be images.

A reasonable assumption is that for almost all environments scripts can quite safely share subprocesses with great benefit to response latency and system impact (see 13.2 - Scripting for a table with some comparative performances). If the local environment requires absolute script isolation for some reason then this subprocess-persistance may easily be disabled with a consequent trade-off on performance.

NOTE: With the form of subprocess management used in v4.2 and following, BYTLM can become an issue. When setting the HTTPd account BYTLM quota allow approxiamtely 12,500 bytes per subprocess that can be concurrently active, plus a general allowance (technically, allow 1.0 x /NETBUF= plus 1.0 x + 0.5 x + 0.5 x /SUBBUF=). That is if the subprocess hard-limit (see below) is 20 then BYTLM should be set to at least 250,000 plus 50,000. Of course in such a case PRCLM should be set to at least 20, preferably 40. These and other relevant quotas may be monitored using the HTTPDMON utility or the server administration menu.

Zombies

The term zombie is used to describe subprocesses when persisting between uses (the reason should be obvious, they are neither "alive" (processing a request) nor are they "dead" (deleted) :^) Zombie subprocesses have a finite time to exist (non-life-time?) before they are automatically purged from the system (see 6 - Server Configuration). This keeps process clutter on the system to a minimum.


11.3 - Script Run-Time

Scripts are merely executed or interpreted files. Although by default VMS executables and DCL procedures can be used as scripts, other run-time environments may also be configured. For example, scripts written for the Perl language may be transparently given to the Perl interpreter in a script subprocess. This type of script activation is based on a unique file type (extension following the file name), for the Perl example this is most commonly ".PL", or sometimes ".CGI". Both of these may be configured to automatically invoke the site's Perl interpreter, or any other run-time environment for that matter.

This configuration is performed using the [DclScriptRunTime] parameter, where a file type is associated with a run-time environment. This parameter takes two components, the file extension and the run-time verb. The verb may be specified as a simple, globally-accessable verb (e.g. one embedded in the CLI tables), or in the format to construct a foreign-verb, providing reasonable versatility. Run-time parameters may also be appended to the verb if desired. The server ensures the verb is foreign-assigned if necessary, then used on a command line with the script file name as the final parameter to it.

The following is an example showing a Perl run-time environment being specified. The first line assumes the "Perl" verb is globally accessable on the system (e.g. perhaps provided by the DCL$PATH logical) while the second (for the sake of illustration) shows the same Perl interpreter being configured for a different file type using the foreign verb syntax.

  [DclScriptRunTime]
  .PL PERL
  .CGI $PERL_EXE:PERL

A file contain a Perl script then may be activated merely by specifying a path such as the following

  /cgi-bin/example.pl

To add any required parameters just append them to the verb specified.

  [DclScriptRunTime]
  .XYZ XYZ_INTERPRETER -vms -verbose -etc
  .XYZ $XYZ_EXE:XYZ_INTERPRETER /vms /verbose /etc

If a more complex run-time environment is required it may be necessary to wrap the script's execution in a DCL procedure.

Script File Extensions

The WASD server does not require a file type (extension) to be explicitly provided when activating a script. This can help hide the implementation detail of any script. If the script path does not contain a file type the server searches the script location for a file with one of the known file types, first ".COM" for a DCL procedure, then ".EXE" for an executable, then any file types specified using script run-time configuration directive, in the order specified.

For instance, the script activated in the Perl example above could have been specified as below and (provided there was no "EXAMPLE.COM" or "EXAMPLE.EXE" in the search) the same script would have been executed.

  /cgi-bin/example


11.4 - CGI Compliance

The HTTPd scripting mechanism is designed to be WWW CGI (Common Gateway Interface) compliant, based in part on by the INTERNET-DRAFT authored by D.Robinson (drtr@ast.cam.ac.uk), 8 January 1996.

CGI Compliant Variables

Environment variables are created in a similar way to the CERN VMS HTTPd implementation, where CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with "WWW_" (by default, although this can be changed using the /CGI_PREFIX qualifier, see 5.3 - HTTPd Command Line, this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).

Extensions to CGI Variables

In line with other CGI implemenations, additional, non-compliant variables are provided to ease CGI interfacing. These provide the various components of the query string. A keyword query string and a form query string are parsed into separated variables, named

  WWW_KEY_number
  WWW_KEY_COUNT
  WWW_FORM_form-element-name

See the example below.

CGI Variable Capacity

DCL symbol values are limited to approximately 1024 characters. The CGI interface will provide symbols with values up to that limit if required. This should be sufficient for most circumstances.

CGI Variable Descriptions

Remember, all variables are prefixed by "WWW_".


Description"Standard" CGI
AUTH_GROUPauthentication group (or empty)no
AUTH_REALMauthentication realm (or empty)no
AUTH_TYPEauthentication type (BASIC or DIGEST)yes
CONTENT_LENGTH"Content-Length:" from request headeryes
CONTENT_TYPE"Content-Type:" from request headeryes
FORM_fieldquery string "&" separated form elementsno
GATEWAY_INTERFACE"CGI/1.1"yes
HTTP_ACCEPTany list of browser-accepted content typesoptional
HTTP_ACCEPT_CHARSETany list of browser-accepted character setsoptional
HTTP_ACCEPT_LANGUAGEany list of browser-accepted languagesoptional
HTTP_AUTHORIZATIONany from request headeroptional
HTTP_COOKIEany cookie sent by the clientoptional
HTTP_FORWARDEDany proxy/gateway hosts that forwarded the requestoptional
HTTP_HOSThost and port request was sent tooptional
HTTP_IF_NOT_MODIFIEDany last modified GMT time stringoptional
HTTP_PRAGMAany pragma directive of request headeroptional
HTTP_REFERERany source document URL for this requestoptional
HTTP_USER_AGENTclient/browser identification stringoptional
KEY_nquery string "+" separated elementsno
KEY_COUNTnumber of "+" separated elementsno
PATH_INFOvirtual path of data requested in URLyes
PATH_TRANSLATEDVMS file path of data requested in URLyes
QUERY_STRINGun-URL-decoded string following "?" in URLyes
REMOTE_ADDRIP host address of HTTP clientyes
REMOTE_HOSTIP host name of HTTP clientyes
REMOTE_USERauthenticated remote user name (or empty)yes
REQUEST_METHOD"GET", "PUT", etc.yes
REQUEST_TIME_GMTGMT time request receivedno
REQUEST_TIME_LOCALLocal time request receivedno
SCRIPT_NAMEname of script being executed (e.g. "/query")yes
SERVER_NAMEIP host name of server systemyes
SERVER_PROTOCOLHTTP protocol version (always "HTTP/1.0")yes
SERVER_PORTIP port request was received onyes
SERVER_SOFTWAREsoftware ID of HTTP serveryes


CGI Variable Demonstration

The basic CGI symbol names are demonstrated here with a call to a script that simply executes the following DCL code:

  $ SHOW SYMBOL WWW_*
  $ SHOW SYMBOL *
Note how the request components are represented for ISINDEX-style searching (third item) and a forms-based query (fourth item).
  1. <A HREF="/script/cgi_symbols">
  2. <A HREF="/script/cgi_symbols/ht_root/doc/htd">
  3. <A HREF="/script/cgi_symbols/ht_root/doc/htd/*.*?string1+string2">
  4. <A HREF="/script/cgi_symbols/ht_root/doc/htd?FirstField=for&SecondField=this">

CGI Compliant Output

Script output must behave in a CGI-compliant fashion (by way of contrast, see 11.5 - Raw HTTP Output). That is, a CGI script may redirect the location of the document, using a Location: header line, or may supply a data stream beginning with a Content-Type: header line. Both must be followed by a blank line.

If the script output begins with a CGI-compliant "Content-Type: text/..." (text document) the HTTPd assumes that output will be line-oriented and requiring HTTP carriage-control (each record/line terminated by a line-feed), and will thereafter ensure each record it receives is correctly terminated before passing it to the client. In this way DCL procedure output (and the VMS CLI in general) is supported transparently. Any other content-type is assumed to be binary and no carriage control is enforced.


11.4.1 - Example DCL Scripts

A simple script to provide the system time might be:

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/plain"
  $ say ""
  $! start of plain-text script output
  $ show time

A script to provide the system time more elaborately (using HTML):

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/html"
  $ say ""
  $! start of HTML script output
  $ say "<HTML>"
  $ say "Hello ''WWW_REMOTE_HOST'"  !(CGI variable)
  $ say "<P>"
  $ say "System time on node ''f$getsyi("nodename")' is:"
  $ say "<H1>''f$cvtime()'</H1>"
  $ say "</HTML>"


11.5 - Raw HTTP Output

A script does not have to output a CGI-compliant data stream. If it begins with a HTTP header status line (e.g. "HTTP/1.0 200 OK"), HTTPd assumes it will supply a raw HTTP data stream, containing all the HTTP requirements.

Any such script must observe the HyperText Transfer Protocol. Every line must be terminated by a carriage-return and line-feed (represented as "\r""\n"), or as a minimum by a single line-feed. In particular, the type of the data being returned by the scripts must be included in an HTTP header sent prior to the data itself. Headers for the two most common data types will be illustrated here. Note that the blank line is strictly necessary, it terminates the header.

Plain-Text

  HTTP/1.0 200 ok\r\n
  Content-Type: text/plain\r\n
  \r\n

HTML

  HTTP/1.0 200 ok\r\n
  Content-Type: text/html\r\n
  \r\n

Raw HTTP DCL Script

The following example show a non-CGI-compliant DCL script similar in function to the CGI-compliant one above. Note the full HTTP header and each line explicitly terminated with a carriage-return and line-feed pair.

  $ cr[0,8] = %x0d
  $ lf[0,8] = %x0a
  $ say = "write sys$output"
  $! the next line determines it is raw HTTP stream
  $ say "HTTP/1.0 200 Success''cr'''lf'"
  $ say "Content-Type: text/html''cr'''lf'"
  $! response header separating blank line
  $ say "''cr'''lf'"
  $! start of HTML script output
  $ say "<HTML>''lf'"
  $ say "Hello ''WWW_REMOTE_HOST'''lf'"
  $ say "<P>''lf'"
  $ say "Local time is ''WWW_TIME_LOCAL'''lf'"
  $ say "</HTML>''lf'"

Raw HTTP C Script

When scripting using the C programming language and providing a full HTTP response there can be considerable efficiencies to be gained by providing a binary output stream from the script. This may be simply provided using a code construct similar to following to reopen <stdout> in binary mode.

  /* reopen output stream so that the '\r' and '\n' are not filtered */
  #ifdef __DECC
     if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
        exit (vaxc$errno);
  #endif
This is used consistently in WASD scripts. Of course after that the full HTTP header must be supplied.
     fprintf (stdout,
  "HTTP/1.0 200 Success\r\n\
  Content-Type: text/html\r\n\
  \r\n\
  <HTML>\n\
  Hello %s\n\
  <P>\n\
  System time is %s\n\
  </HTML>\n",
     getenv("WWW_REMOTE_HOST"),
     getenv("WWW_TIME_LOCAL"));


11.6 - Raw HTTP Input

The logical name SYS$INPUT (with a synonym of HTTP$INPUT for backward compatibility), <stdin> for C Language based scripts, defines a mailbox providing a stream containing the request body (if any). This is available for procedures and executables to explicitly open and read.

Note that this is a raw stream, and HTTP lines (carriage- return/line-feed terminated sequences of characters) may have be blocked together for network transport. These would need to be explicity parsed by the program.

NOTE: Versions of the server prior to 4.3 supplied the full request (header then body) to the script. This was not fully CGI-compliant. Versions 4.3 and following supply only the body, although the previous behaviour may be explicitly selected using the configuration parameter [DclFullRequest].


11.7 - CGIplus Scripting

Common Gateway Interface ... plus lower latency,
plus greater efficiency,
plus far less system impact!

I know, I know! The term CGIplus is a bit too cute but I had to call it something!

CGIplus attempts to eliminate the overhead associated with creating the subprocess and then executing the image of a CGI script. It does this by allowing the subprocess and any associated image/application to continue executing between uses, eliminating any startup overheads. This reduces both the load on the system and the request latency. In this sense these advantages parallel those offered by commercial HTTP server-integration APIs, such as Netscape NSAPI and Microsoft ISAPI, without the disadvantages of such proprietory interfaces, the API complexity, language dependency and server process integration.

CGIplus is not as complex (and consequently nor as versatile) as another approach to improving CGI performance, Open Market's FastCGI, see http://www.fastcgi.com/ (which, while having laudable platform-independence, strikes the author as being "CGI" in nomenclature only).

CGIplus design is generic enough to be easily implemented by other server architectures if found desirable. (For example, it is imagined Unix platforms would implement the CGIplus variable stream using named pipes one of which would be designated by the CGIPLUSIN environment variable.) The CGIplus-specific script environment and example code has been made as platform-neutral as possible, providing potential for a more wide-spread adoption. Existing CGI scripts can rapidly and elegantly be modified to additionally support CGIplus. The capability of scripts to easily differentiate between and operate in both standard CGI and CGIplus environments with a minimum of code revision offers great versatility.

CGIplus Performance

A simple performance evaluation indicates the advantage of CGIplus. See 13.2 - Scripting for some test results comparing the non-persistant-process, persistant-process and CGIplus environments.

Without a doubt, the subjective difference in activating the same script within the standard CGI and CGIplus environments is quite startling!

CGIplus Programming

The script interface is still CGI, which means a new API does not need to be learned and existing CGI scripts are simple to modify.

See examples in HT_ROOT:[SRC.CGIPLUS]

Instead of having the CGI variables available from the environment (generally accessed via the C Language getenv() standard library call) a CGIplus script must read the CGI variables from CGIPLUSIN. They are supplied as a series of records (lines) containing a CGI variable name (in upper-case), an equate symbol and then the variable value. The line will never contain more than 1024 characters. The format may be easily parsed and as the value contains no encoded characters may be directly used.

Requirements when using:

After processing, the CGIplus script can loop, waiting to read the details of the next request from CGIPLUSIN.

Request output (to the client) is written to SYS$OUTPUT (<stdout>) as per normal CGI behaviour. End of output MUST be indicated by writing a special EOF record to the output stream. This is bit of a kludge, and the least elegant part of CGIplus design, but it is also the simplest implementation. A unique EOF sequence is generated for each use of DCL via a zombie or CGIplus subprocess. A non-repeating series of bits most unlikely to occur in normal output is employed ... but there is still a very, very, very small chance of premature termination of output (one in 2^280 I think!) See DCL.c for how the value is generated.

The CGIplus EOF string is obtained by the script from the logical name CGIPLUSEOF, defined in the script subprocess' process table, using the scripting language's equivalent of F$TRNLNM(), SYS$TRNLNM(), or a getenv() call (in the C standard library). This string will always contain less than 64 characters and comprise only printable characters. It must be written at the conclusion of a request's output to the output stream as a single record (line) but may also contain a <CR><LF> or just <LF> trailing carriage-control (to allow for programming language requirements). It only has to be evaluated once, as the processing begins, remaining the same for all requests over the life-time of that instance of the script.

HTTP input (raw request stream, header and any body) is still available to a CGIplus script.

Code Examples

Of course a CGIplus script should only have a single exit point and should explicitly close files, free allocated memory, etc., after processing a request (i.e. not rely on image run-down to clean-up after itself). It is particularly important when modifying existing scripts to work in the CGIplus environment to ensure this requirement is met (who of us hasn't thought "well, this file will close when the image exits anyway"?)

It is a simple task to design a script to modify it's behaviour according to the environment it is executing in. Detecting the presence or absence of the CGIPLUSEOF logical is sufficient indication. The following C code fragment shows simultaneously determining whether it is a standard or CGIplus environment (and setting an appropriate boolean), and getting the CGIplus EOF sequence (if it exists).

  int  IsCgiPlus;
  char  *CgiPlusEofPtr;

  IsCgiPlus = ((CgiPlusEofPtr = getenv("CGIPLUSEOF")) != NULL);

The following C code fragment shows a basic CGIplus request loop, reading lines from CGIPLUSIN, and some basic processing to select required CGI variables for request processing.

  if (IsCgiPlus)
  {
     char  *cptr;
     char  Line [1024],
           RemoteHost [128];
     FILE  *CgiPlusIn;

     if ((CgiPlusIn = fopen (getenv("CGIPLUSIN"), "r")) == NULL)
     {
        perror ("CGIplus: fopen");
        exit (0);
     }

     for (;;)
     {
        /* will block waiting for subsequent requests */
        for (;;)
        {
           /* should never have a problem reading CGIPLUSIN, but */
           if (fgets (Line, sizeof(Line), CgiPlusIn) == NULL)
           {
              perror ("CGIplus: fgets");
              exit (0);
           }
           /* first empty line signals the end of CGIplus variables */
           if (Line[0] == '\n') break;
           /* remove the trailing newline */
           if ((cptr = strchr(Line, '\n')) != NULL) *cptr = '\0';

           /* process the CGI variable(s) we are interested in */
           if (!strncmp (Line, "WWW_REMOTE_HOST=", 16))
              strcpy (RemoteHost, Line+16);
        }

        (process request, signal end-of-output)
     }
  }
CGI scripts can write output in record (line-by-line) or binary mode (more efficient because of buffering by the C RTL). When in binary mode the output stream must be flushed immediately before and after writing the CGIplus EOF sequence (note that in binary a full HTTP stream must also be used). This code fragment shows placing a script output stream into binary mode and the flushing steps.
  /* reopen output stream so that the '\r' and '\n' are not filtered */
  if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
     exit (vaxc$errno);

  do {

     (read request ...)

     /* HTTP response header */
     fprintf (stdout, "HTTP/1.0 200 ok\r\nContent-Type: text/html\r\n\r\n");

     (other output ...)

     if (IsCgiPlus)
     {
        /* the CGIplus EOF must be an independant I/O record */
        fflush (stdout);
        fprintf (stdout, "%s", CgiPlusEofPtr);
        fflush (stdout);
     }

  } while (IsCgiPlus);
If the script output is not binary (using default <stdout>) it is only necessary to ensure the EOF string has a record-delimiting new-line.
  fprintf (stdout, "%s\n", CgiPlusEofPtr);
Other languages may not have this same requirement. DCL procedures are quite capable of being used as CGIplus scripts.

See examples in HT_ROOT:[SRC.CGIPLUS]

Whenever developing CGIplus scripts/applications (unlike standard CGI) don't forget that after compiling, the old image must be purged from the server before trying out the new!!! (I've been caught a number of times :^)

Scripting subprocesses may be purged or deleted using (see 5.3.2.4 - DCL/Scripting Subprocesses):

  $ HTTPD /DO=DCL=DELETE
  $ HTTPD /DO=DCL=PURGE

Other Considerations

Multiple CGIplus scripts may be executing in subprocesses at any one time. This includes multiple instances of any particular script. It is the server's task to track these, distributing appropriate requests to idle subprocesses, monitoring those currently processing requests, creating new instances if and when necessary, and deleting the least-used, idle CGIplus subprocesses when configurable thresholds are reached. Of course it is the script's job to maintain coherency if multiple instances may result in resource conflicts or race conditions, etc., between the scripts.

The CGIplus subprocess can be given a finite life-time set by configuration parameter (see 6 - Server Configuration). If this life-time is not set then the CGIplus will persist indefinitely (i.e. until purged due to soft-limits being reached, or explicitly purged/deleted). When a life-time has been set the CGIplus subprocess is automatically deleted after being idle for the specified period (i.e. not having processed a request). This can be useful in preventing sporadically used scripts from cluttering up the system indefinitely.

In addition, an idle CGIplus script can be terminated by the server at any time the subprocess soft-limit is reached (the subprocess SYS$DELPRC()ed) so resources should be largely quiescent when not actually processing. Of course a CGIplus subprocesses may also be manually terminated from the command line (e.g. STOP/ID=).

Some CGIplus scripting information and management is available via the server administration menu, see 10.1 - HTTPd Server Reports.

CGIplus Rule Mapping

CGIplus scripts are differentiated from standard CGI scripts in the mapping rule configuration file using the "script+" and "exec+" directives. See 8 - Mapping Rules.

Scripts capable of operating in both standard CGI and CGIplus environments may simply be accessed in either via rules such as

  exec /cgi-bin/* /cgi-bin/*
  exec+ /cgiplus-bin/* /cgi-bin/*
while specific scripts can be individually designated as CGIplus using
  script+ /cgiplus_example* /cgi-bin/cgiplus_example*

Caution! If changing CGIplus script mapping it is advised to restart the server rather than reloading the rules. Some conflict is possible when using new rules while existing CGIplus scripts are executing.


11.8 - HTTP Persistant-State Cookies

The server is cookie-aware. That is, if the client supplies a "Cookie:" request header line it is passed to a CGI script as "WWW_HTTP_COOKIE" CGI variable symbol. If a cookie is not part of the request this symbol does not exist. A script may use the "Set-Cookie:" response header line to set cookies.

Here is a small demonstration of cookie processing using a DCL script.


[next] [previous] [contents] [full-page]