HTTPd Scripting

WASD Hypertext Services - Technical Overview

[next] [previous][contents]

9 - HTTPd Scripting

Scripts are mechanisms for creating simple ``servers'' sending data to a client, extending the services provided by the basic server. Anything that can write to SYS$OUTPUT can be used to generate script output. A DCL procedure or an executable can be the basis for a script. Simply TYPE-ing a file can be provide script output.

Scripts are enabled using the exec or script rules in the mapping file (see 6 - HTTPd Mapping Rules). The script portion of the result must be a URL equivalent of the physical VMS procedure or executable specification. It is not necessary to supply the .COM or .EXE file type (although not forbidden either), the server will first check for a procedure and if none found then check for an executable.

9.1 - Caution!

Scripts are executed within unprivileged subprocesses spawned by the HTTPd server. These subprocesses are owned by the HTTPd server account (HTTP$SERVER). Script actions can potentially affect server behaviour. For example it is possible for a script to issue an ``HTTPD/DO=ABORT'' command, or to create or modify logical name values in the JOB table (e.g. change the value of LNM$FILE_DEV altering the logical search path). Obviously these types of actions are undesirable. In addition scripts can access any WORLD-readable and modify any WORLD-writable resource in the system/cluster, opening a window for information leakage or mischievous/malicious actions (some might argue that anyone with important WORLD-accessable resources on their system deserves all that happens to them - but we know they're out there :^) Script authors should be aware of any potential side-effects of their scripts, and Web administrators vigilant against possible malicious behaviours of scripts they do not author.

As of version 4.2 it has become possible to exercise some control over the privileges of spawned subprocesses, allowing enviroments that require scripts to have minimum privileges (e.g. NETMBX, TMPMBX for IPC) to provide them using the server account's authorized privileges. See 5 - HTTPd Configuration.

9.2 - Scripting Environment

WASD HTTPd scripting underwent a major redesign between v4.1 and v4.2. This was to provide a faster and more efficient scripting environment. It provided the opportunity for a much needed review of the DCL mechanism within the server. As a result two capabilities not found in earlier versions became available, persistant subprocesses (see below) and CGIplus (see 9.6 - CGIplus Scripting).

Process creation under the VMS operating system is notoriously slow and expensive. This is an inescapable overhead when scripting via child processes. An obvious strategy is to avoid, at least as much as possible, the creation of subprocesses. The only way to do this is to share subprocesses between multiple scripts/requests, addressing the attendant complications of isolating potential interactions between requests. These could occur through changes made by any script to the subprocess' enviroment. For VMS this involves symbol and logical name creation, and files opened at the DCL level. In reality few scripts need to make logical name changes and symbols are easily removed between uses. DCL-opened files are a little more problematic, but again, in reality most scripts doing file manipulation will be images.

A reasonable assumption is that for almost all environments scripts can quite safely share subprocesses with great benefit to response latency and system impact (see 10.2 - Scripting for a table with some comparative performances). If the local environment requires absolute script isolation for some reason then this subprocess-persistance may easily be disabled with a consequent trade-off on performance.

NOTE: With the form of subprocess management used in v4.2 and following BYTLM can become an issue. When setting the HTTPd account BYTLM quota allow approxiamtely 12,500 bytes per subprocess that can be concurrently active, plus a general allowance (technically, allow 1.0 x /NETBUF= plus 1.0 x + 0.5 x + 0.5 x /SUBBUF=). That is if the subprocess hard-limit (see below) is 20 then BYTLM should be set to at least 250,000 plus 50,000. Of course in such a case PRCLM should be set to at least 20, preferably 40. These and other relevant quotas may be monitored using the HTTPDMON utility or the server administration menu.

Zombies

The term zombie is used to describe subprocesses when persisting between uses (the reason should be obvious, they are neither ``alive'' (processing a request) nor are they ``dead'' (deleted) :^) Zombie subprocesses have a finite time to exist (non-life-time?) before they are automatically purged from the system (see 5 - HTTPd Configuration). This keeps process clutter on the system to a minimum.

9.3 - CGI Compliance

The HTTPd scripting mechanism is designed to be WWW CGI (Common Gateway Interface) compliant, based in part on by the INTERNET-DRAFT authored by D.Robinson (drtr@ast.cam.ac.uk), 8 January 1996.

CGI Compliant Variables

Environment variables are created in a similar way to the CERN VMS HTTPd implementation, where CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with ``WWW_'' (by default, although this can be changed using the /CGI_PREFIX qualifier, see 4.3 - HTTPd Command Line, this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).

Extensions to CGI Variables

In line with other CGI implemenations, additional, non-compliant variables are provided to ease CGI interfacing. These provide the various components of the query string. A keyword query string and a form query string are parsed into separated variables, named

  WWW_KEY_number
  WWW_KEY_COUNT
  WWW_FORM_form-element-name

See the example below.

CGI Variable Capacity

DCL symbol values are limited to approximately 1000 characters. The CGI interface will provide symbols with values up to that limit if required. This should be sufficient for most circumstances.

The basic CGI symbol names are demonstrated here with a call to a script that simply executes the following DCL code:

  $ SHOW SYMBOL WWW_*
  $ SHOW SYMBOL *
Note how the request components are represented for ISINDEX-style searching (third item) and a forms-based query (fourth item).
  1. <A HREF="/script/cgi_symbols">
  2. <A HREF="/script/cgi_symbols/ht_root/doc/htd">
  3. <A HREF="/script/cgi_symbols/ht_root/doc/htd/*.*?string1+string2">
  4. <A HREF="/script/cgi_symbols/ht_root/doc/htd?FirstField=for&SecondField=this">

CGI Compliant Output

Script output must behave in a CGI-compliant fashion (by way of contrast, see 9.4 - Non-CGI Compliance Output). That is, a CGI script may redirect the location of the document, using a Location: header line, or may supply a data stream beginning with a Content-Type: header line. Both must be followed by a blank line.

If the script output begins with either of the these two lines HTTPd assumes that output will be line-oriented, without HTTP carriage-control (each line terminated by a carriage-return then a line-feed), and will thereafter ensure each record it receives is correctly terminated before passing it to the client. In this way DCL procedure output (and the VMS CLI in general) is supported transparently.

9.3.1 - Example DCL Scripts

A simple script to provide the system time might be:

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/plain"
  $ say ""
  $! start of plain-text script output
  $ show time

A script to provide the system time more elaborately (using HTML):

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/html"
  $ say ""
  $! start of HTML script output
  $ say "<HTML>"
  $ say "Hello ''WWW_REMOTE_HOST'"  !(CGI variable)
  $ say "<P>"
  $ say "System time on node ''f$getsyi("nodename")' is:"
  $ say "<H1>''f$cvtime()'</H1>"
  $ say "</HTML>"

9.4 - Non-CGI Compliance Output

A script does not have to output a CGI-compliant data stream. If it begins with a HTTP header status line (e.g. ``HTTP/1.0 200 OK''), HTTPd assumes it will supply a raw HTTP data stream, containing all the HTTP requirements.

Any such script must observe the HyperText Transfer Protocol. Every line must be terminated by a carriage-return and line-feed (represented as ``\r''``\n''), or as a minimum by a single line-feed. In particular, the type of the data being returned by the scripts must be included in an HTTP header sent prior to the data itself. Headers for the two most common data types will be illustrated here. Note that the blank line is strictly necessary, it terminates the header.

Plain-Text

  HTTP/1.0 200 ok\r\n
  Content-Type: text/plain\r\n
  \r\n

HTML

  HTTP/1.0 200 ok\r\n
  Content-Type: text/html\r\n
  \r\n

Non-CGI-Compliant DCL script

The following example show a non-CGI-compliant DCL script similar in function to the CGI-compliant one above. Note the full HTTP header and each line explicitly terminated with a carriage-return and line-feed pair.

  $ cr[0,8] = %x0d
  $ lf[0,8] = %x0a
  $ say = "write sys$output"
  $! the next line makes it non-CGI-compliant
  $ say "HTTP/1.0 200 Time follows.''cr'''lf'"
  $ say "Content-Type: text/html''cr'''lf'"
  $ say "''cr'''lf'"
  $! start of HTML script output
  $ say "<HTML>''cr'''lf'"
  $ say "Hello ''WWW_REMOTE_HOST'''cr'''lf'"  !(CGI variable)
  $ say "<P>"
  $ say "System time on node ''f$getsyi("nodename")' is:''cr'''lf'"
  $ say "<H1>''f$cvtime()'</H1>''cr'''lf'"
  $ say "</HTML>''cr'''lf'"

9.5 - Raw HTTP Input

The logical name SYS$INPUT (with a synonym of HTTP$INPUT for backward compatibility) defines a mailbox providing the raw HTTP input stream from the client. This is available for procedures and executables to explicitly open and read.

Note that this is a raw stream, and HTTP lines (carriage- return/line-feed terminated sequences of characters) may have be blocked together for network transport. These would need to be expliclty parsed by the program.

9.6 - CGIplus Scripting

Common Gateway Interface ... plus lower latency, plus greater efficiency, plus far less system impact!

I know, I know! The term CGIplus is a bit too cute but I had to call it something!

CGIplus attempts to eliminate the overhead associated with creating the subprocess and then executing the image of a CGI script. It does this by allowing the subprocess and any associated image/application to continue executing between uses, eliminating any startup overheads. This reduces both the load on the system and the request latency. In this sense these advantages parallel those offered by commercial HTTP server-integration APIs, such as Netscape NSAPI and Microsoft ISAPI, without the disadvantages of such proprietory interfaces, the API complexity, language dependency and server process integration.

CGIplus is not as complex (and consequently nor as versatile) as another approach to improving CGI performance, Open Market's FastCGI (see http://www.fastcgi.com/)  (which strikes the author as being ``CGI'' in nomenclature only). Interestingly, CGIplus could be used as a viable interface to FastCGI servers by using an adaption of the cgi-fcgi program from the FastCGI Developer's Kit within a CGIplus application (see http://www.fastcgi.com/kit/doc/fcgi-devel-kit.htm)

CGIplus design is generic enough to be easily implemented by other server architectures if found desirable. (For example, it is imagined Unix platforms would implement the CGIplus variable stream using named pipes one of which would be designated by the CGIPLUSIN environment variable.) The CGIplus-specific script environment and example code has been made as platform-neutral as possible, providing potential for a more wide-spread adoption. Existing CGI scripts can rapidly and elegantly be modified to additionally support CGIplus. The capability of scripts to easily differentiate between and operate in both standard CGI and CGIplus environments with a minimum of code revision offers great versatility.

CGIplus Performance

A simple performance evaluation indicates the advantage of CGIplus. See 10.2 - Scripting for some test results comparing the non-persistant-process, persistant-process and CGIplus environments.

Without a doubt, the subjective difference in activating the same script within the standard CGI and CGIplus environments is quite startling!

CGIplus Programming

The script interface is still CGI, which means a new API does not need to be learned and existing CGI scripts are simple to modify.

See examples in HT_ROOT:[SRC.CGIPLUS]

Instead of having the CGI variables available from the environment (generally accessed via the C Language getenv() standard library call) a CGIplus script must read the CGI variables from CGIPLUSIN. They are supplied as a series of records (lines) containing a CGI variable name (in upper-case), an equate symbol and then the variable value. The line will never contain more than 1024 characters. The format may be easily parsed and as the value contains no encoded characters may be directly used.

Requirements when using:

After processing, the CGIplus script can loop, waiting to read the details of the next request from CGIPLUSIN.

Request output (to the client) is written to SYS$OUTPUT (<stdout>) as per normal CGI behaviour. End of output MUST be indicated by writing a special EOF record to the output stream. This is bit of a kludge, and the least elegant part of CGIplus design, but it is also the simplest implementation. A unique EOF sequence is generated for each use of DCL via a zombie or CGIplus subprocess. A non-repeating series of bits most unlikely to occur in normal output is employed ... but there is still a very, very, very small chance of premature termination of output (one in 2^280 I think!) See DCL.c for how the value is generated.

The CGIplus EOF string is obtained by the script from the logical name CGIPLUSEOF, defined in the script subprocess' process table, using the scripting language's equivalent of F$TRNLNM(), SYS$TRNLNM(), or a getenv() call (in the C standard library). This string will always contain less than 64 characters and comprise only printable characters. It must be written at the conclusion of a request's output to the output stream as a single record (line) but may also contain a <CR><LF> or just <LF> trailing carriage-control (to allow for programming language requirements). It only has to be evaluated once, as the processing begins, remaining the same for all requests over the life-time of that instance of the script.

HTTP input (raw request stream, header and any body) is still available to a CGIplus script.

Code Examples

Of course a CGIplus script should only have a single exit point and should explicitly close files, free allocated memory, etc., after processing a request (i.e. not rely on image run-down to clean-up after itself). It is particularly important when modifying existing scripts to work in the CGIplus environment to ensure this requirement is met (who of us hasn't thought ``well, this file will close when the image exits anyway''?)

It is a simple task to design a script to modify it's behaviour according to the environment it is executing in. Detecting the presence or absence of the CGIPLUSEOF logical is sufficient indication. The following C code fragment shows simultaneously determining whether it is a standard or CGIplus environment (and setting an appropriate boolean), and getting the CGIplus EOF sequence (if it exists).

  int  IsCgiPlus;
  char  *CgiPlusEofPtr;

  IsCgiPlus = ((CgiPlusEofPtr = getenv("CGIPLUSEOF")) != NULL);

The following C code fragment shows a basic CGIplus request loop, reading lines from CGIPLUSIN, and some basic processing to select required CGI variables for request processing.

  if (IsCgiPlus)
  {
     char  *cptr;
     char  Line [1024],
           RemoteHost [128];
     FILE  *CgiPlusIn;

     if ((CgiPlusIn = fopen (getenv("CGIPLUSIN"), "r")) == NULL)
     {
        perror ("CGIplus: fopen");
        exit (0);
     }

     for (;;)
     {
        /* will block waiting for subsequent requests */
        for (;;)
        {
           /* should never have a problem reading CGIPLUSIN, but */
           if (fgets (Line, sizeof(Line), CgiPlusIn) == NULL)
           {
              perror ("CGIplus: fgets");
              exit (0);
           }
           /* first empty line signals the end of CGIplus variables */
           if (Line[0] == '\n') break;
           /* remove the trailing newline */
           if ((cptr = strchr(Line, '\n')) != NULL) *cptr = '\0';

           /* process the CGI variable(s) we are interested in */
           if (!strncmp (Line, "WWW_REMOTE_HOST=", 16))
              strcpy (RemoteHost, Line+16);
        }

        (process request, signal end-of-output)
     }
  }
CGI scripts can write output in record (line-by-line) or binary mode (more efficient because of buffering by the C RTL). When in binary mode the output stream must be flushed immediately before and after writing the CGIplus EOF sequence (note that in binary a full HTTP stream must also be used). This code fragment shows placing a script output stream into binary mode and the flushing steps.
  /* reopen output stream so that the '\r' and '\n' are not filtered */
  if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
     exit (vaxc$errno);

  do {

     (read request ...)

     /* HTTP response header */
     fprintf (stdout, "HTTP/1.0 200 ok\r\nContent-Type: text/html\r\n\r\n");

     (other output ...)

     if (IsCgiPlus)
     {
        /* the CGIplus EOF must be an independant I/O record */
        fflush (stdout);
        fprintf (stdout, "%s", CgiPlusEofPtr);
        fflush (stdout);
     }

  } while (IsCgiPlus);
If the script output is not binary (using default <stdout>) it is only necessary to ensure the EOF string has a record-delimiting new-line.
  fprintf (stdout, "%s\n", CgiPlusEofPtr);
Other languages may not have this same requirement. DCL procedures are quite capable of being used as CGIplus scripts (see examples).

Whenever developing CGIplus scripts/applications (unlike standard CGI) don't forget that after compiling, the old image must be purged from the server before trying out the new!!! (I've been caught a number of times :^)

See examples in HT_ROOT:[SRC.CGIPLUS]

Other Considerations

Multiple CGIplus scripts may be executing in subprocesses at any one time. This includes multiple instances of any particular script. It is the server's task to track these, distributing appropriate requests to idle subprocesses, monitoring those currently processing requests, creating new instances if and when necessary, and deleting the least-used, idle CGIplus subprocesses when configurable thresholds are reached. Of course it is the script's job to maintain conherency if multiple instances may result in resource conflicts or race conditions, etc.

The CGIplus subprocess can be given a finite life-time set by configuration parameter (see 5 - HTTPd Configuration). If this life-time is not set then the CGIplus will persist indefinitely (i.e. until purged due to soft-limits being reached, or explicitly purged/deleted). When a life-time has been set the CGIplus subprocess is automatically deleted after being idle for the specified period (i.e. not having processed a request). This can be useful in preventing sporadically used scripts from cluttering up the system indefinitely.

In addition, an idle CGIplus script can be terminated by the server at any time the subprocess soft-limit is reached (the subprocess SYS$DELPRC()ed) so resources should be largely quiescent when not actually processing. Of course a CGIplus subprocesses may also be manually terminated from the command line (e.g. STOP/ID=).

Some CGIplus scripting information and management is available via the server administration menu, see 8.1 - HTTPd Server Reports.

CGIplus Rule Mapping

CGIplus scripts are differentiated from standard CGI scripts in the mapping rule configuration file using the ``script+'' and ``exec+'' directives. See 6 - HTTPd Mapping Rules.

Scripts capable of operating in both standard CGI and CGIplus environments may simply be accessed in either via rules such as

  exec /cgi-bin/* /cgi-bin/*
  exec+ /cgiplus-bin/* /cgi-bin/*
while specific scripts can be individually designated as CGIplus using
  script+ /cgiplus_example* /cgi-bin/cgiplus_example*

Caution! If changing CGIplus script mapping it is advised to restart the server rather than reloading the rules. Some conflict is possible when using new rules while existing CGIplus scripts are executing.

9.7 - HTTP Persistant-State Cookies

The server is cookie-aware. That is, if the client supplies a ``Cookie:'' request header line it is passed to a CGI script as ``WWW_HTTP_COOKIE'' CGI variable symbol. If a cookie is not part of the request this symbol does not exist. A script may use the ``Set-Cookie:'' response header line to set cookies.

Here is a small demonstration of cookie processing using a DCL script.


[next] [previous][contents]