WASD HTTP Server -"Nuts and Bolts"

[next] [previous] [contents] [full-page]

3 - HTTPd Modules

The HTTPd server comprises several main modules, implementing the obvious functionality of the server, and other, smaller, support modules.


3.1 - ADMIN.C

Code for: Admin.c

(INCOMPLETE AS YET!)

This module provides the on-line server administration menu and functionality. Some administration pages are provided by the Upd.c module, "piggy-backed" into normal editing dialogues.


3.2 - AUTH.C

Code for: Auth.c

(INCOMPLETE AS YET!)

Sorry, its fairly complex module so I'll have to plead being too busy!

HTAdmin.c module helps administer the authentication databases.

The authorization module handles user authentication and path method authorization for all requests received by the server. Authenticated username/password information is cached in a balanced binary tree, improving performance compared to on-disk checking each time a request is received.

HTTPd-specific authentication databases are binary files with fixed-length 512 bytes records. Within the record is provision for:

Server-host SYSUAF authentication is also provided.


3.3 - BASIC.C

Code for: Basic.c

This module provides authentication functionality for the BASIC method.


3.4 - CACHE.C

Code for: Cache.c

This module implements a file data and revision time cache. It is different to most other modules in that it doesn't have a "task" structure. The small amount of storage required is integrated into the request structure. It is designed to provided an efficient static document request (file transfer) mechanism. Unlike the file module which may interleave it's activities within the those of other modules (e.g. the directory module using it to provide read-me information), it can only be used once, stand-alone per request.

Cache data is loaded by the file module while concurrently transfering that data to the original requesting client, using buffer space supplied by cache module, space that can then be retained for reuse as cache. Hence the cache load adds no significant overhead to the actual reading and initial transfer of the file.

Space for a file's data is dynamically allocated and reallocated if necessary as cache entries are reused. It is allocated in user-specifiable chunks. It is expected this mechanism provides some efficiencies when reusing cache entries. Memory may be reclaimed from entries twowards the end of the list (least used) if this is required for the file data of one requiring loading. The process is termed memory scavenging.

Cache entries are maintained in a linked list with the most recent and most frequently hit entries towards the head of the list. A case-insensitive hash index into the list entries is maintained.

The search is based on three factors. A simple, efficiently generated, case-insensitive hash value providing a rapid but inconclusive index into the cached paths. Secondarily, the length of the two paths. Finally a conclusive, case-insensitive string comparison if the previous two tests were matches. When the paths do not match a collision list allows rapid subsequent searching.

The linked-list organisation also allows a simple implementation of a least-recently-used (LRU) algorithm for selecting an entry when a new request demands an entry and space for cache loading. The linked list is naturally ordered from most recently and most frequently accessed at the head, to the least recently and least frequently accessed at the tail. Hence an infrequently accessed entry is selected from the tail end of the list, it's data invalidated and given to the new request for cache load. Invalidated data cache entries are also immediately placed at the tail of the list for reuse/reloading.

When a new entry is initially loaded it is placed at the top of the list. Hits on other entries result in a check being made against the number of hits of head entry in the list. If the entry being hit has a higher hit count it is placed at the head of the list, pushing the previously head entry "down". If not then it is again checked against the entry immediately before it in the list. If higher then the two are swapped. This results in the most recently loaded entries and the more frequently hit being nearest and migrating towards the start of the search.

To help prevent the cache thrashing with floods of requests for not currently loaded files, any entry that has a suitably high number of hits over the recent past (suitably high ... how many is that, and recent past ... how long is that?) are not reused until no hits have occured within that period. Hopefully this prevents lots of unnecessary loads of one-offs at the expense of genuinely frequently accessed files.

To prevent multiple loads of the same path/file, for instance if a subsequent request demands the same file as a previous request is still currently loading, any subsequent request will merely transfer the file, not concurrently load it into the cache.

Contents Validation

The cache will automatically revalidate the file data after a specified number of seconds by comparing the original file revision time to the current revision time. If different the file contents have changed and the cache contents declared invalid. If found invalid the file transfer then continues outside of the cache with the new contents being concurrently reloaded into the cache. Cache validation is also always performed if the request uses "Pragma: no-cache" (i.e. as with the Netscape Navigator reload function). Hence there is no need for any explicit flushing of the cache under normal operation. If a document does not immediately reflect and changes made to it (i.e. validation time has not been reached) validation (and consequent reload) can be "forced" with a browser reload. The entire cache may be purged of cached data either from the server administration menu or using command line server control.


3.5 - CGI.C

Code for: Cgi.c

The CGI module provides the CGI scripting support functions used by the DCL.C and DECNET.C modules. These functions provide the following.


3.6 - DCL.C

Code for: DCL.c

(MAJOR REVISION FOR v4.2!)

The DCL execution functionality must interface and coordinate with an external subprocess. It too is asynchronously driven by I/O once the subprocess has been created and is executing independently. Communication with the subprocess (IPC) is via mailboxes.

Process creation by the VMS operating system is notoriously slow and expensive. This is an inescapable factor when scripting using child processes within the environment. An obvious strategy is to avoid, at least as much as possible, the creation of subprocesses. The only way to do this is to share subprocesses between multiple scripts/requests. The obvious complication becomes isolating the potential interactions due to changes made by any script to the subprocess' enviroment. For VMS these changes are basically symbol and logical name creation, and files opened at the DCL level. In reality few scripts need to make logical name changes and symbols are easily removed between uses. DCL-opened files are a little more problematic, but again, in reality most scripts doing file manipulation will be images.

The conclusion arrived at is that for almost all environments scripts can quite safely share subprocesses with great benefit to response latency and system impact. If the local environment requires absolute script isolation for some reason then this subprocess-persistance may easily be disabled with a consequent trade-off on performance.

The term zombie is used affectionately to describe these subprocesses when persisting between uses (the reason should be obvious, they are neither "alive" (processing a request) nor are they "dead" (deleted) :^)

The DCL facility is used in three modes:

  1. To execute independent DCL commands.
    This is used to provide DCL command output for SSI (pre-processed HTML). Subprocesses will exist over multiple commands (if zombies are enabled).

  2. To execute standard CGI scripts.
    Subprocesses will exist over multiple requests (if zombies are enabled).

  3. To execute CGIplus scripts.
    Subprocesses will exist over multiple requests. Technically the CGIplus script only executes once and then remains blocking until a request is provided to it. It then processes the request, provides output, then blocks again.

The DCL module creates a data structure that allows subprocesses to be managed independently of any request. This allows both CGIplus and standard CGI subprocesses (in the form of zombies) to persist across multiple requests. There are a fixed number of subprocesses that can exist for all purposes. This is set by the subprocess hard-limit configuration parameter. Each of these structures is created and populated on an as-required basis, linked into a list growing from zero to maximum based on demand and the life of the server. Subprocesses can come and go depending on requirements. CGIplus script subprocesses and any zombies are semi-permanent. In summary:

The four mailboxes serve as the subprocess' IPC:

  1. SYS$COMMAND
    This stream controls the subprocess execution, providing DCL commands to the subprocess' CLI. For DCL commands and standard CGI it creates DCL symbols representing the CGI variables then the command or script is executed.

  2. SYS$OUTPUT
    The subprocess simply writes output to SYS$OUTPUT (<stdout>). Due to buffering in the C RTL binary-mode streams are more efficient and faster than record-mode. See any of the CGI applications in this package for example code for changing script <stdout> to binary-mode. CGIplus scripts must indicate the end of a single request's output using a special EOF string which is specifically detected by the output functions. As this mailbox persists between requests it is essential to ensure no output from a previous request lingers in the mailbox due to request concellation or abnormal subprocess termination.

  3. SYS$INPUT
    For CGI script execution, available for the subprocess access to the HTTP data stream as <stdin>. A synonym exists for backward compatibility, the logical name HTTP$INPUT, a stream which can be explicitly opened.

    NOTE: Versions of the server prior to 4.3 supplied the full request (header then body) to the script. This was not fully CGI-compliant. Versions 4.3 and following supply only the body, although the previous behaviour may be explicitly selected by enabling this configuration parameter.

  4. CGIPLUSIN
    For CGIplus this mailbox provides access to a request's CGI variables. The first line of any request can always be discarded (for synchronization) and end-of-request vriables is indicated by an empty record (blank line). For standard CGI and DCL commands this mailbox is not used.

DCL Module Processing

  1. The primary DCL function ensures any required file specification exists (e.g. script procedure). It the allocates a slot to the request. Slot allocation is a very fundamental activity:

    A function writes to the CGIPLUSIN, creating a number of logical names, CGI-compliant symbol names and executing the command or invoking the execution of a DCL procedure, etc., and supplies the CGIplus variable stream if appropriate. If the use of zombies is enabled then DCL to clean up the environment as much as possible is provided first.

  2. When the subprocess writes to the SYS$OUTPUT stream the I/O completion AST routine associated with reading that mailbox is called.

    If CGIplus script execution the I/O is always examined for the CGIplus end-of-output signature bits. It must always be at the start of the record and if detected the request is concluded.

    If CGI script execution, the first I/O from this stream is analyzed for CGI-compliance. It is determined whether a raw HTTP data stream will be supplied by the script, or whether the script will be CGI-compliant (requiring the addition of HTTP header, etc.) and whether HTTP carriage-control needs to be checked/added for each record.

    A CGI local redirection header (partial URL) is a special case. When this is received all output from the subprocess is suppressed until the script processing is ready to be concluded. At that time the "Location:" information of the header is used to reinitiate the request, using the same thread data structure.

    When normal SYS$OUTPUT processing is complete the record received can be handled in one of two ways. If it is raw HTTP it is asynchronously written to the network. The AST completion routine specified with the network write will queue another read from subprocess' SYS$OUTPUT. If it is record-oriented I/O (e.g. from DCL output), it has it's carriage-control checked for HTTP compliance before asynchronously writing the record. Hence a script supplying its own raw, HTTP-compliant data stream is much more efficient than line-by-line.

    The SYS$OUTPUT stream is a little problematic. For standard CGI and DCL command execution at subprocess exit there may be one or more records waiting in the mailbox for reading and subsequent writing to the client over the network, delaying processing conclusion. Detection of completion is accomplished by making each QIO sensitive to mailbox status via the SS$_NOWRITER status, which indicates there is no channel assigned to the mailbox, and the mailbox buffer is empty. It then becomes safe to dispose of the client thread without loss of data.

  3. If CGI-script execution is for a POST or PUT metthod, the HTTP data stream made available is also AST driven. If the subprocess opens the stream and reads from it, the I/O completion routine called queues another asynchronous read from the buffered request header and/or body.


3.7 - DECNET.C

Code for: DECnet.c

The DECnet module provides scripting based on process management using DECnet. Both standard WASD CGI scripting and an emulation of OSU (DECthreads) scripting are supported. Both function by activating specific DECnet object DCL procedures on the target node using transparent task-to-task communication. These procedures act to set up and control the scripting environment and script activation. With both standard WASD CGI scripting and an emulation of OSU scripting being supported, separate functions, and associated object procedures, are provided to support the dialogs associated with each of these.

The DECnet node and task specification string is determined by examination of the script specification. Connection to the object on the node is made asynchronously. When established one of two dialogs is maintained.

CGI

Using successive execution states the CGI dialog function handles interaction with the fairly simple CGIWASD.COM procedure, used as the DECnet object, and the script output. The fundamentals of the object are reproduced here:

  $!'f$verify(0)
  $! CGIWASD.COM (DECnet object for WASD HTTPd DECnet-based CGI scripting)
  $! 07-JAN-98  MGD  initial
  $ set noon
  $ delete = "delete"
  $ delete/symbol/global/all
  $ delete/symbol/local/all
  $ if f$trnlnm("CGIEOF") .nes. "" then deassign/process CGIEOF
  $ open /read /write /share net$link SYS$NET
  $ define/nolog sys$output net$link
  $ define/nolog sys$input net$link
  $ define/nolog http$input net$link
  $ Loop:
  $    read /error=EndLoop /end=EndLoop /time=30 net$link Line
  $'Line'
  $    goto Loop
  $ DoIt:
  $    read /error=EndLoop /end=EndLoop /time=30 net$link Line
  $'Line'
  $    write = "write"
  $    write sys$output f$trnlnm("CGIEOF")
  $ EndLoop:
  $ close net$link
  $ exit

The actual procedure is available as: CGIWASD.COM

This procedure is executed within the NETSERVER process on the remote node. It's function is very simple. Processing in a loop, it receives a record from the HTTP server and then executes it by substitution on the command line. In this way DCL commands to set up the CGI environment can be sent to the network process where they are executed creating a CGI environment in much the same way as is done with subprocess-based CGI scripts. Once set up, the server sends the DCL command "GOTO DOIT" which causes the procedure to branch out of the loop and to read one last record from the server ... the actual DCL command to activate the script. After the script finishes the procedure writes the end-of-output indicator to the server which then concludes the script.

Once the CGI environment is set up and the script activation DCL is sent the function assumes the role of accepting output from the script over the network link, processing that as necessary for CGI compliance, etc., and then writing the data to the client.

OSU

The behaviour for the OSU dialog has been determined from reverse-engineering the OSU v3.1 'script_execute.c' module, and a certain measure of trial-and-error.

Using successive execution states the OSU dialog function handles interaction with the standard OSU WWWEXEC.COM procedure, used as the DECnet object, and the script output. The actual procedure may be viewed here: WWWEXEC.COM

OSU scripts operate in two distinct and successive phases.

  1. Dialog - During part of this phase the script has not been activated and the link is in the process of setting up the script execution environment. The network object (WWWEXEC.COM) can request the HTTP server to supply specific data which it does by writing one or more records to the network link. The object then searches for and activates the script.

    The dialog phase is not yet complete for the script may now request the server to supply more data. The dialog phase ends when the script indicates to the server that it is ready to supply output. The script then enters the output phase.

  2. Output - During the output phase the server is responsible for ensuring the CGI compliance, or at least HTTP compliance, of that output. End-of-output is indicated by writing a special tag that the server detects.

    Output may be made in one of a number of modes. Basically these are raw, and the script is totally responsible for HTTP compliance, record, where each record must have correct carriage-control for a line enforced, and CGI, where CGI compliance is enforced.

    The output phase may be entered before any script is activated, such as when the DECnet object needs to report errors, for example the script file not being found.


3.8 - DESCR.C

Code for: Descr.c

(INCOMPLETE AS YET!)

The Descr.c module generates a file description by searching HTML files for the first occurance of <TITLE>...</TITLE> or <Hn>...</Hn> tags, using the description provided there-in. It is primarily used by the directory listing module, but can also be used by the menu module.

It does this search asynchronously (surprise-surprise!)

To asynchronously locate a description in an HTML file, the file is opened and then each record asynchronously read and examined for the <TITLE> element. Once obtained a synchronous call is made to a function to list the file details. After the file details are listed another asynchronous search call is made, with the file search function specified for AST completion. The function then immediately completes.


3.9 - DIGEST.C

Code for: Digest.c

This module provides authentication functionality for the DIGEST method.

This module uses code derived in part from RSA Data Security, Inc., under licence:

granted to make and use derivative works provided that such works are identified as "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing the derived work.


3.10 - DIR.C

Code for: Dir.c

There is some fairly complex and convoluted behaviour in this code!

This module implements the HTTPd directory listing functionality. Directories are listed first, then files. File detail format customizable, with the default resembling the default CERN and NCSA server layout. Output from this module is buffered to reduce network writes, improving I/O efficiency. HTML files have the <TITLE></TITLE> element extracted as a "Description" item.

Essential behaviour:

  1. The primary function obtains the file specification from the request data structure. Server directives, controlling some features of the directory listing beahaviour, are checked for and parsed out if present. The directory listing layout is initialized. The directory specification (path information) is parsed to obtain the directory and file name/type components. After successfully parsing the specification it generates an HTTP response header if required.

  2. Column headings and (possibly) a parent directory item are buffered in an asynchronous function call. An RMS structure is initialized to allow the asynchronous search for all files in the specified directory ending in ".DIR".

  3. For each directory file found the directory search AST completion function is called. Status is checked for success or otherwise. If an error the status is reported to the client and the request processing concluded. If the directory contained no directory files, or the directory files are exhausted a call to a function to begin a listing of non-directory files is made and the function then completes.

    If a directory file was returned a synchronous call to list the details of that directory is made and then another asynchronous search call made with an AST completion function again back to this function.

  4. When the directory files are exhausted the RMS structure is reinitialized to allow the search for all specification-matching, non-directory files in the directory. An asynchronous search call is made.

  5. For each matching file found the file search AST completion function is called. Status is checked for success or otherwise. If an error the status is reported to the client and the processing concluded. If the directory contained no matching files, or the files are exhausted, the processing is concluded and the function immediately completes.

    If a file was returned a call is made to the Descr.c module to check whether a file description can be obtained (HTML files only). If it can then this module is use to generate it and the function completes. If no description can be obtained a synchronous call is made to a function to list the file details. After the file details are listed another asynchronous search call is made, with the same function specified for AST completion. The function then immediately completes.


3.11 - FILE.C

Code for: File.c

This module implements the file transfer functionality. It obtains the file specification and mime content type information from the request data structure. It handles VARIABLE or VFC files differently from STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED. With STREAM(_*), FIXED and UNDEFINED files the assumption is that HTTP carriage-control is within file itself (i.e. at least the newline (LF), all that is required required by most browsers), and does not require additional processing. With VARIABLE, etc., files the carriage-control is implied and therefore each record requires additional processing by the server to supply it. Record-oriented files will have multiple records buffered before writing them collectively to the network (improving efficiency). Stream and binary file reads are by Virtual Block, and are written to the network immediately making the transfer of these very efficient indeed! The essential behaviour however is much the same.

If file caching is enabled, and this file is to be cached, a pointer provides a cache entry. Storage for the file contents is provided by the cache structure. Instead of loading the file into temporary storage before writing to the network it is loaded into cache storage and retained at the end of the request. In this way a cache load adds insignificant overhead to a generic file transfer.

If conversion to STREAM_LF files is enabled this module will, upon encountering a VARIABLE or VFC file, initiate its conversion to STREAM_LF record format. This is done using the StmLf.c module.

(Versions prior to 3.2 used a configuration directive for the MIME content-type to determine whether a file was transfered record-by-record or in binary. This is no longer required.)

  1. The primary function allocates a task structure. This function then gets some file information using ACP I/O. If the file does not exist it immediately returns the error status to the calling routine for further action (this behaviour is used to try each of multiple home pages by detecting file- not-found, for example). If it does the ACP information provides modification date/time, size and record-format. If the record format is VARIABLE and STREAM-LF conversion is enabled, conversion is initiated. If the request specified an "If-Modified-Since:" header line the modification date is checked and a possible "304 Not Modified" response generated.

  2. After successfully opening the file it generates an HTTP response header if required. It then calls one of either two functions to queue the first read from the file, one for variable-record files (record-oriented transfer), another for stream (STREAM-LF and stream record formats) text and binary files (block-oriented transfer). After the read is queued it returns with a success status code.

  3. When the asynchronous file read completes one of either two AST completion functions (one for record the other for block) is called to post-process the I/O. Status is checked for success or otherwise. If an error the status is reported to the client, the file closed, and the request thread concluded.

    If end-of-file, the file is closed, for record-oriented files the buffer checked and if necessary flushed. If an end task function was specified control is now passed to that, otherwise the thread is concluded.

    If not end-of-file, for record files multiple records may be buffered before writing to the network. If the buffer is full (the read was unsuccessful due to insufficient space) the contents are asynchronously written to the network, with the network write completion routine specifying a function to re-read the the file record that just failed. If there is still space in the buffer another asynchronous read of the file is queued in an attempt to append the next record into the buffer. After the read is queued the function completes.

    If not end-of-file, for stream and binary files a successful read results in a call to the network write function to send this to the client. This call contains the address of the function to read the next blocks from the file as an AST completion routine. After the asynchronous network write is queued the function completes.

For text files the contents can be encapsulated as plain text. This involves prefixing the file send with a <PRE> HTML tag and postfixing it with a </PRE> tag. The buffer is filled as per normal but when ready to output a function is called that escapes all HTML-forbidden characters first (e.g. "<", ">", "&", etc.) This is used by the SSI.C module.


3.12 - HTADMIN.C

Code for: HTAdmin.c

(INCOMPLETE AS YET!)

The HTAdmin.c module allows on-line administration of the HTTPd-specific authentication databases.


3.13 - HTTPD.C

Code for: HTTPd.c

This is the main() module of the server. It performs server startup and shutdown, along with other miscellaneous functionality.


3.14 - ISMAP.C

Code for: IsMap.c

The clickable-image support module provides this functionality as an integrated part of the server. It supports image configuration file directives in either of NCSA or CERN formats. Extensive configuration specification error reporting has been implemented.

Acknowlegement:

Three coordinate mapping functions have been plagiarized from the NCSA imagemap.c script program. These have been inserted unaltered in the module and an infrastructure built around the essential processing they provide. Due acknowlegement to the original authors and maintainers of that application. Any copyright over portions of that code is also acknowleged:

  ** mapper 1.2
  ** 7/26/93 Kevin Hughes, kevinh@pulua.hcc.hawaii.edu
  ** "macmartinized" polygon code copyright 1992 by Eric Haines, erich@eye.com

Essential behaviour:

  1. The primary function allocates a task structure and then attempts to open the map configuration file. If unsuccessful it generates an error report and concludes processing.

  2. After successfully opening the configuration file it extracts the client-supplied coordinate from the query string. A call is then made to asynchronously read a record (line) from the configuration file. Configuration file processing is asynchronous from that point.

  3. The record (line) read AST function checks for end-of-file, when it will return the default URL (if supplied). After end-of-file the file is closed and the processing is concluded.

    If not end-of-file, a function is called to parse the record for an image mapping directive. When the components have been parsed the NCSA imagemap.c routines are used to determine if the click coordinates are within the specified region coordinates.

    If it is within the region the click has been mapped and the URL is placed in heap memory and the thread's redirection location pointer set to it. The file is closed and the processing conclusion function called. This function detects the redirection location and if a local URL instead of disposing of the thread generates a new, internal request from the redirection information. In a non-local URL the client is sent a redirection response and then the thread concluded.

    If not within the region a call is made to asynchronous read the next record from the configuration file.


3.15 - LOGGING.C

Code for: Logging.c

The logging module provides an access log (server logs, including error messages are generated by the detached HTTPd process.

The access log format can be that of the Web-standard, "common"-format, "common+server"-format or "combined"-format, along with user-definable formats, allowing processing by most log-analysis tools.

The "common"-format entries (record, line) comprise:

  client_host r_ident auth_user [time] "request" reponse_status bytes_sent
where:

The "common+server"-format entry appends the server name to the common-format entry (for multi-homed services).

The "combined"-format entry appends quote-delimited, the referer and then the user-agent to the common-format entry.

In addition to legitimate request entries the server adds bogus entries to the "common"-format log for time-stamping server startup, shutdown, and the log being explicitly opened or closed. These entries are correctly formatted so as to be processed by a log analysis tool, and are recognisable as being "POST" method and coming from user "HTTPd". The request path contains the event and a hexadecimal VMS status code, that represents a valid exit status only in "END" entries.

Clickable-image requests are logged as "302" entries, and the resulting, redirected request entry logged as well.

When a log entry is required the file is opened if closed. The file is again closed one minute after the initial request. This flushes the contents of the write-behind buffers.


3.16 - MENU.C

Code for: Menu.c

This module implements the WASD menu interpretation functionality. It obtains the file specification from the request data structure. Output from this module is buffered to reduce network writes, improving I/O efficiency.

Essential behaviour:

  1. The primary function allocates a task structure and then attempts to open the file. If unsuccessful it immediately returns the error status to the calling routine for further action (this behaviour is used to try multiple home pages, for example). No checking of modification date/times is done as menu documents are considered dynamic in a similar way to SSI documents.

  2. After successfully opening the file it generates an HTTP response header if required. A call is then made to asynchronously read a record from the file opened. After the asynchronous file read is queued the function returns with a success status code.

    When the asynchronous file read completes the AST completion function is called to interpret the line, dependant on the section number it occurs in. Status is checked for success or otherwise. If an error the status is reported to the client, the file closed, and the request concluded. If end-of-file, the file is closed and the processing concluded. For a successful record read the line can either be title, description or menu item. When the line is interpreted and written to the network another read is queued, with an AST completion routine again specifying the contents interpretation function. The function then completes.


3.17 - MSG.C

Code for: Msg.c

The message database for the server is maintained by this module. Some structures are fixed in size at compilation, but the actual messages themselves are stored using allocated memory so each and all may be of greatly variable size.

There are three main functions in the module.

  1. MsgLoad() Loads a message database. Is called at server startup and whenever a report is to be generated from the message file.
  2. MsgFor() Called each time the server needs to provide a message that originates from the database. The request pointer (if available) and the message number (from the defined-by-macro number in msg.h) are supplied. If no request pointer the prefered language is used (lowest number). If there is a request pointer it is checked for a prefered language before getting that message pointer from whichever language array is to be used.
  3. MsgReport() This is called via the server administration menu to provide an HTML-formatted listing of messages in the server's volatile database or from the on-disk file. For the on disk file it calls MsgInit() with a local message structure, completely loading a new instance of all messages from the file, displays the report and then disposes of them.


3.18 - NET.C

Code for: Net.c

This module handles all TCP/IP network activites, from creating the server socket and listening on the port, to reading and writing network I/O. It manages request initiation and rundown, and controls connection persistence. The network read and write functions have provision for specifying I/O completion AST function addresses. If these are provided then the function is called upon completion of the network I/O. If not provided then the I/O completes without calling an AST routine.

As of v4.3 this module supports the MadGoat NETLIB network progamming package. This excellent freeware tool provides a generic, asynchronous interface to a number of underlying TCP/IP packages. It behaves in much the same manner as the $QIO interface and so dove-tails perfectly into this server. It took less that eight hours to build support into the original UCX version for NETLIB! To avoid complete dependence on, and the slight extra overhead of NETLIB, both a UCX version and a NETLIB version are maintained via conditional compilation using C macros.

The server begins by creating a network socket and then binding that to the HTTP port. The server then enters an infinite loop, waiting for IP connections.

When a connection request is received the remote host is checked as an allowed connection. If allowed, a request data structure is created from dynamic memory, and an asynchronous read is queued from the network client. The pointer to this dynamic data structure becomes the request thread, and is passed from function to function, AST routine to AST routine. The AST completion routine of the network request read(s) specifies a request analysis function. The function then returns to the connection acceptance loop.

When the network read(s) complete an AST completion function in the Request() module is called to process the HTTP request.

This module also contains the code for the NetWriteBuffered() function described above, 2.6 - Output Buffering.

With the introduction of SSL (see 3.21 - SESOLA.C) with v5.0 the NetWrite() and NetRead() functions no longer had the role of the lowest-level network interface functions, but now assume the role of delivering data via a raw network interface or via an SSL-encrypted one depending on a particular request's requirement. They no longer read or wrote directly to the network, this functionality was devolved to NetReadRaw() and NetWriteRaw(). The SSL routines also use NetReadRaw() and NetWriteRaw() when receiving or transmitting the encrypted data stream from the client.


3.19 - PUT.C

Code for: Put.c

(INCOMPLETE AS YET!)

The PUT module allows files to be uploaded to, and stored by, the server. It also allows the deletion of files, and the creation and deletion of directories. This same module handles PUT, POST and DELETE methods appropriately. It requires authorization to be enabled on the server. Created files have a three-version limit applied.

The Request.c module controls the size of any request POSTed or PUTed via a configurable parameter limiting the size in Kilobytes. The request is completely read by that same module before being parsed and handed over to the Put.c module (or DCL.c if a script). Hence the Put.c module has a complete request body pointed in-memory that it can process.

POSTed or PUTed requests are processed differently according to the MIME content-type:

The parent directory of any file/directory operation is always checked for permission to modify its contents. This permission is usually granted to the HTTPd account via an ACL. Files are written using Virtual Block I/O. This can make them very efficient. They are handled asynchronously, not disturbing the multi-threading of the server.


3.20 - REQUEST.C

Code for:Request.c

This module reads the request header from the client, parses this, and then calls the appropriate task function to execute the request (i.e. send a file, SSI an HTML file, generate a directory listing, execute a script, etc.)

The request header is contained in the network read buffer. If it cannot be completely read in the first chunk, the read buffer is dynamically expanded so as to be read in multiple chunks. The request header is addressed by a specific pointer that allows the parse-and-execute function to process either a genuine, initial client request header, or a pseudo-header generated to effect a redirection.

The method, path information and query string are parsed from the first line of the header. Other, specific request header fields are also parsed out and stored for later reference. Once this has been done the header is not further used.

Once the relevant information is obtained from the request header processing commences on implementing the request. This comprsises the rule-mapping of any path information, the RMS parsing of any resulting VMS file specification and decision-making on how to execute the request.

This functionality is used to parse and execute both the initial client request and any pseudo-request generated internally to effect a redirection.

If a POST/PUT method is used the entire request body (using the "Content-Length:" header line to determine length) is also read into dynamic memory before passing to the PUT.C or DCL.C modules for processing.


3.21 - SESOLA.C

Code for: SeSoLa.c

This module provides the optional Secure Sockets Layer (SSL) encrypted communication link functionality for WASD. This section will not discuss how SSL works, or even the SSLeay package (see below), it is merely a thumb-nail sketch of some quite complex functionality, much of it hidden by the SSLeay package.

WASD implements SSL using a freely available software library known as "SSLeay" (pronounced "S-S-L-E-A-Y", i.e. all letters spelt), version 0.8.1, authored and copyright by Eric Young and Tim Hudson. It is not a GNU-licensed package, but does makes unrestricted commercial and non-commercial use available. The FAQ for SSLeay may be found at

http://www.psy.uq.oz.au/~ftp/Crypto/

This should be consulted for all information on the SSL technology employed by WASD.

If the SSLeay component of WASD is installed it can be found at

HT_ROOT:[SRC.SSLEAY-0_8_1]

It has been necessary to make minor modifications to the v0.8.1 distribution to support VMS (there was rudimentary support that looked like a hang-over from a previous distribution). These changes are very minor and designed to address the differences in VMS and DECC versions. All changes to source can be found by searching for "MGD" in [...]*.C, [...]*.H and [...]*.COM files.

These changes have been made only to support WASD's use of the package, they are not proposed as general SSLeay modifications, i.e. they were purely pragmatic!

SeSoLa

This module is named "SeSoLa" to avoid any confusion and/or conflict with SSLeay routines. SSLeay and WASD supports SSL v2 and v3 protocols. The module has two distinct roles controlled by the SESOLA define. If this is defined the SeSoLa module is compiled into an interface with SSLeay. If it is not defined it just provides the function stubs required by, but not used by, the other modules in the server. In this way only the SeSoLa module needs to be recompiled and the server relinked to provide the SSL functionality, all other modules stay the same.

Non-Blocking I/O

SSL I/O is implemented as a SSLeay BIO_METHOD, named "sesola_method". It provides NON-BLOCKING SSL input/output. All routines that are part of this functionality are named "sesola_..." and are grouped towards the end of the module.

SSLeay supports non-blocking I/O by requiring the BIO (Basic Input/Output) routine to indicate (using a -1 return) when the I/O is not available but will be later. It then expects the same routine to be called with the same parameters when it is, completing that part of the processing. WASD utilizes this with the sesola_read() and sesola_write() functions, and their respective AST functions. A state variable tracks where in a session a particular read or write is occuring and the AST re-calls the appropriate function to complete the processing.

The main functions directly used when processing a request are SeSoLaAccept(), which establishes the SSL session with the client, SeSoLaRead() which is the equivalent of NetRead(), and is in fact called from it, and SeSoLaWrite() the equivalent of NetWrite(), and is again called from it. SeSoLaRead() accepts encrypted data from the network, decrypts it and returns plain data to the AST routine. SeSoLaWrite() accepts plain data from the calling routine, encrypts it and write that encrypted data to the network.


3.22 - SSI.C

Code for: SSI.c

The Server Side Includes (HTML pre-processor) module provides this functionality as an integrated part of the server. Output from this module is buffered to reduce network writes, improving I/O efficiency.

Essential behaviour:

  1. The primary function attempts to open the file. If unsuccessful it immediately returns the error status to the calling routine for further action (this behaviour is used to try multiple home pages, for example). SSI documents can be dynamic in undetectable ways so no checking of file modification date/time is done.

  2. After successfully opening the file it generates an HTTP response header if required. A call is then made to asynchronously read a record from the file opened. The record read AST function scans the record (line) looking for pre-processor directives embedded in HTML comment directives. If no directive is found the record is output buffered and another queued to be read.

  3. If a directive is detected any part of the line up the directive is output buffered and a function called to parse the directive. This function reports an error if the directive specified is not supported (unknown, etc.) If a supported directive a specific function is called according to the directive specified. These functions provide the pre-processor information in one of four ways:

    1. Internally

      Information such as the system time, current document information, etc., can be provided from information contained in the request data, etc., or in the case of specified document/file information obtained via the file system. These directives have the relevant information buffered and then the function returns to the directive parsing function.

    2. Via DCL Execution

      Information that must be obtained through DCL execution is obtained using an asynchronous call to the Dcl() module. The next-task function is specified as the line parsing function. When the DCL module has finished executing the required command control is passed back to this function.

    3. Sending a File

      If a file is #included this is provided with an asynchronous call to the File() module. The next-task function is specified as the line parsing function. When the File() module has finished transfering the included file control is passed back to this function.

    4. Directory Listing

      If a directory listing is requested this is provided via an asynchronous call to the Dir() module. The next-task function is specified as the line parsing function. When the Dir() module has finished generating the listing control is passed back to this function.

  4. Directives continue to be parsed, and executed, asynchronously if necessary (as just described), from within a line until the end-of-line is reached. Any remaining characters are output buffered. Lines continue to be read from the file using the AST mechanism until end-of-file.


3.23 - STMLF.C

Code for: stmLF.c

The stmLF.c module converts VARIABLE format records to STREAM-LF. It is only called from the File.c module and only then when STREAM-LF conversion is enabled within the server configuration.

The conversion is done asynchronously (naturally) concurrently with any reads being performed by the File.c module in transfering the file to the client. After successful conversion the original file is purged.


3.24 - SUPPORT.C

Code for: Support.c

The support module provides a number of miscellaneous support functions for the HTTPd (well go-o-o-lee!).


3.25 - UPD.C

Code for: Upd.c

(INCOMPLETE AS YET!)

The Upd.c module implements the on-line web directory administration and file editing and upload facility. It requires authorization to be enabled on the server. File and directory modification are still performed by the Put.c module. The Upd.c is an overly large body of code generating all of the dialogues and editing pages. It also provides additional functionality for the server administration, adding admin-specific portions of dialogues as required.


3.26 - VM.C

Code for: Vm.c

The virtual memory management module provides dynamic memory allocation and deallocation functions. These functions use the VMS system library virtual memory routines. Also see general comments in 2.5 - Memory Management.

Separate virtual memory zones are created and used for specific dynamic memory requirements within the server. These are:

If a request for dynamic memory, made for any of the purposes listed above, fails the server exits reporting the problem (e.g. insufficient dynamic memory). Memory availability is considered so crucial that any problem with it affects the server severely enough for it to be reported immediately. This also make the code requesting the memory simpler, it is not necessary to check the success of the call as it is part of the design that only successful requests ever return!


[next] [previous] [contents] [full-page]