WASD HTTP Server -"Nuts and Bolts"

[next] [previous] [contents] [full-page]

2 - General Design

Without apology (almost) the server is a large, monolithic, shall we be kind and say old-fashioned, piece of software.  Lots of things would be done differently if it was being started over.  Other things wouldn't be.  The code is always attempting to do things faster or more efficiently (especially because it's VMS) and so it's a bit clunky in places.  Other clunkiness is due entirely to the author ;^)


2.1 - Server Behaviour

The HTTPd executes permanently on the server host, listening for client connection requests on TCP/IP port 80 (by default).  It provides concurrent services for a (technically) unlimitted number of clients (constrained only by the server resources).  When a client connects the server performs the following tasks:

  1. creates a thread for this request (this term does not denote the use of DECthreads or other specific thread library, just a thread of execution, see 2.2 - Multi-Threaded)
  2. reads and analyzes the HTTP request sent,
    depending on the nature of the request ...
  3. closes the connection to the client and disposes of the thread data structures

For I/O intensive activities like file transfer and directory listing, the AST-driven code provides an efficient, multi-threaded environment for the concurrent serving of multiple clients. 


2.2 - Multi-Threaded

The WASD HTTPd is written to exploit VMS operating system characteristics allowing the straight-forward implementation of event-driven, multi-threaded code.  Asynchronous System Traps (ASTs), or software interrupts, at the conclusion of an I/O (or other) event allow functions to be activated to post-process the event.  The event traps are automatically queued on a FIFO basis, allowing a series of events to be sequentially processed.  When not responding to an event the process is quiescent, or otherwise occupied, effectively interleaving I/O and processing, and allowing a sophisticated client multi-threading. 

Multi-threaded code is inherently more complex than single-threaded code, and there are issues involved in the synchronization of some activities in such an environment.  Fortunately VMS handles many of these issues internally.  After connection acceptance, all of the processing done within the server is at USER mode AST delivery level, and for all intents and purposes the processing done therein is atomic, implicitly handling its own synchronization issues. 

The HTTPd is written to make longer duration activities, such as the transfer of a file's contents, event-driven.  Other, shorter duration activites, such as accepting a client connection request, are handled synchronously. 

It is worth noting that with asynchronous, and AST-driven output, the data being written must be guaranteed to exist without modification for the duration of the write (indicated by completion AST delivery).  This means data written must be static or in buffers that persist with the thread.  Function-local (automatic) storage cannot be used.  The server allocates dynamic storage for general (e.g. output buffering) or specific (e.g. response headers) uses. 


2.3 - ASTs

With server functions having AST capability, in particular $QIO, the server is designed to rely on the AST routine to report any error, including both those that occur during the IO operation and any that occur when initiating the IO (which would normally prevent it being queued) even if that requires directly setting the IO status block with the offending status and explicitly declaring the AST.  This eliminates any ambiguity about under what conditions ASTs are delivered ... ASTs are always delivered

If a call to a server function with AST capability does not supply an AST routine then it must check the return status to determine whether it can continue processing.  If it supplies an AST routine address then it must not act on any error status returned, it must allow the AST routine to process according to the IO status block status. 


2.4 - Tasks

Each request can have one or more tasks executed sequentially to fullfil the request.  This occurs most obviously with Server-Side Includes (SSI, the HTML pre-processor) but also, to a more limited extent, with directory listing and its read-me file inclusion.  A task is more-or-less defined as one of:

Each one of the associated modules executes relatively independently.  Before commencing a task, a next-task pointer can be set to the function required to execute at the conclusion of that task.  At that conclusion, the next-task functionality checks for a specified task to start or continue.  If it has been specified control is passed to that next-task function via an AST. 

Some tasks can only be called once per request.  For example, image mapping, file transfer using cache, file upload, menu interpretation. 

Other tasks have the possibility of being called within other tasks or multiple times serially during a request.  An example is the transfer file task (non-cache), which can be used within directory listings to insert read-me files, and when <!--#includeing multiple files within an SSI document. 

Two tasks, the directory listing and SSI interpretation tasks, can be called multiple times and can also have concurrent instances running.  For example, an SSI file can <!--#include another SSI file, nesting the SSI execution.  The same SSI document can have an embedded directory listing that contains an SSI read-me file with another directory listing.  Can get quite convoluted!  The tasks are inplemented using a linked-list FILO stack allowing this nesting.  SSI documents have a maximum depth for nesting, preventing recursive document inclusion. 


2.5 - Memory Management

Memory management is exclusively done using VMS system library virtual memory routines.  Using these rather that generic C library routines is a deliberate design decision, and done with the following considerations. 

Per-request memory is managed in three distinct portions. 

  1. A fixed-size structure of dynamic memory is used to contain the core request thread data.  This is released at thread disposal.  This is allocated from a specific virtual memory zone tailored for fixed-size management. 

  2. A heap of dynamically allocated memory is maintained during the life of a thread structure. 

    When a dynamic structure is required during request processing it is allocated from a request-thread-specific zone of virtual memory.  This list is released in one operation at thread disposal, making it quite efficient.  Maintaining a thread-specific heap of vritual memory also makes it easier to avoid memory leakage. 

  3. Per-task data structures are allocated using the above heap

    These structures are used to store task-specific data.  If a task is used multiple times within the one request (see above) the previous allocated and now finished-with (but not deallocated) task structures can be reused, reducing overhead. 


2.6 - Output Formatting

The increasing complexity of the formatting of output (particularly with the introduction of extended file specifications with ODS-5) prompted the devlopment of a SYS$FAO()-like set of functions for writing formatted output.  This can write directly into the request dynamic network buffers or into static character storage. 

The directives run parallel to those supported by SYS$FAO(), although it is not a complete implementation and contains a number of variants and extensions to that service's behaviour.  See 3.41 - SUPPORT.C.


2.7 - Output Buffering

To reduce the number of individual network writes, and thus provide significant improvements in efficiency, generated output can be buffered into larger packets before sending to the client.  Not all modules use this (e.g. File.c) and not all modules use it all of the time, but all modules work to implement a seamless integration of output via this mechanism (best seen in the SSI.c module). 

The output buffer functionality underwent a complete redesign for v5.0. It is now based on a list of one or more buffers that can be used in two modes. 

  1. When both an AST address and data to be buffered is supplied the buffering function operates to fill one entire buffer, overflowing into a second linked into the list.  When that overflow occurs the first is written to the network asynchronously (calling the supplied AST when complete) and the second moved to the head of the list, effectively to the front of the buffer, and so on. 
  2. When no AST address is supplied with the data to be buffered, it keeps on filling buffers and adding others to the tail of the list as required, creating a virtual buffer with no fixed length. 

The first mode is used for general buffering (e.g. SSI and directory listings), streaming data to the client in a sequence of larger aggregates.  The second mode is useful for functions that must block (e.g. those reporting on data structures such as the file cache), write a lot of output for a report, and not want to block general server activity for a long-ish period due to network throughput (e.g. again the caching reports).  In these cases the entire report can be written to buffer, then simply asynchronously output, unblocking any resource it may have held. 


2.8 - Auto-Scripting

The WASD VMS HTTP server has the facility to automatically invoke a script to process a non-HTML document (file).  This facility is based on detecting the MIME content data type (via the file's extension) and causing a transparent, local redirection, invoking the script as if it was specified in the original request. 


2.9 - Internal Directives and "Scripts"

The HTTPd server detects certain paths and query strings as directives about its behaviour.  Certain paths are interpreted as pseudo, or internal scripts, handled internal to the server.  Other directives are passed in the query string component of the request, and as reserved sequences cannot occur in normal requests (an unlikely combination of characters has been selected). 


2.10 - Server Security and Privileges

As a major security design criterion the WASD environment has specified the use of a non-privileged, non-SYSTEM, non-system-group server account.  In this way it begins with a fairly restricted and safe base, resources limited to those world-accessable or explicitly allowed to the server account.  For access to selected, essential resources (such a privileged IP ports, for example 80) selected privileges are enabled only on an as-required basis, then as soon as the need for that privilege has passed disabled.  Hence, the executable is installed with the minimum required extended privileges which are operating and used only as required during the course of processing.  The server program is almost always executing with only NETMBX and TMPMBX enabled ... in other words as a completely average VMS user! 

Extended privileges are required for the purposes listed below:

Not that the author doesn't have at least some confidence in his code ;^) but has also placed a sanity checker which when the server becomes quiescent establishes that only the NETMBX and TMPMBX privileges are enabled.  The server will exit with an error message if any extended privileges are enabled at the time of the check.  (During development in 1997 this check discovered an instance where an EnableSysPrv() call had inadvertantly been coded instead of a DisableSysPrv() call :^( so it does work in real-life :^)

The capacity for the server to write into the file system is a major concern, and a lot of care has been taken to make it as secure as possible.  Of course there is always the chance of a problem :^( The main defence against a system design or programming problem allowing write access to the file system is having the server account as a separate user and group (and definitely non-SYSTEM).  In this way a part of the file system must explicitly have write access granted to the server account for it to be able write into the file system (or for it to have world write access ... but then what is the problem with server access if the world has access?) This is recommended to be done using an ACE (see the Technical Overview). 


[next] [previous] [contents] [full-page]