WASD Hypertext Services - Technical Overview

[next] [previous] [contents] [full-page]

21 - Server Performance

The server has a single-process, multi-threaded, asynchronous I/O design. On a single-processor system this is the most efficient approach. On a multi-processor system it is limited by the single process context (with scripts executing within their own context).

The server has been tested with up to 30 concurrent requests originating from 6 different systems and continues to provide an even distribution of data flow to each client (albeit more slowly :^)

The test system was a lightly-loaded AlphaServer 2100 4/275, VMS v7.1 and DEC TCP/IP 4.2. No Keep-Alive: functionality was employed so each request required a complete TCP/IP connection and disposal, although the WWWRKOUT utility (see 23.6 - Server Workout (stress-test)) was used on the same system as the HTTP server, eliminating actual network transport. DNS (name resolution) and access logging were disabled.

As of v6.0 the performance data is collected and collated automatically using the WWWRKOUT metric functionality, which reads and interprets directive input via the /METRIC= qualifier, running batches of requests then averaging the results. The raw performance data is available in HT_ROOT:[EXERCISE], and the summaries compiled below.

Test results are all obtained using the native Digital TCP/IP Services executable. The NETLIB image may provide very slightly lower results due to the additional NETLIB layer. These results are indicative only! On a clustered, multi-user system too many things vary slightly all the time. Hence the batching of accesses, interleaved between servers, attempting provide a representative result.


OSU Comparison

Until v5.3 a direct comparison of performance between OSU and WASd had not been made (even to satisfy the author's own occasional curiosity). After a number of users with experience in both environments commented ... WASD seemed faster, was it? ... it was decided to make and provide comparisons using the same metrics used on WASD for some time.

Every endeavour has been made to ensure the comparison is as equitable as possible (e.g. each server executes at the same process priority, has a suitable cache enabled, runs on the same machine in the same relatively quiescent environment, 2 CPUs to allow OSU to take advantage of VMS V7.1's kernel threads - checked to be enabled via the THREADCP utility and "$ SHOW SYSTEM"). Each test run was interleaved between each server to try and distribute any environment variations. Tests showing /PORT=7777 were to the OSU server, any others to WASD. Both servers were configured "out-of-the-box", minimal changes (generally just path mappings), WASD executing via the FREEWARE_DEMO.COM procedure.

Of course performance is just one of a number of considerations in any software environment (otherwise we wouldn't be using VMS now would we? ;^) No specific conclusions are promoted by the author. Readers may draw their own from the results recorded below.

For this document the results were derived using the WASD v6.0 and OSU 3.4 servers.


21.1 - Simple File Request Turn-Around

A series of tests using batches of 200 accesses were made and the results averaged. The first test returned an empty file measuring response and file access time, without any actual transfer. The second requested a file of 64K characters, testing performance with a more realistic load. All were done using one and ten concurrent requests.

Cache Disabled - Requests/Second

ResponseConcurrentWASDOSU
0K19749
0K1011658
64K13212
64K104216

Cache Enabled - Requests/Second

ResponseConcurrentWASDOSU
0K1232117
0K10308156
64K15021
64K105224

Configuration and result files:

Significantly, with both WASD cached and non-cached (as with OSU), throughput actually improves at ten concurrent requests (probably due to the latency of the serial TCP/IP connection/disconnection in one-by-one, compared to several happening concurrently).

Note that the response and transfer benefits decline noticably with file size (transfer time). The difference between cached and non-cached with the zero file size (no actual data transfer involved) gives some indication of the raw difference in response latency, some 200-300% improvement. This is a fairly crude analysis, but does give some indication of cache efficiencies.

Just one other indicative metric of the two servers, CPU time consumed during the file measurement runs.

CPU Time Consumed (Seconds)

CacheWASDOSU
Disabled96489
Enabled45248


File Transfer Rate

Under similar conditions results indicate a potential transfer rate well in excess of 1 Mbyte per second. (Remember, both client and server are on the same system, so the data, although being transported by TCP/IP networking, is not actually ending up out on a physical network.) This serves to demonstrate that server architecture should not be the limiting factor in file throughput.

Transfer Rate - MBytes/Second

ResponseConcurrentWASDOSU
2.4MB (4700 blocks)13.61.5
2.4MB (4700 blocks)104.01.7

Configuration and result files are the same as for file request turn-around metrics above.

Significantly, there were no dramatic drops in transfer rate between one and ten concurrent requests! In fact an small increase in throughput!


File Record Format

The server can handle STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED record formats very much more efficiently than VARIABLE or VFC files.

With STREAM, FIXED and UNDEFINED files the assumption is that HTTP carriage-control is within the file itself (i.e. at least the newline (LF), all that is required required by browsers), and does not require additional processing. With VARIABLE record files the carriage-control is implied and therefore each record requires additional processing by the server to supply it. Even with variable record files having multiple records buffered by the HTTPd before writing them collectively to the network improving efficiency, stream and binary file reads are by Virtual Block and are written to the network immediately making the transfer of these very efficient indeed!

So significant is this efficiency improvement a module exists to automatically convert VARIABLE record files to STREAM-LF when detected by the file transfer module. This is disabled by default but the user is strongly encouraged to enable it and to ensure that stream format files are provided to the server by other hypertext generating and processing utilitites.


21.2 - Scripting

Persistant-subprocesses are probably the most efficient solution for child-process scripting under VMS. See 14.2 - Scripting Environment. The I/O still needs to be on-served to the client by the server.

A simple performance evaluation shows the relative merits of the four WASD scripting environments available, plus a comparison with OSU. These were obtained using the WWWRKOUT utility (see 23.6 - Server Workout (stress-test)) accessing the same CGI test utility script, HT_ROOT:[SRC.CGIPLUS]CGIPLUSTEST.C, which executes in both standard CGI and CGIplus environments, and an ISAPI example DLL, HT_ROOT:[SRC.CGIPLUS]ISAPIEXAMPLE.C, which provides equivalent output. A series of 200 accesses were made and the results averaged. The first test returned only the HTTP header, evaluating raw request turn-around time. The second test requested a body of 64K characters, again testing performance with a more realistic load.

DECnet-based scripting was tested using essentially the same environment as subprocess-based CGI, assessing the performance of the same script being executed using DECnet to manage the processes. Three separate environments have been evaluated, WASD-DECnet-CGI, WASD-OSU-emulation and OSU. The OSU script used the WASD CGISYM.C utility to generate the required CGI symbols (also see WASD/OSU Comparison). DECnet Phase-IV was in use.

CGI Scripting - Requests/Second

ResponseConcurrentCGICGIplusISAPIDECnet-CGIOSU-emulOSU
0KB11773641198
0KB10288885161311
64KB1122524975
64KB1016222512106

Configuration and result files:


Scripting Observations

Although these results are indicative only, they do show CGIplus and ISAPI to have a potential for improvement over standard CGI of up to 400%, a not inconsiderable improvement. Of course this test generates the output stream very simply and efficiently and so excludes any actual processing time that may be required by a "real" application. If the script/application has a large activation time the reduction in response latency could be even more significant (e.g. Perl scripts and RDMS access languages).


DECnet Observations

This section comments on non-persistant scripts (i.e. those that must run-up and run-down with each request - general CGI behaviour). Although not shown here measurements of connection reuse show significant benefits in reduced response times, consistency of response times and overall throughput, showing a difference of some 200% over non-reuse (similar improvements were reported with the OSU 3.3a server).

With ten simultaneous and back-to-back scripts and no connection reuse many more network processes are generated than just ten. This is due to the NETSERVER maintenance tasks such as log creation and purging, activating and deactivating the task, etc., adding latency into this script environment. The throughput was generally still lower than with subprocess-based scripting.

While earlier versions cautioned on the use of DECnet-based scripting this has been relaxed somewhat through connection reuse.


WASD/OSU Comparison

A direct comparison of CGI performance between WASD and OSU scripting is biased in favour of WASD, as OSU scripting is based on it's own protocol with CGI behaviour layered-in above scripts that require it. Therefore a non-CGI comparison was devised. The script, HT_ROOT:[SCRIPT]FACE2FACE.COM, is designed to favour neither environment, merely return the plain-text string "Hello!" as quickly as possible.

  $! OSU and WASD scripting face-to-face in a script that favours neither unduly
  $ if f$type(WWWEXEC_RUNDOWN_STRING) .nes. ""
  $ then
  $    write net_link "<DNETTEXT>"
  $    write net_link "200 Success"
  $    write net_link "Hello!"
  $    write net_link "</DNETTEXT>"
  $ else
  $    write sys$output "Hello!"
  $ endif

Face-to-Face - Requests/Second

 ConcurrentCGICGIplusISAPIDECnet-CGIOSU-emulOSU
"Hello!"138n/an/a163022
"Hello!"1058n/an/a244226

Configuration and result files are the same as for scripting metrics above.


21.3 - SSL

At this time there are no definitive measurements of SSL performance (see 10 - Secure Sockets Layer), as work on an SSL version of the WWWRKOUT utility has not yet been undertaken. One might expect that because of the CPU-intensive cryptography employed in SSL requests that performance, particularly where concurrent requests are in progress, would be significantly lower. In practice SSL seems to provide more-than-acceptable responsiveness.


21.4 - Suggestions

Here are some suggestions for improving the performance of the server, listed in approximate order of significance. Note that these will have proportionally less impact on an otherwise heavily loaded system.

  1. Disable host name resolution (configuration parameter [DNSLookup]). DNS latency can slow request processing significantly! Most log analysis tools can convert numeric addresses so DNS resolution is often an unnecessary burden.

  2. Ensure served files are not VARIABLE record format (see above). Enable STREAM-LF conversion using a value such as 250 (configuration parameter [StreamLF], and SET against required paths using mapping rules).

  3. Use persistant-subprocess DCL/scripting (configuration parameter [ZombieLifeTime])

  4. Use CGIplus-capable or ISAPI scripts whenever possible.

  5. Enable caching (configuration parameter [Cache]).

  6. Disable logging (configuration parameter [Logging]).

  7. Set the HTTP server process priority higher, say to 6 (use startup qualifier /PRIORITY=).

  8. Reduce to as few as possible the number of mapping rules.

  9. Use a pre-defined log format (e.g. "common", configuration parameter [LogFormat]).

  10. Disable request history (configuration parameter [RequestHistory]).

  11. Disable activity statistics (configuration parameter [ActivityDays]).


[next] [previous] [contents] [full-page]