WASD Hypertext Services - Technical Overview

[next] [previous] [contents] [full-page]

13 - Server Performance

The server has a single-process, multi-threaded, asynchronous I/O design. On a single-processor system this is the most efficient approach. On a multi-processor system it is limited by the single process context (ignoring scripts which execute within their own context). An obvious improvement would be to have multi-processor threading or a pool of server processes, one per CPU, servicing requests. The latter may be the approach of future refinements.

The server has been tested with up to 30 concurrent requests originating from 6 different systems and continues to provide an even distribution of data flow to each client (albeit more slowly :^).

Test results are all obtained using the native Digital TCP/IP Services executable. The NETLIB image may provide very slightly lower results due to the additional NETLIB layer. These results are indicative only!

Simple File Request Turn-Around

Two sets of data are now reported, one with caching disabled, the other enabled.

A series of tests using batches of 200 accesses were made and the results averaged. The first test returned an empty file measuring response and file access time, without any actual transfer. The second requested files of 16K and 64K characters, testing performance with more realistic scenarios. Both were done using one and then ten concurrent requests.

The test system was a lightly-loaded AlphaServer 2100, VMS v6.2 and DEC TCP/IP 4.1. No Keep-Alive: functionality was employed so each request required a complete TCP/IP connection and disposal, although the WWWRKOUT utility (see 15.6 - Server Workout (stress-test)) was used on the same system as the HTTPd server, eliminating actual network transport. DNS (name resolution) was disabled. The command lines are show below.

  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/0k.txt"
  $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/0k.txt"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/16k.txt"
  $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/16k.txt"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/64k.txt"
  $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/ht_root/exercise/64k.txt"

The following results were derived using the v4.5 server. Improvements in throughput over the 4.2/4.3 servers are due to optimizations in frequently executed code and the elimination of debug code in production executables. (An apparent improvement in throughput for multiple, concurrent connections over the 4.2/4.3 server is due to the redesign of the testing tool, WWWRKOUT, which improved its multi-threading!) Anyone who compares these results to those published with the v4.4 documentation can't mistake the huge anomaly. I can't explain or reproduce them now. Notes taken at the time confirm the results reported. I'll continue investigate the phenomenon.

Cache Disabled

concurrentduration (seconds)requests/second
0K12.3087
0K101.95103
16K13.4558
16K102.9568
64K17.6826
64K105.9833

Significantly, throughput actually improves at ten concurrent requests (probably due to the latency of the serial TCP/IP connection/disconnection in one-by-one, compared to several happening concurrently).

Cache Enabled

concurrentduration (seconds)requests/second
0K10.95210
0K100.75266
16K11.95103
16K101.75114
64K14.9540
64K104.8341

Again, with caching, throughput increases with ten concurrent requests. Note that the response and transfer benefits decline noticably with file size (transfer time). The difference between cached and non-cached with the zero file size (no actual data transfer involved) gives some indication of the raw difference in response latency, some 240% improvement for single serial requests, and 260% with ten concurrent. This is a fairly crude analysis, but does give some indication of cache efficiencies.

Simple File Request Transfer Rate

The simple text file request under similar conditions indicates a potential transfer rate in excess of 1 Mbyte per second. (Remember, both client and server are on the same system, so the data, although being transported by TCP/IP networking, is not actually ending up out on a physical network.) This serves to demonstrate that server architecture should not be the limiting factor in file throughput.

  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=10 /PATH="/ht_root/log/access.log"
  $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOOUT /COUNT=10 /PATH="/ht_root/log/access.log"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=10 /PATH="/ht_root/log/total.log"
  $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOOUT /COUNT=10 /PATH="/ht_root/log/total.log"

The following results were derived using the v4.2 server.

Transfer Rate

concurrentduration (seconds)Kbytes/second
2M (4123 blocks)182,570
2M (4123 blocks)10161,336
7M (14852 blocks)1591,278
7M (14852 blocks)10621,224

Significantly, there were no dramatic drops in transfer rate between one and ten concurrent requests!


13.1 - File Record Format

The server can handle STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED record formats very much more efficiently than VARIABLE or VFC files.

With STREAM, FIXED and UNDEFINED files the assumption is that HTTP carriage-control is within the file itself (i.e. at least the newline (LF), all that is required required by browsers), and does not require additional processing. With VARIABLE record files the carriage-control is implied and therefore each record requires additional processing by the server to supply it. Even with variable record files having multiple records buffered by the HTTPd before writing them collectively to the network improving efficiency, stream and binary file reads are by Virtual Block and are written to the network immediately making the transfer of these very efficient indeed!

So significant is this efficiency improvement a module exists to automatically convert VARIABLE record files to STREAM-LF when detected by the file transfer module. This is disabled by default but the user is strongly encouraged to enable it and to ensure that stream format files are provided to the server by other hypertext generating and processing utilitites.


13.2 - Scripting

Persistant-subprocesses are probably the most efficient solution for child-process scripting under VMS. See 11.2 - Scripting Environment. The I/O still needs to be on-served to the client by the server.

A simple performance evaluation shows the relative merits of the three scripting environments available. Two results are provided here. Both were obtained using the WWWRKOUT utility (see 15.6 - Server Workout (stress-test)) accessing the same CGI test utility script, HT_ROOT:[SRC.CGIPLUS]CGIPLUSTEST.C, which executes in both standard CGI and CGIplus environments. A series of 200 access were made and the results averaged. The first test returned only the HTTP header, evaluating raw request turn-around time. The second test requested a body of 16K characters, again testing performance with a more realistic scenario. No Keep-Alive: functionality was employed so each request required a complete TCP/IP connection and disposal, although the WWWRKOUT utility was used on the same system as the HTTPd server, eliminating actual network transport. DNSLookup (host name resolution) was disabled. The test system was a lightly-loaded AlphaServer 2100, VMS v6.2 and DEC TCP/IP 4.1. (An apparent improvement in throughput for multiple, concurrent connections over the 4.2/4.3 server is due to the redesign of the testing tool, WWWRKOUT, which improved its multi-threading!) The command lines are show below:

  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/cgi-bin/cgiplustest?0"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/cgi-bin/cgiplustest?200"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/cgiplus-bin/cgiplustest?0"
  $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOOUT /COUNT=200 /PATH="/cgiplus-bin/cgiplustest?200"

The following results were derived using the v4.5 server.

Single Concurrent CGI Script

CGI (non-persistant)CGI (persistant)CGIplus
0K duration (seconds)28.216.65.5
0K requests/second7.112.036.0
16K duration (seconds)29.317.76.7
16K requests/second6.811.229.7

Ten Concurrent CGI Scripts

CGI (non-persistant)CGI (persistant)CGIplus
0K duration (seconds)20.710.54.7
0K requests/second9.719.143.0
16K duration (seconds)27.815.68.9
16K requests/second7.212.922.6

Again significantly, throughput actually improved at ten concurrent requests (probably due to the latency of the serial TCP/IP connection/disconnection in one-by-one, compared to several happening concurrently). Slight reductions in throughput over the 4.2/4.3 server scripting is probably due to some additional CGI variables being supplied to the script.

Although these results are indicative only, they do show CGIplus to have a potential for improvement over standard CGI in the order of 250-300% (when using persistant-subprocesses, or zombies, 400-600% over non-persistant subprocesses), a not inconsiderable increase. Of course this test generates the output stream very simply and efficiently and so excludes any actual processing time that may be required by a "real" application. If the script/application has a large activation time the reduction in response latency could be even more significant (e.g. Perl scripts and RDBS access languages).


13.3 - Suggestions

Here are some suggestions for improving the performance of the server, listed in approximate order of significance. Note that these will have proportionally less impact on an otherwise heavily loaded system.

  1. Disable host name resolution (configuration parameter [DNSLookup]). This can slow processing significantly. Most log analysis tools can convert numeric addresses so DNS resolution is often an unnecessary burden.

    This can actually make a remarkable difference. The same test provided very different throughputs with DNS lookup enabled and disabled (v4.5 server, cache enabled).


    duration (seconds)requests/second
    DNSLookup ON6.3032
    DNSLookup OFF0.95210

  2. Ensure served files are not VARIABLE record format (see above). Enable STREAM-LF conversion using a value such as 250 (configuration parameter [StreamLF]).

  3. Use persistant-subprocess DCL/scripting (configuration parameter [ZombieLifeTime])

  4. Use CGIplus-capable scripts whenever possible.

  5. Enable caching (configuration parameter [Cache]).

  6. Disable logging (configuration parameter [Logging]).

  7. Set the HTTPd server process priority higher, say to 6 (use startup qualifier /PRIORITY=).

  8. Reduce to as few as possible the number of mapping rules.

  9. Place more-commonly resolved content-types towards the top of the configuration list (configuration parameter [AddType]).

  10. Use a pre-defined log format (e.g. "common", configuration parameter [LogFormat]).

  11. Disable request history (configuration parameter [RequestHistory]).

  12. Disable activity statistics (configuration parameter [ActivityDays]).


[next] [previous] [contents] [full-page]