[next] [previous] [contents] [full-page]2.1 - Document Content Type
2.2 - Explicitly Specifying Content-Type
2.3 - Document Specification
2.3.1 - Absolute File Path
2.3.2 - Partial (or Relative) File Path
2.4 - Extended File Specifications (ODS-5)
2.4.1 - Characters In Request Paths
2.4.2 - Characters In Server-Generated Paths
2.4.3 - Document Cache
Arbitrary documents may not be accessed.
The server can only access files where the path is allowed according to a specified set of rules specified within the hypertext environment.
Documents must be read-accessible.
The server can only access files that are world readable, or that have an
ACL specifically controlling access for "HTTP$SERVER", the server
account.
2.1 - Document Content Type
Document (file) retrieval is initiated by providing the server with the file specification as a URL path. Server configuration determines the format in which the file is returned to the client. It may contain text or images immediately diplayable by the browser, or by a viewer external to the browser may be spawned. The server may automatically activate a script to provide a gateway to non-native information (see description of [AddType] configuration directive in the Technical Overview). The file type (extension) determines the content type by which the server returns (and/or interprets) the file.
The following table lists some of the current file types (as examples) and
their associated MIME-style content type. HTML documents are presented
layed-up according to the full HTML-capabilities of the browser. Plain-text
documents are presented in a fixed-font format. Other types require an
external viewer to be activated. Here are a few examples.
.BKB Bookreader document (BNU) text/html, gateway script activated
.BKS Bookreader shelf (BNU) text/html, gateway script activated
.C C source text/plain
.COM DCL procedure text/plain
.CONF configuration file text/plain
.CPP C++ source text/plain
.DECW$BOOK Bookreader document text/html, gateway script activated
.FOR Fortran source text/plain
.GIF GIF image image/gif
.H C header text/plain
.HLB VMS Help library text/html, gateway script activated
.HTML HyperText Markup Language text/html
.HTM HyperText Markup Language text/html
.JPG JPEG image image/jpeg
.LIS Listing text/plain
.MAR Macro source text/plain
.PAS Pascal source text/plain
.PRO IDL source text/plain
.PS PostScript application/PostScript
.TEXT Text text/plain
.TLB VMS text library text/html, gateway script activated
.TXT Text text/plain
.SHTML HyperText Markup Language pre-processed text/html
.ZIP zipped file application/binary
If other file types are required to be defined contact the Web
administrator.
2.2 - Explicitly Specifying Content-Type
When accessing files it is possible to explicitly specify the identifying content-type to be returned to the browser in the HTTP response header. Of course this does not change the actual content of the file, just the header content-type! This is primarily provided to allow access to plain-text documents that have obscure, non-"standard" or non-configured file extensions.
It could also be used for other purposes, "forcing" the browser to accept a particular file as a particular content-type. This can be useful if the extension is not configured (as mentioned above) or in the case where the file contains data of a known content-type but with an extension conflicting with an already configured extension specifying data of a different content-type.
Enter the file path into the browser's URL specification field ("Location:",
"Address:"). Then, for plain-text, append the following query string:
?httpd=content&type=text/plain
For another content-type substitute it appropriately.
For example, to retrieve a text file in binary (why I can't imagine :^) use
?httpd=content&type=application/octet-stream
This is an example:
file.unknown file.unknown?httpd=content&type=text/plain
It is also posssible to "force" the content-type for all files in a
particular directory. See 3.3.8 - Specifying Content-Type.
2.3 - Document Specification
For the "http:" protocol, file and directory locations are specified using URL path syntax where slash-separated ("/") elements delineate a hierarchy leading to a data item. Anyone familiar with the syntax of the Unix file system, or the MS-DOS file system (where back-slashes are hierarchy delimiters), will feel at home with URL syntax. Specifications under VMS are not case-sensitive.
A VMS directory specification
WEB:[TECHNICAL.HTML-PRIMER]
would be represented in URL syntax as
/web/technical/html-primer/
and a VMS file specification
WEB:[TECHNICAL.HTML-PRIMER]HTML-PRIMER.HTML
represented as
/web/technical/html-primer/html-primer.html
NOTE
It is not required (although not forbidden) to supply a VMS master file directory component ("[000000]", "[000000.", etc.) in a URL specification. Hence the file specificationWEB:[000000]HOME.HTMLshould be represented as/web/home.html
A file may be specified using an absolute, or full path. This
must specify the location of the file exactly. Absolute paths
always begin with a forward-slash ("/"). For example:
/web/committee/minutes/1994/1994-09-27.txt
/web/committee/constitution.txt
/web/committee/membership/fred-bloggs.txt
2.3.2 - Partial (or Relative) File Path
(Strictly speaking, it is a function of the client to construct a full URL from such a relative URL before sending the request to the server.)
A file may be specified relative to its current location. That is, a current document (or menu) may specify another document file relative to itself. This may be at the current level, a subdirectory, or in another part of the directory tree related to the current. Relative paths never begin with forward-slash ("/").
For example, documents at the same level as the current may be specified
without any hierachy being indicated:
1994-07-22.txt
1994-08-24.txt
1994-09-27.txt
Documents at an inferior point in the hierarchy may be specified as in the
following example:
1993/1993-02-17.txt
1993/reports/membership.txt
other/etc.txt
Documents in a related part of the hierarchy may be referenced using the
"../" construct. As with MS-DOS and Unix this syntax
indicates the immediately superior directory.
../other_committee/1993/1993-02-17.txt
../other_committee/1993/reports/balance-sheet.txt
../../other_section/committee/constitution.txt
2.4 - Extended File Specifications (ODS-5)
OpenVMS Alpha V7.2 introduced a new on-disk file system structure, ODS-5.
This brings to VMS in general, and WASD and other Web servers in particular, a
number of issues regarding the handling of characters previously not
encountered during (ODS-2) file system activities.
2.4.1 - Characters In Request Paths
There is a standard for characters used in HTTP requests paths and query strings (URLs). This includes conventions for the handling of reserved characters, for example "?", "+", "&", "=" that have specific meanings in a request, characters that are completely forbidden, for example white-space, control characters (0x00 to 0x1f), and others that have usages by convention, for example the "~", commonly used to indicate a username mapping. The request can otherwise contain these characters provided they are URL-encoded (i.e. a percentage symbol followed by two hexadecimal digits representing the hexadecimal-encoded character value).
There is also an RMS standard for handling characters in extended file specifications, some of which are forbidden in the ODS-2 file naming conventions, and others which have a reserved meaning to either the command-line interpreter (e.g. the space) or the file system structure (e.g. the ":", "[", "]" and "."). Generally the allowed but reserved characters can be used in ODS-5 file names if escaped using the "^" character. For example, the ODS-2 file name "THIS_AND_THAT.TXT" could be named "This^_^&^_That.txt" on an ODS-5 volume. More complex rules control the use of character combinations with significance to RMS, for instance multiple periods. The following file name is allowed on an ODS-5 volume, "A-GNU-zipped-TAR-archive^.tar.gz", where the non-significant period has been escaped making it acceptable to RMS.
The WASD server will accept request paths for file specifications in both formats, URL-encoded and RMS-escaped. Of course characters absolutely forbidden in request paths must still be URL-encoded, the most obvious example is the space. RMS will accept the file name "This^ and^ that.txt" (i.e. containing escaped spaces) but the request path would need to be specified as "This%20and%20that.txt", or possibly "This^%20and^%20that.txt" although the RMS escape character is basically redundant.
Unlike for ODS-2 volumes, ODS-5 volumes do not have "invalid"
characters, so unlike with ODS-2 no processing is performed by the server to
ensure RMS compliance.
2.4.2 - Characters In Server-Generated Paths
When the server generates a path to be returned to the browser, either in a viewable page such as a directory listing or error message, or as a part of the HTTP transaction such as a redirection, the path will contain the URL-encoded equivalent of the canonical form of an extended file specification escaped character. For example, the file name "This^_and^_that.txt" will be represented by "This%20and%20that.txt".
When presenting a file name in a viewable page the general rule is to also
provide this URL-equivalent of the unescaped file name, with a small number of
exceptions. The first is a directory listing where VMS format has been
requested by including a version component in the request file specification.
The second is in similar fashion, but with the tree facility,
displaying a directory tree. The third is in the navigation page of the
UPDate menu. In all of the instances the canonical form of the
extended file specification is presented (although any actual reference to the
file is URL-encoded as described above).
2.4.3 - Document Cache
The Web server is most commonly set up to cache static documents (files). A cache is higher speed storage, in-memory, in the server itself. Cached documents are checked periodically for changes when being requested. Changes to a file are determined by the comparing the modification date/time and file length. A common check period is one minute, though it can set longer or even disabled. If a document has changed the old one is discarded from cache (called invalidation) and the new one loaded into cache while being transfered to the client.
After making changes to a document it is possible the server will continue to serve the old one for a short period. This can be overridden by using the browser's Reload facility. This directs the server to go and check the on-disk file regardless, invalidating it if necessary.