Digital UNIX Version 4.0 supports the following file systems which are accessed through the OSF/1 Version 1.0 Virtual File System (VFS):
/proc
File system (PROCFS)
Note that all of the file systems are integrated with the Virtual Memory Unified Buffer Cache (UBC).
In addition, Digital UNIX Version 4.0 supports the Logical Storage Manager (LSM) and the Prestoserve file system accelerator.
Note that the Logical Volume Manager is being retired in this release.
The following sections briefly discuss VFS, the file systems supported in Digital UNIX Version 4.0, the Logical Storage Manager, and the Prestoserve file system accelerator.
The
Virtual File System
(VFS),
which is
based on the
Berkeley 4.3 Reno Virtual File System,
provides a uniform interface abstracted from the file system
layer
which allows common access to files,
regardless of the file system
on which the files reside.
A structure known as
a
vnode
(analogous to an inode)
contains information about each file
in a mounted file system
and is more or less a wrapper around
file system-specific nodes.
If,
for example,
a read or write is requested on a file,
the
vnode
points the
system call to the system call appropriate for
that file system
(a
read
request is
pointed to a
ufs_read
if the request is made on a file in a UFS file system
or to an
nfs_read
if the request is made on a file in an NFS-mounted file system).
As a result,
file access across different file systems is transparent to the user.
Digital's
VFS
implementation
also
supports
Extended File Attributes (XFAs).
Although originally intended to provide support
for
system security (Access Control Lists) and
the Pathworks PC server
(so that a Pathworks PC server could assign PC-specific
attributes to a file, such as icon color,
the startup size of the application,
its backup date,
and so forth),
the
XFA
implementation was expanded to
provide support
for any application that wants to assign
an XFA
to a file.
Currently,
both
UFS and
AdvFS
support
XFAs,
as well as
the
pax
backup
utility
which has
a
tar
and
cpio
front-end. XFAs are also supported for remote UFS file systems, to a server
which supports a special protocol which currently only Digital supports.
For more information on
XFAs,
see
setproplist(2).
For more information on
pax,
see
pax(1).
In
Digital UNIX Version 4.0,
the
VOP_READDIR
kernel
vnode
operation
interface has been changed
to accommodate a new
structure,
kdirent,
in addition to the
existing
dirent
structure.
The
new
kdirent
structure was developed to make
file systems other than UFS work
properly over NFS.
Note,
however,
that
if you implement a file system under
Digital UNIX,
you do not need to make any changes to your
VOP_READDIR
interface routine for
Digital UNIX Version 4.0,
and
applications see the same interface as before
the addition of the new
kdirent
structure.
Unlike the
dirent
structure,
the
kdirent
structure
has a
kd_off
field
that subordinate file systems
can set to point to the
on-disk
offset of the next directory entry.
Arrays of
struct kdirent
must be padded to
8-byte boundaries,
using the
KDIRSIZE
macro,
so that
the
off_t
is properly aligned;
arrays of
struct dirent
are only padded to 4 bytes.
Each mounted file system
has the option of setting the
M_NEWRDDIR
flag in the
mount
structure
m_flag
field.
If the
M_NEWRDDIR
flag
is set,
then the
routine
calling
VOP_READDER
expects
the
readdir
on that vnode
to return an array of
struct kdirent;
if the
M_NEWRDDIR
flag
is clear (the default),
then the
the
readdir
on that vnode
returns an array of
struct dirent.
In terms of NFS,
if the
M_NEWRDDIR
flag
is not set,
then the NFS server uses
the
dirent
structures
and then calculates
the necessary offset to pass back to the server.
Thus,
to ensure
proper operation over NFS,
any file system that does not have
the
M_NEWRDDIR
flag
set
must be prepared to have
VOP_READDIR
called with
offsets based on a
packed array of
struct dirent,
which may be in conflict with the offsets on the on-disk
directory structure.
However,
if the
M_NEWRDDIR
flag
is set,
then the NFS server uses
the
kd_off
fields of the
kdirent
structures
to generate the necessary offsets to pass back to the
server.
A new
vnode
operation
VOP_PATHCONF
was added to the kernel in order to
return
filesystem-specific information for
the
fpathconf()
and
pathconf()
system
calls.
This
vnode
operation takes as arguments the pointer to struct
vnode,
the
pathconf
name
int,
return value pointer to
long
and error
int.
It also
sets the return value and
ERRNO.
Note that
each filesystem must implement the
vnode
operation by providing a function in the
vnodeops
structure after
the
vn_delproplist
component (at the end of the structure). This
function takes as arguments the pointer to
vnode,
the
pathconf
name,
and
the
return value
pointer to
long.
The function sets the return value and
returns zero for succes or
an error number.
The UNIX File System (UFS) is compatible with the Berkeley 4.3 Tahoe release. UFS allows a pathname component to be 255 bytes, with the fully qualified pathname length restriction of 1023 bytes. The Digital UNIX Version 4.0 implementation of UFS supports file sizes which exceed 2 GBs.
Digital added support for file block clustering which provides sequential read and write access that is equivalent to the raw device speed of the disk and up to a 300% performance increase over previous releases of the operating system; file-on-file mounting (FFM) for STREAMS; and integrated UFS with the Unified Buffer Cache. UFS also supports Extended File Attributes (XFAs). For more information on XFAs, see Section 4.2.
The Network File System (NFS) is a facility for sharing files in a heterogeneous environment of processors, operating systems, and networks, by mounting a remote file system or directory on a local system and then reading or writing the files as though they were local.
Digital UNIX Version 4.0 supports NFS Version 3, in addition to NFS Version 2. NFS Version 2 code is based on ONC Version 4.2, which Digital licensed from Sun Microsystems. The NFS Version 3 code supersedes ONC Version 4.2, although at the time that NFS Version 3 was ported to Digital UNIX, Sun Microsystems had not yet released a newer, public version of ONC with NFS Version 3 support.
NFS Version 3 supports all the features of NFS Version 2 as well as the following:
Allows users to access files larger than 2 GBs over NFS
READDIRPLUS
procedure
that returns file handles and attributes
with directory names
to eliminate
LOOKUP
calls
when scanning a directory
GETATTR
procedure calls
ACCESS
procedure
that fixes the problems in NFS Version 2 with
superuser
permission mapping,
and
allows
access checks at file-open time,
so that the server can better support
Access Control Lists (ACLs)
PATHCONF
procedure
Since Digital UNIX supports both NFS Version 3 and Version 2, the NFS client and server bind at mount time using the highest NFS version number they both support. For example, a Digital UNIX Version 4.0 client will use NFS Version 3 when it is served by a Digital UNIX Version 4.0 NFS server; however, when it is served by an NFS server running an earlier version of Digital UNIX, the Digital UNIX Version 4.0 NFS client will use NFS Version 2.
For more detailed information on NFS Version 3, see the paper NFS Version 3: Design and Implementation (USENIX, 1994).
In addition to the NFS Version 3.0 functionality, Digital UNIX supports the following Digital enhancements to NFS:
NFS has been traditionally run over the UDP protocol. Digital Unix V4.0
now supports NFS over the TCP protocol. See
mount(8)
for additional details.
On an NFS server, multiple write requests to the same file are combined to reduce the number of actual writes as much as possible. The data portions of successive writes are cached and a single metadata update is done that applies to all the writes. Replies are not sent to the client until all data and associated metadata are written to disk to ensure that write-gathering does not violate the NFS crash recovery design.
As a result,
write-gathering increases
write throughput by up to 100 %
and the CPU overhead associated with writes is substantially reduced,
thereby further increasing server capacity.
Using
the
fcntl
system call
to control access to file regions,
NFS-locking
allows you to place
locks on file records over NFS,
thereby protecting, among other
things, segments of a shared,
NFS-served database.
The status daemon,
rpc.statd,
monitors the NFS-servers
and
maintains
the NFS lock if the
server goes down.
When the NFS server comes back up,
a reclaiming process allows the lock to
be reattached.
The
automount
daemon automatically and transparently mounts and
unmounts NFS file systems on an as-needed basis. It provides an alternative
to using
the
/etc/fstab
file
for NFS mounting file systems on client machines.
The
automount
daemon can be started from the
/etc/rc.config
file or
from the command line. Once started, it sleeps until a user attempts to
access a directory that is associated with an automount map or any directory
or file in the directory structure. The daemon awakes and consults the
appropriate map and mounts the NFS file system.
After a specified period of inactivity on a file system,
5 minutes by default, the automount daemon unmounts that file
system.
The maps indicate where to find the file system to be mounted and the mount options to use. An individual automount map is either local or served by NIS. A system, however, can use both local and NIS automount maps.
Automounting NFS-mounted file systems provides the following advantages over static mounts:
automount
will connect you to the fastest server that responds.
If at least one of the servers is available,
the mount will not hang.
automount
daemon
conserves system resources,
particularly memory.
PC-NFS, a product for PC clients available from Sun Microsystems, allows personal computers running DOS to access NFS servers as well as providing a variety of other functionality.
Digital supports the
PC-NFS
server daemon,
pcnfsd,
which allows
PC clients with PC-NFS configured to do the following:
The
PC-NFS
pcnfsd
daemon,
in compliance with
Versions 1.0 and 2.0 of
the
pcnfsd
protocol,
assigns UIDs and GIDs
to PC clients so that they can
talk to NFS.
The
pcnfsd
daemon
performs UNIX login-like password and username
verification on the server for the PC client.
If the authentication succeeds,
the
pcnfsd
daemon
then grants
the PC client the same permissions accorded to that username.
The PC client can mount NFS file systems by
talking to the
mountd
daemon
as long as the NFS file systems are exported to the PC client
in the
/etc/exports
file
on the server.
Since there is no mechanism in DOS to perform file permission
checking, the PC client calls the authentication server to perform
checking of the user's credentials against the file's
attributes.
This happens when the PC client makes NFS requests to the server for
file-access that requires permission checking, such as opening of a
file.
The
pcnfsd
daemon
authenticates
the PC client
and then spools and prints the file
on behalf of the client.
Digital UNIX Version 4.0 supports the ISO-9660 CDFS standard for data interchange between multiple vendors; High Sierra Group standard for backward compatibility with earlier CD-ROM formats; and an implementation of the Rock Ridge Interchange Protocol (RRIP), Version 1.0, Revision 1.09. The RRIP extends ISO-9660 using the system use areas defined by ISO-9660 to provide mixed-case and long filenames; symbolic links; device nodes; deep directory structures (deeper than ISO-9660 allows); UIDs, GIDs, and permissions on files; and POSIX time stamps.
This code was taken from the public domain and enhanced by Digital.
In addition, Digital UNIX Version 4.0 also supports X/Open Preliminary Specification (1991) CD-ROM Support Component (XCDR). XCDR allows users to examine selected ISO-9660 attributes through defined utilities and shared libraries, and allows system administrators to substitute different file protections, owners, and file names for the default CD-ROM files.
Digital UNIX Version 4.0 supports a Memory File System (MFS) which is essentially a UNIX File System that resides in memory. No permanent file structures or data are written to disk, so the contents of an MFS file system are lost on reboots, unmounts, or power failures. Since it does not write data to disk, the MFS is a very fast file system and is quite useful for storing temporary files or read-only files that are loaded into it after it is created.
For example,
if you are performing a software build
which would have to be restarted if it failed,
the
MFS is
a very appropriate choice to use for storing the temporary files
that are created during the build,
since by virtue of its speed it would reduce the build time.
For more information,
see the
newfs(8)
reference page.
The
/proc
file system enables running processes to be accessed and
manipulated as files by the system calls
open,
close,
read,
write,
lseek,
and
ioctl.
While the
/proc
file system is most useful for debuggers,
it
enables any process with the correct permissions to
control another running process.
Thus, a parent/child relationship does not
have to exist between a debugger and the process being debugged.
The
dbx
debugger
that ships in
Digital UNIX Version 4.0
supports
attaching to running processes
through
/proc.
For more information,
see the
proc(4)
and
dbx(1)
reference pages.
The
File-on-File Mounting (FFM)
file system
allows
regular,
character, or block-special files to be mounted over regular files,
and, for the most part, is only used by the SVR4-compatible
system calls
fattach
and
fdetach
of a STREAMS-based pipe (or FIFO).
With FFM,
a FIFO, which
normally
has no file system object associated with it, is given a name in the
file system space.
As a result,
a process that is unrelated
to the process that created the FIFO can then access the FIFO.
In addition
to programs using
FFM through the
fattach
system call,
users can
mount one regular file on top
of another using the
mount
command.
Mounting a file on top of another file
does not destroy the contents of the
covered file;
it simply associates the name of the covered file with the mounted
file, making the contents of the covered file temporarily unavailable.
The covered file can be accessed after the file mounted on top of it
is unmounted, either by a reboot or by a call to
fdetach,
or by entering the
umount
command.
Note
that the
contents of the covered file are still available to any process which
had the file open at the time of the
call to
fattach
or when a user issued a
mount
command
that covered the file.
The File Descriptor File System
(FDFS)
allows
applications
to reference a process's open file descriptors
(0, 1, 2, 3, and so forth)
as if they were files in the UNIX File System
(for example,
/dev/fd/0,
/dev/fd/1,
/dev/fd/2)
by aliasing a process's open file descriptors
to file objects.
When the
FDFS is mounted,
opening or creating a
file descriptor file has the same effect as
calling
the
dup(2)
system call.
The FDFS allows applications that were not written with support for UNIX I/O to avail themselves of pipes, named pipes, and I/O redirection.
The FDFS is not mounted by default and must either
be mounted by hand or by an entry placed in the
/etc/fstab
file.
For more information on
the FDFS,
see the
fd(4)
reference page.
The POLYCENTER Advanced File System (AdvFS), which consists of a file system that ships with the base system and a set of file system utilities that are available as a separate, layered product, is a log-based (journaled) file system that is especially valuable on systems with large amounts of storage. Because it maintains a log of active file-system transactions, AdvFS avoids lengthy file system checks on reboot and can therefore recover from a system failure in seconds. AdvFS ensures that log records are written to disk before data records, ensuring that file domains (file systems) are recovered to a consistent state. AdvFS uses extent-based allocation for optimal performance.
To users and applications, AdvFS looks like any other UNIX file system. It is compliant with POSIX and SPEC 1170 file-system specifications. AdvFS file domains and other Digital UNIX file systems, like UFS, can exist on the same system and are integrated with the Virtual File System (VFS) and the Unified Buffer Cache (UBC). AdvFS file domains can also be remote-mounted with NFS and support extended file attributes (XFAs). For more information on XFAs, see Section 4.2.
In addition to providing rapid restart and increased file-system integrity, AdvFS supports files and file systems much larger than 2 GBs and, by separating the file system directory layer from the logical storage layer, provides increased file-system flexibility and manageability.
In addition to the Advanced File System that ships as part of the base operating system, the POLYCENTER Advanced File System Utilities are available as a layered product. The AdvFS Utilities enable a system administrator to create multivolume file domains, add and remove volumes online, clone filesets for online backup, unfragment and balance file domains online, stripe individual files, and establish trashcans so that users can restore their deleted files. The AdvFS Utilities also provide a Graphical User Interface for configuring and managing AdvFS file domains. The AdvFS Utilities require a separate license Product Authorization Key (PAK). Contact your Digital representative for additional information on the AdvFS Utilities product. For more information on AdvFS, see the System Administration guide and the POLYCENTER Advanced File System Utilities Technical Summary.
Digital UNIX Version 4.0 supports the Logical Storage Manager (LSM), a more robust logical storage manager than Logical Volume Manager (LVM), which it has replaced. LSM supports all of the following:
Disk spanning allows you to concatenate entire
disks or parts (regions) of multiple disks
together to use as one, logical volume.
So,
for example,
you could
"combine"
two RZ26s
and have them
contain the
/usr
file system.
Mirroring allows you to write simultaneously to two or more disk drives to protect against data loss in the event of disk failure.
Striping improves performance by breaking data into segments that are written to several different physical disks in a "stripe set."
LSM supports disk management utilities that, among other things, change the disk configuration without disrupting users while the system is up and running.
Mirroring, striping and the graphical interface require a separate license PAK. The LSM code came from VERITAS (the VERITAS Volume Manager) and was enhanced by Digital.
For each logical volume
defined in the system,
the LSM volume device driver
maps
logical volume I/O to physical
disk I/O.
In addition,
LSM
uses a user-level volume configuration daemon
(vold)
that controls changes to the
configuration of
logical volumes.
Users can administer LSM
either through a series of command-line utilities
or by availing themselves of an intuitive
Motif-based graphical interface.
To ensure a smooth migration from LVM to LSM, Digital has developed a migration utility that maps existing LVM volumes into nonstriped, nonmirrored LSM volumes that preserves all of the LVM data. After the migration is complete, administrators can mirror the volumes if they so desire.
Similarly, to help users transform their existing UFS or AdvFS partitions into LSM logical volumes, Digital has developed a utility that will transform each partition in use by UFS or AdvFS into a nonstriped, nonmirrored LSM volume. After the transformation is complete, administrators can mirror the volumes if they so desire.
Note that LSM volumes can be used in conjunction with AdvFS, as part of an AdvFS domain; with RAID disks; and with the Available Server Environment (ASE), since LSM supports logical volume failover. For more information on LSM, see the Logical Storage Manager.
The enhancements related to Overlap Partition Checking are described next.
Partion overlap checks were added to a number of commands in Digital UNIX
Version 4.0. Some of the commands which use these checks are:
newfs,
fsck,
mount,
mkfdnm,
swapon,
voldisksetup,
and
voldisk.
The enhanced checks require a disk label to be installed on the disk.
Refer to the
disklabel(8)
reference page for further information.
The checks ensure that if a partition or an overlapping partition is
already in use (for example, mounted or used as a swap device), the
partition will not be overwritten. Additionally, the checks ensure
that partitions will not be overwritten if the specific partition or
an overlapping partition is marked in use in the
fstype
field on the disk label.
If a partition or an overlapping partition has an in-use
fstype
field in the
disklabel, some commands inquire interactively if a partition can be
overwritten.
Two new functions,
check_usage(3)
and
set_usage(3)
are available
for use by applications. These functions check whether a disk partition is
marked for use and set the
fstype
of the partition in the disk label. See
the appropriate reference pages for these functions for more information.
The Prestoserve file system accelerator is a hardware option that speeds up synchronous disk writes, including NFS server access, by reducing the amount of disk I/O. Frequently-written data blocks are cached in nonvolatile memory and then written to disk asynchronously.
The software required to drive the board ships as an optional subset in Digital UNIX Version 4.0 and once it is installed can be accessed with a PAK that comes with the board.
Prestoserve uses a write cache for synchronous disk I/O. Prestoserve works in a way that is similar to the way the system buffer cache speeds up asynchronous disk I/O requests. Prestoserve is interposed between the operating system and the device drivers for the disks on a server. Mounted file systems and unmounted block devices selected by the administrator are accelerated.
When a synchronous write request is issued to a disk with accelerated file systems or block devices, it is intercepted by the Prestoserve pseudodevice driver, which stores the data in nonvolatile memory instead of on the disk. Thus, synchronous writes occur at memory speeds, not at disk speeds.
As the nonvolatile memory in the Prestoserve cache fills up, it asynchronously flushes the cached data to the disk in portions that are large enough to allow the disk drivers to optimize the order of the writes. A modified form of Least Recently Used (LRU) replacement is used to determine the order. Reads that hit (match blocks) in the Prestoserve cache also benefit.
Nonvolatile memory is required because data must not be lost if the power fails or if the system crashes. As a result, the hardware board contains a battery that protects data in case the system crashes. From the point of view of the operating system, Prestoserve appears to be a very fast disk.
Note that there is a substantial performance gain when Prestoserve is used on an NFSV2 server.
The
dxpresto
command
allows you to monitor Prestoserve activity
and to enable or disable Prestoserve on machines that
allow that operation.
For more information on Prestoserve
see the
Guide to Prestoserve
and the
dxpresto(8X)
reference page.