Oplocks, SMB/NFS Retrospective, Commerical Expectations (long)

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Gordon Waidhofer (gww@traakan.com)
Date: 02/28/99-04:25:00 AM Z


Date: Sun, 28 Feb 1999 02:25:00 -0800
From: Gordon Waidhofer <gww@traakan.com>
Message-Id: <199902281025.CAA16766@traakan.com>
Subject: Oplocks, SMB/NFS Retrospective, Commerical Expectations (long)


Oplocks have been the subject of many recent postings.  David Hitz's
posting was really good, though I think it may have left the
uninitiated a little lost. There is confusion about oplocks being
a region/file locking mechanism, or a cache coherency mechanism,
or both.  Carl Beame routinely posts commercial requirements and
motivations, which I believe are often misunderstood and more often
underappreciated. There have been postings, both recent and long
ago, that NFS cache coherency should yet again be deferred.  Yet
commercial realities demand that such issues be addressed.

Oplocks? What are they? Where did they come from? What is the
motivation? How important are they to NFSv4?

This is a poor man's brief history of SMB, NFS, and oplocks.  By
reviewing these histories we can understand the respective
characteristics, motivations, and importance.  Caveat: I could have
a few fine points wrong.

SMB and NFS are both old; like over ten years.

SMB emerged on the original, wimpy PC platorm:  8086, 16-bit this
and that, and a mere 640kb of premium memory to support all major
functions.  In some circles, the SMB client-side is still referred
to as "the redirector". A TSR (a patch) intercepted the file API
calls (like system calls). API calls which involved "network drives"
(like mounts) were synchronously forwarded to the file server. All
state -- even the binding of the file descriptors to the underlying
file -- was managed by the server.  Using any of the 640kb for
client-side caching was an unthinkable luxury.  File (region)
locking was supported by the redirector long before there was local
file locking, or even multi-tasking on PC platforms.  A result of
the cacheless design was that all clients had a current (highly
coherent) view of the files.  Multi-client applications emerged
that took advantage of this, and they grew in popularity, and they
enjoyed commercial success ($$$). Multi-client applications were
the chief functional advantage that merited a file server.  Bear
in mind that multi-client was the only technique available for
implementing multi-user applications.

NFS emerged on relatively priviledged platforms:  32-bit everything,
multi-tasking, virtual memory, VAXen and Sun/2 and let's not forget
Pyramid's early contribution.  NFS suceeded over LucasFilm EFS,
which somewhat resembled the SMB redirector architecture, and AT&T
USG RFS, which is half-way between EFS and NFS. (AFS, et al, came
later).  The UNIX API is separated from NFS by several architectural
layers, including the VNODES (an internal file service abstraction
largely motivated by NFS). This distance is a constant challenge
to NFS developers.  Other UNIX file systems, converted to VNODES,
cached file data, so it seemed perfectly natural for the NFS
client-side (under VNODES) to do the same (ain't beefy hardware
handy).  UNIX file region locking predates NFS by just a few years,
and there were several designs which ultimately converged (I believe
John Bass presented the first UNIX region locking, and his original
design is reflected in modern designs).  Multi-user applications
on a single UNIX host were practical because region locking worked
and there was only one file (disk) cache, thus coherency.  UNIX
historically provides multi-tasking and multi-user capabilities.
Yet, historically, UNIX predominantly hosts single-user applications
such as text editors and compilers.  NFS developers were compelled
to focus on file service for single-user applications, and felt
unconstrained by coherent concurrent access.  Applications requiring
concurrent file access were deemed the domain of beefy "servers"
using local storage and local region locking.  NFS always has been,
and still is, very weak in the coherency area, and client-solicited
time stamps are the only cue for stale cache contents.  Consequently,
GETATTR requests are the lions share of NFS traffic.  The NLM
(network lock manager) -- referred to in some circles as the problem
child -- was little more than an afterthought.  There have been
numerous bandaids, dependent on cues from the API, to help NFS with
concurrent access.

Enter oplocks (opportunistic locks).  It's difficult to tell exactly
when oplocks arrived on the scene.  SMB does not have versions, it
has dialects. And it doesn't merely evolve, it forks and branches.

The name "oplock" is confusing for us UNIX folks.  I'm guessing
that the term "lock" was used by SMB developers to refer to all
multi-client issues and solutions.  UNIX folks think of locks as
pertaining to NLM issues, and the NLM doesn't
pertain to very much, so the relevance of oplocks to NFS is obscure.
UNIX folks might better comprehend oplocks if they think of them
as "revokable exclusive leases" (leases ala NQNFS).

The idea behind oplocks is that most file accesses are in fact done
by a single user on a single client, and so there is a performance
opportunity. SMB developers seized the opportunity.  Free of the
original "wimpy" constraints, they remained contrained by commercial
expectation.  SMB developers could not break the coherent multi-client
applications.  Further, they needed a mechanism that was fully
transparent to the API.  Their solution was oplocks.

If the client (at its discretion) obtains an oplock (exclusive
lease), then all client/server interaction for the sake of multi-client
access may be surpressed, including synchronous data writes and
posting of region locks (ala NLM).  The cues to the server for the
sake of maintaining (highly current) state may be deferred, and
perhaps eliminated.  While the oplock is in effect, GETATTRs are
not necessary for cache coherency, thus mitigating network congestion.
Prior to releasing its oplock (exclusive lease), the client flushes
all net-change state to the server.  If the server detects a second
client requiring concurrent access to the same file(s) while the
oplock is in effect, the server signals the first client to perform
the oplock release procedure.  If the client can not obtain the
oplock, or releases the oplock for any reason, the client/server
interactions follow the synchronous methods of SMB origin.

Oplocks win big. Single-user applications (like word processing)
run at full speed.  Multi-client applications continue to work with
highly-coherent concurrent file access.  Multi-client applications
with only one active user win big.  All this without cues from the
API, which is another big, spiffy point.

I concur with David Hitz.  I'm a UNIX bigot (I have the "Live Free
or Die" license plate on my wall).  I'm impressed with oplocks.
They are remarkably practical, remarkably efficient, and remarkably
transparent with respect to the API.

Are oplocks important to NFSv4? I would say that oplocks per-se
are not absolutely required.  The highly-coherent multi-client
issues are.  Oplocks represent one successful technique.  There
are certainly others.

It's a little troubling that SMB, with its humble origins, should
outstrip NFS, with its noble origins, on both highly-coherent
multi-client access and also on single-user high-performance access.
The contraints under which SMB was developed lead to compelling
functionality and attendant commercial expectations.  NFS presumes
single-user high-performance access at all times, and lead to
compelling mistrust of coherency and reliable locking.  SMB moved
from fully synchronous to coherent caching easily.  NFS has not
easily moved to multi-client applications, and, for example, some
database system vendors refuse to support their products when used
with NFS.  SMB file consistency is managed transparently to the
API.  Meanwhile, NFS applications must use tricky modes and calls
(fsync()).

It has been suggested that highly-coherent multi-client issues
(cache coherency, et al) be deferred until after NFSv4.  Further,
it has been suggested that NFSv4, once complete, could be used as
a "core" protocol to study the multi-client issues.  Bad idea.
Better to table NFSv4, use NFSv3 as the core, and study the issues
now.  Deadlines don't matter if the deliverable is not commercially
viable, and March of 2001 might as well be the due date for the
NFS wake.

As Carl Beame indicated, telling a customer he is wrong rarely has
the desired effect. VisualBasic ISAM files and Microsoft Access
are examples of popular applications dependent on coherent, concurrent
file access. It just wouldn't be helpful to tell a customer that NFS
would work much better for him if he would just stop using Microsoft.

Here's an antecdote. At the first CIFS conference, I was chatting
with one of the Microsoft SMB engineers. He liked NFS, largely
because it is free of the evolutionary baggage of SMB. There is
only one file READ opcode -- how nice. We were talking about READ_RAW
and other speedups, and their impact on coherency.  I mentioned
NFSv3 weak cache consistency.  Without hesitation, he said, "If I
suggested weak cache consistency to my group I would get fired.
It's just not good enough." It's telling that an SMB developer was
fully conversent on NFS and WCC. They are most certainly watching
the progress of NFS.

The competition takes coherency and performance seriously.  We can
do no less with NFSv4 without seriously impacting its commercial
viability. Coherent multi-client is not a priority for NFSv4, it
is THE priority.  If NFSv4 lacks such capabilities, then NFSv4 must
otherwise offer enough benefit for end-users to justify multiple
file sharing protocols on their machines:  the one that gets
multi-client right and NFSv4 which gets "what" right.

	-gww

P.S. Dennis Chapman's posting arrived after I drafted this. His
     link is well worth following.


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:46:46 AM Z CST