Replication/Migration conference call minutes, 2/26

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Robert Thurlow (Robert.Thurlow@eng.sun.com)
Date: 02/27/02-12:00:09 AM Z


Date: Tue, 26 Feb 2002 23:00:09 -0700 (MST)
From: Robert Thurlow <Robert.Thurlow@eng.sun.com>
Subject: Replication/Migration conference call minutes, 2/26
Message-ID: <Roam.SIMC.2.0.6.1014789609.17512.thurlow@jurassic.eng>

Conference call minutes, Feb 26 2002, as reported by Rob Thurlow

The following people met today in a conference call to get the
ball rolling on replication and migration:

	"Dave Noveck" <Dave.Noveck@netapp.com>
	"Spencer Shepler" <Spencer.Shepler@sun.com>
	"William A.(Andy) Adamson" <andros@umich.edu> and the CITI team
	"Rob Thurlow" <robert.thurlow@sun.com>

The first thing we talked about was Rob's draft requirements
bullets, as follows:

        Must be transparent to applications on client
                Exceptions?
        Must be as secure as NFSv4
        Must be efficient
                Minimal lock-out time
                Reasonable bandwidth requirements
                Propagate differences
                Restartable
        Must be scalable to large and small filesystems

We spent the most time talking about the first requirement, especially
on what "transparent" means.  Rob thought the toughest edge case
was what happens to an application which has a file open when a
migration is complete or a different replica has to be chosen.  The
ideal thing is that the application notices nothing except a slight
delay in access to the file.  This view makes the data the critical
thing; most metadata need not be quite so correct.  We talked about
the issues that made this difficult; we agreed that renames on open
files was a tough (though not insoluble) issue.  Dave talked about
a way the server could choose not to expire filehandles for open
files and continue to provide service until the files were closed.
It was agreed that this could be good to be able to do, even if it
meant waiting for a minor revision of the NFSv4 protocol.

We talked about what the unit of migration is or should be.  The
current NFSv4 spec talks about the replication or migration of
filesystems with a single fsid, which most implementors accept
from the underlying local filesystem.  This could be an issue;
there is a need for migration of portions of large filesystems,
e.g. the 1 Tb LUN your sysadmin set up might contain your 500 Mb
home directory, which you would like to manage as a unit.  Dave
mentioned that the server could choose to give out a modified
fsid to permit this kind of carving up of large filesystems, and
that this would not require a protocol change.  Rob mentioned
that a prior project had done work on the problem of how to
lock subtrees of larger filesystems.  It was agreed that we
needed to look at the current spec to remind ourselves how the
spec defined and referred to fsids.

Andy asked if, when using read-only replication, the server may
issue read delegations?  Probably - this would mean a client
could cache aggressively until a new replica update is received.

We spent some time talking about the purpose of the fs_locations
attribute; it is likely that the client will do an initial mount
and based on information from a name service, and then fetch the
fs_locations attribute; this means possible conflicting information
from the two sources, conflicting information from the servers
hosting the different replicas, the issue of keeping the name
service updated, and other horrors.  Dave mentioned that he had
been thinking about having clients point to a "referral server"
(my term, I think, not Dave's - R) at e.g. nfsv4:/, and having
all resources in the domain be found by getting redirected to
the correct server during the lookup process.  Rob mentioned that
we had had discussions within Sun about something similar, and
that he liked the idea as it could free us from dependency on a
namespace.

We talked about the granularity of the updates - when a file
had been touched, would a replica update transfer the entire
file or just the byte ranges which had been changed?  It was
clear to all that it was important for the protocol to be able
to express the transfer of just modified portions of a file,
as e.g. a large database table would be far more efficient to
update via changed byte ranges.

We spent some time talking about filesystem versioning.  Andy thought
that some kind of version attribute was important to be able to at
least debug problems that occur when replicas don't get updated
properly, or when an application running from an NFS filesystem
dies because its pages are no longer valid due to a replica update.
Rob wondered if a per-filesystem modification time might be more
straightforward and better.

We discussed the other requirements briefly, and agreed that they
were reasonable and much less controversial than "transparency".
Rob said he'd wondered if "minimal lock-out time" wasn't more an
end-user requirement of a product, decoupled from the protocol
we're interested in, but Dave thought it was not out of place as
a corollary to "propagate differences".  We agreed that once Rob
had minutes posted, we would have some further discussions about
the transparency requirement to make it work.

We discussed single-writer/multiple-reader replication; it would
be nice to be able for the client to be able to determine which
replica in fs_locations was the writable master, and to be able
to force use of that whenever a file was opened for writing, as
this would give us the closest match to DCE/DFS.  People felt
that this was probably not a current issue, but that if a way
could be found to support it in the protocol, that future work
might take advantage of this.  The biggest issue seemed to be
that the client's caching strategies might make implementing
this difficult.

We closed after a brief discussion of what future issues we
should discuss and how we should keep the working group
informed of what we talk about.  It was decided that we would
always post minutes of these conference calls to the working
group alias to make sure our work remains visible.

Rob T


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:49:34 AM Z CST