RE: Replication/migration proposal

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (dave.noveck@netapp.com)
Date: 06/14/99-11:38:10 AM Z


Message-ID: <7F608EC0BDE6D111B53A00805FA7F7DA03303640@tahoe.corp.netapp.com>
From: "Noveck, Dave" <dave.noveck@netapp.com>
Subject: RE: Replication/migration proposal
Date: Mon, 14 Jun 1999 09:38:10 -0700

 
> The proposal is more palatable for migration than for 
> replication. 

I would say it is more critical for migration than for
replication, since there is no other proposal to support
migration, and migration, under the rubric "Reliable and
Available" is one of the things the Design Considerations
puts forth as important, not to mention the fact that 
people need it.

If there are other approaches to migration, let's hear them.
While I don't think my proposal is perfect, I think the
basic outlines are just about mandated by the requirements.
We need to be able to make read-write filesystems available
even when a server is not functional for whatever reason.
Filesystems may move around for lots of legitimate reasons
(particularly in the world of SAN).  We must allow clients 
(but not necessarily servers) to deal with that within the
NFS protocol because to do otherwise would pose unacceptable
burdens on the clients and firewall issues that nfs-v4 is
supposed to avoid.  Given all that, I was looking for the
simplest way to do it.  I wasn't looking for the best, most
complete way because of the kitchen-sink-problem.

> The
> overall problems with getting this info from the server are:
> 1.  Redundancy of information.  If a filesystem is available at n
> sources, each of these n will have to store information about 
> the other
> n-1.  If one thing changes, the change needs to be propagated 
> to all the
> others.

You're assuming a particular implementation at least in outline.
I call it the hack-it-up version and it's the first way I would think
about implementing it, too.  It may not be the best way, but it
might be the first way.

The point is that the servers may get this information any way they
want.  My proposal doesn't and the spec shouldn't tell them how to
do it and it may change.  It may be hack-it-up today, nis+ tomorrow,
ldap the day after that.  The server can be notified of changes or
it can make requests when it needs to.  This is all up to the 
implementation.  The point is that the clients don't have to change
no matter how many times the server implementation changes.

> 2.  Naming issues in complex networks.  Today's servers have 
> multiple IP
> interfaces, usually with different names for each interface.  
> Each server
> will have to keep knowledge of all the interfaces of all the 
> servers that
> provide access to this filesystem.  How does the client 
> decide which of
> these interfaces to choose?  The situation is worse if interface names
> are not the same from the perspective of each client.  Not to say this
> situation is handled cleanly by today's methods, but do we 
> really want to
> drag this complexity into the NFS protocol?  There is a lot 
> of merit to
> having separate solutions for separate problems rather than adopting a
> kitchen-sink approach and throwing everything into NFSv4 just 
> because it
> is needed for file sharing (why not hostname-to-IPaddr 
> translation also,
> while we are at it?).

To push the kitchen-sink metaphor a bit (probably past its breaking 
point), the intention of this proposal is precisely to keep the kitchen
sink in the kitchen behind a thick concrete wall, but, before pouring 
that concrete, to prepare a very small opening, though which will be
routed some quarter-inch copper tubing, so we can do what we need to 
do, without having to deal with the kitchen-sink directly.

All of the complexity you talk about above, illustrates the kind of 
thing we don't want in the nfs protocol.  This is a complicated problem
which is going to be solved a number of times in different ways, possibly
without end, as things get more complex.

The question is, should the clients bear the burden of that complexity
and all the change associated with it, or should they rely on the server?
If they have to do the job themselves, then effectively it will not get
done in many important cases.  Either supporting nfs-v4 will entail 
supporting a long list of other protocols (and also note the firewall
issues that these bring in their wake), or clients will not have the 
benefits of "Realiable and Available" servers, which we should provide 
them, unless there is something that makes that impossible.

I don't believe the Marie Antoinette answer, "Let them support LDAP",
is appropriate here.  We got rid of the Mount protocol because it is
not firewall friendly.  I am working on a caching proposal under the
constraint that callbacks have to be optional, and it ain't fun.  If
the rule is that nfs-v4 clients only need to speak one protocol to get
the functionality they need so that firewalls only need to accomodate
that one protocol, then it has to apply to this stuff as well, and the
answer follows: the clients have to get the information from the server
and they have to do it using the nfs-v4 protocol.

This proposal leaves the servers alone.  They can provide the information
to the clients, if they want to provide this functionality, but they
don't have to.  They can use any number of protocols to get it.  Since
a group of servers capable of providing the same file system are going
to be in the same administrative domain and are most likely to be in
the same firewall domain, lots of issues go away.  Also, the servers
providing access to the same filesystem, particularly a read-write one,
are almost certainly (a DOS floppy is an exception) running the same 
species/genus/family of operating system so the need for standardization 
doesn't exist.

Someday, if SAN standardization prospers, and vendors can agree on a
filesystem format, then it would make sense to standardize the means
by which a group of NFS servers agree on how they are to determine
what servers are providing NFS service to what filesystems.  That
day certainly isn't now.


> 3.  Even if we look past the above issues, what information must the
> server return?  You suggest hostname:filename.  What filename is this?
> The pathname of the file corresponding to this handle?  How will the
> server know this, once the filesystem is no longer on the server?  The
> pathname of the filesystem root?  What does this mean to the 
> client, who
> may have mounted a subdirectory of the root?  The pathname of the
> directory the client mounted?  How does the server know this, 
> regardless
> of whether the file system is still on the server?  Perhaps the only
> thing that will work is to return just the hostname, expecting the new
> server to interpret the old handles.  Of course, this is why we have
> volatile filehandles in the first place, but I don't recall 
> there being
> consensus on whether volatile handles will actually work, especially
> under such circumstances.

I don't see the problem here.  server_locations is a per-fsid attribute
so that it is uniform throughout all filesystem objects with a
given fsid, which means that it designates the locations for the 
filesystem as a whole in terms a hostname and a pathname on that host.  
That pathname is exactly the pathname of the root of that filesystem on 
the new host.  There are two cases: persistent filehandles and volatile
filehandles (or handles that persist upon migration and those that don't 
if we go down the volatility-upon-a-specific-occasion route).

If handles persist then pathname is, stricly speaking, not necessary, 
but I would be very uncomfortable without it.  Since the protocol's 
basic model is that handles are owned by a server, transitioning these 
opaque entities to another server, even if the first server told you 
to do so, is something I wouldn't do without a safety check.  So what 
I would do, is to start at the new server's root and use LOOKUP to take
me (using the pathname part of the server_locations attribute) to the 
root for the filesystem.  If that handle is not same as the filehandle
on the old server, something has gone terribly wrong and we should not
proceed.  In the case in which the client has mounted some subtree 
of a server filesystem, the client should when doing his LOOKUP at 
mount time save the handle of the server filesystem as well as the 
handle of the root of his subtree and the pathname to get from one 
to the other.  This is saved as per-fs information on the client, 
entirely apart from saving pathname componentss for volatile file 
handles.  Then, upon migration, the client can validate both the
filehandle for the server-fs root and for the client root.  This is 
good  engineering practice which the spec should probably mandate by 
saying that clients "SHOULD" make those checks.

If the handles are volatile (have evaporated?  Ok, I'll stop), then 
you use the pathname in server_location to the get to the server's 
global root to the servers fs-root.  You then use saved pathname 
components to get from the server's fs-root to the client root in 
that fs, if it is  different.  Note that, you are going to have to 
save that part of the pathname between the server-fs root and the 
root of the client-tree even apart from migration.  Since the root 
of the client-tree is volatile, you need to be able to get from the
server's fs-tree root to you root whenever the handle for the client
root expires, which it can always do.
 
As far as volatile filehandles, there is certainly no consensus about
them (Whether they will "work" and what exactly "work" should mean).  We 
have N engineers and about 2N opinions.  My two (printable) opinions
are: "They stink" and "We're stuck with them".  If you have a server
that implements volatile handles, for replication (likely) or for 
migration (unlikely), or for anything else, it's your job to tell your
customers what works and what doesn't and deal with the calls.  Unless 
there is some prospect of a consensus different from our current agree-
to-disagree-and-hope-that-it-doesn't-matter, I don't see spending more 
time on it.


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:47:13 AM Z CST