Re: Replication/Migration issues list

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Ted Anderson (TedAnderson@mindspring.com)
Date: 10/03/02-07:20:42 AM Z


Message-ID: <3D9C361A.9060007@mindspring.com>
Date: Thu, 03 Oct 2002 08:20:42 -0400
From: Ted Anderson <TedAnderson@mindspring.com>
Subject: Re: Replication/Migration issues list

This note addresses some of the harder questions about scope and goals.

Regarding file system state, it seems to me that there are three cases:
read-only replicas, writable replicas (e.g. a fully populated cache) and
migration.  All these require addressing the question of file handles
(and file ids), since this involves issues of file system implementation
which may differ from one replica/host to another.  The first is
distinguished because it has very limited (or absent) requirements for
managing opens, shares and delegations.  For the latter two, however,
this management is important because of the need to coordinate writers
with other users.  Consistency of writable replicas depends upon
communicating information about multiple readers and writers with a
central coordinating authority.  On the other hand, migration involves
transferring this coordination from one authority to another.  These
latter two activities seem to require different mechanisms.  Thus, I
conclude there is a need for three mechanisms to support the full range
of file sharing functionality.

As mentioned in the previous message in connection with right-sizing
filesystems, I think we should move towards depending upon pathnames and
deprecating file ids, hardlinks, and ZLC files, though directory renames
remain an issue.  This shift, already begun by the introduction of
volatile file handles, eases the job of transferring file handles and
file ids between heterogeneous servers.

For writable replicas or caches, the NFSv4 client/server protocol
contains a suite of mechanisms that are arguably adequate to consistency
management.  I don't see any moves to add this capability to the
repl-mig protocol.  Is support for writable replicas (or even strongly
consistent read-only replicas which would benefit from something like
delegation) out of scope for the repl-mig protocol?

To support the migration of filesystems, there seem to be two minds on
the subject of this meta file system information.  Does the repl-mig
protocol rely on clients to reestablish state with the new server, like
after a crash, or does the protocol need to help.  I've heard opinions
both ways on the WG mailing list.  Has a consensus developed?

The approach of relying on the server crash recovery logic seems
workable.  I am not aware of any show stoppers with this approach,
except the grace period limit on high-availability[1].  Probably a few
corner cases that present some difficulties do exist.  We should collect
problematic scenarios so they can be evaluated based on the application
transparency goal.

The grace period problem, might admit to a third option, namely to
enhance the client-server protocol to ease the transfer of client state.
Specifically, a callback could be issued by the source server which
tells clients to start transferring their state to the target server.
Then clients send a finish message to advise the target that they have
completed their state transfer, which would allow the target to exit
this state promptly.  It does require that clients in an environment
requiring high-availability support callbacks, but this does not sound
onerous.

On 9/10/2002 13:25:51 MDT, Robert Thurlow wrote:
 >>Updates can be "pulled" as well as "pushed".
 >
 > The current proposal deals only with making information flow
 > from the master copy out to the replicas, and this flow is
 > unidirectional (modulo rsync-like examination of files).  The
 > way a transfer is initiated is unspecified - earlier thoughts
 > of also defining an third-party admin protocol to control this
 > were abandoned since we have things like SNMP and WBEM/CIM.
 > That means that the impetus to perform an update can come from
 > a human being, from the destination server, or from a "cron"
 > job, and happen out-of-band to cause the source to contact
 > the destination server(s) via this protocol.  We do need to
 > also define the SNMP MIB or WBEM/CIM schema to do this.

Okay, clearly at the lowest level all updates have to be "pushed" from
where writes occur to other sites.  The rest is "control".

So if this out-of-band mechanism is where all this control is located,
then this repl-mig protocol just ships the data when directed.  I
suppose with a sufficiently sophisticated control protocol to determine
what and where to send data, one could implement any kind of
replication.  For example, such a repl-mig protocol doesn't need support
for describing updates, such as incremental dumps.  Instead, the
contents of an update is negotiated out-of-band and the protocol just
ships the selected data.

Replication specification and maintenance has several components:
    * Where are the replicas positioned - location - list of sites.
    * When are updates received at the replicas - consistency -
      guarantees made about data to readers regarding currency and
      writers regarding safety.
    * Which objects does the replica house - subset - lots of choices are
      possible here, including "recently used" and "all".
    * What name does the replica have - read-only - writable secondaries
      usually have the same name as the primary so they are interposed,
      while read-only secondaries have different names from the writable
      primary.
These are all control-like functions which I guess would be out of
scope.  While I can see how SNMP could be used to manage the list of
replica sites, it seems less suitable for maintaining the consistency of
a replica.

If this protocol is targeted at shipping the filesystem data, then what
distinguishes its goals from those of the client-server protocol?  My
tentative answer is bulk transfers are optimized relative to requests
for information about individual files[2].  A protocol limited to file
transfer is likely to be significantly simpler than the client-server
protocol burdened with operational semantics (e.g. opens, reads, etc).
But is this the whole list of benefits?  Is there any data to support
the assumption that repl-mig can perform better than "big honking
compounds ops"[3]?

Ted Anderson

[1] http://playground.sun.com/pub/nfsv4/nfsv4-wg-archive/2001/0463.html
[2] http://playground.sun.com/pub/nfsv4/nfsv4-wg-archive/2001/0476.html
[3] http://playground.sun.com/pub/nfsv4/nfsv4-wg-archive/2002/0246.html


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:24 AM Z CST