RE: V4 complexity (was re: Proposed charter addition)

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: 12/19/02-10:14:43 PM Z


From: Neil Brown <neilb@cse.unsw.edu.au>
Date: Fri, 20 Dec 2002 15:14:43 +1100
Message-ID: <15874.39219.3092.703227@notabene.cse.unsw.edu.au>
Subject: RE: V4 complexity (was re: Proposed charter addition)

On Thursday December 19, Dave.Noveck@netapp.com wrote:
> 
> The other related issue is suppose you have a PUTFH - op - GETFH,
> for whatever reason.  In your method, you might return a different
> filehandle than the one you started with.  Also this strikes me
> as strange, even though I don't know of any negative consequences.
> Or maybe this isn't what you were saying.  Are you supposing that
> the file handle would be copied into the fd?
> 

No, it is what I am saying.  I think you understand exactly how I
imagine filehandles would be managed on the server.
If a server used volatile filehandles then
   PUTFH - GETFH
could well return a different filehandle than was put.  Infact the
Linux NFSv4 server would work that way.  You can convince Linux to
include the directory inode in a filehandle so that moving between
directories changes the filehandle.  It should be fairly easy to
demonstrate this with the Linux NFS server and a scriptable client.

> > But why would a client want to release any open without releasing the
> > delegation?  The whole point of a delegation is the client doesn't
> > have to tell the serve what it is doing.
> > As you say, it "normally" "will not".  So there is no need for the
> > protocol to be able to support this.
> 
> Assume you have the process-direct-to-server model as v4 does.
> 
> So you do an open for process A and get back a delegation stateid and
> a stateid for A's open.  I'm not saying this is the only possible way 
> to model this, but given the model that V4 chose, this seems a very 
> clean way for the protocol to implement this.  I think it makes the
> protocol cleanly match the model.  You don't like the model but that's
> another story.

I think we might be violently in agreement here.
The current model makes a distinction between open and delegation, and
you rightly argue that if the distinction is in the model, then it
should be in the protocol.  I agree.

I'm saying that the distinction does not need to be in the model, in
which case it obviously wouldn't be in the protocol.  I think you
agree that a model in which the distinction is much less strong is
possible, and that with such a model, there would be no place in the
protocol for the distinction.

Then I start talking about the primacy of elegance and we start to
part company.

> 
> You don't talk about how this would work for byte-range locking.  I haven't
> thought it abut it in detail but I worry about the case of a POSIX client
> and non-POSIX (e.g. Windows) server.  It is part of semantics of the 
> Windows locking interface that locks are not divisible or mergeable and
> I'm wondering how you would deal with this case. 
> 

I think this issue doesn't change between NFSv4 and my vision.
Section 8.2 seems to say that it is up to the application to not try
to do anything that a WIN32 server wouldn't support, but allows for
POSIX clients talking to POSIX server to do what they like, but
doesn't allow the client to ask the server if it POSIX in this sense
(though I guess it could try a sub-range operation and see).

Certainly in my model the client would not merge lock requests from
different application but would keep them separate in case the server
had a WIN32 style locking protocol.


> > > 
> > > I think I've been here before.  I didn't like having these two either.
> > > I don't consider that part of the model, but rather part of V4's troubled
> > > approach to replay protection.  I tried a lot of things and wasn't able
> > > to do significantly better.  I'm not unwilling to believe that you 
> > > succeeded where I failed (well maybe a little bit :-)
> 
> > This is how I would do sequencing:
> >    We already have a concept of a 'client'.  It is represented by a
> >    'clientid'.
> >    First, allow that we can have several 'clientid's for the one
> >    client. A new clientid is allocate (if the server is willing) with
> >    something like:
> >
> >      FORKCLIENTID: clientid -> newclientid
> >
> >   Each clientid has an associated sequence number.  It starts at 0
> >   when the clientid is created, either by SETCLIENTID or
> >   FORKCLIENTID.
> >   Each clientid  has a cached "reply" - a complete reply to
> >   some compound request.
> 
> This in general is very interesting to me.  I'm looking for some
> solution to the general sequencing/replay problem (possible exception
> of open-create-exclusive) that will work in the V4/COMPOUND environment.
> DAFS, where I first saw the beauty/value of amalgamating all that,
> does things in an entirely different way that wouldn't carry over to
> V4.
> 
> Let me explain that approach so that you can understand where I'm
> coming from.  DAFS is session-oriented and when a seesion is created,
> some number of streams are negotiated (this falls out of the necessity 
> to flow control rdma) and so each request has a stream number and
> a sequence number within the stream.  Then everything is sequenced
> as you want.  The replay cache (only used when reconnecting) allows
> you to remember one operation for each stream.  When you issue a 
> request for a stream, you implicitly recognize that you have recieved
> the previous response for the stream so the replay cache information
> is strictly bounded.  Then you don't need sequence id in the messages 
> or changing state-id or open-confirm.  It's a nice package. 

This sound's very like my model, though set in a somewhat different
context.
A 'stream number' maps to a 'clientid'.
Each stream, and each clientid, has a sequence number and a replay
cache of one entry.
As RPC doesn't recongnise the existance of sessions, each request is
effectively it's own session, so the replay cache in my model is
relevant for every request.

> 
> I assume that you then don't rely on sequence id within operations.
> 

No.  seqid in individual operations becomes unnecessary.

> I don't really understand the barrierflag.  Why would you issue one
> of these with the barrier flag off, as opposed to just not having a 
> PUTCLIENT?

Having barrierflag not set is the analog of using a stateid but not
changing it, as is done by the WRITE operation (and READ, though
sequencing READ is less significant).

When you send a lot of write requests, you want them to be sequenced
properly between any preceding lock/open/whatever, and any following
unlock/close/etc.  However you don't need any sequencing within them
(though you might if you found a need to over-write a block without
any locking changes).
So PUTCLIENID with barrierflag==FALSE would normally only be used with
WRITE operations.

If you are happy to have lots of clientids per client -  enough to cover
the maximum number of in-flight write requests - then barrierflag
would not be needed.

> 
> >    Note that here we are using the power of compound to separate the
> >    sequencing from the actual operation, thus making sequencing both
> >    simple and uniform.
> 
> As a die-hard anti-COMPOUNDer, I note that if you put this information
> in the RPC request, you wouldn't need it in COMPOUND.  But given that
> we have COMPOUND, it is nice that you can add this feature by using
> it.  And it fits very well in the minor versioning model.

We do have very different perspectives, don't we!!  I think COMPOUND
is great.  It's like down-loading a tiny program into the server to be
run there.  I liked some of the ideas in Brent's original proposal that
were dropped (conditionals etc).  I bet you don't :-)

> 
> >    One interesting issue that comes from combining this with the
> >    earlier discussion of state is that:
> >    a/ the client needs to sequence all operations on a particular
> >       file
> >    b/ the client may not know which file a given OPEN request is
> >       affecting until it gets the filehandle in the reply.  This
> >       makes it hard for the client to sequence the OPEN with, e.g. a
> >       separate close.
> >
> >    To deal with this, the server must enforce sequencing of operations
> >    on a given file.  One approach is to track which files have
> >    signifcant operations in cached replies, and not to allow further
> >    significant operations on them.  Another approach is to require the
> >    operation before and after an OPEN use the same clientid as the
> >    OPEN.  One way or the other it is fairly easy to address.
> 
> Or if you apply this to the current v4 model, then you don't have to
> do that sequencing at all.

No, because opens are per-lockowner and each lockowner must fully
serialise their requests (which makes sense).
I admit that the extra machinery in NFSv4 (lockowners/stateids) make
this case a little easier to handle.

> 
> >
> >    Give then existance of PUTCLIENT, FORKCLIENT can just use the
> >    'current clientid' so we have:
> 
> >      FORKCLIENTID: (currentclientid) -> newclientid
> >      PUTCLIENTID: clientid seqnum barrierflag -> (currentclientid)
> >      DROPCLIENTID: clientid seqnum
> 
> >    DROPCLIENT must be sequenced by some OTHER clientid if there is
> >    one, and must give the seqnum that the dropped clientid is upto.
> 
> I like this approach.  It fits very well with COMPOUND and minor
> versioning.  Since this happen after a client-id is established,
> I think that still leaves us with more than one sequencing/replay
> mechanism (other than open-create-exclusive):
> 
>      setclient-id confirm.
> 
>      PUTCLIENT/replay cache.
> 
> I think I can live with that though.

I could probably reduce it to one, but I would only bother if creating
a completely new protocol. :-)

On the open-create-exclusive issue... I've toyed with the idea that
the create verifier could be implicit in the request instead of having
to be explicit.  e.g. it could be a hash of the 'id' in the
nfs_client_id4 structure.  This would be fairly unique among clients -
as unique as a locally generated createverf.

If this were used as a verifier, and if clients always included a
PUTCLIENTID in requests with CREATE, then mkdir, mknod etc could be
implemented as exclusive creates (as they should be) without any
explicit support in the protocol.

NeilBrown


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:43 AM Z CST