RE: Stateid and seqid

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: 01/31/01-10:30:08 PM Z


From: Neil Brown <neilb@cse.unsw.edu.au>
Date: Thu, 1 Feb 2001 15:30:08 +1100 (EST)
Message-ID: <14968.58960.594822.614268@notabene.cse.unsw.edu.au>
Subject: RE: Stateid and seqid

On Wednesday January 31, Dave.Noveck@netapp.com wrote:
> >    If the operation is successful and state changes, then the stateid
> >      changes.
> 
> According to David Robinson, the server may in some cases return the
> same stateid (ie. not change it).

I can see, from a operational perspective, that you could argue this
 - all operations that effect state maintenance also have a seqid to
   ensure sequencing and
 - the operations that only have a stateid (read/write) will not be
   adversely affected by certain state changes - e.g. gaining a lock.

however I think that if you try to understand a protocol like NFSv4
from a purely operation perspective, you will get bad confused very
quickly (Well, maye you won't but I suspect that many would, myself
included).   Complexity needs to managed by abstraction (that is what
computer science is all about) and so we need to understand and reason
about the protocol at an abstract level.

When you are reasoning like this (abstractly), having a "stateid" that
"uniquely defines the locking state granted by the server for a
specific lock owner for a specific file" and then not changing that
stateid when the locking state changes is a recipe for disaster.


> 
> >    If the operation fails and the state doesn't change, then the
> >      seqid changes.
> 
> The current approach is that seqid does not change if the operation
> fails.  That's what people are implementing to, but there has been
> disagreement about whether the spec says that (e.g. Mike Eisler believes 
> it does)
> 

ButButButButButButButButButButButButButBut
Numbers in [brackets] are seqids.  As our story opens, the "last
sequence number (L) received" is [L].

 Client say [L+1] "lock this bit of that file".
 Server says "No, there is a conflicting lock" but reply is delayed in
      transit.  As this is an error return (NFS4ERR_DENIED), the
      server doesn't touch the seqid (you seem to say), it leave it at
      [L]. 
 Client is impatient and again says [L+1] "lock this bit of that file".
 Client gets first reply and says "oh well, such it life".
 SomeotherClient unlocks "this bit of that file".
 Server gets resend.  The rpc reply cache has been busy and it doesn't
   catch this one. The seqid looks fine ([L+1] just like the last time), and
   the server notices that there is no conflicting lock now, so it
   returns with success. It updates the "last sequence number
   received" to [L+1] and stores "the response that was returned.
 Client doesn't notice this reply because it doesn't really care any
   more.  Maybe it was lost in transit anyway - it happens.
 Client tries to [L+1] "lock that other bit of that file".
   Now this has the same seqid as the last successful lock request
   for this lockowner, so "the stored response is returned" -
   success.

Net result: client thinks that it locked "that other bit" and server
thinks that it has "this bit" locked.  Not a good situation.

Am I missing something?

(all quotes from rfc3010 section 8.1.5).

> 
> I'm certainly curious to see if we could have something simpler (but
> not too simple).  Even if people don't think it enough better to
> change v4 at this point (and that is going to be a high hurdle),
> it would be valuable as input to the next minor version.
> 
I'm working on it, but it wont be today.

NeilBrown


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:48:27 AM Z CST