From: Neil Brown (neilb@cse.unsw.edu.au)
Date: 01/31/01-10:30:08 PM Z
From: Neil Brown <neilb@cse.unsw.edu.au> Date: Thu, 1 Feb 2001 15:30:08 +1100 (EST) Message-ID: <14968.58960.594822.614268@notabene.cse.unsw.edu.au> Subject: RE: Stateid and seqid On Wednesday January 31, Dave.Noveck@netapp.com wrote: > > If the operation is successful and state changes, then the stateid > > changes. > > According to David Robinson, the server may in some cases return the > same stateid (ie. not change it). I can see, from a operational perspective, that you could argue this - all operations that effect state maintenance also have a seqid to ensure sequencing and - the operations that only have a stateid (read/write) will not be adversely affected by certain state changes - e.g. gaining a lock. however I think that if you try to understand a protocol like NFSv4 from a purely operation perspective, you will get bad confused very quickly (Well, maye you won't but I suspect that many would, myself included). Complexity needs to managed by abstraction (that is what computer science is all about) and so we need to understand and reason about the protocol at an abstract level. When you are reasoning like this (abstractly), having a "stateid" that "uniquely defines the locking state granted by the server for a specific lock owner for a specific file" and then not changing that stateid when the locking state changes is a recipe for disaster. > > > If the operation fails and the state doesn't change, then the > > seqid changes. > > The current approach is that seqid does not change if the operation > fails. That's what people are implementing to, but there has been > disagreement about whether the spec says that (e.g. Mike Eisler believes > it does) > ButButButButButButButButButButButButButBut Numbers in [brackets] are seqids. As our story opens, the "last sequence number (L) received" is [L]. Client say [L+1] "lock this bit of that file". Server says "No, there is a conflicting lock" but reply is delayed in transit. As this is an error return (NFS4ERR_DENIED), the server doesn't touch the seqid (you seem to say), it leave it at [L]. Client is impatient and again says [L+1] "lock this bit of that file". Client gets first reply and says "oh well, such it life". SomeotherClient unlocks "this bit of that file". Server gets resend. The rpc reply cache has been busy and it doesn't catch this one. The seqid looks fine ([L+1] just like the last time), and the server notices that there is no conflicting lock now, so it returns with success. It updates the "last sequence number received" to [L+1] and stores "the response that was returned. Client doesn't notice this reply because it doesn't really care any more. Maybe it was lost in transit anyway - it happens. Client tries to [L+1] "lock that other bit of that file". Now this has the same seqid as the last successful lock request for this lockowner, so "the stored response is returned" - success. Net result: client thinks that it locked "that other bit" and server thinks that it has "this bit" locked. Not a good situation. Am I missing something? (all quotes from rfc3010 section 8.1.5). > > I'm certainly curious to see if we could have something simpler (but > not too simple). Even if people don't think it enough better to > change v4 at this point (and that is going to be a high hurdle), > it would be valuable as input to the next minor version. > I'm working on it, but it wont be today. NeilBrown
This archive was generated by hypermail 2.1.2 : 03/04/05-01:48:27 AM Z CST