RE: comments on draft-ietf-nfsv4-06.txt

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 04/14/00-02:48:41 PM Z


Message-ID: <641BC3DDCEB3D3118C3700902745938E8AB0AF@black.eng.netapp.com>
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
Subject: RE: comments on draft-ietf-nfsv4-06.txt
Date: Fri, 14 Apr 2000 12:48:41 -0700

Vern Paxson wrote:
> 
> >    The server implementor needs to be careful in developing a migration
> >    solution.  The server must consider all of the state information
> >    clients may have outstanding at the server.  This includes but is not
> >    limited to locking/share state, delegation state, and asynchronous
> >    file writes which are represented by WRITE and COMMIT verifiers.  The
> 
> Are there any others known besides the "includes but is not limited to"
list?
> It's the "not limited to" that threw me ...

None that I can think of.  I was just being over-careful.

> 
>   
> 
> >    If the server determines that the client holds no associated state
> >    for its clientid and no activity from that client has been received
> >    some long period of time, the server may choose to release the
> 
> Need advice on the suggested "long period of time".

Actually, the server may make this choice immediately (with no long 
period of inactivity), but he would be unwise to do so.

I would strike "and no activity from that client had been recieved for
some long period of time,".

Then I would add to the end of that paragraph, "Typically a server would
not release a clientid unless there had been no activity from that 
client for many minutes."

>  
> 
> >    If a request with a previous sequence number (r < L) is received, it
> >    is rejected with the return of error NFS4ERR_BAD_SEQID.  Given a
> >    properly-functioning client, the response to (r) must have been
> >    received before the last request (L) was sent.  If a duplicate of
> 
> Can't there in principle be concurrent RPCs, which means that a response
> might not have been received before the last request was sent?

I also found this hard to understand until Dave Robinson clued me in.
I think the restriction that the client should only have one request
with a sequence number outstanding for each owner should be explained
up front.  I think that would make subsequent stuff easier to understand.

>   
> >    Clients should be prepared for the return of NFS4ERR_GRACE errors for
> >    non-reclaim lock and I/O requests.  In this case the client should
> >    employ a backoff and retry mechanism for the request.  Timeout
> 
> Need more specifics explaining how the backoff should work.  (Should it
> be exponential?)

Actually, I think we should just strike "backoff and".  and replace the
sentence following by "A delay (on the order of several seconds) between
retires should be used to avoid overwhelming the server."

> 
> >    by the client.  As a result of revocation, the client will receive an
> >    error of NFS4ERR_EXPIRED and the error is received within the lease
> >    period for the lock.  In this instance the client may assume that
> 
> Seems there's a race here - "within the least period of the lock" isn't
> tightly defined.  This is worth commenting on even if it really shouldn't
> come up in practice (in which case, that's part of the comment).

I think this would be clearer if the second and third cases in this
discussion were interchanged.  In what is now the third case, the client
determines that the lease period may have expired.  If in fact it did
not expire, then normally he will just revalidate all of locks
successfully.  In the case in which the server revoked a lock without
lease expiration, he will be unsure about whether the lease was
expired and he was unable to revalidate it before a conflicting lock
was given to someone else, or whether it was revoked without 
lease expiration.  It doesn't matter to him, he only cares that
he lost his lock.

The point here is that even when the client determines that lock
expiration could not possibly have occurred (taking into account
all worst-case propagation delays), he can still get EXPIRED and
in that case can infer that his lock was lost due to revocation.

> 
> 
> >    with these delegations will need to wait.  This behavior is
> >    consistent with the normal recall process may take significant time
> >    because of the client's need to flush state to the server.  This
> 
> "This behavior" sentence is quite hard to parse.

How about replacing it with:  "Because the normal recall process may
require significant time for the client to flush changed state to
the server, other clients need be prepared for delays that occur
because of a conflicting delegation, in any case."

> 
> >         achieve its purpose.  The other aspect to flushing the data
> >         before close is that the data must be committed to stable
> >         storage before the CLOSE operation is requested by the client.
> 
> It's unclear whether this means committed to stable storage *locally at
> the client* or *on the server*.  Presumably, it's the latter, since the
> client might not have any stable storage.  This occurs again later:

Yes, on the server.

> 
> >    The data that is written to the server as a pre-requisite to the
> >    unlocking of a region must be written to stable storage.  The client

Also, on the server.

> 
> >    o    For a write open delegation, if, at the time of recall, if the
> >         file is not open for write, all modified data for the file must
> >         be flushed to the server.  If the delegation had not existed,
> 
> "if ... if" in the first sentence scans awkwardly.

I think the second "if" should be removed.
 
> 
> >         ... If the delegation had not existed,
> >         the client would have done this data flush before the CLOSE
> >         operation.
> 
> This is at odds with the last bullet:
> 
> >    o    Any modified data for the file needs to be flushed to the the
> >         server.
> 
> which sounds like the flush is done anyway.

Good point.  What the client would have done before the CLOSE is flush
the data to server and force it to stable storage (it must be committed),
as was discussed a bit above.

In the second case, you can just do the write (but don't have to committ)
since you have a file open for write, you have a stateid to redo
asynchronous writes, at least until you do a CLOSE and then the
rule discussed above applies.

> 
> Also, "the the" -> "the"
> 
> >    In the case of modified data existing in the client's cache, that
> >    data must be removed from the client without it being written to the
> >    server.
> 
> Explain why this is necessary.

Since your lock was revoked, a conflicting lock may have been given to
another client.  If you were allowed to write the data based on your
old locks, the assuptions that other clients validly make about the 
data would be violated.  This applies even if another client no longer
holds a conflicting lock or you use the special stateids to write.
Once, a lokc has been revoked the modified data that you put into
your cache when you had it should not be written to the server.

> 
>   
>  
> 
> >      This operation should also be used by clients which do have
> >      delegation information on stable storage after doing all of
> >      delegation recovery that is needed.  Using DELEGPURGE will prevent
> 
> I found this sentence confusing.

How about this paragraph instead

     This operation should be used by a clients which do record delegation
     information on stable storage on the client.  In this case, DELEGPURGE
     should be issued immediately after doing delegation recovery on all
     delegations know to the client.  Doing so will notify the server that
     no additional delegations for the client will be recovered allowing 
     it to free resources, and avoid delaying other clients who make
     request that conflict with the unrecovered delegations.  The set of
     delgations known to the server and the client may be different since
     a client may fail after making a request that resulted in delegation
     but before it receieved the results and committed them to its own
     stable storage.
     
>  
> > 15.2.  Procedure 1: CB_COMPOUND - Compound Operations
> > 
> > ...
> >    ERRORS
> > 
> >      All errors defined in the protocol
> 
> (actually, it can't return them all, or even close ...)

True enough.  We can substitute the union of the errors defined for
each of the callback ops.

> 
> >      The client returns attrbits and the associated attribute values
> >      only for attributes that it may change (change, time_modify,
> >      object_size).  It may further limit the response to attributes that
> >      it has in fact changed during the scope of the delegation.
> 
> Why would it bother limiting the response to just those that have
> changed?  Surely this is a very minor savings ... ?

You're right.  I would strike the last sentence.

 


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:48:11 AM Z CST