From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 04/14/00-02:48:41 PM Z
Message-ID: <641BC3DDCEB3D3118C3700902745938E8AB0AF@black.eng.netapp.com> From: "Noveck, Dave" <Dave.Noveck@netapp.com> Subject: RE: comments on draft-ietf-nfsv4-06.txt Date: Fri, 14 Apr 2000 12:48:41 -0700 Vern Paxson wrote: > > > The server implementor needs to be careful in developing a migration > > solution. The server must consider all of the state information > > clients may have outstanding at the server. This includes but is not > > limited to locking/share state, delegation state, and asynchronous > > file writes which are represented by WRITE and COMMIT verifiers. The > > Are there any others known besides the "includes but is not limited to" list? > It's the "not limited to" that threw me ... None that I can think of. I was just being over-careful. > > > > > If the server determines that the client holds no associated state > > for its clientid and no activity from that client has been received > > some long period of time, the server may choose to release the > > Need advice on the suggested "long period of time". Actually, the server may make this choice immediately (with no long period of inactivity), but he would be unwise to do so. I would strike "and no activity from that client had been recieved for some long period of time,". Then I would add to the end of that paragraph, "Typically a server would not release a clientid unless there had been no activity from that client for many minutes." > > > > If a request with a previous sequence number (r < L) is received, it > > is rejected with the return of error NFS4ERR_BAD_SEQID. Given a > > properly-functioning client, the response to (r) must have been > > received before the last request (L) was sent. If a duplicate of > > Can't there in principle be concurrent RPCs, which means that a response > might not have been received before the last request was sent? I also found this hard to understand until Dave Robinson clued me in. I think the restriction that the client should only have one request with a sequence number outstanding for each owner should be explained up front. I think that would make subsequent stuff easier to understand. > > > Clients should be prepared for the return of NFS4ERR_GRACE errors for > > non-reclaim lock and I/O requests. In this case the client should > > employ a backoff and retry mechanism for the request. Timeout > > Need more specifics explaining how the backoff should work. (Should it > be exponential?) Actually, I think we should just strike "backoff and". and replace the sentence following by "A delay (on the order of several seconds) between retires should be used to avoid overwhelming the server." > > > by the client. As a result of revocation, the client will receive an > > error of NFS4ERR_EXPIRED and the error is received within the lease > > period for the lock. In this instance the client may assume that > > Seems there's a race here - "within the least period of the lock" isn't > tightly defined. This is worth commenting on even if it really shouldn't > come up in practice (in which case, that's part of the comment). I think this would be clearer if the second and third cases in this discussion were interchanged. In what is now the third case, the client determines that the lease period may have expired. If in fact it did not expire, then normally he will just revalidate all of locks successfully. In the case in which the server revoked a lock without lease expiration, he will be unsure about whether the lease was expired and he was unable to revalidate it before a conflicting lock was given to someone else, or whether it was revoked without lease expiration. It doesn't matter to him, he only cares that he lost his lock. The point here is that even when the client determines that lock expiration could not possibly have occurred (taking into account all worst-case propagation delays), he can still get EXPIRED and in that case can infer that his lock was lost due to revocation. > > > > with these delegations will need to wait. This behavior is > > consistent with the normal recall process may take significant time > > because of the client's need to flush state to the server. This > > "This behavior" sentence is quite hard to parse. How about replacing it with: "Because the normal recall process may require significant time for the client to flush changed state to the server, other clients need be prepared for delays that occur because of a conflicting delegation, in any case." > > > achieve its purpose. The other aspect to flushing the data > > before close is that the data must be committed to stable > > storage before the CLOSE operation is requested by the client. > > It's unclear whether this means committed to stable storage *locally at > the client* or *on the server*. Presumably, it's the latter, since the > client might not have any stable storage. This occurs again later: Yes, on the server. > > > The data that is written to the server as a pre-requisite to the > > unlocking of a region must be written to stable storage. The client Also, on the server. > > > o For a write open delegation, if, at the time of recall, if the > > file is not open for write, all modified data for the file must > > be flushed to the server. If the delegation had not existed, > > "if ... if" in the first sentence scans awkwardly. I think the second "if" should be removed. > > > ... If the delegation had not existed, > > the client would have done this data flush before the CLOSE > > operation. > > This is at odds with the last bullet: > > > o Any modified data for the file needs to be flushed to the the > > server. > > which sounds like the flush is done anyway. Good point. What the client would have done before the CLOSE is flush the data to server and force it to stable storage (it must be committed), as was discussed a bit above. In the second case, you can just do the write (but don't have to committ) since you have a file open for write, you have a stateid to redo asynchronous writes, at least until you do a CLOSE and then the rule discussed above applies. > > Also, "the the" -> "the" > > > In the case of modified data existing in the client's cache, that > > data must be removed from the client without it being written to the > > server. > > Explain why this is necessary. Since your lock was revoked, a conflicting lock may have been given to another client. If you were allowed to write the data based on your old locks, the assuptions that other clients validly make about the data would be violated. This applies even if another client no longer holds a conflicting lock or you use the special stateids to write. Once, a lokc has been revoked the modified data that you put into your cache when you had it should not be written to the server. > > > > > > This operation should also be used by clients which do have > > delegation information on stable storage after doing all of > > delegation recovery that is needed. Using DELEGPURGE will prevent > > I found this sentence confusing. How about this paragraph instead This operation should be used by a clients which do record delegation information on stable storage on the client. In this case, DELEGPURGE should be issued immediately after doing delegation recovery on all delegations know to the client. Doing so will notify the server that no additional delegations for the client will be recovered allowing it to free resources, and avoid delaying other clients who make request that conflict with the unrecovered delegations. The set of delgations known to the server and the client may be different since a client may fail after making a request that resulted in delegation but before it receieved the results and committed them to its own stable storage. > > > 15.2. Procedure 1: CB_COMPOUND - Compound Operations > > > > ... > > ERRORS > > > > All errors defined in the protocol > > (actually, it can't return them all, or even close ...) True enough. We can substitute the union of the errors defined for each of the callback ops. > > > The client returns attrbits and the associated attribute values > > only for attributes that it may change (change, time_modify, > > object_size). It may further limit the response to attributes that > > it has in fact changed during the scope of the delegation. > > Why would it bother limiting the response to just those that have > changed? Surely this is a very minor savings ... ? You're right. I would strike the last sentence.
This archive was generated by hypermail 2.1.2 : 03/04/05-01:48:11 AM Z CST