From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 11/08/02-01:16:58 PM Z
Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A0708E8@SILVER.nane.netapp.com> From: "Noveck, Dave" <Dave.Noveck@netapp.com> Subject: RE: OPEN_CONFIRM implementation issues Date: Fri, 8 Nov 2002 11:16:58 -0800 First, some philosophy. I tend to lean to a more restrictive implementation than Spencer. We have to obey the spec, but where it isn't clear, my tendency is to read it relatively strictly. I think that doing that helps clients find their bugs faster. This is also affected by the phase of protocol implementation one is in. When initial servers are being implemented, a strict interpretation helps establish more uniform client behavior. Later servers are probably best advised to lean to the liberal side since they will have to deal with clients that may have been devloped in a more permissive server environment. So to understand how I relate this to OPEN_CONFIRM, let's first consider a further case e), that we may even agree on. Suppose someone does a LOCK and then passes the stateid returned by LOCK to OPEN_CONFIRM. Now the spec doesn't explicitly say that you must not do this. However, the point of OPEN_CONFIRM is to confirm an OPEN, and the discussion of OPEN_CONFIRM is in those terms. Also the synopsis, lists the parameter as open_stateid. Given that, doing an OPEN_CONFIRM with the stateid from LOCK looks to me like using an inappropriate type or stateid for a given operation which I always treat as BAD_STATEID. For example this is what I do when you do a CLOSE on a lock stateid. My impression had been that the definition of BAD_STATEID included a clause to deal with such a case, but I don't see it in the current spec, but the only alternative is INVAL which seems really bogus. What do you return in CLOSE(lock_stateid) case? I guess you could treat it as a no-op and return OK, but that seems very troublesome as it is particuarly prone to hide client bugs. >From a practical point of view, the client who is doing this (i.e. OPEN_CONFIRM of a lock stateid) has a bug and probably would want to know about it, and getting BAD_STATEID tells him. Either he is pointlessly adding RPC's which is not good for him in high-latency environments (a V4 goal is better performance in such environments), or he meant to confirm the OPEN but is sending the wrong stateid. In either case, being liberal and treating the operation as a no-op, just makes it harder to find the problem. I consider cases c) and d) below in exactly the same way. The spec doesn't forbid them, but the entire discussion of OPEN_CONFIRM deals with confirming an open as the next operation for that owner sequence and indeed the spec discusses what happens if something other than a confirm is done (getting rid of the OPEN). So again, doing an OPEN_CONFIRM past the appropriate time seems like using a stateid which is not appropriate to the operation being performed (i.e. because it is not the one returned by OPEN). So I return BAD_STATEID in this case. Also, the same practical considerations apply as for e). The client has some sort of bug and probably wants to find out about it sooner rather than later. I think a) and b) are closer cases, but I would still follow the same line of reasoning. The spec says that OPEN_CONFIRM is "used to confirm the sequence id usage for the first time that a open_owner is used by a client." This is unfortunately ambiguous. There are three viewpoints from which to interpret first time: omniscient observer, client, and server. It is particularly unfortunate that the spec's wording seems to lean in the direction of the client-seen view, while the design of OPEN_CONFIRM is based on it being from the server's of view. That is, confirmation may be required when it appears to the server that this is the first use of an owner, either because it is in fact, or because the server has lost the knowledge of that owner, due to recylcing information about that owner. Given that the client sends the OPEN_CONFIRM, but that the necessity for it is based on server knowledge, the protocol provides the open-confirm-bit in the open response. So the design of OPEN_CONFIRM is based on the client using the open-confirm-bit to determine whether a confirm is to be done. Certainly, if the open-confirm-bit is set, the client must do one. However, I would also argue, that where a confirm is not required, the OPEN is precisely in the same state as in c) or d). In those cases, because the open had been confirmed, I consider OPEN-CONFIRM as inappropriate for that stateid and I think it is reasonable to treat OPEN where confirmation is not required in exactly the same way. The server is saying this is not the initial sequence for a new owner, making OPEN_CONFIRM an inappropriate operation since the spec discusses the use of OPEN_CONFIRM only in the context of such initial use of a lockowner and none of the discussion makes any sense in any other context. Again, in practical terms, treating this as a no-op is liable to hide bugs. If a client is sending confirm requests when the open-confirm-bit is not set, then it's pretty likely that he may not send them when the open-confirm-bit is set. While it is possible that he has coded something on the order of "if (open_confirm_bit || phase_of_the_moon == FULL) open_confirm()", the more likely cases are that he is always sending open_confirm after open (grossly inefficient) or even more likely, that he misinterpreted "first use" in the spec to mean the *client's* first use. If that's the case, then at some point in the future, when, after a period of inactivity, the owner is recycled, he isn't going to do an open-confirm, even when the server requires it, since from the client's point of view this is not a new owner. Since the logic of open-confirm is that the server's view prevails, enforcing it from the beginning just seems better. Of course, writing the server, I would think that :-) I think this is an area where the spec is not very clear and opinions will differ. I'm not sure that it so bad that servers differ as long as all of the cases that they differ on are things that clients don't really want to do. -----Original Message----- From: Eric Kustarz [mailto:Eric.Kustarz@eng.sun.com] Sent: Thursday, November 07, 2002 5:53 PM To: nfsv4-wg@sunroof.eng.sun.com Subject: OPEN_CONFIRM implementation issues There seems to be an unresolved issue with respect to OPEN_CONFIRM... what if the server doesn't ask for an OPEN_CONFIRM but the client sends one anyways? what should the server do? Dave Noveck identified four possible cases: a) The file was just opened and you were not told to open-confirm. b) The file was just opened and you were told not to open-confirm and it is wrong to open-confirm (create or delegation present). c) The file was opened and already the open was confirmed but no other IO was done. d) The file was open and confirmed if necessary and IO was done. Here's how Dave is handling those cases: " So I'm saying BAD_STATEID on all of a-d, with the exception that if the stateid takes you to the owner and the seqid indicates that this is a DUP then you would get OK. " The official SUN stance is to apply the Shepler principle of the "server should be flexible in what is accepts". So we are currently having the server return NFS4_OK for a) and c). We would like to see why in b) and d) an OPEN_CONFIRM is considered "wrong" to occur? Dave ? Dave also had this side note: " One final point. the spec lists OLD_STATEID in the error list, although I'm hard-pressed to figure out when you might actually return it. If you are returning OK to c-d, then this opens the door to OLD_STATEID as well, although returning OK to c-d seems very wierd to me. " eric
This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:28 AM Z CST