From: David Robinson (robinson@jetsun.eng.sun.com)
Date: 09/28/98-10:20:29 PM Z
Date: Mon, 28 Sep 1998 20:20:29 -0700 (PDT) From: David Robinson <robinson@jetsun.eng.sun.com> Message-Id: <199809290320.UAA00558@jetsun.eng.sun.com> Subject: Re: locking errors > After reading Carl's comments this shows how much we need to drop UDP. > > As part of the document I am writing it talks about using UDP only > notifications. > > Because TCP is used if a client drops the connection it has a certain about > of time to reconnect before it's locks are drop. e.g. 15 minutes. > > If the server reboots the clients have 15 minutes to reconnect before locks > are dropped. > > A client then reconnects authentication and gets its locks back, if within > the time out. Why does this imply that we need to drop UDP? What you have described is exactly what has been proposed with a 15 minute lease. The only difference is that you get an active notification sometimes if the TCP connection is dropped. But if I remember my TCP protocol correctly, if one side drops the connection the other side may not detect it unless it tries to read or write to the socket. This can be fixed by establishing keepalives or a higher level ping. But this is just another form of lease renewal, all of which can be done using UDP. Another factor is the maximum number of connections. TCP only support 64K connections, and most NFS servers allow even less. If a connection must be held open for all clients things will not scale well. Most current TCP connections allow the server to unilaterally disconnect an idle client and the client reconnects when it needs to. Your proposal will cause a lot of connection flapping for servers with limited numbers of available sockets. > The server records all locks and associated file handles, authentication on > permanent media i.e. > partition or FS /export/home/locks.nfs for /export/home > > If HA is required then two servers would share the same Hard Disk, if a > server > is detected as being down then the other server would load > /export/home/locks.nfs > and take over the other servers IP address. This is a very nice optimization that will speed up recovery in all proposals I have seen so far (including the existing NLM). But I don't think it is required from what you have described so far. > The Clients would detect the server going down because the TCP connections > would be dropped and they would just reconnect. If the client is not actively using the connection it will not detect someone pulling the server's plug out from wall. > *There are no lease on individual locks. > *Locks are only dropped if the client has a problem. How do you define "the client has a problem", the server cannot detect the difference between a network partition and the client being unplugged from the wall. So how does it know there is a problem without some sort of active ping? Thus you need some sort of lease renewal, either initiated by the client or the server. In my alternative proposal (which I need to cleanup from all the feedback I got), there are leases on individual locks, but any I/O from client to server is implicitly a lease renewal. So you don't have the problem of one application losing a lock while another was able to keep it. > This is a very clean way of doing locks. You still have to solve the pull the plug versus network partition case. -David
This archive was generated by hypermail 2.1.2 : 03/04/05-01:46:24 AM Z CST