Re: locking errors

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Attachment view

From: David Robinson (robinson@jetsun.eng.sun.com)
Date: 09/28/98-10:20:29 PM Z

Next message: Brent Callaghan: "Last call for NFSv4 Requirements Document"
Previous message: Mike Eisler: "Re: Printing and UID/GID mapping"
Maybe in reply to: Brent Callaghan: "Re: locking errors"
Next in thread: Mike Eisler: "Re: locking errors"
Reply: Mike Eisler: "Re: locking errors"
Reply: Damon Atkins: "Re: locking errors"

Date: Mon, 28 Sep 1998 20:20:29 -0700 (PDT)
From: David Robinson <robinson@jetsun.eng.sun.com>
Message-Id: <199809290320.UAA00558@jetsun.eng.sun.com>
Subject: Re: locking errors

> After reading Carl's comments this shows how much we need to drop UDP.
> 
> As part of the document I am writing it talks about using UDP only
> notifications.
> 
> Because TCP is used if a client drops the connection it has a certain about
> of time to reconnect before it's locks are drop. e.g.  15 minutes.
> 
> If the server reboots the clients have 15 minutes to reconnect before locks
> are dropped.
> 
> A client then reconnects authentication and gets its locks back, if within
> the time out.

Why does this imply that we need to drop UDP?  What you have described
is exactly what has been proposed with a 15 minute lease.  The only
difference is that you get an active notification sometimes if the TCP
connection is dropped.  But if I remember my TCP protocol correctly,
if one side drops the connection the other side may not detect it unless
it tries to read or write to the socket.  This can be fixed by establishing
keepalives or a higher level ping.  But this is just another form of
lease renewal, all of which can be done using UDP.

Another factor is the maximum number of connections. TCP only support 64K
connections, and most NFS servers allow even less.  If a connection
must be held open for all clients things will not scale well.  Most
current TCP connections allow the server to unilaterally disconnect
an idle client and the client reconnects when it needs to.  Your proposal
will cause a lot of connection flapping for servers with limited
numbers of available sockets.

> The server records all locks and associated file handles, authentication on
> permanent media i.e.
> partition or FS    /export/home/locks.nfs for /export/home
> 
> If HA is required then two servers would share the same Hard Disk, if a
> server
> is detected as being down then the other server would load
> /export/home/locks.nfs
> and take over the other servers IP address.

This is a very nice optimization that will speed up recovery in all
proposals I have seen so far (including the existing NLM).  But I don't
think it is required from what you have described so far.

> The Clients would detect the server going down because the TCP connections
> would be dropped and they would just reconnect.

If the client is not actively using the connection it will not detect
someone pulling the server's plug out from wall.

> *There are no lease on individual locks.
> *Locks are only dropped if the client has a problem.

How do you define "the client has a problem", the server cannot detect
the difference between a network partition and the client being
unplugged from the wall.  So how does it know there is a problem without
some sort of active ping?  Thus you need some sort of lease renewal,
either initiated by the client or the server.

In my alternative proposal (which I need to cleanup from all the feedback
I got), there are leases on individual locks, but any I/O from
client to server is implicitly a lease renewal.  So you don't have
the problem of one application losing a lock while another was
able to keep it.

> This is a very clean way of doing locks.

You still have to solve the pull the plug versus network partition case.

	-David

Next message: Brent Callaghan: "Last call for NFSv4 Requirements Document"
Previous message: Mike Eisler: "Re: Printing and UID/GID mapping"
Maybe in reply to: Brent Callaghan: "Re: locking errors"
Next in thread: Mike Eisler: "Re: locking errors"
Reply: Mike Eisler: "Re: locking errors"
Reply: Damon Atkins: "Re: locking errors"

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:46:24 AM Z CST