From: Robert Thurlow (Robert.Thurlow@eng.sun.com)
Date: 02/27/02-12:00:09 AM Z
Date: Tue, 26 Feb 2002 23:00:09 -0700 (MST) From: Robert Thurlow <Robert.Thurlow@eng.sun.com> Subject: Replication/Migration conference call minutes, 2/26 Message-ID: <Roam.SIMC.2.0.6.1014789609.17512.thurlow@jurassic.eng> Conference call minutes, Feb 26 2002, as reported by Rob Thurlow The following people met today in a conference call to get the ball rolling on replication and migration: "Dave Noveck" <Dave.Noveck@netapp.com> "Spencer Shepler" <Spencer.Shepler@sun.com> "William A.(Andy) Adamson" <andros@umich.edu> and the CITI team "Rob Thurlow" <robert.thurlow@sun.com> The first thing we talked about was Rob's draft requirements bullets, as follows: Must be transparent to applications on client Exceptions? Must be as secure as NFSv4 Must be efficient Minimal lock-out time Reasonable bandwidth requirements Propagate differences Restartable Must be scalable to large and small filesystems We spent the most time talking about the first requirement, especially on what "transparent" means. Rob thought the toughest edge case was what happens to an application which has a file open when a migration is complete or a different replica has to be chosen. The ideal thing is that the application notices nothing except a slight delay in access to the file. This view makes the data the critical thing; most metadata need not be quite so correct. We talked about the issues that made this difficult; we agreed that renames on open files was a tough (though not insoluble) issue. Dave talked about a way the server could choose not to expire filehandles for open files and continue to provide service until the files were closed. It was agreed that this could be good to be able to do, even if it meant waiting for a minor revision of the NFSv4 protocol. We talked about what the unit of migration is or should be. The current NFSv4 spec talks about the replication or migration of filesystems with a single fsid, which most implementors accept from the underlying local filesystem. This could be an issue; there is a need for migration of portions of large filesystems, e.g. the 1 Tb LUN your sysadmin set up might contain your 500 Mb home directory, which you would like to manage as a unit. Dave mentioned that the server could choose to give out a modified fsid to permit this kind of carving up of large filesystems, and that this would not require a protocol change. Rob mentioned that a prior project had done work on the problem of how to lock subtrees of larger filesystems. It was agreed that we needed to look at the current spec to remind ourselves how the spec defined and referred to fsids. Andy asked if, when using read-only replication, the server may issue read delegations? Probably - this would mean a client could cache aggressively until a new replica update is received. We spent some time talking about the purpose of the fs_locations attribute; it is likely that the client will do an initial mount and based on information from a name service, and then fetch the fs_locations attribute; this means possible conflicting information from the two sources, conflicting information from the servers hosting the different replicas, the issue of keeping the name service updated, and other horrors. Dave mentioned that he had been thinking about having clients point to a "referral server" (my term, I think, not Dave's - R) at e.g. nfsv4:/, and having all resources in the domain be found by getting redirected to the correct server during the lookup process. Rob mentioned that we had had discussions within Sun about something similar, and that he liked the idea as it could free us from dependency on a namespace. We talked about the granularity of the updates - when a file had been touched, would a replica update transfer the entire file or just the byte ranges which had been changed? It was clear to all that it was important for the protocol to be able to express the transfer of just modified portions of a file, as e.g. a large database table would be far more efficient to update via changed byte ranges. We spent some time talking about filesystem versioning. Andy thought that some kind of version attribute was important to be able to at least debug problems that occur when replicas don't get updated properly, or when an application running from an NFS filesystem dies because its pages are no longer valid due to a replica update. Rob wondered if a per-filesystem modification time might be more straightforward and better. We discussed the other requirements briefly, and agreed that they were reasonable and much less controversial than "transparency". Rob said he'd wondered if "minimal lock-out time" wasn't more an end-user requirement of a product, decoupled from the protocol we're interested in, but Dave thought it was not out of place as a corollary to "propagate differences". We agreed that once Rob had minutes posted, we would have some further discussions about the transparency requirement to make it work. We discussed single-writer/multiple-reader replication; it would be nice to be able for the client to be able to determine which replica in fs_locations was the writable master, and to be able to force use of that whenever a file was opened for writing, as this would give us the closest match to DCE/DFS. People felt that this was probably not a current issue, but that if a way could be found to support it in the protocol, that future work might take advantage of this. The biggest issue seemed to be that the client's caching strategies might make implementing this difficult. We closed after a brief discussion of what future issues we should discuss and how we should keep the working group informed of what we talk about. It was decided that we would always post minutes of these conference calls to the working group alias to make sure our work remains visible. Rob T
This archive was generated by hypermail 2.1.2 : 03/04/05-01:49:34 AM Z CST