From: Yoder, Alan (agy@netapp.com)
Date: 01/27/03-11:31:51 AM Z
Message-ID: <6440EA1A6AA1D5118C6900902745938E07470B57@black.eng.netapp.com> From: "Yoder, Alan" <agy@netapp.com> Subject: RE: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft] Date: Mon, 27 Jan 2003 09:31:51 -0800 I'm confused. Is Dan talking about filenames, or about the actual contents of files? Alan =============================================================== Alan G. Yoder, Ph.D. agy@netapp.com Technical Staff Network Appliance, Inc. 408-822-6919 =============================================================== > -----Original Message----- > From: Nicolas Williams [mailto:Nicolas.Williams@sun.com] > Sent: Monday, January 27, 2003 9:10 AM > To: Dan Oscarsson > Cc: nfsv4-wg@sunroof.eng.sun.com > Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 > rfc3010bis-05 draft] > > > On Mon, Jan 27, 2003 at 08:55:37AM +0100, Dan Oscarsson wrote: > > > > >I think NFSv4 must require at the very least that > filenames be stored > > >normalized to some form (we should probably specify if it > can be a K > > >form or not, but D vs. C is not so important) and let clients and > > >servers deal with that. This is pretty much what the draft says or > > >implies. > > > > The Open Source Unix and Linux community have for > internationalisation > > selected UCS normalised using form C and encoded using UTF-8 as > > the standard to be used on Unix and Linux. > > The same form and encoding have been selected by W3C for the webb. > > > > So there is a lot of software that does or will handle form > C encoded > > text. From what I have seen there will be very little software > > that will handle or use normalisation form D, KD or KC. > > > > So NFSv4 should use the same as most do (or will do): > normalising form C. > > Normalization form C ALWAYS starts with decomposition, that is, > normalization form D. > > Any software which can perform normalization form C can also perform > normalization form D. > > > But we cannot require it in storage, only on the wire. > > In storage we must leave the fileserver free to do as it pleases, but > for one restriction: it must use a reasonably canonical form > in storage, > otherwise equal filenames with unequal encodings could be allowed. > > I believe that this is what the draft specifies. An on-the-wire > normalization form specification would be an optimization, but is not > absolutely necessary. > > > It is on the wire, that is between systems, that it must be > standardised > > to one simple format. Systems can use any format they want. > > I remain unconvinced. > > > A system which uses normalising form C as its local format > for staorage > > will have a simpler implementation than others, and that will help > > push system vendors to move to the most common format used. > > UCS normalising form C is compact and do not destroy any > information, > > so it is best. The K forms destroy data and the D form > takes more space and > > breaks the semantic concept of letter on some letters. > > Normalization form C is limited to composed characters defined in > Unicode 3.0 (we're past 3.0); as such it really means "composed for > these characters, decomposed for everything else" - so why not just > decomposed then? > > I don't thing encoding length is that big a deal, but cycles spent in > normalization, space dedicated to normalization data > structures, *that* > is a big deal. > > This is why I'm for form D (on the wire as an optimization). > > > Yes, it may result in additional code in servers, but many > system can > > create very efficient code to convert between legacy character set > > and UCS normalising form C. So I think it will not be that > expensive. > > I have no figures close at hand, but I don't think that Unicode > normalization data structures and code are small (remember, we're > talking about kernel constraints here, complete with small stacks). > > Nico > -- >
This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:50 AM Z CST