RE: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft]

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Yoder, Alan (agy@netapp.com)
Date: 01/27/03-11:31:51 AM Z


Message-ID: <6440EA1A6AA1D5118C6900902745938E07470B57@black.eng.netapp.com>
From: "Yoder, Alan" <agy@netapp.com>
Subject: RE: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft]
Date: Mon, 27 Jan 2003 09:31:51 -0800

I'm confused.  Is Dan talking about filenames,
or about the actual contents of files?

Alan

===============================================================
Alan G. Yoder, Ph.D.                             agy@netapp.com
Technical Staff                          
Network Appliance, Inc.                            408-822-6919
=============================================================== 

> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams@sun.com]
> Sent: Monday, January 27, 2003 9:10 AM
> To: Dan Oscarsson
> Cc: nfsv4-wg@sunroof.eng.sun.com
> Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4
> rfc3010bis-05 draft]
> 
> 
> On Mon, Jan 27, 2003 at 08:55:37AM +0100, Dan Oscarsson wrote:
> > 
> > >I think NFSv4 must require at the very least that 
> filenames be stored
> > >normalized to some form (we should probably specify if it 
> can be a K
> > >form or not, but D vs. C is not so important) and let clients and
> > >servers deal with that.  This is pretty much what the draft says or
> > >implies.
> > 
> > The Open Source Unix and Linux community have for 
> internationalisation
> > selected UCS normalised using form C and encoded using UTF-8 as
> > the standard to be used on Unix and Linux.
> > The same form and encoding have been selected by W3C for the webb.
> > 
> > So there is a lot of software that does or will handle form 
> C encoded
> > text. From what I have seen there will be very little software
> > that will handle or use normalisation form D, KD or KC.
> > 
> > So NFSv4 should use the same as most do (or will do):  
> normalising form C.
> 
> Normalization form C ALWAYS starts with decomposition, that is,
> normalization form D.
> 
> Any software which can perform normalization form C can also perform
> normalization form D.
> 
> > But we cannot require it in storage, only on the wire.
> 
> In storage we must leave the fileserver free to do as it pleases, but
> for one restriction: it must use a reasonably canonical form 
> in storage,
> otherwise equal filenames with unequal encodings could be allowed.
> 
> I believe that this is what the draft specifies.  An on-the-wire
> normalization form specification would be an optimization, but is not
> absolutely necessary.
> 
> > It is on the wire, that is between systems, that it must be 
> standardised
> > to one simple format. Systems can use any format they want.
> 
> I remain unconvinced.
> 
> > A system which uses normalising form C as its local format 
> for staorage
> > will have a simpler implementation than others, and that will help
> > push system vendors to move to the most common format used.
> > UCS normalising form C is compact and do not destroy any 
> information,
> > so it is best. The K forms destroy data and the D form 
> takes more space and
> > breaks the semantic concept of letter on some letters.
> 
> Normalization form C is limited to composed characters defined in
> Unicode 3.0 (we're past 3.0); as such it really means "composed for
> these characters, decomposed for everything else" - so why not just
> decomposed then?
> 
> I don't thing encoding length is that big a deal, but cycles spent in
> normalization, space dedicated to normalization data 
> structures, *that*
> is a big deal.
> 
> This is why I'm for form D (on the wire as an optimization).
> 
> > Yes, it may result in additional code in servers, but many 
> system can
> > create very efficient code to convert between legacy character set
> > and UCS normalising form C. So I think it will not be that 
> expensive.
> 
> I have no figures close at hand, but I don't think that Unicode
> normalization data structures and code are small (remember, we're
> talking about kernel constraints here, complete with small stacks).
> 
> Nico
> -- 
> 


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:50 AM Z CST