RE: Invalid UTF-8 strings

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 12/11/01-09:48:10 AM Z


Message-ID: <8C610D86AF6CD4119C9800B0D0499E336A8B53@red.nane.netapp.com>
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
Subject: RE: Invalid UTF-8 strings
Date: Tue, 11 Dec 2001 07:48:10 -0800

This raises another interesting question.  What do we do about the
case in which you have a string that is a valid UTF-8 string but
which represents characters that don't fit in 16 bits?  These
would not be a problem for file systems which use native ASCII
names and simply store the UTF-8 but they definitely would be
a problem for file that store names as Unicode.

The spec now says that names are UTF8-encoded UCS strings but 
doesn't say much about what you do if the strings contain characters 
that are not valid UCS characters.  

The first case to consider is characters that fit in 16 bits
but have no valid assignments in UCS.  I think the spec should 
explicitly say that that these are valid and servers should not 
reject them.

The more interesting case concerns characters that are beyond the 
16-bit range.  The spec talks about how Unicode may not be big enough
(in the future), leaving it unclear about what servers and clients
are supposed to do now.

My preference for dealing with this is for the spec to say that for
this version of nfs-v4, UTF-8 should be used to encode 16-bit
characters and that anything else is invalid (NFS4ERR_INVAL).  It
would be still be noted that UTF-8 allows encoding of larger
characters and that if any such characters are added to UCS, the
issue can be addressed in a minor version.  Otherwise, we have a
lot of complicated interoperability issues to consider and deal with
when the event which may trigger them may in fact never happen.

-----Original Message-----
From: Peter Åstrand [mailto:peter@cendio.se]
Sent: Tuesday, December 11, 2001 10:15 AM
To: nfsv4-wg@sunroof.eng.sun.com
Subject: Re: Invalid UTF-8 strings


On Tue, 27 Nov 2001, Peter Åstrand wrote:

I'll answer a few of my own questions:

> 1. What actually is an invalid UTF-8 string? Is it any string with an
> invalid character, like 11000000 11000001 (0xC0 0xC1))?

I've been told:
10xxxxxx
110xxxxx 0xxxxxxx
110xxxxx 11xxxxxx
1110xxxx 0xxxxxxx
1110xxxx 11xxxxxx
1110xxxx 10xxxxxx 0xxxxxxx
1110xxxx 10xxxxxx 11xxxxxx
11110xxx 0xxxxxxx
11110xxx 11xxxxxx
11110xxx 10xxxxxx 0xxxxxxx
11110xxx 10xxxxxx 11xxxxxx
11110xxx 10xxxxxx 10xxxxxx 0xxxxxxx
11110xxx 10xxxxxx 10xxxxxx 11xxxxxx
...
1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 11xxxxxx

Plus, the stream should not end too early. 

> 2. Should operations check UTF-8 strings before returning them? E.g, 
> should READLINK verify the linkdata before returning? Or is it OK for a 

I suggest no. It would limit performance. However, I think checks should 
be done at creation time. This seems to be the intention of the NFS4 
protocol. However, some checks are missing, I think:

* CREATE should return NFS4ERR_INVAL if trying to create a symbolic link 
with linkdata not obeying UTF-8. Maybe the comment in the READLINK 
description needs to clarify this also. 

* SECINFO should return NFS4ERR_INVAL for non-UTF8 "name" argument. I
believe this is Issue66. (Btw, why does SECINFO have a name arg? To me, it
makes more sense have the (cfh) representing the file instead.)

* SETATTR, NVERIFY, VERIFY should also return NFS4ERR_INVAL on invalid 
attributes, for example attributes a non-UTF8 fattr4_owner. 

All these ops have NFS4ERR_INVAL as possible return type, but maybe the 
spec should clarify that this error should return for invalid attributes. 


/Peter

> But what about attributes, for example "owner". Should SETATTR accept
> setting the owner to an invalid UTF-8 string? Should GETATTR return
> invalid UTF-8 strings without errors, if such strings is stored on disk?
> 
> I've made a table, summarizing where UTF-8 is used. Have I understood the 
> policy for UTF-8 checking correctly? Should the "?" be "yes" or "no"?
> 
> ---
> 
> Data types derived from or containing utf8string:
> COMPOUND4args
> COMPOUND4res
> CB_COMPOUND4args
> CB_COMPOUND4res
> component4
> pathname4
> fs_location4
> fs_locations4
> READLINK4resok
> CREATE4args
> createtype4
> linktext4
> nfsace4
> fattr4_acl
> open_read_delegation4
> open_write_delegation4
> open_delegation4
> OPEN4resok
> fattr4_owner
> fattr4_owner_group
> fattr4_mimetype
> 
> Types used in attributes:
> fattr4_owner, fattr4_owner_group, fattr4_mimetype, fs_locations4, fattr4_acl
> 
> 
>          Check input			Check output
> ====================================================
> Procedures
> ==========
> COMPOUND:	no(tag)	                    no(tag)
> CB_COMPOUND:	no(tag)	                    no(tag)
> 			                    
> Operations		                    
> ==========		                    
> ACCESS		-	                    -
> CLOSE		-	                    -
> COMMIT		-	                    -
> CREATE		yes(objname, linkdata)      -
> DELEGPURGE	-	                    -
> DELEGRETURN	-	                    -
> GETATTR		-	                    ?(obj_attributes)
> GETFH		-	                    -
> LINK		yes(newname)		    -	
> LOCK		-	                    -
> LOCKT		-	                    -
> LOCKU		-	                    -
> LOOKUP		yes(objname)                -
> LOOKUPP		-	                    -
> NVERIFY		?(obj_attributes)	    -	
> OPEN		yes(open_claim.file)	    ?(delegation.*.permissions.who)
> OPENATTR	-			    -
> OPEN_CONFIRM	-			    -
> OPEN_DOWNGRADE	-			    -
> PUTFH		-			    -
> PUTPUBFH	-			    -
> PUTROOTFH	-			    -
> READ		-			    -
> READDIR		-			    ?(reply.entries.name, 
> 					      reply.entries.attrs)
> READLINK        -			    ?(linkdata) 
> REMOVE		yes(target)		    -
> RENAME		yes(oldname, newname)	    -
> RENEW		-			    -
> RESTOREFH	-			    -
> SAVEFH		-			    -
> SECINFO		yes(name)		    -
> SETATTR		yes(obj_attributes)	    -
> SETCLIENTID	-			    -
> SETCLIENTID_CONFIRM -			    -
> VERIFY		yes(obj_attributes)	    -
> WRITE		-			    -
> CB_GETATTR	-			    ?(obj_attributes)
> CB_RECALL	-			    -
> 
> 
> 

-- 
Peter Åstrand                Telephone: +46-13-21 46 00
Cendio Systems               E-mail: peter@cendio.se
Teknikringen 3
583 30 Linköping
Sweden


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:49:26 AM Z CST