From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 12/11/01-09:48:10 AM Z
Message-ID: <8C610D86AF6CD4119C9800B0D0499E336A8B53@red.nane.netapp.com> From: "Noveck, Dave" <Dave.Noveck@netapp.com> Subject: RE: Invalid UTF-8 strings Date: Tue, 11 Dec 2001 07:48:10 -0800 This raises another interesting question. What do we do about the case in which you have a string that is a valid UTF-8 string but which represents characters that don't fit in 16 bits? These would not be a problem for file systems which use native ASCII names and simply store the UTF-8 but they definitely would be a problem for file that store names as Unicode. The spec now says that names are UTF8-encoded UCS strings but doesn't say much about what you do if the strings contain characters that are not valid UCS characters. The first case to consider is characters that fit in 16 bits but have no valid assignments in UCS. I think the spec should explicitly say that that these are valid and servers should not reject them. The more interesting case concerns characters that are beyond the 16-bit range. The spec talks about how Unicode may not be big enough (in the future), leaving it unclear about what servers and clients are supposed to do now. My preference for dealing with this is for the spec to say that for this version of nfs-v4, UTF-8 should be used to encode 16-bit characters and that anything else is invalid (NFS4ERR_INVAL). It would be still be noted that UTF-8 allows encoding of larger characters and that if any such characters are added to UCS, the issue can be addressed in a minor version. Otherwise, we have a lot of complicated interoperability issues to consider and deal with when the event which may trigger them may in fact never happen. -----Original Message----- From: Peter Åstrand [mailto:peter@cendio.se] Sent: Tuesday, December 11, 2001 10:15 AM To: nfsv4-wg@sunroof.eng.sun.com Subject: Re: Invalid UTF-8 strings On Tue, 27 Nov 2001, Peter Åstrand wrote: I'll answer a few of my own questions: > 1. What actually is an invalid UTF-8 string? Is it any string with an > invalid character, like 11000000 11000001 (0xC0 0xC1))? I've been told: 10xxxxxx 110xxxxx 0xxxxxxx 110xxxxx 11xxxxxx 1110xxxx 0xxxxxxx 1110xxxx 11xxxxxx 1110xxxx 10xxxxxx 0xxxxxxx 1110xxxx 10xxxxxx 11xxxxxx 11110xxx 0xxxxxxx 11110xxx 11xxxxxx 11110xxx 10xxxxxx 0xxxxxxx 11110xxx 10xxxxxx 11xxxxxx 11110xxx 10xxxxxx 10xxxxxx 0xxxxxxx 11110xxx 10xxxxxx 10xxxxxx 11xxxxxx ... 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 11xxxxxx Plus, the stream should not end too early. > 2. Should operations check UTF-8 strings before returning them? E.g, > should READLINK verify the linkdata before returning? Or is it OK for a I suggest no. It would limit performance. However, I think checks should be done at creation time. This seems to be the intention of the NFS4 protocol. However, some checks are missing, I think: * CREATE should return NFS4ERR_INVAL if trying to create a symbolic link with linkdata not obeying UTF-8. Maybe the comment in the READLINK description needs to clarify this also. * SECINFO should return NFS4ERR_INVAL for non-UTF8 "name" argument. I believe this is Issue66. (Btw, why does SECINFO have a name arg? To me, it makes more sense have the (cfh) representing the file instead.) * SETATTR, NVERIFY, VERIFY should also return NFS4ERR_INVAL on invalid attributes, for example attributes a non-UTF8 fattr4_owner. All these ops have NFS4ERR_INVAL as possible return type, but maybe the spec should clarify that this error should return for invalid attributes. /Peter > But what about attributes, for example "owner". Should SETATTR accept > setting the owner to an invalid UTF-8 string? Should GETATTR return > invalid UTF-8 strings without errors, if such strings is stored on disk? > > I've made a table, summarizing where UTF-8 is used. Have I understood the > policy for UTF-8 checking correctly? Should the "?" be "yes" or "no"? > > --- > > Data types derived from or containing utf8string: > COMPOUND4args > COMPOUND4res > CB_COMPOUND4args > CB_COMPOUND4res > component4 > pathname4 > fs_location4 > fs_locations4 > READLINK4resok > CREATE4args > createtype4 > linktext4 > nfsace4 > fattr4_acl > open_read_delegation4 > open_write_delegation4 > open_delegation4 > OPEN4resok > fattr4_owner > fattr4_owner_group > fattr4_mimetype > > Types used in attributes: > fattr4_owner, fattr4_owner_group, fattr4_mimetype, fs_locations4, fattr4_acl > > > Check input Check output > ==================================================== > Procedures > ========== > COMPOUND: no(tag) no(tag) > CB_COMPOUND: no(tag) no(tag) > > Operations > ========== > ACCESS - - > CLOSE - - > COMMIT - - > CREATE yes(objname, linkdata) - > DELEGPURGE - - > DELEGRETURN - - > GETATTR - ?(obj_attributes) > GETFH - - > LINK yes(newname) - > LOCK - - > LOCKT - - > LOCKU - - > LOOKUP yes(objname) - > LOOKUPP - - > NVERIFY ?(obj_attributes) - > OPEN yes(open_claim.file) ?(delegation.*.permissions.who) > OPENATTR - - > OPEN_CONFIRM - - > OPEN_DOWNGRADE - - > PUTFH - - > PUTPUBFH - - > PUTROOTFH - - > READ - - > READDIR - ?(reply.entries.name, > reply.entries.attrs) > READLINK - ?(linkdata) > REMOVE yes(target) - > RENAME yes(oldname, newname) - > RENEW - - > RESTOREFH - - > SAVEFH - - > SECINFO yes(name) - > SETATTR yes(obj_attributes) - > SETCLIENTID - - > SETCLIENTID_CONFIRM - - > VERIFY yes(obj_attributes) - > WRITE - - > CB_GETATTR - ?(obj_attributes) > CB_RECALL - - > > > -- Peter Åstrand Telephone: +46-13-21 46 00 Cendio Systems E-mail: peter@cendio.se Teknikringen 3 583 30 Linköping Sweden
This archive was generated by hypermail 2.1.2 : 03/04/05-01:49:26 AM Z CST