Re: Invalid UTF-8 strings

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Erik Seaberg (erk@flyingcroc.com)
Date: 12/31/01-02:23:40 PM Z


Subject: Re: Invalid UTF-8 strings
From: Erik Seaberg <erk@flyingcroc.com>
Date: 31 Dec 2001 12:23:40 -0800
Message-ID: <86d70vnnoz.fsf@unx51.staff.flyingcroc.net>

Peter Åstrand <peter@cendio.se> writes:

> There exists UTF-8 sequences that are impossible to represent even
> with surrogate pairs.

Can a NFS4 server impose local filesystem rules against client
requests?  If so, NFS4 over a UTF-16 filesystem simply never allows
codepoints beyond U+10FFFF in filenames, just as over a Unix
filesystem it never allows "/" or "\0" in filenames.  RFC 3010 stated
that ".." has no special semantics, but that seems to forbid clients
from using it in place of LOOKUPP, not oblige servers to map each and
every UTF-8 string to some locally legal filename.

> Should a Unix-kernel-client pass the octets given in open()
> unmodified over the wire, to the server?

I suppose the only other option is a libc wrapper that runs the
filename through mbstowcs() in the current locale and then wcstombs()
in a UTF-8 locale, just as Win32's CreateFileW() will have to convert
from UTF-16.  Do ANSI or POSIX have anything to say about I18N of
open() or fopen(), or is the filename arg just assumed to be in some
unknown single-byte encoding?


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:49:27 AM Z CST