|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Trash Can Questionhey,
which character encoding is getting used in trash info files? there should be info about it in the spesification in my opinion. -- Ozan オザン Close the world, txEn eht nepO _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Wednesday 05 August 2009, Ozan Türkyılmaz wrote:
> hey, > > which character encoding is getting used in trash info files? there > should be info about it in the spesification in my opinion. The same encoding as the file system. The filename is copied "as is" into the info file. In practice I would recommend using utf8 everywhere and getting rid of the whole "filesystem encoding" mess in the first place. -- David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE, Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org). _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can Question2009/8/5 David Faure <faure@...> -- In practice I would recommend using utf8 everywhere and getting rid Who is interested to work a new draft (a draft) of the spec which solves this and the other problems emerged? Andrea Francia http://andreafrancia.blogspot.com/ _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Thursday 06 August 2009, Andrea Francia wrote:
> 2009/8/5 David Faure <faure@...> > > > > In practice I would recommend using utf8 everywhere and getting rid > > of the whole "filesystem encoding" mess in the first place. > > > Who is interested to work a new draft (a draft) of the spec which solves > this and the other problems emerged? Well, I am. But you'll probably need Alex Larsson on board too, for it to make sense. Anyway, how do you propose to "solve this"? David (who also has something to suggest in an updated trash spec; a file containing the size of the trash, as a "cache" of that information; more details once I have time to formalize this a little bit) -- David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE, Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org). _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote:
> 2009/8/5 David Faure <faure@...> > In practice I would recommend using utf8 everywhere and > getting rid > of the whole "filesystem encoding" mess in the first place. > > > Who is interested to work a new draft (a draft) of the spec which > solves this and the other problems emerged? This is not a "problem" that should be "solved". It was very delibirately added to the spec in order to allow all files to be trashed. How would you trash a file named some non-utf8 string if only utf8 is allowed in the format? Filenames on linux are zero terminated arrays of bytes. If you treat it like anything else you will just fail in some corner cases. Of course, we should all move towards all filenames being in UTF8, avoid creating non-UTF8 filenames, etc. That is a different issue, and should not make us limit our specifications to only work on a subset of the valid filenames. _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can Question2009/8/21 Alexander Larsson <alexl@...>
For me filenames are a list of unicode characters. The way those filenames are represented using array of bytes is a different issue.
As far I know the filesystem is possible to create filename with the zero character '\0' or the newline ('\n') in it. Of course, we should all move towards all filenames being in UTF8, avoid This sound strange to me, UTF-8 is about encoding not about character set. May be there is a little misunderstanding about utf8, unicode and encoding system.
It seems to me that you are using the term utf-8 as character set. I see two different aspects: 1) which character set the trash system should be able to handle?
2) how the trash system handle it? I think that the trash system should be able to manage filenames and path expressed in unicode. One way to encode unicode characters is UTF-8, but there also UTF-16, and others.
I don't see any problem with filesystem whose filenames aren't encoded in non-utf8. All the pre-unicode character set are part of unicode and all character of unicode can be represented in utf8.
That is a different issue, and should That's true but currently I see the following problems: - the subset of valid filenames doesn't contains filenames with '\n' or '\0' in it
- isn't clear (probably only be for me) which encoding should be used for reading .trashinfo files. - the uses of character set like latin1 for encoding .trashinfo files contents could lead to a loss of information
-- Andrea Francia http://andreafrancia.blogspot.com/ _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Sat, Aug 22, 2009 at 3:23 AM, Andrea Francia<andrea@...> wrote:
> > > 2009/8/21 Alexander Larsson <alexl@...> >> >> On Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote: >> > 2009/8/5 David Faure <faure@...> >> > In practice I would recommend using utf8 everywhere and >> > getting rid >> > of the whole "filesystem encoding" mess in the first place. >> > >> > >> > Who is interested to work a new draft (a draft) of the spec which >> > solves this and the other problems emerged? >> >> This is not a "problem" that should be "solved". It was very >> delibirately added to the spec in order to allow all files to be >> trashed. How would you trash a file named some non-utf8 string if only >> utf8 is allowed in the format? >> >> Filenames on linux are zero terminated arrays of bytes. If you treat it >> like anything else you will just fail in some corner cases. > > For me filenames are a list of unicode characters. The way those filenames > are represented using array of bytes is a different issue. > As far I know the filesystem is possible to create filename with the zero > character '\0' or the newline ('\n') in it. >> >> Of course, we should all move towards all filenames being in UTF8, avoid >> creating non-UTF8 filenames, etc. On local machine, you can use any filename encoding you want. The remote servers, however, cannot be totally migrated to UTF-8 sometimes. So unless your vfs implementation can convert the encodings and only show UTF-8 to applications, handling non-UTF-8 will always be needed. > This sound strange to me, UTF-8 is about encoding not about character set. > May be there is a little misunderstanding about utf8, unicode and encoding > system. > It seems to me that you are using the term utf-8 as character set. > I see two different aspects: > 1) which character set the trash system should be able to handle? > 2) how the trash system handle it? > I think that the trash system should be able to manage filenames and path > expressed in unicode. > One way to encode unicode characters is UTF-8, but there also UTF-16, and > others. > I don't see any problem with filesystem whose filenames aren't encoded in > non-utf8. > All the pre-unicode character set are part of unicode and all character of > unicode can be represented in utf8. Most of the pre-unicode characters are defined in unicode, but some of them are not. So if you convert everything to UTF-8, some data might be lost. >> That is a different issue, and should >> not make us limit our specifications to only work on a subset of the >> valid filenames. > That's true but currently I see the following problems: > - the subset of valid filenames doesn't contains filenames with '\n' or > '\0' in it If the stored path is URL-encoded, this is not an issue since they will be escaped. Besides, even the filesystem support using \0 inside filename, I don't believe that there is a real-world filemanager able to handle this. In theory this should be allowed but it's an extremely rare use case which doesn't exist in real world. > - isn't clear (probably only be for me) which encoding should be used for > reading .trashinfo files. > - the uses of character set like latin1 for encoding .trashinfo files > contents could lead to a loss of information > > -- > Andrea Francia > http://andreafrancia.blogspot.com/ > > _______________________________________________ > xdg mailing list > xdg@... > http://lists.freedesktop.org/mailman/listinfo/xdg > > xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Fri, 2009-08-21 at 21:23 +0200, Andrea Francia wrote:
> > > 2009/8/21 Alexander Larsson <alexl@...> > On Thu, 2009-08-06 at 00:05 +0200, Andrea Francia wrote: > > 2009/8/5 David Faure <faure@...> > > In practice I would recommend using utf8 everywhere > and > > getting rid > > of the whole "filesystem encoding" mess in the first > place. > > > > > > Who is interested to work a new draft (a draft) of the spec > which > > solves this and the other problems emerged? > > > This is not a "problem" that should be "solved". It was very > delibirately added to the spec in order to allow all files to > be > trashed. How would you trash a file named some non-utf8 string > if only > utf8 is allowed in the format? > > Filenames on linux are zero terminated arrays of bytes. If you > treat it > like anything else you will just fail in some corner cases. > > > For me filenames are a list of unicode characters. The way those > filenames are represented using array of bytes is a different issue. > As far I know the filesystem is possible to create filename with the > zero character '\0' or the newline ('\n') in it. I don't know what operating system you are running, but I'm running UNIX, and the unix APIs define filenames as a list of bytes, terminated by a zero byte, where only byte 47 ("/" in ascii) is treated specially. What you believe filenames are does not really affect things. On the filesystem a file has a name consisting of an array of bytes, and only if you can give this exact array of bytes can you open the file. There is not implicit either encoding or character set, there is only bytes. A filename may be a string of bytes that are not valid in any existing encoding of any existing character set. You still need to be able to represent this in the trashinfo file. And even if the filename *is* in some valid encoding you don't know which on it is. The various locale settings can give you a hint about what it may be, and should be used when you *create* a new filename from a known unicode string. However, whats on the disk is whats on the disk and have no strict connection to unicode. > Of course, we should all move towards all filenames being in > UTF8, avoid > creating non-UTF8 filenames, etc. > > > This sound strange to me, UTF-8 is about encoding not about character > set. > May be there is a little misunderstanding about utf8, unicode and > encoding system. I'm not confused about this. > It seems to me that you are using the term utf-8 as character set. > > > I see two different aspects: > 1) which character set the trash system should be able to handle? The trash system is not about handling any character set at all. It is about storing the identifier used on the operating system to access the file. Anything else and there are valid files that the trash system wouldn't handle. > 2) how the trash system handle it? > > I think that the trash system should be able to manage filenames and > path expressed in unicode. > One way to encode unicode characters is UTF-8, but there also UTF-16, > and others. So, say you have a filename that is basically a random set of bytes, not valid utf8, not valid utf16. Its e.g. valid latin-1, because all strings are, but if you view it in latin-1 its full of unprintable characters and gobeligok. This is a valid filename in UNIX, and you must give exactly this array of bytes in order to access the file (to open it, rename, delete, etc). How do you propose to "encode this in UTF-8"? > I don't see any problem with filesystem whose filenames aren't encoded > in non-utf8. > All the pre-unicode character set are part of unicode and all > character of unicode can be represented in utf8. > > That is a different issue, and should > not make us limit our specifications to only work on a subset > of the > valid filenames. > > > That's true but currently I see the following problems: > - the subset of valid filenames doesn't contains filenames with '\n' > or '\0' in it Oh, the string itself in the file is URI-style encoded, so it can contain any sort of bytes. (In fact, it can even contain encoded \0s in it i guess, but that is unlikely to work well, as you can't e.g. pass such a filename to the kernel since it things supplied filenames stop at the first zero.) > - isn't clear (probably only be for me) which encoding should be used > for reading .trashinfo files. In general utf8 is the character set of the whole file, but the name key, when unescaped is to be treated as an array of bytes, representing the actual filename, which may be of any or no encoding. > - the uses of character set like latin1 for encoding .trashinfo files > contents could lead to a loss of information Since the filename is uri encoded we can store whatever bytes we want in the filename. However, once that is decoded we can't expect it to be in any encoding or character set, because that would as you say lead to a loss of information. _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionOn Sat, 2009-08-22 at 08:54 +0800, PCMan wrote:
> On Sat, Aug 22, 2009 at 3:23 AM, Andrea Francia<andrea@...> wrote: > > > > > >> > >> Of course, we should all move towards all filenames being in UTF8, avoid > >> creating non-UTF8 filenames, etc. > This is not a real solution if you're going to support remote filesystems. > On local machine, you can use any filename encoding you want. > The remote servers, however, cannot be totally migrated to UTF-8 sometimes. > So unless your vfs implementation can convert the encodings and only > show UTF-8 to applications, handling non-UTF-8 will always be needed. I'm of course talking about local files here. Obviously you have to treat remote systems differently, exactly how depends on the remote filesystem. In gvfs for instance this is handled by all filenames being strings of bytes in undefined encoding, but there are backend-implemented ways to get the "display name" of a file (and back) so that you can display this in a user interface. > Besides, even the filesystem support using \0 inside filename, I don't > believe that there is a real-world filemanager able to handle this. In > theory this should be allowed but it's an extremely rare use case > which doesn't exist in real world. How could this be allowed? How would you pass such a filename to the filesystem via e.g. the open() API? Any pathname passed via the POSIX APIs are "c-strings", i.e. they end at the first passed in zero byte. _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
|
|
Re: Trash Can QuestionI reviewed the messages from Larsson and PCMag and I think I was wrong about what constitutes an identifier for a file in a filesystem. Thanks to you both for helping me to review my opinions.
I think that there are also something that I don't get, I'll write about my perplexities after I reviewed some more documentations.
2009/8/22 Alexander Larsson <alexl@...>
-- Andrea Francia http://andreafrancia.blogspot.com/ _______________________________________________ xdg mailing list xdg@... http://lists.freedesktop.org/mailman/listinfo/xdg |
| Free embeddable forum powered by Nabble | Forum Help |