Content string encoding?

View: New views
4 Messages — Rating Filter:   Alert me  

Content string encoding?

by Murray Cumming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We are wrapping the g_content_type_* functions for giomm, and have a
question:

Can/must the content strings here be UTF-8, or are they a blob of data
of unknown encoding (a bit like a URI)
http://library.gnome.org/devel/gio/unstable/gio-GContentType.html
 
--
murrayc@...
www.murrayc.com
www.openismus.com

_______________________________________________
gnome-vfs-list mailing list
gnome-vfs-list@...
http://mail.gnome.org/mailman/listinfo/gnome-vfs-list

Re: Content string encoding?

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, 2008-02-01 at 01:01 +0100, Murray Cumming wrote:
> We are wrapping the g_content_type_* functions for giomm, and have a
> question:
>
> Can/must the content strings here be UTF-8, or are they a blob of data
> of unknown encoding (a bit like a URI)
> http://library.gnome.org/devel/gio/unstable/gio-GContentType.html

I'm not sure. I mean, on unix they are mimetypes, and on windows they
are extension strings like ".doc", "audio", "*". Both of these will in
practice be ASCII strings in all cases, but I don't think there is
anything prohibiting e.g. adding a non-ascii type in the windows
registry which then could be returned to the app via gio.

For unix the source of mime-types is the freedesktop shared mime spec,
and its files are defined in utf8, so all unix mimetypes should be utf8.
Maybe we can say that the content type must be utf8, and then we filter
out those who are not (in practice none).

Also, URIs are not undefined, they are a limited subset of ASCII. If any
non-ascii character is unescaped in the URI it is invalid (by the spec).

_______________________________________________
gnome-vfs-list mailing list
gnome-vfs-list@...
http://mail.gnome.org/mailman/listinfo/gnome-vfs-list

Re: Content string encoding?

by Murray Cumming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, 2008-02-01 at 09:54 +0100, Alexander Larsson wrote:

> On Fri, 2008-02-01 at 01:01 +0100, Murray Cumming wrote:
> > We are wrapping the g_content_type_* functions for giomm, and have a
> > question:
> >
> > Can/must the content strings here be UTF-8, or are they a blob of data
> > of unknown encoding (a bit like a URI)
> > http://library.gnome.org/devel/gio/unstable/gio-GContentType.html
>
> I'm not sure. I mean, on unix they are mimetypes, and on windows they
> are extension strings like ".doc", "audio", "*". Both of these will in
> practice be ASCII strings in all cases, but I don't think there is
> anything prohibiting e.g. adding a non-ascii type in the windows
> registry which then could be returned to the app via gio.

Could that mean that they are ever some odd encoding such as UCS2, which
would not be UTF-8?

> For unix the source of mime-types is the freedesktop shared mime spec,
> and its files are defined in utf8, so all unix mimetypes should be utf8.
> Maybe we can say that the content type must be utf8, and then we filter
> out those who are not (in practice none).
>
> Also, URIs are not undefined, they are a limited subset of ASCII. If any
> non-ascii character is unescaped in the URI it is invalid (by the spec).

--
murrayc@...
www.murrayc.com
www.openismus.com

_______________________________________________
gnome-vfs-list mailing list
gnome-vfs-list@...
http://mail.gnome.org/mailman/listinfo/gnome-vfs-list

Re: Content string encoding?

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, 2008-02-01 at 17:36 +0100, Murray Cumming wrote:

> On Fri, 2008-02-01 at 09:54 +0100, Alexander Larsson wrote:
> > On Fri, 2008-02-01 at 01:01 +0100, Murray Cumming wrote:
> > > We are wrapping the g_content_type_* functions for giomm, and have a
> > > question:
> > >
> > > Can/must the content strings here be UTF-8, or are they a blob of data
> > > of unknown encoding (a bit like a URI)
> > > http://library.gnome.org/devel/gio/unstable/gio-GContentType.html
> >
> > I'm not sure. I mean, on unix they are mimetypes, and on windows they
> > are extension strings like ".doc", "audio", "*". Both of these will in
> > practice be ASCII strings in all cases, but I don't think there is
> > anything prohibiting e.g. adding a non-ascii type in the windows
> > registry which then could be returned to the app via gio.
>
> Could that mean that they are ever some odd encoding such as UCS2, which
> would not be UTF-8?

Actually, looking at the win32 the registry is stored in unicode and we
always convert to UTF-8 in libgio.

_______________________________________________
gnome-vfs-list mailing list
gnome-vfs-list@...
http://mail.gnome.org/mailman/listinfo/gnome-vfs-list