|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
multipart/form-data filename encoding: unicode and special charactersI am not an experienced web standards wonk, so please forgive me if I'm making a mistake here.
When uploading files that contain special characters in their name, it appears to me that it is unspecified as to how those file names should be escaped. As a result, Webkit/Safari/Chrome appear to handle these filenames in one way, while Firefox handles them in another. I'm implementing the server side of this equation, and it is unclear to me what I should be doing. Am I missing something? Webkit even has a bug on this issue that states "I suggest working with WHATWG or HTML WG to get something specified in HTML5, and getting browsers converge on that." Is anyone working on this? EXAMPLE Create a file named: bàz'\"hi%22.txt eg. using the unix command: touch bàz\'\\\"hi%22.txt Firefox (13.0 beta on Mac) sends the following header, backslash escaping the double quote but not escaping the backslash. Content-Disposition: form-data; name="somefile"; filename="bàz'\\"hi%22.txt" Webkit (latest nightly r115711 on Mac): %-escapes the double quote, but does nothing to the literal % Content-Disposition: form-data; name="somefile"; filename="bàz'\%22hi%22.txt" THE SPECS: HTML5 states: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data Encode the (now mutated) form data set using the rules described by RFC 2388. […] File names […] must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. […]). User agents must not use the RFC 2231 encoding suggested by RFC 2388. … this seems contradictory: Encode using RFC 2388, but do not using the encoding suggested by the RFC. Worse, no browser actually follows the RFC (e.g. they all use UTF-8 encoded parameter values), so that doesn't seem like the right answer. Is there a way out of this mess? Evan -- http://evanjones.ca/ |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn Tue, 2012-05-01 at 21:12 -0400, Evan Jones wrote:
> I am not an experienced web standards wonk, so please forgive me if I'm making a mistake here. > > When uploading files that contain special characters in their name, it appears to me that it is unspecified as to how those file names should be escaped. As a result, Webkit/Safari/Chrome appear to handle these filenames in one way, while Firefox handles them in another. I'm implementing the server side of this equation, and it is unclear to me what I should be doing. Am I missing something? Webkit even has a bug on this issue that states "I suggest working with WHATWG or HTML WG to get something specified in HTML5, and getting browsers converge on that." Is anyone working on this? > > > EXAMPLE > > Create a file named: bàz'\"hi%22.txt eg. using the unix command: touch bàz\'\\\"hi%22.txt > > > Firefox (13.0 beta on Mac) sends the following header, backslash escaping the double quote but not escaping the backslash. > > Content-Disposition: form-data; name="somefile"; filename="bàz'\\"hi%22.txt" > > > Webkit (latest nightly r115711 on Mac): %-escapes the double quote, but does nothing to the literal % > > Content-Disposition: form-data; name="somefile"; filename="bàz'\%22hi%22.txt" > > > THE SPECS: HTML5 states: > > http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data > > Encode the (now mutated) form data set using the rules described by RFC 2388. […] File names […] must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. […]). User agents must not use the RFC 2231 encoding suggested by RFC 2388. > > > … this seems contradictory: Encode using RFC 2388, but do not using the encoding suggested by the RFC. Worse, no browser actually follows the RFC (e.g. they all use UTF-8 encoded parameter values), so that doesn't seem like the right answer. Is there a way out of this mess? > > Evan > > -- > http://evanjones.ca/ > Although an issue with this test case, I would argue what valid problem this may cause. It does implement many characters which are considered unsafe in the most popular operating system file system (windows either NTFS or FAT32), and therefore by association operating systems in which the user is probably (even unconsciously) avoiding those characters purely for interoperability reasons. The Webkit method looks the better of the two with regards to how server-side languages might interpret it, but it would need work to ensure everything that should be escaped is, and that everything that is unescaped on the server should be and is done so correctly. -- Thanks, Ash http://www.ashleysheridan.co.uk |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn May 1, 2012, at 22:38 , Ashley Sheridan wrote:
> The Webkit method looks the better of the two with regards to how > server-side languages might interpret it, but it would need work to > ensure everything that should be escaped is, and that everything that is > unescaped on the server should be and is done so correctly. The problem is that currently I am unable to correctly "round trip" an uploaded file name. I would like users to upload a file, and be able to later download the file with the *exact same* file name. If you follow the specifications, this is not possible. Firefox is closer to the MIME RFCs (which specifies backslash quoting in quoted-strings), but apparently that will break IE6, 7, and 8: https://bugs.webkit.org/show_bug.cgi?id=62107 http://java.net/jira/browse/JERSEY-759 Webkit's %-escaping behaviour is *not* part of the referenced MIME RFCs (which specifies either backslash quoting in quoted-strings, base64 encoding, or %-escaping in special "filename*=" arguments). Thus, if this is the "right answer," it should be specified somewhere. I'm assuming that this needs to be in the HTML5 spec, since HTTP calls this the "body" of the the POST and declares that it is outside the HTTP specification. Webkit's escaping is also flawed (see bug 62107 above). Files with that contain %-escapes (eg. my%22file.txt, admittedly very rare) will get mangled, because there is no difference between my%22file.txt and my"file.txt. Currently, I need to detect the browser in order to figure out what kind of unescaping to apply to the file name, and even then in some cases I can't figure out what the right file name is. Webkit claims this is a specification bug, so I'm hoping someone here might tell me if this is the case, and if so where can I file bugs, create test cases, etc? Evan -- http://evanjones.ca/ |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn 2012-05-02 13:05, Evan Jones wrote:
> On May 1, 2012, at 22:38 , Ashley Sheridan wrote: >> The Webkit method looks the better of the two with regards to how >> server-side languages might interpret it, but it would need work to >> ensure everything that should be escaped is, and that everything that is >> unescaped on the server should be and is done so correctly. > > The problem is that currently I am unable to correctly "round trip" an uploaded file name. I would like users to upload a file, and be able to later download the file with the *exact same* file name. If you follow the specifications, this is not possible. Firefox is closer to the MIME RFCs (which specifies backslash quoting in quoted-strings), but apparently that will break IE6, 7, and 8: > > https://bugs.webkit.org/show_bug.cgi?id=62107 > http://java.net/jira/browse/JERSEY-759 > > Webkit's %-escaping behaviour is *not* part of the referenced MIME RFCs (which specifies either backslash quoting in quoted-strings, base64 encoding, or %-escaping in special "filename*=" arguments). Thus, if this is the "right answer," it should be specified somewhere. I'm assuming that this needs to be in the HTML5 spec, since HTTP calls this the "body" of the the POST and declares that it is outside the HTTP specification. > > Webkit's escaping is also flawed (see bug 62107 above). Files with that contain %-escapes (eg. my%22file.txt, admittedly very rare) will get mangled, because there is no difference between my%22file.txt and my"file.txt. > > Currently, I need to detect the browser in order to figure out what kind of unescaping to apply to the file name, and even then in some cases I can't figure out what the right file name is. Webkit claims this is a specification bug, so I'm hoping someone here might tell me if this is the case, and if so where can I file bugs, create test cases, etc? > > Evan > > -- > http://evanjones.ca/ I did spend a considerable amount of time with Content-Disposition, the *response* header field (resulting in RFC 6266 and <http://greenbytes.de/tech/tc2231/>). However, this has little to do with the representation in form uploads. If browser implementers want to try something new that will not affect the old code paths, supporting the encoding defined in RFC 5987 might be the right thing to do (yes, it's ugly, but it's unambiguous). Best regards, Julian |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn May 2, 2012, at 7:43 , Julian Reschke wrote:
> If browser implementers want to try something new that will not affect the old code paths, supporting the encoding defined in RFC 5987 might be the right thing to do (yes, it's ugly, but it's unambiguous). It seems to me like that is a potential solution that could be evaluated. It would be nice to have both the HTTP response header and the POST form encoding be the same. However, a critical question is if the server software that parses the form headers would do the "right thing" if it sees both an ASCII fallback filename= and an escaped filename*= parameter in the Content-Disposition header. Without looking at any code, I suspect some will and some won't. My conclusion: I would be willing to help with bugs, testing, test cases, looking at server code, etc related to this issue. However, I believe someone who is experienced with the technology and politics of web standards to really champion any change because I don't fully understand the processes or the issues. If I don't hear anything in a few days, I'll try filing some additional bugs with Webkit, Firefox, and the HTML5 spec and otherwise give up. Thanks, Evan Jones -- http://evanjones.ca/ |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn 2012-05-02 19:26, Evan Jones wrote:
> On May 2, 2012, at 7:43 , Julian Reschke wrote: >> If browser implementers want to try something new that will not affect the old code paths, supporting the encoding defined in RFC 5987 might be the right thing to do (yes, it's ugly, but it's unambiguous). > > It seems to me like that is a potential solution that could be evaluated. It would be nice to have both the HTTP response header and the POST form encoding be the same. However, a critical question is if the server software that parses the form headers would do the "right thing" if it sees both an ASCII fallback filename= and an escaped filename*= parameter in the Content-Disposition header. Without looking at any code, I suspect some will and some won't. I'm pretty sure everybody will ignore filename* for now. Which means servers need to upgrade, but at least it would be an upgrade that doesn't break any existing behavior. > My conclusion: I would be willing to help with bugs, testing, test cases, looking at server code, etc related to this issue. However, I believe someone who is experienced with the technology and politics of web standards to really champion any change because I don't fully understand the processes or the issues. If I don't hear anything in a few days, I'll try filing some additional bugs with Webkit, Firefox, and the HTML5 spec and otherwise give up. > ... Sounds like a plan. Best regards, Julian |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn Tue, 01 May 2012 18:12:36 -0700, Evan Jones <evanj@...> wrote:
> … this seems contradictory: Encode using RFC 2388, but do not using the > encoding suggested by the RFC. Worse, no browser actually follows the > RFC (e.g. they all use UTF-8 encoded parameter values), so that doesn't > seem like the right answer. Is there a way out of this mess? Yes. I think we should define multipart/form-data directly in HTML and thereby obsolete http://tools.ietf.org/html/rfc2388 as it is outdated and not maintained. -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: multipart/form-data filename encoding: unicode and special characters On May 3, 2012, at 17:09 , Anne van Kesteren wrote:
> Yes. I think we should define multipart/form-data directly in HTML and thereby obsolete http://tools.ietf.org/html/rfc2388 as it is outdated and not maintained. Right; that would be ideal. Despite the fact that HTML5 references that RFC, browsers don't really follow it. I would be interested in trying to help with this, but again I would certainly need some guidance from people who know more about the vagaries of how the various browsers encode their form parameters / uploaded file names, and why things got that way. It probably would not be helpful for me to try to draft an update to the spec without getting the right implementers on board. Evan -- http://evanjones.ca/ |
|
|
|
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn Thu, 3 May 2012, Evan Jones wrote:
> On May 3, 2012, at 17:09 , Anne van Kesteren wrote: > > > > Yes. I think we should define multipart/form-data directly in HTML and > > thereby obsolete http://tools.ietf.org/html/rfc2388 as it is outdated > > and not maintained. > > Right; that would be ideal. Despite the fact that HTML5 references that > RFC, browsers don't really follow it. > > I would be interested in trying to help with this, but again I would > certainly need some guidance from people who know more about the > vagaries of how the various browsers encode their form parameters / > uploaded file names, and why things got that way. It probably would not > be helpful for me to try to draft an update to the spec without getting > the right implementers on board. If this is still something for which you have some time available, then the starting point for anything like this would be test cases, lots and lots of test cases. In this case, it would have to be something like a server that echoes the precise bytes sent by the client, for a huge variety of different setups: - various submission encodings - various form field names and types - various file submission filenames ...etc. I'd be happy to advise if this is something that still interests you. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn 2012-07-09 23:01, Ian Hickson wrote:
> On Thu, 3 May 2012, Evan Jones wrote: >> On May 3, 2012, at 17:09 , Anne van Kesteren wrote: >>> >>> Yes. I think we should define multipart/form-data directly in HTML and >>> thereby obsolete http://tools.ietf.org/html/rfc2388 as it is outdated >>> and not maintained. >> >> Right; that would be ideal. Despite the fact that HTML5 references that >> RFC, browsers don't really follow it. >> >> I would be interested in trying to help with this, but again I would >> certainly need some guidance from people who know more about the >> vagaries of how the various browsers encode their form parameters / >> uploaded file names, and why things got that way. It probably would not >> be helpful for me to try to draft an update to the spec without getting >> the right implementers on board. > > If this is still something for which you have some time available, then > the starting point for anything like this would be test cases, lots and > lots of test cases. In this case, it would have to be something like a > server that echoes the precise bytes sent by the client, for a huge > variety of different setups: > > - various submission encodings > - various form field names and types > - various file submission filenames > > ...etc. > > I'd be happy to advise if this is something that still interests you. I agree with the methodology. However I would suggest to simply revise RFC 2388. Best regards, Julian |
|
|
Re: multipart/form-data filename encoding: unicode and special charactersOn Mon, 9 Jul 2012, Julian Reschke wrote:
> > I agree with the methodology. However I would suggest to simply revise > RFC 2388. The precise details of the process of how it's done are up to whoever writes the spec text, they're not really relevant to this list. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' |
| Free embeddable forum powered by Nabble | Forum Help |