stability of content type sniffing algorithm? contentTypeOverride-24 / issue-24

View: New views
2 Messages — Rating Filter:   Alert me  

stability of content type sniffing algorithm? contentTypeOverride-24 / issue-24

by Dan Connolly :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I recently gave the mime-sniff a somewhat closer look,
including these two paragraphs, which looked familiar:

[[
   This document describes a mime sniffing algorithm that carefully
   balances the compatibility needs of browser vendors with the security
   constraints.  The algorithm has been constructed with reference to
   mime sniffing algorithms present in popular Web browsers, an
   extensive database of Web content, and metrics collected from
   implementations deployed to a sizable number of Web users.

   Warning!  It is imperative that the algorithm in this document be
   followed exactly.  When a user agent uses different heuristics for
   content type detection than the server expects, security problems can
   occur.  For example, if a server believes that the client will treat
   a contributed file as an image (and thus treat it as benign), but a
   Web browser believes the content to be HTML (and thus execute any
   scripts contained therein), the end user can be exposed to malicious
   content, making the user vulnerable to cookie theft attacks and other
   cross-site scripting attacks.
]]
 -- http://ietfreport.isoc.org/idref/draft-abarth-mime-sniff/

I had an uneasiness about them that I wasn't sure how to articulate,
but then I just read this:

-------- Forwarded Message --------
http://lists.w3.org/Archives/Public/public-html/2009May/0524.html
> From: Sam Ruby <rubys@...>
> To: Anne van Kesteren <annevk@...>
> Cc: Maciej Stachowiak <mjs@...>, Roy T. Fielding
> <fielding@...>, Larry Masinter <masinter@...>, HTML WG
> <public-html@...>
> Subject: Re: HTML interpreter vs. HTML user agent
> Date: Thu, 28 May 2009 09:41:36 -0400
[...]

> The actual observed behavior of user agents designed to (primarily)
> process content of a certain media type (either in general, or in the
> specific context) is to make every effort to parse the content according
> to those rules, and only if such rules fail to produce meaningful
> results will they investigate alternatives.
>
> Browsers will first attempt to process content as HTML.
> FeedReaders will first attempt to process content as a feed.
> Media plays will first attempt to process content as media.
>
> Browsers, when chasing an image tag, will make different assumptions
> than when presented with a raw uri from the chrome.
>
> All are equally "right" or "wrong".
>
> None of this is meant to imply that the behavior that is being settled
> upon by browser manufacturers isn't worth specifying or standardizing.
>
> - Sam Ruby

Is there any reason to believe that the next sort of content
to hit the web won't disrupt things much like java .jar files
and RSS/Atom feeds and mp3/wma media?

I think it's worthwhile to update our finding on authoritative
metadata* to acknowledge draft-abarth-mime-sniff and the practice
it represents... but I'm struggling to figure out exactly
what to say.

 * http://www.w3.org/2001/tag/doc/mime-respect-20060412

It's pretty clear to me that people will take the shortest path
to their target, and that usually doesn't involve editing
the .htaccess file when they test their RSS file with their
RSS readers. It's not until the RSS reader gets integrated
into the web browser that the HTTP client's presumption
is that it's getting a feed goes away (and even then,
not completely).


--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E



Re: stability of content type sniffing algorithm? contentTypeOverride-24 / issue-24

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I must say I have the same concern; the next Great New Format is going  
to place the exact same pressure on browser vendors as RSS, Atom and  
the rest have.

I don't think it's valid to read this document with the expectation  
that it will be the *last* content-sniffing spec.



On 29/05/2009, at 2:34 AM, Dan Connolly wrote:

> I recently gave the mime-sniff a somewhat closer look,
> including these two paragraphs, which looked familiar:
>
> [[
>   This document describes a mime sniffing algorithm that carefully
>   balances the compatibility needs of browser vendors with the  
> security
>   constraints.  The algorithm has been constructed with reference to
>   mime sniffing algorithms present in popular Web browsers, an
>   extensive database of Web content, and metrics collected from
>   implementations deployed to a sizable number of Web users.
>
>   Warning!  It is imperative that the algorithm in this document be
>   followed exactly.  When a user agent uses different heuristics for
>   content type detection than the server expects, security problems  
> can
>   occur.  For example, if a server believes that the client will treat
>   a contributed file as an image (and thus treat it as benign), but a
>   Web browser believes the content to be HTML (and thus execute any
>   scripts contained therein), the end user can be exposed to malicious
>   content, making the user vulnerable to cookie theft attacks and  
> other
>   cross-site scripting attacks.
> ]]
> -- http://ietfreport.isoc.org/idref/draft-abarth-mime-sniff/
>
> I had an uneasiness about them that I wasn't sure how to articulate,
> but then I just read this:
>
> -------- Forwarded Message --------
> http://lists.w3.org/Archives/Public/public-html/2009May/0524.html
>> From: Sam Ruby <rubys@...>
>> To: Anne van Kesteren <annevk@...>
>> Cc: Maciej Stachowiak <mjs@...>, Roy T. Fielding
>> <fielding@...>, Larry Masinter <masinter@...>, HTML WG
>> <public-html@...>
>> Subject: Re: HTML interpreter vs. HTML user agent
>> Date: Thu, 28 May 2009 09:41:36 -0400
> [...]
>> The actual observed behavior of user agents designed to (primarily)
>> process content of a certain media type (either in general, or in the
>> specific context) is to make every effort to parse the content  
>> according
>> to those rules, and only if such rules fail to produce meaningful
>> results will they investigate alternatives.
>>
>> Browsers will first attempt to process content as HTML.
>> FeedReaders will first attempt to process content as a feed.
>> Media plays will first attempt to process content as media.
>>
>> Browsers, when chasing an image tag, will make different assumptions
>> than when presented with a raw uri from the chrome.
>>
>> All are equally "right" or "wrong".
>>
>> None of this is meant to imply that the behavior that is being  
>> settled
>> upon by browser manufacturers isn't worth specifying or  
>> standardizing.
>>
>> - Sam Ruby
>
> Is there any reason to believe that the next sort of content
> to hit the web won't disrupt things much like java .jar files
> and RSS/Atom feeds and mp3/wma media?
>
> I think it's worthwhile to update our finding on authoritative
> metadata* to acknowledge draft-abarth-mime-sniff and the practice
> it represents... but I'm struggling to figure out exactly
> what to say.
>
> * http://www.w3.org/2001/tag/doc/mime-respect-20060412
>
> It's pretty clear to me that people will take the shortest path
> to their target, and that usually doesn't involve editing
> the .htaccess file when they test their RSS file with their
> RSS readers. It's not until the RSS reader gets integrated
> into the web browser that the HTTP client's presumption
> is that it's getting a feed goes away (and even then,
> not completely).
>
>
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
>
>


--
Mark Nottingham     http://www.mnot.net/