Content-MD5 and partial responses

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Content-MD5 and partial responses

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

After a quick look, my reading is that a Content-MD5 header on a  
partial response reflects the bytes in that message, rather than the  
whole (non-partial) response:

> The entity-header field "Content-MD5", as defined in [RFC1864], is an
> MD5 digest of the entity-body for the purpose of providing an end-to-
> end message integrity check (MIC) of the entity-body.  (Note: a MIC
> is good for detecting accidental modification of the entity-body in
> transit, but is not proof against malicious attacks.)
>
>   Content-MD5   = "Content-MD5" ":" OWS Content-MD5-v
>   Content-MD5-v = <base64 of 128 bit MD5 digest as per [RFC1864]>
>
> The Content-MD5 header field MAY be generated by an origin server or
> client to function as an integrity check of the entity-body.  Only
> origin servers or clients MAY generate the Content-MD5 header field;
> proxies and gateways MUST NOT generate it, as this would defeat its
> value as an end-to-end integrity check.  Any recipient of the entity-
> body, including gateways and proxies, MAY check that the digest value
> in this header field matches that of the entity-body as received.
>
> The MD5 digest is computed based on the content of the entity-body,
> including any content-coding that has been applied, but not including
> any transfer-encoding applied to the message-body.  If the message is
> received with a transfer-encoding, that encoding MUST be removed
> prior to checking the Content-MD5 value against the received entity.

Also, note that a multipart message is allowed to have C-MD5 on  
individual parts;
> The entity-body for composite types MAY contain many body-parts,  
> each with its own MIME and HTTP headers (including Content-MD5,  
> Content-Transfer-Encoding, and Content-Encoding headers).

For a multipart/byteranges response, this only helps really if they  
apply to the individual parts...

However, I'm wondering what a cache should do when combining partial  
responses that include Content-MD5. This doesn't seem to be addressed  
in 2616, nor in p5 or p6.

It looks like there are two options here;

a) C-MD5 applies to the bytes in the entity-body (as above), and  
therefore we need to specify what a cache does with it when it  
combines partial responses (throw it away?).

b) C-MD5 applies to the *full* response body, avoiding the combination  
issues, and allowing clients to do a MIC of the full response  
(assuming they have it), but removing the ability to do a MIC on a  
partial response on its own.

Anybody aware of C-MD5 being used with partial responses in the wild  
(I'm looking at you, Adobe)?

Cheers,

--
Mark Nottingham     http://www.mnot.net/



Re: Content-MD5 and partial responses

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RFC3230 (n.b., - standards track) seems to agree:
> HTTP/1.1 defines a Content-MD5 header that allows a server to  
> include a digest of the response body. However, this is specifically  
> defined to cover the body of the actual message, not the contents of  
> the full file (which might be quite different, if the response is a  
> Content- Range, or uses a delta encoding).

Now <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/178>.




On 29/06/2009, at 12:00 PM, Mark Nottingham wrote:

> After a quick look, my reading is that a Content-MD5 header on a  
> partial response reflects the bytes in that message, rather than the  
> whole (non-partial) response:
>
>> The entity-header field "Content-MD5", as defined in [RFC1864], is an
>> MD5 digest of the entity-body for the purpose of providing an end-to-
>> end message integrity check (MIC) of the entity-body.  (Note: a MIC
>> is good for detecting accidental modification of the entity-body in
>> transit, but is not proof against malicious attacks.)
>>
>>  Content-MD5   = "Content-MD5" ":" OWS Content-MD5-v
>>  Content-MD5-v = <base64 of 128 bit MD5 digest as per [RFC1864]>
>>
>> The Content-MD5 header field MAY be generated by an origin server or
>> client to function as an integrity check of the entity-body.  Only
>> origin servers or clients MAY generate the Content-MD5 header field;
>> proxies and gateways MUST NOT generate it, as this would defeat its
>> value as an end-to-end integrity check.  Any recipient of the entity-
>> body, including gateways and proxies, MAY check that the digest value
>> in this header field matches that of the entity-body as received.
>>
>> The MD5 digest is computed based on the content of the entity-body,
>> including any content-coding that has been applied, but not including
>> any transfer-encoding applied to the message-body.  If the message is
>> received with a transfer-encoding, that encoding MUST be removed
>> prior to checking the Content-MD5 value against the received entity.
>
> Also, note that a multipart message is allowed to have C-MD5 on  
> individual parts;
>> The entity-body for composite types MAY contain many body-parts,  
>> each with its own MIME and HTTP headers (including Content-MD5,  
>> Content-Transfer-Encoding, and Content-Encoding headers).
>
> For a multipart/byteranges response, this only helps really if they  
> apply to the individual parts...
>
> However, I'm wondering what a cache should do when combining partial  
> responses that include Content-MD5. This doesn't seem to be  
> addressed in 2616, nor in p5 or p6.
>
> It looks like there are two options here;
>
> a) C-MD5 applies to the bytes in the entity-body (as above), and  
> therefore we need to specify what a cache does with it when it  
> combines partial responses (throw it away?).
>
> b) C-MD5 applies to the *full* response body, avoiding the  
> combination issues, and allowing clients to do a MIC of the full  
> response (assuming they have it), but removing the ability to do a  
> MIC on a partial response on its own.
>
> Anybody aware of C-MD5 being used with partial responses in the wild  
> (I'm looking at you, Adobe)?
>
> Cheers,
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>


--
Mark Nottingham     http://www.mnot.net/



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 29 Jun 2009, Mark Nottingham wrote:

> a) C-MD5 applies to the bytes in the entity-body (as above), and therefore we
> need to specify what a cache does with it when it combines partial responses
> (throw it away?).
>
> b) C-MD5 applies to the *full* response body, avoiding the combination
> issues, and allowing clients to do a MIC of the full response (assuming they
> have it), but removing the ability to do a MIC on a partial response on its
> own.
>
> Anybody aware of C-MD5 being used with partial responses in the wild (I'm
> looking at you, Adobe)?

A while back, I implemented option a) as it seemed to be the most logical
interpretation of the spec.

HEAD /Distrib/jigsaw_2.2.6.zip HTTP/1.1
Host: jigsaw.w3.org

HTTP/1.1 200 OK
Content-Length: 9331520
Content-Md5: fBhlh9ttr14YAqe45Yi+xg==

------

GET /Distrib/jigsaw_2.2.6.zip HTTP/1.1
Host: jigsaw.w3.org
Range: bytes=0-1

HTTP/1.1 206 Partial Content
Content-Length: 2
Content-Md5: 1xvdIsi7k7jSh9zm9GrtJQ==
Content-Range: bytes 0-1/9331520

Caches should "throw away" the md5 (after verification of the partial body
received, and it is up to the cache to recompute the md5 sum of the bytes
served.

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 30 Jun 2009, Yves Lafon wrote:

> Caches should "throw away" the md5 (after verification of the partial body
> received, and it is up to the cache to recompute the md5 sum of the bytes
> served.

(Well, no at worst it should ask it to the origin server and match it with
the stored bytes because of the MUST NOT relative to Content-MD5
generation by proxies).

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Adrian Chadd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 30, 2009, Yves Lafon wrote:

> >a) C-MD5 applies to the bytes in the entity-body (as above), and therefore
> >we need to specify what a cache does with it when it combines partial
> >responses (throw it away?).
> >
> >b) C-MD5 applies to the *full* response body, avoiding the combination
> >issues, and allowing clients to do a MIC of the full response (assuming
> >they have it), but removing the ability to do a MIC on a partial response
> >on its own.
> >
> >Anybody aware of C-MD5 being used with partial responses in the wild (I'm
> >looking at you, Adobe)?
>
> A while back, I implemented option a) as it seemed to be the most logical
> interpretation of the spec.

Have you tested how well this particular assumption scales for intermediary
proxies and origin servers working on large objects?

> Caches should "throw away" the md5 (after verification of the partial body
> received, and it is up to the cache to recompute the md5 sum of the bytes
> served.

This requires the origin and cache to slurp in the required reply data off
disk before calculating the C-MD5 header. For small objects this is fine
but for large objects it could mean a -lot- of doubled up disk IO.

C-MD5 would make sense in this situation (as a message integrity check)
if it were a trailer header rather than an initial reply header.

How does this compare to the use of E-Tags, for example, with full and
range responses? Why would you do something different?

2c,


Adrian



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 29 Jun 2009, Mark Nottingham wrote:

> However, I'm wondering what a cache should do when combining partial
> responses that include Content-MD5. This doesn't seem to be addressed in
> 2616, nor in p5 or p6.

If a cache is aware of partial responses, then we assume that it knows
about p5 and p6, so it is safe to assume that any rule in the prose will
be implemented (as opposed to a proxy/cache handling headers without
knowing their semantic).
So the best course of action would be to add in p5 section 4
<<
    If either requirement is not met, the cache MUST use only the most
    recent partial response (based on the Date values transmitted with
    every response, and using the incoming response if these values are
    equal or missing), and MUST discard the other partial information.
>>
to
<<
    If either requirement is not met, the cache MUST use only the most
    recent partial response (based on the Date values transmitted with
    every response, and using the incoming response if these values are
    equal or missing), and MUST discard the other partial information.

    If Content-MD5 is present in partial responses, it MUST be removed
    in the combined response.
>>

However, it applies also to Content-Length, so should we explicitely state
how the combination process work ?


--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

mån 2009-06-29 klockan 12:00 +1000 skrev Mark Nottingham:
> After a quick look, my reading is that a Content-MD5 header on a  
> partial response reflects the bytes in that message, rather than the  
> whole (non-partial) response:

RFC2616 can apparently be read both ways depending on which parts of the
specs you read, which is a bit of a problem for Content-MD5.

My reading is that Content-MD5 is computed on the variant and not the
message-body. The reasoning behind this are:

      * 206 is talked about to only contain ranges of the entity-body
        (which btw conflicts with the general messaging format
        definition of entity-body making 206 a special case).p4 4.
        Combining Ranges
      * How partial responses including their headers may be combined.
        p4 4. Combining Ranges
      * It being an Entity-Header. p3 5.8 Content-MD5
      * That sending Entity-Headers is forbidden in an conditional 206
        response (MUST/SHOULD NOT) and required to be included in
        unconditional 206 responses if it would have been sent in an 200
        response.
      *
      *



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 23 Jul 2009, Henrik Nordstrom wrote:

> mån 2009-06-29 klockan 12:00 +1000 skrev Mark Nottingham:
>> After a quick look, my reading is that a Content-MD5 header on a
>> partial response reflects the bytes in that message, rather than the
>> whole (non-partial) response:
>
> RFC2616 can apparently be read both ways depending on which parts of the
> specs you read, which is a bit of a problem for Content-MD5.
>
> My reading is that Content-MD5 is computed on the variant and not the
> message-body. The reasoning behind this are:
>
>      * 206 is talked about to only contain ranges of the entity-body
>        (which btw conflicts with the general messaging format
>        definition of entity-body making 206 a special case).p4 4.
>        Combining Ranges
206 is indeed a very special case.

>      * How partial responses including their headers may be combined.
>        p4 4. Combining Ranges
Same for CL (which can be extracted form Content-Range)
There is indeed a story to be told about combining partial responses when
Content-MD5 is there (or we can forbid C-MD5 in partial responses)

>      * It being an Entity-Header. p3 5.8 Content-MD5

Well, Content-Length is also an entity header, however it applies to the
transferred bytes in case of 206.
What would be the use of C-MD5 if it applies to the whole bag of bytes
when you only get a part of it? It can't serve its purpose which is
integrity verification, so it makes far more sense if C-MD5 is applied to
the transferred bytes, like C-Length

>      * That sending Entity-Headers is forbidden in an conditional 206
>        response (MUST/SHOULD NOT) and required to be included in
>        unconditional 206 responses if it would have been sent in an 200
>        response.
>      *
>      *
>
>
>

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

fre 2009-07-24 klockan 04:18 -0400 skrev Yves Lafon:

> Well, Content-Length is also an entity header, however it applies to the
> transferred bytes in case of 206.

And is specified separately for 206 responses, referring to message-body
and not entity, ruling out any doubt.

> What would be the use of C-MD5 if it applies to the whole bag of bytes
> when you only get a part of it?

Well, Content-MD5 is often not even allowed in a 206 response
(SHOULD/MUST not include if a If-Range validator was used) which kind of
defeats the per-message idea on partial responses.

And if a validator was not used then the definition of 206 says "MUST
include all of the entity-headers that would have been returned with a
200 (OK) response to the same request." which to me says it should be
the same value as in 200 OK enabling clients to compare with other
responses to verify they all refer to the same "200 response". Yes, this
conflicts somewhat for Content-Length, but as already said the rules for
Content-Length in 206 is explicitly stated some paragraphs up in the
same section.

But it's easy to (imho wrongly) assume the per-message semantics on a
quick reading of just the definition of Content-MD5. But I can not make
the per-message semantics fit well at all when taking 206 responses into
account.

One more point on this:

      * "Only origin servers or clients MAY generate the Content-MD5
        header field; proxies and gateways MUST NOT generate it"
      * Caching proxies MAY support Range requests, turning a 200
        response into 206 partial response.
      * There is no explicit rule specifying that Content-MD5 is to be
        recalculated when making a 206 partial response from a 200
        response, other than the "copy" rule quoted above.

Similar not-per-message hints is also seen in indirectly in definition
of 304 where a careful distinction is made between message-body and
entity-body.

Regards
Henrik



Re: Content-MD5 and partial responses

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just another point of data which has come up before:

> HTTP/1.1 defines a Content-MD5 header that allows a server to  
> include a digest of the response body. However, this is specifically  
> defined to cover the body of the actual message, not the contents of  
> the full file (which might be quite different, if the response is a  
> Content- Range, or uses a delta encoding).

That's the beginning of RFC3230, which is on the standards track.


On 24/07/2009, at 8:23 PM, Henrik Nordstrom wrote:

> fre 2009-07-24 klockan 04:18 -0400 skrev Yves Lafon:
>
>> Well, Content-Length is also an entity header, however it applies  
>> to the
>> transferred bytes in case of 206.
>
> And is specified separately for 206 responses, referring to message-
> body
> and not entity, ruling out any doubt.
>
>> What would be the use of C-MD5 if it applies to the whole bag of  
>> bytes
>> when you only get a part of it?
>
> Well, Content-MD5 is often not even allowed in a 206 response
> (SHOULD/MUST not include if a If-Range validator was used) which  
> kind of
> defeats the per-message idea on partial responses.
>
> And if a validator was not used then the definition of 206 says "MUST
> include all of the entity-headers that would have been returned with a
> 200 (OK) response to the same request." which to me says it should be
> the same value as in 200 OK enabling clients to compare with other
> responses to verify they all refer to the same "200 response". Yes,  
> this
> conflicts somewhat for Content-Length, but as already said the rules  
> for
> Content-Length in 206 is explicitly stated some paragraphs up in the
> same section.
>
> But it's easy to (imho wrongly) assume the per-message semantics on a
> quick reading of just the definition of Content-MD5. But I can not  
> make
> the per-message semantics fit well at all when taking 206 responses  
> into
> account.
>
> One more point on this:
>
>      * "Only origin servers or clients MAY generate the Content-MD5
>        header field; proxies and gateways MUST NOT generate it"
>      * Caching proxies MAY support Range requests, turning a 200
>        response into 206 partial response.
>      * There is no explicit rule specifying that Content-MD5 is to be
>        recalculated when making a 206 partial response from a 200
>        response, other than the "copy" rule quoted above.
>
> Similar not-per-message hints is also seen in indirectly in definition
> of 304 where a careful distinction is made between message-body and
> entity-body.
>
> Regards
> Henrik
>


--
Mark Nottingham     http://www.mnot.net/



Re: Content-MD5 and partial responses

by Adrien de Croy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I also would have assumed that MD5 would cover the whole entity.  For
the reason that it's used as a signature on the entity.

If you get a file in N parts, you can be fairly certain they are all
parts of the same entity if the C-MD5 is the same for each.

But this doubles up with ETag.

If the MD5 were calculated on only the transferred partial body it would
need to be calculated each time a part were served.

So I think it comes down to what is the intended purpose of the header
in the first place.  My assumption would have been to cover potential
corruption in transit or detect modifications.  But obviously not secure
since any agent in the chain can recalculate it.  So in the end I feel
it's of little value which is probably why it's seldom used.

Regards

Adrien


Yves Lafon wrote:

> On Thu, 23 Jul 2009, Henrik Nordstrom wrote:
>
>> mån 2009-06-29 klockan 12:00 +1000 skrev Mark Nottingham:
>>> After a quick look, my reading is that a Content-MD5 header on a
>>> partial response reflects the bytes in that message, rather than the
>>> whole (non-partial) response:
>>
>> RFC2616 can apparently be read both ways depending on which parts of the
>> specs you read, which is a bit of a problem for Content-MD5.
>>
>> My reading is that Content-MD5 is computed on the variant and not the
>> message-body. The reasoning behind this are:
>>
>>      * 206 is talked about to only contain ranges of the entity-body
>>        (which btw conflicts with the general messaging format
>>        definition of entity-body making 206 a special case).p4 4.
>>        Combining Ranges
> 206 is indeed a very special case.
>
>>      * How partial responses including their headers may be combined.
>>        p4 4. Combining Ranges
> Same for CL (which can be extracted form Content-Range)
> There is indeed a story to be told about combining partial responses
> when Content-MD5 is there (or we can forbid C-MD5 in partial responses)
>
>>      * It being an Entity-Header. p3 5.8 Content-MD5
>
> Well, Content-Length is also an entity header, however it applies to
> the transferred bytes in case of 206.
> What would be the use of C-MD5 if it applies to the whole bag of bytes
> when you only get a part of it? It can't serve its purpose which is
> integrity verification, so it makes far more sense if C-MD5 is applied
> to the transferred bytes, like C-Length
>
>>      * That sending Entity-Headers is forbidden in an conditional 206
>>        response (MUST/SHOULD NOT) and required to be included in
>>        unconditional 206 responses if it would have been sent in an 200
>>        response.
>>      *
>>      *
>>
>>
>>
>

--
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 24 Jul 2009, Adrien de Croy wrote:

>
> I also would have assumed that MD5 would cover the whole entity.  For the
> reason that it's used as a signature on the entity.
>
> If you get a file in N parts, you can be fairly certain they are all parts of
> the same entity if the C-MD5 is the same for each.
>
> But this doubles up with ETag.
>
> If the MD5 were calculated on only the transferred partial body it would need
> to be calculated each time a part were served.
>
> So I think it comes down to what is the intended purpose of the header in the
> first place.  My assumption would have been to cover potential corruption in
> transit or detect modifications.  But obviously not secure since any agent in
> the chain can recalculate it.  So in the end I feel it's of little value
> which is probably why it's seldom used.

Another reason is also that for big content, on servers that are not
caching the computed metadata, it is a good way to almost stop a server,
so the server architecture may mandate not using it at all.


> Yves Lafon wrote:
>> On Thu, 23 Jul 2009, Henrik Nordstrom wrote:
>>
>>> mån 2009-06-29 klockan 12:00 +1000 skrev Mark Nottingham:
>>>> After a quick look, my reading is that a Content-MD5 header on a
>>>> partial response reflects the bytes in that message, rather than the
>>>> whole (non-partial) response:
>>>
>>> RFC2616 can apparently be read both ways depending on which parts of the
>>> specs you read, which is a bit of a problem for Content-MD5.
>>>
>>> My reading is that Content-MD5 is computed on the variant and not the
>>> message-body. The reasoning behind this are:
>>>
>>>      * 206 is talked about to only contain ranges of the entity-body
>>>        (which btw conflicts with the general messaging format
>>>        definition of entity-body making 206 a special case).p4 4.
>>>        Combining Ranges
>> 206 is indeed a very special case.
>>
>>>      * How partial responses including their headers may be combined.
>>>        p4 4. Combining Ranges
>> Same for CL (which can be extracted form Content-Range)
>> There is indeed a story to be told about combining partial responses when
>> Content-MD5 is there (or we can forbid C-MD5 in partial responses)
>>
>>>      * It being an Entity-Header. p3 5.8 Content-MD5
>>
>> Well, Content-Length is also an entity header, however it applies to the
>> transferred bytes in case of 206.
>> What would be the use of C-MD5 if it applies to the whole bag of bytes when
>> you only get a part of it? It can't serve its purpose which is
>> integrity verification, so it makes far more sense if C-MD5 is applied to
>> the transferred bytes, like C-Length
>>
>>>      * That sending Entity-Headers is forbidden in an conditional 206
>>>        response (MUST/SHOULD NOT) and required to be included in
>>>        unconditional 206 responses if it would have been sent in an 200
>>>        response.
>>>      *
>>>      *
>>>
>>>
>>>
>>
>
>

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Jamie Lokier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adrien de Croy wrote:
> I also would have assumed that MD5 would cover the whole entity.  For
> the reason that it's used as a signature on the entity.

I made the same assumption.

> If you get a file in N parts, you can be fairly certain they are all
> parts of the same entity if the C-MD5 is the same for each.

More importantly, you can compute the MD5 over all the N parts and
compare it against the Content-MD5 value.

That's a kind of end-to-end check.  You can't do that if Content-MD5
applies to the parts individually.

> If the MD5 were calculated on only the transferred partial body it would
> need to be calculated each time a part were served.

Or cached, if the parts have regular sizes.

> So I think it comes down to what is the intended purpose of the header
> in the first place.  My assumption would have been to cover potential
> corruption in transit or detect modifications.  But obviously not secure
> since any agent in the chain can recalculate it.  So in the end I feel
> it's of little value which is probably why it's seldom used.

I assumed it was not for security (as it can be changed in transit),
but to detect errors - simple bit corruption in transit or storage, as
well as more complex software errors.

As an end-to-end error checking device, it is better if the
calculation is closer to the origin of the data,

For example, a simple static file server, having Content-MD5 stored
along with the files, perhaps when they are uploaded with PUT, would
be better for "end-to-end" corruption checking than calculating it by
the server responding to GET.

-- Jamie


Re: Content-MD5 and partial responses

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

fre 2009-07-24 klockan 20:26 +1000 skrev Mark Nottingham:
> Just another point of data which has come up before:
>
> > HTTP/1.1 defines a Content-MD5 header that allows a server to  
> > include a digest of the response body. However, this is specifically  
> > defined to cover the body of the actual message, not the contents of  
> > the full file (which might be quite different, if the response is a  
> > Content- Range, or uses a delta encoding).
>
> That's the beginning of RFC3230, which is on the standards track.

I know, and I obviously do not share the same view of HTTP as RFC3230,
not only in this aspect, as I also tried to point out earlier.

But with the amount of damage already done to Content-MD5 I am fine with
deprecating it as historic if that is the seen as the viable solution to
this discussion, effectively removing it from HTTPbis with a mention
that there was a ambiguity in if this applied to the variant or the
message-entity (or watever to call it, before T-E) of 206 responses.

But I do not think that is needed to go that way as I would be very
surprised if any implementation could be found implementing Content-MD5
on the partial entity of a 206 response and not the corresponding 200
response. I would expect that the implementations that can be found all
implements Content-MD5 based on the corresponding 200 response.

Regards
Henrik



Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 25 Jul 2009, Henrik Nordstrom wrote:

> fre 2009-07-24 klockan 20:26 +1000 skrev Mark Nottingham:
>> Just another point of data which has come up before:
>>
>>> HTTP/1.1 defines a Content-MD5 header that allows a server to
>>> include a digest of the response body. However, this is specifically
>>> defined to cover the body of the actual message, not the contents of
>>> the full file (which might be quite different, if the response is a
>>> Content- Range, or uses a delta encoding).
>>
>> That's the beginning of RFC3230, which is on the standards track.
>
> I know, and I obviously do not share the same view of HTTP as RFC3230,
> not only in this aspect, as I also tried to point out earlier.
>
> But with the amount of damage already done to Content-MD5 I am fine with
> deprecating it as historic if that is the seen as the viable solution to
> this discussion, effectively removing it from HTTPbis with a mention
> that there was a ambiguity in if this applied to the variant or the
> message-entity (or watever to call it, before T-E) of 206 responses.
>
> But I do not think that is needed to go that way as I would be very
> surprised if any implementation could be found implementing Content-MD5
> on the partial entity of a 206 response and not the corresponding 200
> response. I would expect that the implementations that can be found all
> implements Content-MD5 based on the corresponding 200 response.

HEAD /Distrib/jigsaw_2.2.6.zip HTTP/1.1
Host: jigsaw.w3.org

HTTP/1.1 200 OK
Cache-Control: max-age=432000
Date: Sat, 25 Jul 2009 06:15:07 GMT
Content-Length: 9331520
Content-Md5: fBhlh9ttr14YAqe45Yi+xg==
Content-Type: application/zip
Etag: "1k8cb5f:127e0lb8o"
Expires: Thu, 30 Jul 2009 06:15:07 GMT
Last-Modified: Tue, 10 Apr 2007 15:09:24 GMT
Server: Jigsaw/2.3.0-beta1

-----

GET /Distrib/jigsaw_2.2.6.zip HTTP/1.1
Host: jigsaw.w3.org
Range: bytes 0-1

HTTP/1.1 206 Partial Content
Cache-Control: max-age=432000
Date: Sat, 25 Jul 2009 06:15:23 GMT
Content-Length: 2
Content-Md5: 1xvdIsi7k7jSh9zm9GrtJQ==
Content-Range: bytes 0-1/9331520
Content-Type: application/zip
Etag: "1k8cb5f:127e0lb8o"
Expires: Thu, 30 Jul 2009 06:15:23 GMT
Last-Modified: Tue, 10 Apr 2007 15:09:24 GMT
Server: Jigsaw/2.3.0-beta1

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves



Re: Content-MD5 and partial responses

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

lör 2009-07-25 klockan 02:16 -0400 skrev Yves Lafon:

> HEAD /Distrib/jigsaw_2.2.6.zip HTTP/1.1
> Host: jigsaw.w3.org
>
> HTTP/1.1 200 OK
> Content-Md5: fBhlh9ttr14YAqe45Yi+xg==
> Server: Jigsaw/2.3.0-beta1
>
> -----
>
> GET /Distrib/jigsaw_2.2.6.zip HTTP/1.1
> Host: jigsaw.w3.org
> Range: bytes 0-1
>
> HTTP/1.1 206 Partial Content
> Content-Md5: 1xvdIsi7k7jSh9zm9GrtJQ==

Crap, So then we definitely have conflicting implementations out there.

Same response via a Squid cache which does not implement any bit of
Content-MD5 calculation (and is not allowed to touch Content-MD5 by
HTTP/1.1):

GET http://jigsaw.w3.org/Distrib/jigsaw_2.2.6.zip HTTP/1.1
Range: bytes=0-1

HTTP/1.0 206 Partial Content
Date: Sat, 25 Jul 2009 14:40:35 GMT
Content-MD5: fBhlh9ttr14YAqe45Yi+xg==
Content-Type: application/zip
Expires: Thu, 30 Jul 2009 14:40:35 GMT
Last-Modified: Tue, 10 Apr 2007 15:09:24 GMT
Server: Jigsaw/2.3.0-beta1
Age: 126
X-Cache: HIT from henrik
X-Cache-Lookup: HIT from henrik:3128
Via: 1.0 henrik (squid/3.HEAD-BZR)
Proxy-Connection: close
Content-Range: bytes 0-1/9331520
Content-Length: 2




Btw, the jigsaw server also does the following:

GET /Distrib/jigsaw_2.2.6.zip HTTP/1.1
Host: jigsaw.w3.org
Range: bytes=0-1
If-Range: "1k8cb5f:127e0lb8o"

HTTP/1.1 206 Partial Content
Cache-Control: max-age=432000
Date: Sat, 25 Jul 2009 08:58:59 GMT
Content-Length: 2
Content-Md5: 1xvdIsi7k7jSh9zm9GrtJQ==
Content-Range: bytes 0-1/9331520
Content-Type: application/zip
Etag: "1k8cb5f:127e0lb8o"
Expires: Thu, 30 Jul 2009 08:58:59 GMT
Last-Modified: Tue, 10 Apr 2007 15:09:24 GMT
Server: Jigsaw/2.3.0-beta1

which contradicts the SHOULD NOT on conditional requests...



Regards
Henrik



Re: Content-MD5 and partial responses

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jun 28, 2009, at 7:00 PM, Mark Nottingham wrote:

> After a quick look, my reading is that a Content-MD5 header on a  
> partial response reflects the bytes in that message, rather than  
> the whole (non-partial) response:

That is correct.  Content-MD5 is a MIME header referring to the content
of a specific body-part (or the entire body if it is in the main  
headers).

If folks want to define a resource metadata header for a hash on
the entire representation, then I suggest one be defined or use
one of the many defined elsewhere (algorithm-independent, of course).
Just don't use the "Content-" prefix on the name.

> However, I'm wondering what a cache should do when combining  
> partial responses that include Content-MD5. This doesn't seem to be  
> addressed in 2616, nor in p5 or p6.

Toss the MD5s unless it plans to recompose them.

> It looks like there are two options here;
>
> a) C-MD5 applies to the bytes in the entity-body (as above), and  
> therefore we need to specify what a cache does with it when it  
> combines partial responses (throw it away?).
>
> b) C-MD5 applies to the *full* response body, avoiding the  
> combination issues, and allowing clients to do a MIC of the full  
> response (assuming they have it), but removing the ability to do a  
> MIC on a partial response on its own.
>
> Anybody aware of C-MD5 being used with partial responses in the  
> wild (I'm looking at you, Adobe)?

It is a MIME header field.  (a) is the definition.

....Roy



Re: Content-MD5 and partial responses

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

lör 2009-07-25 klockan 23:32 -0700 skrev Roy T.Fielding:

> > However, I'm wondering what a cache should do when combining  
> > partial responses that include Content-MD5. This doesn't seem to be  
> > addressed in 2616, nor in p5 or p6.
>
> Toss the MD5s unless it plans to recompose them.

Unfortunately the process on how to combine responses says nothing about
tossing this header. Additionally HTTP specification of Content-MD5
says:

   The Content-MD5 header field MAY be generated by an origin server or
   client to function as an integrity check of the entity-body. Only
   origin servers or clients MAY generate the Content-MD5 header field;
   proxies and gateways MUST NOT generate it, as this would defeat its

which forbids shared caches from recomposing Content-MD5.

plus the issues in how a 206 response is to be composed which I
mentioned earlier.

> It is a MIME header field.  (a) is the definition.

Then the sections on how to combine responses and 206 both needs to be
extended to emphasize this, if Content-MD5 is at all to be kept.

As already demonstrated there is significant implementations out there
handling Content-MD5 differently, simply from implementing different
parts of the spec literally:

  - jigsaw implements it on the message body, recomposing Content-MD5 on
each response, returning different Content-MD5 in 206 than 200.

  - Squid indirectly implements it on the resource by implementing the
requirements of 206 composing literally ("MUST include all of the
entity-headers that would have been returned with a 200 (OK)") and not
caring about Content-MD5 as the specs says it is not allowed to
recompose it.

Regards
Henrik



Re: Content-MD5 and partial responses

by Jamie Lokier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Nordstrom wrote:
>   - Squid indirectly implements it on the resource by implementing the
> requirements of 206 composing literally ("MUST include all of the
> entity-headers that would have been returned with a 200 (OK)") and not
> caring about Content-MD5 as the specs says it is not allowed to
> recompose it.

Indeed, if Squid simply didn't know about Content-MD5, it *couldn't*
toss it or change it.

If someone defines Content-SHA3 some day, caches will not modify or
discard that when combining or splitting partial responses - they'll
pass the header along unchanged.

-- Jamie


Re: Content-MD5 and partial responses

by Yves Lafon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 25 Jul 2009, Henrik Nordstrom wrote:

> Btw, the jigsaw server also does the following:
>
> GET /Distrib/jigsaw_2.2.6.zip HTTP/1.1
> Host: jigsaw.w3.org
> Range: bytes=0-1
> If-Range: "1k8cb5f:127e0lb8o"
>
> HTTP/1.1 206 Partial Content
> Cache-Control: max-age=432000
> Date: Sat, 25 Jul 2009 08:58:59 GMT
> Content-Length: 2
> Content-Md5: 1xvdIsi7k7jSh9zm9GrtJQ==
> Content-Range: bytes 0-1/9331520
> Content-Type: application/zip
> Etag: "1k8cb5f:127e0lb8o"
> Expires: Thu, 30 Jul 2009 08:58:59 GMT
> Last-Modified: Tue, 10 Apr 2007 15:09:24 GMT
> Server: Jigsaw/2.3.0-beta1
>
> which contradicts the SHOULD NOT on conditional requests...
>

[[
  If the 206 response is the result of an If-Range request, the
    response SHOULD NOT include other entity-headers.  Otherwise, the
    response MUST include all of the entity-headers that would have been
    returned with a 200 (OK) response to the same request.
]]
In that case, it includes all the entity-headers that would have been
returned by a 200, so it's already behaving per spec.

One thing we can do to clarify this, is in part5 3.1, after the paragraph
I quoted above:
add
[[
Headers that apply to the message-body and not to the full entity MUST be
ignored by caches.
]]

--
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves


< Prev | 1 - 2 | Next >