Multi-server HTTP

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Multi-server HTTP

by Ford, Alan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

At the IETF this week, Mark Handley and I submitted a floating-an-idea
draft on multi-server HTTP and presented it in tsvarea.

http://www.ietf.org/id/draft-ford-http-multi-server-00.txt

Slides are at: http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf

I realise Transport Area didn't capture a large number of HTTP people -
the main reason for presenting it there was our key motivation was to
improve Internet resource usage, and we have been doing other such work
(notably multipath TCP) in that area. We were also very short on
preparation time before the IETF - so apologies for missing many of you
guys.

However, we would very much like input and guidance from the HTTP
community. I am grateful to Henrik Nordstrom for suggesting we should
bring it to the HTTPbis WG, even though as an extension it is not within
the charter.

This is a brief summary of the proposal:

  * We are aiming to achieve better usage of Internet resources by
applying BitTorrent-like chunked downloading of large files from
different servers.
  * Upon connection to a Multi-Server HTTP server, when a client says
they are Multi-server capable, in the response the server will provide a
list of mirrors for that resource, a checksum for the file, and a chunk
of the file with a Content-Range header.
  * The client will then send more GET requests, this time with Range:
headers, to the original server and to zero or more of the mirror
servers, along with a verification header to ensure the checksum matches
and so the resource is the same. The client will handle the scheduling
of Range requests in order to make the most effective use of the least
loaded servers.

We realise that the draft itself is not making the best use of existing
proposals. During the presentation, Instance-Digests (RFC3230) were
mentioned which look ideal instead of X-Checksum, although we will still
need an If-Digest-Match header. Content-MD5 was also suggested but that
appears to be a checksum of just the data that is sent, not the whole
resource.

I discounted ETags along with If-Match in the proposal since RFC2616
says "Entity tags are used for comparing two or more entities from the
same requested resource" but if I have understood the terminology
correctly, in our proposal we are fetching chunks from different
resources (even though the content should be the same). Indeed the RFC
also says, "The use of the same entity tag value in conjunction with
entities obtained by requests on different URIs does not imply the
equivalence of those entities." Please correct me if I'm wrong!

There is also a question of whether we could make further extensions,
specifically:

  * Wildcarded mirror lists (e.g. a server that mirrors all /a/*.jpg).
  * Checksums could be provided for file chunks allowing broken chunks
to be re-fetched.
  * Servers could store multiple versions of the file indexed by
checksum.
  * Initial servers could send no, or very little, data itself, and
purely act as a load balancer; or redirect immediately when it's
overloaded.

These may change the mechanism quite considerably, however (e.g. with
wildcards, no longer would you be getting all checksums from the same
server; and for verification checksum chunks need to be pre-determined
and calculated).

We believe that the extension as it stands can bring significant benefit
to HTTP, making much more efficient use of Internet resources.
Experiments have been conducted that suggest it has no negative impact
in every scenario in which it was tested.

Looking forward to your comments and advice!

Regards,
Alan

------------------------------------------------------------------------
Alan Ford

Tel: +44 (0)1794 833465
Fax: +44 (0)1794 833433
alan.ford@...


--
Roke Manor Research Ltd, Romsey,
Hampshire, SO51 0ZN, United Kingdom

A Siemens company
Registered in England & Wales at:
Siemens plc, Faraday House, Sir William Siemens Square,
Frimley, Camberley, GU16 8QD. Registered No: 267550
------------------------------------------------------------------------
Visit our website at www.roke.co.uk
------------------------------------------------------------------------
The information contained in this e-mail and any attachments is
proprietary to Roke Manor Research Ltd and must not be passed to any
third party without permission. This communication is for information
only and shall not create or change any contractual relationship.
------------------------------------------------------------------------

Please consider the environment before printing this email



Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alan and Mark,

There unfortunately hasn't been much discussion of this yet, at least  
on the list. Has there been progress elsewhere?

For my part, this looks like interesting work. If I understand it  
correctly, it's entirely application-layer (or at least able to be  
implemented within the application layer), so if you want to, I think  
it's entirely appropriate to discuss it on this list.

Also, have you made contact with the folks doing Metalink <http://www.metalinker.org/ 
 >? They have deployed implementations, and it's my understanding that  
they're looking at revising the spec now, so it may an excellent time  
to collaborate.

Personally, I'd like to see the end result able to use the same URL  
for multi-server downloads and "traditional" single-server downloads;  
i.e., it should be transparent to clients.

Cheers,


On 31/07/2009, at 9:59 PM, Ford, Alan wrote:

> Hi all,
>
> At the IETF this week, Mark Handley and I submitted a floating-an-idea
> draft on multi-server HTTP and presented it in tsvarea.
>
> http://www.ietf.org/id/draft-ford-http-multi-server-00.txt
>
> Slides are at: http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf
>
> I realise Transport Area didn't capture a large number of HTTP  
> people -
> the main reason for presenting it there was our key motivation was to
> improve Internet resource usage, and we have been doing other such  
> work
> (notably multipath TCP) in that area. We were also very short on
> preparation time before the IETF - so apologies for missing many of  
> you
> guys.
>
> However, we would very much like input and guidance from the HTTP
> community. I am grateful to Henrik Nordstrom for suggesting we should
> bring it to the HTTPbis WG, even though as an extension it is not  
> within
> the charter.
>
> This is a brief summary of the proposal:
>
>  * We are aiming to achieve better usage of Internet resources by
> applying BitTorrent-like chunked downloading of large files from
> different servers.
>  * Upon connection to a Multi-Server HTTP server, when a client says
> they are Multi-server capable, in the response the server will  
> provide a
> list of mirrors for that resource, a checksum for the file, and a  
> chunk
> of the file with a Content-Range header.
>  * The client will then send more GET requests, this time with Range:
> headers, to the original server and to zero or more of the mirror
> servers, along with a verification header to ensure the checksum  
> matches
> and so the resource is the same. The client will handle the scheduling
> of Range requests in order to make the most effective use of the least
> loaded servers.
>
> We realise that the draft itself is not making the best use of  
> existing
> proposals. During the presentation, Instance-Digests (RFC3230) were
> mentioned which look ideal instead of X-Checksum, although we will  
> still
> need an If-Digest-Match header. Content-MD5 was also suggested but  
> that
> appears to be a checksum of just the data that is sent, not the whole
> resource.
>
> I discounted ETags along with If-Match in the proposal since RFC2616
> says "Entity tags are used for comparing two or more entities from the
> same requested resource" but if I have understood the terminology
> correctly, in our proposal we are fetching chunks from different
> resources (even though the content should be the same). Indeed the RFC
> also says, "The use of the same entity tag value in conjunction with
> entities obtained by requests on different URIs does not imply the
> equivalence of those entities." Please correct me if I'm wrong!
>
> There is also a question of whether we could make further extensions,
> specifically:
>
>  * Wildcarded mirror lists (e.g. a server that mirrors all /a/*.jpg).
>  * Checksums could be provided for file chunks allowing broken chunks
> to be re-fetched.
>  * Servers could store multiple versions of the file indexed by
> checksum.
>  * Initial servers could send no, or very little, data itself, and
> purely act as a load balancer; or redirect immediately when it's
> overloaded.
>
> These may change the mechanism quite considerably, however (e.g. with
> wildcards, no longer would you be getting all checksums from the same
> server; and for verification checksum chunks need to be pre-determined
> and calculated).
>
> We believe that the extension as it stands can bring significant  
> benefit
> to HTTP, making much more efficient use of Internet resources.
> Experiments have been conducted that suggest it has no negative impact
> in every scenario in which it was tested.
>
> Looking forward to your comments and advice!
>
> Regards,
> Alan
>
> ------------------------------------------------------------------------
> Alan Ford
>
> Tel: +44 (0)1794 833465
> Fax: +44 (0)1794 833433
> alan.ford@...
>
>
> --
> Roke Manor Research Ltd, Romsey,
> Hampshire, SO51 0ZN, United Kingdom
>
> A Siemens company
> Registered in England & Wales at:
> Siemens plc, Faraday House, Sir William Siemens Square,
> Frimley, Camberley, GU16 8QD. Registered No: 267550
> ------------------------------------------------------------------------
> Visit our website at www.roke.co.uk
> ------------------------------------------------------------------------
> The information contained in this e-mail and any attachments is
> proprietary to Roke Manor Research Ltd and must not be passed to any
> third party without permission. This communication is for information
> only and shall not create or change any contractual relationship.
> ------------------------------------------------------------------------
>
> Please consider the environment before printing this email
>
>


--
Mark Nottingham     http://www.mnot.net/



RE: Multi-server HTTP

by Ford, Alan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mark, all,

Thanks for your response. Unfortunately we have not received any further
feedback on this, which is a same since we'd really like to know if
there is interest in trying to move this forward.

I have (admittedly only briefly) looked at metalink. It seems to cover
some of what we need (list of mirrors, pieces, checksumming) but seems
mostly to be concerned with finding a single appropriate source rather
than downloading from multiple HTTP servers. This seems to mostly be a
client rather than a spec choice, however. Nevertheless, one of the
disadvantages of metalink, from our point of view, is that it is an
overhead. This is negligible for large files, but one of our (longer
term) use cases is for mirrors of a whole site allowing e.g. a set of
images to be downloaded from different servers. As such, there is a
moderate delay before a download would start since first the metalink
must be downloaded, then decisions made, then new downloads started.

In our case, the download starts immediately, just as in standard HTTP,
and the client can take over the requesting of various parts when it is
ready, so there is no delay introduced by metadata handshaking.

Our solution is indeed designed to operate on the same URLs as  It seems
that it is feasible for metalink to also be done transparently (by the
client declaring "Accept: application/metalink+xml" as I understand it).

So, folks... any more thoughts? :)

Regards,
Alan

> -----Original Message-----
> From: Mark Nottingham [mailto:mnot@...]
> Sent: 25 August 2009 07:14
> To: Ford, Alan
> Cc: ietf-http-wg@...; Mark Handley
> Subject: Re: Multi-server HTTP
>
> Alan and Mark,
>
> There unfortunately hasn't been much discussion of this yet, at least
> on the list. Has there been progress elsewhere?
>
> For my part, this looks like interesting work. If I understand it
> correctly, it's entirely application-layer (or at least able to be
> implemented within the application layer), so if you want to, I think
> it's entirely appropriate to discuss it on this list.
>
> Also, have you made contact with the folks doing Metalink
> <http://www.metalinker.org/
>  >? They have deployed implementations, and it's my understanding that
> they're looking at revising the spec now, so it may an excellent time
> to collaborate.
>
> Personally, I'd like to see the end result able to use the same URL
> for multi-server downloads and "traditional" single-server downloads;
> i.e., it should be transparent to clients.
>
> Cheers,
>
>
> On 31/07/2009, at 9:59 PM, Ford, Alan wrote:
>
> > Hi all,
> >
> > At the IETF this week, Mark Handley and I submitted a
floating-an-idea
> > draft on multi-server HTTP and presented it in tsvarea.
> >
> > http://www.ietf.org/id/draft-ford-http-multi-server-00.txt
> >
> > Slides are at:
http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf
> >
> > I realise Transport Area didn't capture a large number of HTTP
> > people -
> > the main reason for presenting it there was our key motivation was
to
> > improve Internet resource usage, and we have been doing other such
> > work
> > (notably multipath TCP) in that area. We were also very short on
> > preparation time before the IETF - so apologies for missing many of
> > you
> > guys.
> >
> > However, we would very much like input and guidance from the HTTP
> > community. I am grateful to Henrik Nordstrom for suggesting we
should

> > bring it to the HTTPbis WG, even though as an extension it is not
> > within
> > the charter.
> >
> > This is a brief summary of the proposal:
> >
> >  * We are aiming to achieve better usage of Internet resources by
> > applying BitTorrent-like chunked downloading of large files from
> > different servers.
> >  * Upon connection to a Multi-Server HTTP server, when a client says
> > they are Multi-server capable, in the response the server will
> > provide a
> > list of mirrors for that resource, a checksum for the file, and a
> > chunk
> > of the file with a Content-Range header.
> >  * The client will then send more GET requests, this time with
Range:
> > headers, to the original server and to zero or more of the mirror
> > servers, along with a verification header to ensure the checksum
> > matches
> > and so the resource is the same. The client will handle the
scheduling
> > of Range requests in order to make the most effective use of the
least

> > loaded servers.
> >
> > We realise that the draft itself is not making the best use of
> > existing
> > proposals. During the presentation, Instance-Digests (RFC3230) were
> > mentioned which look ideal instead of X-Checksum, although we will
> > still
> > need an If-Digest-Match header. Content-MD5 was also suggested but
> > that
> > appears to be a checksum of just the data that is sent, not the
whole
> > resource.
> >
> > I discounted ETags along with If-Match in the proposal since RFC2616
> > says "Entity tags are used for comparing two or more entities from
the
> > same requested resource" but if I have understood the terminology
> > correctly, in our proposal we are fetching chunks from different
> > resources (even though the content should be the same). Indeed the
RFC
> > also says, "The use of the same entity tag value in conjunction with
> > entities obtained by requests on different URIs does not imply the
> > equivalence of those entities." Please correct me if I'm wrong!
> >
> > There is also a question of whether we could make further
extensions,
> > specifically:
> >
> >  * Wildcarded mirror lists (e.g. a server that mirrors all
/a/*.jpg).
> >  * Checksums could be provided for file chunks allowing broken
chunks
> > to be re-fetched.
> >  * Servers could store multiple versions of the file indexed by
> > checksum.
> >  * Initial servers could send no, or very little, data itself, and
> > purely act as a load balancer; or redirect immediately when it's
> > overloaded.
> >
> > These may change the mechanism quite considerably, however (e.g.
with
> > wildcards, no longer would you be getting all checksums from the
same
> > server; and for verification checksum chunks need to be
pre-determined
> > and calculated).
> >
> > We believe that the extension as it stands can bring significant
> > benefit
> > to HTTP, making much more efficient use of Internet resources.
> > Experiments have been conducted that suggest it has no negative
impact
> > in every scenario in which it was tested.
> >
> > Looking forward to your comments and advice!
> >
> > Regards,
> > Alan
> >
> >
------------------------------------------------------------------------

> > Alan Ford
> >
> > Tel: +44 (0)1794 833465
> > Fax: +44 (0)1794 833433
> > alan.ford@...
> >
> >
> > --
> > Roke Manor Research Ltd, Romsey,
> > Hampshire, SO51 0ZN, United Kingdom
> >
> > A Siemens company
> > Registered in England & Wales at:
> > Siemens plc, Faraday House, Sir William Siemens Square,
> > Frimley, Camberley, GU16 8QD. Registered No: 267550
> >
------------------------------------------------------------------------
> > Visit our website at www.roke.co.uk
> >
------------------------------------------------------------------------
> > The information contained in this e-mail and any attachments is
> > proprietary to Roke Manor Research Ltd and must not be passed to any
> > third party without permission. This communication is for
information
> > only and shall not create or change any contractual relationship.
> >
------------------------------------------------------------------------
> >
> > Please consider the environment before printing this email
> >
> >
>
>
> --
> Mark Nottingham     http://www.mnot.net/


--
Roke Manor Research Ltd, Romsey,
Hampshire, SO51 0ZN, United Kingdom

A Siemens company
Registered in England & Wales at:
Siemens plc, Faraday House, Sir William Siemens Square,
Frimley, Camberley, GU16 8QD. Registered No: 267550
------------------------------------------------------------------------
Visit our website at www.roke.co.uk
------------------------------------------------------------------------
The information contained in this e-mail and any attachments is
proprietary to Roke Manor Research Ltd and must not be passed to any
third party without permission. This communication is for information
only and shall not create or change any contractual relationship.
------------------------------------------------------------------------

Please consider the environment before printing this email



Parent Message unknown RE: Multi-server HTTP

by Ford, Alan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well that would certainly be a great solution for many use cases. If
there was a distributed set of virtual hosts of a given server, I can
see that working quite well. Plus, this would solve the ETag issue that
I mention (assuming I have understood correctly - nobody has yet
corrected me!), since in this case the client /is/ requesting the same
resource from each IP address.

There are two extra issues that spring to mind that our solution
handles, however, that this does not:

It permits the use of any mirroring service (e.g. mirror.ac.uk,
sourceforge) where a mirror of the server is not a dedicated virtual
host, but is a subdirectory of a larger mirror, or indeed is just a
completely different server.

Also, in the case you outline, how would the client know what Range: to
request to begin with, since it does not know the size of the resource
at that stage? I realise our proposal is not hugely elegant in that
sense either, but it seemed the best way to get the connection going (by
initially delegating a range choice to the server, which of course knows
the resource size).

Regards,
Alan

> -----Original Message-----
> From: Robert Siemer [mailto:Robert.Siemer-http@...]
> Sent: 25 August 2009 07:53
> To: Mark Nottingham
> Cc: Ford, Alan; ietf-http-wg@...; Mark Handley
> Subject: Re: Multi-server HTTP
>
> It there anything big different from setting up a bunch of servers
> behind a DNS round-robin and changing the HTTP-client to go for all
IPs?
>
> The client could wait for the response headers from the first server
and
> spawn more connections with range requests using ETag and URL.
>
> Basically everything in HTTP already, apart from some
clarifications...
>
>
> Robert
>
> On Tue, 2009-08-25 at 16:13 +1000, Mark Nottingham wrote:
> > Alan and Mark,
> >
> > There unfortunately hasn't been much discussion of this yet, at
least
> > on the list. Has there been progress elsewhere?
> >
> > For my part, this looks like interesting work. If I understand it
> > correctly, it's entirely application-layer (or at least able to be
> > implemented within the application layer), so if you want to, I
think
> > it's entirely appropriate to discuss it on this list.
> >
> > Also, have you made contact with the folks doing Metalink
> <http://www.metalinker.org/
> >  >? They have deployed implementations, and it's my understanding
that
> > they're looking at revising the spec now, so it may an excellent
time
> > to collaborate.
> >
> > Personally, I'd like to see the end result able to use the same URL
> > for multi-server downloads and "traditional" single-server
downloads;

> > i.e., it should be transparent to clients.
> >
> > Cheers,
> >
> >
> > On 31/07/2009, at 9:59 PM, Ford, Alan wrote:
> >
> > > Hi all,
> > >
> > > At the IETF this week, Mark Handley and I submitted a
floating-an-idea
> > > draft on multi-server HTTP and presented it in tsvarea.
> > >
> > > http://www.ietf.org/id/draft-ford-http-multi-server-00.txt
> > >
> > > Slides are at:
http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf
> > >
> > > I realise Transport Area didn't capture a large number of HTTP
> > > people -
> > > the main reason for presenting it there was our key motivation was
to
> > > improve Internet resource usage, and we have been doing other such
> > > work
> > > (notably multipath TCP) in that area. We were also very short on
> > > preparation time before the IETF - so apologies for missing many
of
> > > you
> > > guys.
> > >
> > > However, we would very much like input and guidance from the HTTP
> > > community. I am grateful to Henrik Nordstrom for suggesting we
should

> > > bring it to the HTTPbis WG, even though as an extension it is not
> > > within
> > > the charter.
> > >
> > > This is a brief summary of the proposal:
> > >
> > >  * We are aiming to achieve better usage of Internet resources by
> > > applying BitTorrent-like chunked downloading of large files from
> > > different servers.
> > >  * Upon connection to a Multi-Server HTTP server, when a client
says
> > > they are Multi-server capable, in the response the server will
> > > provide a
> > > list of mirrors for that resource, a checksum for the file, and a
> > > chunk
> > > of the file with a Content-Range header.
> > >  * The client will then send more GET requests, this time with
Range:
> > > headers, to the original server and to zero or more of the mirror
> > > servers, along with a verification header to ensure the checksum
> > > matches
> > > and so the resource is the same. The client will handle the
scheduling
> > > of Range requests in order to make the most effective use of the
least
> > > loaded servers.
> > >
> > > We realise that the draft itself is not making the best use of
> > > existing
> > > proposals. During the presentation, Instance-Digests (RFC3230)
were
> > > mentioned which look ideal instead of X-Checksum, although we will
> > > still
> > > need an If-Digest-Match header. Content-MD5 was also suggested but
> > > that
> > > appears to be a checksum of just the data that is sent, not the
whole
> > > resource.
> > >
> > > I discounted ETags along with If-Match in the proposal since
RFC2616
> > > says "Entity tags are used for comparing two or more entities from
the
> > > same requested resource" but if I have understood the terminology
> > > correctly, in our proposal we are fetching chunks from different
> > > resources (even though the content should be the same). Indeed the
RFC
> > > also says, "The use of the same entity tag value in conjunction
with
> > > entities obtained by requests on different URIs does not imply the
> > > equivalence of those entities." Please correct me if I'm wrong!
> > >
> > > There is also a question of whether we could make further
extensions,
> > > specifically:
> > >
> > >  * Wildcarded mirror lists (e.g. a server that mirrors all
/a/*.jpg).
> > >  * Checksums could be provided for file chunks allowing broken
chunks
> > > to be re-fetched.
> > >  * Servers could store multiple versions of the file indexed by
> > > checksum.
> > >  * Initial servers could send no, or very little, data itself, and
> > > purely act as a load balancer; or redirect immediately when it's
> > > overloaded.
> > >
> > > These may change the mechanism quite considerably, however (e.g.
with
> > > wildcards, no longer would you be getting all checksums from the
same
> > > server; and for verification checksum chunks need to be
pre-determined
> > > and calculated).
> > >
> > > We believe that the extension as it stands can bring significant
> > > benefit
> > > to HTTP, making much more efficient use of Internet resources.
> > > Experiments have been conducted that suggest it has no negative
impact
> > > in every scenario in which it was tested.
> > >
> > > Looking forward to your comments and advice!
> > >
> > > Regards,
> > > Alan
> > >
> > >
------------------------------------------------------------------------

> > > Alan Ford
> > >
> > > Tel: +44 (0)1794 833465
> > > Fax: +44 (0)1794 833433
> > > alan.ford@...
> > >
> > >
> > > --
> > > Roke Manor Research Ltd, Romsey,
> > > Hampshire, SO51 0ZN, United Kingdom
> > >
> > > A Siemens company
> > > Registered in England & Wales at:
> > > Siemens plc, Faraday House, Sir William Siemens Square,
> > > Frimley, Camberley, GU16 8QD. Registered No: 267550
> > >
------------------------------------------------------------------------
> > > Visit our website at www.roke.co.uk
> > >
------------------------------------------------------------------------
> > > The information contained in this e-mail and any attachments is
> > > proprietary to Roke Manor Research Ltd and must not be passed to
any
> > > third party without permission. This communication is for
information
> > > only and shall not create or change any contractual relationship.
> > >
------------------------------------------------------------------------

> > >
> > > Please consider the environment before printing this email
> > >
> > >
> >
> >
> > --
> > Mark Nottingham     http://www.mnot.net/
> >
> >
>


--
Roke Manor Research Ltd, Romsey,
Hampshire, SO51 0ZN, United Kingdom

A Siemens company
Registered in England & Wales at:
Siemens plc, Faraday House, Sir William Siemens Square,
Frimley, Camberley, GU16 8QD. Registered No: 267550
------------------------------------------------------------------------
Visit our website at www.roke.co.uk
------------------------------------------------------------------------
The information contained in this e-mail and any attachments is
proprietary to Roke Manor Research Ltd and must not be passed to any
third party without permission. This communication is for information
only and shall not create or change any contractual relationship.
------------------------------------------------------------------------

Please consider the environment before printing this email



Re: Multi-server HTTP

by Anthony Bryan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Aug 25, 2009 at 5:24 AM, Ford, Alan<alan.ford@...> wrote:
> Hi Mark, all,
>
> I have (admittedly only briefly) looked at metalink. It seems to cover
> some of what we need (list of mirrors, pieces, checksumming) but seems
> mostly to be concerned with finding a single appropriate source rather
> than downloading from multiple HTTP servers. This seems to mostly be a
> client rather than a spec choice, however. Nevertheless, one of the

This wasn't really a spec choice, more inadequacy of explaining of
what metalink offers in the abstract and introduction of our ID. :)
Looking at our ID, it doesn't really spell out what we've solved in
the past 4 years to those unfamiliar with metalink. Our ID is focused
more on the format, not on what the client does with it.

All but a few of the 30 some metalink clients support downloading from
multiple HTTP servers. That is, clients aren't required to support
multi-source downloads.
But, most metalink clients are download managers / accelerators. I
think using mirrors for fallback / failover is just as important
though.
See http://en.wikipedia.org/wiki/Metalink or (our embarrassing)
http://www.metalinker.org/implementation.html

Your excellent introduction put ours to shame, so I've tried to update ours:

All the information about a download, including mirrors, checksums,
digital signatures, and more can be stored in a machine-readable
Metalink file. This Metalink file transfers the knowledge of the
download server (and mirror database) to the client. Clients can
fallback to alternate mirrors if the current one has an issue. With
this knowledge, the client is enabled to work its way to a successful
download even under adverse circumstances. All this is done
transparently to the user and the download is much more reliable and
efficient. In contrast, a traditional HTTP redirect to a mirror
conveys only extremely minimal information - one link to one server,
and there is no provision in the HTTP protocol to handle failures.
Other features that some clients provide include multi-source
downloads, where chunks of a file are downloaded from multiple mirrors
(and optionally, Peer-to-Peer) simultaneously, which frequently
results in a faster download. Metalinks also provide structured
information about downloads that can be indexed by search engines.

http://tools.ietf.org/html/draft-bryan-metalink#section-1

I should note though that metalink requires no changes to a server. A
user can create a metalink.

> disadvantages of metalink, from our point of view, is that it is an
> overhead. This is negligible for large files, but one of our (longer
> term) use cases is for mirrors of a whole site allowing e.g. a set of
> images to be downloaded from different servers. As such, there is a
> moderate delay before a download would start since first the metalink
> must be downloaded, then decisions made, then new downloads started.

We could add this if people want. No one had requested it.

> In our case, the download starts immediately, just as in standard HTTP,
> and the client can take over the requesting of various parts when it is
> ready, so there is no delay introduced by metadata handshaking.

Downloads with metalink start immediately as well.

> Our solution is indeed designed to operate on the same URLs as  It seems
> that it is feasible for metalink to also be done transparently (by the
> client declaring "Accept: application/metalink+xml" as I understand it).

Yes, we've been experimentally using transparent content negotiation,
which we have since learned is bad. :)

We'll be using Mark's Link header in the future.

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


>> -----Original Message-----
>> From: Mark Nottingham [mailto:mnot@...]
>> Sent: 25 August 2009 07:14
>> To: Ford, Alan
>> Cc: ietf-http-wg@...; Mark Handley
>> Subject: Re: Multi-server HTTP
>>
>> Alan and Mark,
>>
>> There unfortunately hasn't been much discussion of this yet, at least
>> on the list. Has there been progress elsewhere?
>>
>> For my part, this looks like interesting work. If I understand it
>> correctly, it's entirely application-layer (or at least able to be
>> implemented within the application layer), so if you want to, I think
>> it's entirely appropriate to discuss it on this list.
>>
>> Also, have you made contact with the folks doing Metalink
>> <http://www.metalinker.org/
>>  >? They have deployed implementations, and it's my understanding that
>> they're looking at revising the spec now, so it may an excellent time
>> to collaborate.
>>
>> Personally, I'd like to see the end result able to use the same URL
>> for multi-server downloads and "traditional" single-server downloads;
>> i.e., it should be transparent to clients.
>>
>> Cheers,
>>
>>
>> On 31/07/2009, at 9:59 PM, Ford, Alan wrote:
>>
>> > Hi all,
>> >
>> > At the IETF this week, Mark Handley and I submitted a
> floating-an-idea
>> > draft on multi-server HTTP and presented it in tsvarea.
>> >
>> > http://www.ietf.org/id/draft-ford-http-multi-server-00.txt
>> >
>> > Slides are at:
> http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf
>> >
>> > I realise Transport Area didn't capture a large number of HTTP
>> > people -
>> > the main reason for presenting it there was our key motivation was
> to
>> > improve Internet resource usage, and we have been doing other such
>> > work
>> > (notably multipath TCP) in that area. We were also very short on
>> > preparation time before the IETF - so apologies for missing many of
>> > you
>> > guys.
>> >
>> > However, we would very much like input and guidance from the HTTP
>> > community. I am grateful to Henrik Nordstrom for suggesting we
> should
>> > bring it to the HTTPbis WG, even though as an extension it is not
>> > within
>> > the charter.
>> >
>> > This is a brief summary of the proposal:
>> >
>> >  * We are aiming to achieve better usage of Internet resources by
>> > applying BitTorrent-like chunked downloading of large files from
>> > different servers.
>> >  * Upon connection to a Multi-Server HTTP server, when a client says
>> > they are Multi-server capable, in the response the server will
>> > provide a
>> > list of mirrors for that resource, a checksum for the file, and a
>> > chunk
>> > of the file with a Content-Range header.
>> >  * The client will then send more GET requests, this time with
> Range:
>> > headers, to the original server and to zero or more of the mirror
>> > servers, along with a verification header to ensure the checksum
>> > matches
>> > and so the resource is the same. The client will handle the
> scheduling
>> > of Range requests in order to make the most effective use of the
> least
>> > loaded servers.
>> >
>> > We realise that the draft itself is not making the best use of
>> > existing
>> > proposals. During the presentation, Instance-Digests (RFC3230) were
>> > mentioned which look ideal instead of X-Checksum, although we will
>> > still
>> > need an If-Digest-Match header. Content-MD5 was also suggested but
>> > that
>> > appears to be a checksum of just the data that is sent, not the
> whole
>> > resource.
>> >
>> > I discounted ETags along with If-Match in the proposal since RFC2616
>> > says "Entity tags are used for comparing two or more entities from
> the
>> > same requested resource" but if I have understood the terminology
>> > correctly, in our proposal we are fetching chunks from different
>> > resources (even though the content should be the same). Indeed the
> RFC
>> > also says, "The use of the same entity tag value in conjunction with
>> > entities obtained by requests on different URIs does not imply the
>> > equivalence of those entities." Please correct me if I'm wrong!
>> >
>> > There is also a question of whether we could make further
> extensions,
>> > specifically:
>> >
>> >  * Wildcarded mirror lists (e.g. a server that mirrors all
> /a/*.jpg).
>> >  * Checksums could be provided for file chunks allowing broken
> chunks
>> > to be re-fetched.
>> >  * Servers could store multiple versions of the file indexed by
>> > checksum.
>> >  * Initial servers could send no, or very little, data itself, and
>> > purely act as a load balancer; or redirect immediately when it's
>> > overloaded.
>> >
>> > These may change the mechanism quite considerably, however (e.g.
> with
>> > wildcards, no longer would you be getting all checksums from the
> same
>> > server; and for verification checksum chunks need to be
> pre-determined
>> > and calculated).
>> >
>> > We believe that the extension as it stands can bring significant
>> > benefit
>> > to HTTP, making much more efficient use of Internet resources.
>> > Experiments have been conducted that suggest it has no negative
> impact
>> > in every scenario in which it was tested.
>> >
>> > Looking forward to your comments and advice!
>> >
>> > Regards,
>> > Alan


RE: Multi-server HTTP

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tis 2009-08-25 klockan 10:24 +0100 skrev Ford, Alan:
> Well that would certainly be a great solution for many use cases. If
> there was a distributed set of virtual hosts of a given server, I can
> see that working quite well. Plus, this would solve the ETag issue that
> I mention (assuming I have understood correctly - nobody has yet
> corrected me!), since in this case the client /is/ requesting the same
> resource from each IP address.

ETag is per URI, but it is entirely fine for a specification like this
to require that the participating servers use the same ETag for the same
object version, for example based on an hash of the object data. How
servers compose ETag is outside of HTTP specification and a property of
the server implementation, only requirement HTTP places is uniqueness
among versions or variants of the same URI. The base HTTP specifications
places no direct requirements on how ETag from different URIs relate to
each other, but do hint that for objects having multiple URIs where
those URIs are equal it's expected the ETag would also be the same.

However, many server implementations available today do not easily allow
this on the wide scale you require without additions on the server, as
they base ETag on other metadata that may differ between mirrors of the
same object such as local file timestamp, filesystem inode numbers etc,
not really taking the actual content of the object into consideration.

Regards
Henrik



RE: Multi-server HTTP

by Ford, Alan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Henrik, all,

Thanks for the clarification, so it seems we could in theory define the
ETag for this specification to ensure it matches across servers. That
would remove the need for all except the Mirrors: header, and possibly
Multiserver-Version (so that the server knows it's talking to a
multiserver-capable client and thus the ETag is defined this way). If we
didn't mind a small delay, we could probably do away with that too and
say the client could infer capability by getting a Mirrors: header back
from a HEAD request first, and then deciding what to do (assuming the
connection can be kept alive).

Which brings me onto another thing about Mirrors: header. One of our
longer-term goals with this would be to somehow provide wildcarded lists
of mirrors, so that a client could immediately run off and fetch bits of
a website from many mirrors, potentially speeding up loading time
considerably, and providing an alternative method of load balancing.

However, I'm struggling to see a neat way of doing this reliably, since
we couldn't get checksums for every file on the first handshake (or if
all content was static we might be able to, but it's a big overhead).
Does anybody have any ideas as to a neat way of doing this? Best I can
think of so far is some sort of version number/(pseudo)hash of the
entire directory structure!

Regards,
Alan

> -----Original Message-----
> From: Henrik Nordstrom [mailto:henrik@...]
> Sent: 26 August 2009 21:30
> To: Ford, Alan
> Cc: Robert Siemer; Mark Nottingham; ietf-http-wg@...; Mark Handley
> Subject: RE: Multi-server HTTP
>
> tis 2009-08-25 klockan 10:24 +0100 skrev Ford, Alan:
> > Well that would certainly be a great solution for many use cases. If
> > there was a distributed set of virtual hosts of a given server, I
can
> > see that working quite well. Plus, this would solve the ETag issue
that
> > I mention (assuming I have understood correctly - nobody has yet
> > corrected me!), since in this case the client /is/ requesting the
same
> > resource from each IP address.
>
> ETag is per URI, but it is entirely fine for a specification like this
> to require that the participating servers use the same ETag for the
same
> object version, for example based on an hash of the object data. How
> servers compose ETag is outside of HTTP specification and a property
of
> the server implementation, only requirement HTTP places is uniqueness
> among versions or variants of the same URI. The base HTTP
specifications
> places no direct requirements on how ETag from different URIs relate
to
> each other, but do hint that for objects having multiple URIs where
> those URIs are equal it's expected the ETag would also be the same.
>
> However, many server implementations available today do not easily
allow
> this on the wide scale you require without additions on the server, as
> they base ETag on other metadata that may differ between mirrors of
the
> same object such as local file timestamp, filesystem inode numbers
etc,
> not really taking the actual content of the object into consideration.
>
> Regards
> Henrik


--
Roke Manor Research Ltd, Romsey,
Hampshire, SO51 0ZN, United Kingdom

A Siemens company
Registered in England & Wales at:
Siemens plc, Faraday House, Sir William Siemens Square,
Frimley, Camberley, GU16 8QD. Registered No: 267550
------------------------------------------------------------------------
Visit our website at www.roke.co.uk
------------------------------------------------------------------------
The information contained in this e-mail and any attachments is
proprietary to Roke Manor Research Ltd and must not be passed to any
third party without permission. This communication is for information
only and shall not create or change any contractual relationship.
------------------------------------------------------------------------

Please consider the environment before printing this email



RE: Multi-server HTTP

by Daniel Stenberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 28 Aug 2009, Ford, Alan wrote:

> the client could infer capability by getting a Mirrors: header back from a
> HEAD request first, and then deciding what to do (assuming the connection
> can be kept alive).

That would work even if the connection isn't kept alive, wouldn't it?

> Which brings me onto another thing about Mirrors: header. One of our
> longer-term goals with this would be to somehow provide wildcarded lists of
> mirrors, so that a client could immediately run off and fetch bits of a
> website from many mirrors, potentially speeding up loading time
> considerably, and providing an alternative method of load balancing.
>
> However, I'm struggling to see a neat way of doing this reliably, since we
> couldn't get checksums for every file on the first handshake (or if all
> content was static we might be able to, but it's a big overhead). Does
> anybody have any ideas as to a neat way of doing this? Best I can think of
> so far is some sort of version number/(pseudo)hash of the entire directory
> structure!

This idea is attractive methinks, but coming up with a fine protocol for it is
really tricky.

A hash of the entire directory would be problematic, I think, since it would
imply that both directory structures need to remain identical - not only hold
the right files and no extra files.

I'm thinking like: you have two sites A and B, they show one picture each
A.jpg and B.jpg. Both sites refer to a mirror that holds BOTH those images in
the same directory. It could work fine, but the mirror's dir doesn't look the
same as the dir of A nor B. That concept would break too easily I think.

We want to avoid doing requests to non-existing resources on the mirror that'd
respond with a 404 back (which then would have to retried to the master site
or another mirror) - we need a decent way for a client to know which URIs it
can try to get from a mirror instead of the master...

I think all this make me favour not a wildcard concept, but more a
list-concept where a site can list not only that "this object also exist HERE
and HERE" but then also "THESE OTHER OBJECTS also exist HERE and HERE" and
"THESE OTHER" would then be a list of (relative?) URIs somehow. But this
becomes awkward if the list of items is long.

Then we come to the concept of changing items. How long can a client assume
that the mirrors have the corresponding object? Would they need some kind of
cache control headers to specify that? In the mirror-for-a-single-object case
I think we can assume that the mirror will have the object for at least a very
short while after the response said so but then it too gets this problem.

--

  / daniel.haxx.se


RE: Multi-server HTTP

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan:

> Multiserver-Version (so that the server knows it's talking to a
> multiserver-capable client and thus the ETag is defined this way).

Not needed. It's sufficient the server announces the support. In fact
strongly recommended it always announces it or you'll run into some
hairy issues with caching..

> Which brings me onto another thing about Mirrors: header. One of our
> longer-term goals with this would be to somehow provide wildcarded lists
> of mirrors, so that a client could immediately run off and fetch bits of
> a website from many mirrors, potentially speeding up loading time
> considerably, and providing an alternative method of load balancing.

That should imho be in a profile which you reference from a header, i.e.
by using the Link header referring to a mirror profile.

> However, I'm struggling to see a neat way of doing this reliably, since
> we couldn't get checksums for every file on the first handshake (or if
> all content was static we might be able to, but it's a big overhead).

Right.. so the client need to pick one known server (perhaps "at
random") as the master server for any given request, giving the needed
object metadata, based on whatever prior knowledge it has about the
mirror setup.

> Does anybody have any ideas as to a neat way of doing this? Best I can
> think of so far is some sort of version number/(pseudo)hash of the
> entire directory structure!

A such hash isn't useful unless you retrieve the complete structure,
which most often is not what you want to do.

Imho what you can provide in the mirror profile is just the URL patterns
where content may be found. Hashes etc have to be resolved per object
when fetched.

Additionally the list of mirrors can be fairly large, making it
unsuitable to be sent in HTTP headers. Consider for example a site with
hundreds of mirrors which is not unrealistic (even the little Squid
project have in the range of 70 registered and verified mirrors).

So I would recommend the following slightly different approach to your
problem.

* Define a new Mirror profile object, similar to MetaLink but defining
the mirror URL policy for groups of URLs on the server, without going
into checksums etc (HTTP will give those).

* Instance-Digest header returning the object checksum

* HTTP addendum that servers participating in this mirror scheme should
all share the same ETag policy, i.e. base it on the file contents and
not server-unique filesystem metadata..


1. First request for a mirrored URL. Plain GET requests, perhaps with a
Range limit (not required). Client discovers the mirror profile link in
the header, and maybe a MetaLink relation as well (the two happily
coexists). From this response the client learns the following metadata
about the requested object, in addition also starting to receive the
object:

    * ETag
    * Instance-Digest
    * Mirror profile link.
    * Object size
    * Recovery profile link

2. If the object is large and gets delivered slower than expected then
the client fetches the mirror profile, and then starts a number of
parallel ranged downloads (one per selected mirror server other than the
first) using If-Match conditions based on the ETag to quickly detect
out-of-date mirrors. If no Range limit was given in the original request
then work from the tail of the object (the first is still running and
will eventually catch up), otherwise continue after the range requested
in the first request.

2b. If a server rejects the If-Match condition then something is fishy.
If the metadata came from the master server or the master server has
already acknowledged the validity by accepting an If-Match condition
then ignore those other servers rejecting If-Match. If the master server
has not yet been queried then pick the master server as fallback for the
first failed range. If the master server rejects the If-Match then
restart the download from the beginning using the master server for the
initial range.


3. If the first request was not Range limited then abort it by closing
the connection when it catches up with the other parallel downloads of
the same object.

3. On the next requested URL the mirror profile of the server is already
known, and the client can pick the server that seems fastest for the
initial request, where it will learn the required object-specific
metadata (ETag, Size, Instance-Digest, Recovery profile link).


4. If the object checksum does not match the instance-digest then fetch
the recovery profile link, where partial checksums etc can be found
allowing detection of which server returned bad information.



In this approach all servers providing the mirror service SHOULD use the
same ETag and preferably also provide an Instance-Digest checksum. It's
possible to specify this property of the available servers per server in
the mirror profile however, and the modification for servers not sharing
the same ETag is that If-Match won't be used for those servers. This
slightly increases the risk of a failed transfer, requiring recovery
after the download is supposed to be complete.. And at least one of the
selected servers need to provide Instance-Digest to be able to detect
corrupted transfers.

I.e. it's in most cases sufficient that the master server provides
mirror profile and instance-digest information, but operation will be
more robust and efficient if the mirror servers do implement a common
ETag and preferably Instance-Digest as well. In fact the emitted ETag
may be implemented as the same as the instance digest for simplicity,
but there is no need to specify how ETag generated, just that it needs
to be shared among the mirror servers.

Regards
Henrik



Re: Multi-server HTTP

by Anthony Bryan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Aug 28, 2009 at 11:27 AM, Henrik
Nordstrom<henrik@...> wrote:
> fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan:
>
> * Define a new Mirror profile object, similar to MetaLink but defining
> the mirror URL policy for groups of URLs on the server, without going
> into checksums etc (HTTP will give those).
>
> * Instance-Digest header returning the object checksum

My connection has been down for a few days but here are my very rough
ideas on doing Metalink in HTTP headers with the Link header, Instance
Digests, and perhaps Content-MD5.

Briefly, it's:

   Link: <http://www2.example.com/example.ext>; rel="alternate";
   Link: <ftp://ftp.example.com/example.ext>; rel="alternate";
   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="torrent";
   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature";
   Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=

http://www.ietf.org/id/draft-bryan-metalinkhttp-00.txt

I've been meaning to ask Mark Nottingham if "alternate" from Link
header fits what we are using it for, to mean identical, duplicate
copy, etc?

   The Link Relation Type registry's initial contents are:

   o  Relation Name: alternate
   o  Description: Designates a substitute for the link's context.
   o  Reference: [W3C.REC-html401-19991224]


--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


RE: Multi-server HTTP

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

fre 2009-08-28 klockan 15:45 +0200 skrev Daniel Stenberg:

> I think all this make me favour not a wildcard concept, but more a
> list-concept where a site can list not only that "this object also exist HERE
> and HERE" but then also "THESE OTHER OBJECTS also exist HERE and HERE" and
> "THESE OTHER" would then be a list of (relative?) URIs somehow. But this
> becomes awkward if the list of items is long.

I am in favor of wildcards and similar patterns. It's a one-way mapping,
mapping original URL to possible mirrors, not the other way around.

> Then we come to the concept of changing items. How long can a client assume
> that the mirrors have the corresponding object? Would they need some kind of
> cache control headers to specify that?

In the mirror-for-a-single-object based on response headers the mirror
better keep the object for as long as we tell the object fresh
(Cache-Control: max-age etc). HTTP does not define different freshness
for headers and the rest of the object, only on a response as a whole.
Adding new cache-controls for these headers is pretty useless as the
rest of the HTTP infrastructure (caches/proxies) will continue to use
the normal HTTP freshness definitions.

>  In the mirror-for-a-single-object case
> I think we can assume that the mirror will have the object for at least a very
> short while after the response said so but then it too gets this problem.

In my experience the problem in most mirror setups is the reverse, that
the mirror hasn't yet got the object or more troublesome to deal with
that the mirror has not yet updated of an existing object that got
changed.. The first is a rather trivial error condition to deal with, no
worse than when a mirror site isn't reachable (actually easier). The
latter is worse and will cause a bad download unless there is reasonable
means that protect from it (i.e. a common ETag and/or a hash Digest).

Regards
Henrik



Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't think so; 'alternate' doesn't specify for what purpose it's an  
alternate, and you need a very precise definition (byte-for-byte  
equivalence of representations). 'alternate' is often used to mean  
"here's a copy in another format" and similar.

Perhaps you should mint 'duplicate'...

Cheers,


On 29/08/2009, at 4:01 AM, Anthony Bryan wrote:

> On Fri, Aug 28, 2009 at 11:27 AM, Henrik
> Nordstrom<henrik@...> wrote:
>> fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan:
>>
>> * Define a new Mirror profile object, similar to MetaLink but  
>> defining
>> the mirror URL policy for groups of URLs on the server, without going
>> into checksums etc (HTTP will give those).
>>
>> * Instance-Digest header returning the object checksum
>
> My connection has been down for a few days but here are my very rough
> ideas on doing Metalink in HTTP headers with the Link header, Instance
> Digests, and perhaps Content-MD5.
>
> Briefly, it's:
>
>   Link: <http://www2.example.com/example.ext>; rel="alternate";
>   Link: <ftp://ftp.example.com/example.ext>; rel="alternate";
>   Link: <http://example.com/example.ext.torrent>; rel="describedby";
>   type="torrent";
>   Link: <http://example.com/example.ext.asc>; rel="describedby";
>   type="application/pgp-signature";
>   Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
>
> http://www.ietf.org/id/draft-bryan-metalinkhttp-00.txt
>
> I've been meaning to ask Mark Nottingham if "alternate" from Link
> header fits what we are using it for, to mean identical, duplicate
> copy, etc?
>
>   The Link Relation Type registry's initial contents are:
>
>   o  Relation Name: alternate
>   o  Description: Designates a substitute for the link's context.
>   o  Reference: [W3C.REC-html401-19991224]
>
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>  )) Easier, More Reliable, Self Healing Downloads


--
Mark Nottingham     http://www.mnot.net/



Re: Multi-server HTTP

by Anthony Bryan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 31, 2009 at 3:39 AM, Mark Nottingham<mnot@...> wrote:

> I don't think so; 'alternate' doesn't specify for what purpose it's an
> alternate, and you need a very precise definition (byte-for-byte equivalence
> of representations). 'alternate' is often used to mean "here's a copy in
> another format" and similar.
>
> Perhaps you should mint 'duplicate'...

Ok, this is what I have in the ID now:

Link Relation Type Registration: "duplicate"

o Relation Name: duplicate
o Description: Refers to an identical resource that is a byte-for-byte
equivalence of representations.
o Reference: This specification.

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

That's a good start, but it deserves a bit of discussion.

"byte-for-byte" implies that the bodes are the same, but what about  
things like:

* Entity headers (e.g., Content-Type)
* Available content-encodings
* Whether partial content is supported
* Whether the same set of methods are supported (e.g., if A is a  
duplicate of B, will POSTing something to either have the same effect  
as on the other?)

I think the answer is that entity headers should generally be the  
same, so the real question is whether we're talking about the relation  
describing:

a) resources with duplicate representations (i.e., a GET on any of the  
dups will return the same reps)
b) duplicate resources (i.e., any method will have the same effect)

If it's (b), we should consider whether the resources are in fact the  
same "behind the curtains" (e.g., POSTing to A has the exact same  
effect on the world as POSTing to B), or whether they may be in fact  
separate systems (i.e., A and B have the same "interface", but POSTing  
to A may affect a different part of the world to B).

Just food for thought...



On 01/09/2009, at 6:03 AM, Anthony Bryan wrote:

> On Mon, Aug 31, 2009 at 3:39 AM, Mark Nottingham<mnot@...> wrote:
>
>> I don't think so; 'alternate' doesn't specify for what purpose it's  
>> an
>> alternate, and you need a very precise definition (byte-for-byte  
>> equivalence
>> of representations). 'alternate' is often used to mean "here's a  
>> copy in
>> another format" and similar.
>>
>> Perhaps you should mint 'duplicate'...
>
> Ok, this is what I have in the ID now:
>
> Link Relation Type Registration: "duplicate"
>
> o Relation Name: duplicate
> o Description: Refers to an identical resource that is a byte-for-byte
> equivalence of representations.
> o Reference: This specification.
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>  )) Easier, More Reliable, Self Healing Downloads


--
Mark Nottingham     http://www.mnot.net/



Re: Multi-server HTTP

by Nicolas Alvarez-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Nottingham wrote:

> That's a good start, but it deserves a bit of discussion.
>
> "byte-for-byte" implies that the bodes are the same, but what about
> things like:
>
> * Entity headers (e.g., Content-Type)
> * Available content-encodings
> * Whether partial content is supported
> * Whether the same set of methods are supported (e.g., if A is a
> duplicate of B, will POSTing something to either have the same effect
> as on the other?)
>
> I think the answer is that entity headers should generally be the
> same, so the real question is whether we're talking about the relation
> describing:
>
> a) resources with duplicate representations (i.e., a GET on any of the
> dups will return the same reps)
> b) duplicate resources (i.e., any method will have the same effect)
>
> If it's (b), we should consider whether the resources are in fact the
> same "behind the curtains" (e.g., POSTing to A has the exact same
> effect on the world as POSTing to B), or whether they may be in fact
> separate systems (i.e., A and B have the same "interface", but POSTing
> to A may affect a different part of the world to B).

Well, we're talking about static GETable resources with a single
representation. But I agree that if you make a Link relation, you'd want it
to be applicable to as many HTTP resources as possible... Or is it
possible / reasonable to say "this relation doesn't make sense for dynamic
or POSTable resources and shouldn't be used for those"?




Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Totally; we just need to be crisp about it.

My inclination would be that if we can be more inclusive without  
making it significantly more complex or risky, we should; otherwise,  
just do what's needed.

Cheers,


On 01/09/2009, at 1:49 PM, Nicolas Alvarez wrote:

> Mark Nottingham wrote:
>> That's a good start, but it deserves a bit of discussion.
>>
>> "byte-for-byte" implies that the bodes are the same, but what about
>> things like:
>>
>> * Entity headers (e.g., Content-Type)
>> * Available content-encodings
>> * Whether partial content is supported
>> * Whether the same set of methods are supported (e.g., if A is a
>> duplicate of B, will POSTing something to either have the same effect
>> as on the other?)
>>
>> I think the answer is that entity headers should generally be the
>> same, so the real question is whether we're talking about the  
>> relation
>> describing:
>>
>> a) resources with duplicate representations (i.e., a GET on any of  
>> the
>> dups will return the same reps)
>> b) duplicate resources (i.e., any method will have the same effect)
>>
>> If it's (b), we should consider whether the resources are in fact the
>> same "behind the curtains" (e.g., POSTing to A has the exact same
>> effect on the world as POSTing to B), or whether they may be in fact
>> separate systems (i.e., A and B have the same "interface", but  
>> POSTing
>> to A may affect a different part of the world to B).
>
> Well, we're talking about static GETable resources with a single
> representation. But I agree that if you make a Link relation, you'd  
> want it
> to be applicable to as many HTTP resources as possible... Or is it
> possible / reasonable to say "this relation doesn't make sense for  
> dynamic
> or POSTable resources and shouldn't be used for those"?
>
>
>


--
Mark Nottingham     http://www.mnot.net/



Re: Multi-server HTTP

by Anthony Bryan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's what I have now. More inclusive is good but I think someone
else would be better at writing it than me.

http://tools.ietf.org/html/draft-bryan-metalinkhttp

Link Relation Type Registration: "duplicate"

   o Relation Name: duplicate
   o Description: Refers to an identical resource that is a
byte-for-byte equivalence of representations.
   o Reference: This specification.
   o Notes: This relation is for static resources.  That is, an HTTP
GET request on any duplicate will return the same representation.  It
does not make sense for dynamic or POSTable resources and should not
   be used for them.

And here's the introduction (Content-MD5 is now mentioned):

   MetaLinkHeader is an alternative to Metalink, usually represented in
   an XML-based document format [draft-bryan-metalink].  MetaLinkHeader
   attempts to provide as much functionality as the Metalink XML format
   by using existing standards such as Web Linking
   [draft-nottingham-http-link-header], Instance Digests in HTTP
   [RFC3230], and Content-MD5 [RFC1864].  MetaLinkHeader is used to list
   information about a file to be downloaded.  This includes lists of
   multiple URIs (mirrors), Peer-to-Peer information, checksums, and
   digital signatures.

Here's what it looks like:

   Link: <http://www2.example.com/example.ext>; rel="duplicate";
   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="torrent";
   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature";
   Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=

And more description:

   Metalink servers are HTTP servers that MUST have lists of mirrors and
   use the Link header [draft-nottingham-http-link-header] to indicate
   them.  They also MUST provide checksums of files via Instance Digests
   in HTTP [RFC3230].  Mirror and checksum information provided by the
   originating Metalink server is considered authoritative.

   Mirror servers are typically FTP or HTTP servers that "mirror"
   another server.  That is, they provide identical copies of (at least
   some) files that are also on the mirrored server.  Mirror servers MAY
   be Metalink servers.  Mirror servers MUST support serving partial
   content.  Mirror servers SHOULD support Instance Digests in HTTP
   [RFC3230].

   Metalink clients use the mirrors provided by a Metalink server with
   Link header [draft-nottingham-http-link-header].  Metalink clients
   MUST support HTTP and MAY support FTP, BitTorrent, or other download
   methods.  Metalink clients MUST switch downloads from one mirror to
   another if the one mirror becomes unreachable.  Metalink clients are
   RECOMMENDED to support multi-source, or parallel, downloads, where
   chunks of a file are downloaded from multiple mirrors simultaneously
   (and optionally, Peer-to-Peer).  Metalink clients MUST support
   Instance Digests in HTTP [RFC3230] by requesting and verifying
   checksums.  Metalink clients MAY make use of digital signatures if
   they are offered.



On Tue, Sep 1, 2009 at 3:08 AM, Mark Nottingham<mnot@...> wrote:

> Totally; we just need to be crisp about it.
>
> My inclination would be that if we can be more inclusive without making it
> significantly more complex or risky, we should; otherwise, just do what's
> needed.
>
> Cheers,
>
>
> On 01/09/2009, at 1:49 PM, Nicolas Alvarez wrote:
>
>> Mark Nottingham wrote:
>>>
>>> That's a good start, but it deserves a bit of discussion.
>>>
>>> "byte-for-byte" implies that the bodes are the same, but what about
>>> things like:
>>>
>>> * Entity headers (e.g., Content-Type)
>>> * Available content-encodings
>>> * Whether partial content is supported
>>> * Whether the same set of methods are supported (e.g., if A is a
>>> duplicate of B, will POSTing something to either have the same effect
>>> as on the other?)
>>>
>>> I think the answer is that entity headers should generally be the
>>> same, so the real question is whether we're talking about the relation
>>> describing:
>>>
>>> a) resources with duplicate representations (i.e., a GET on any of the
>>> dups will return the same reps)
>>> b) duplicate resources (i.e., any method will have the same effect)
>>>
>>> If it's (b), we should consider whether the resources are in fact the
>>> same "behind the curtains" (e.g., POSTing to A has the exact same
>>> effect on the world as POSTing to B), or whether they may be in fact
>>> separate systems (i.e., A and B have the same "interface", but POSTing
>>> to A may affect a different part of the world to B).
>>
>> Well, we're talking about static GETable resources with a single
>> representation. But I agree that if you make a Link relation, you'd want
>> it
>> to be applicable to as many HTTP resources as possible... Or is it
>> possible / reasonable to say "this relation doesn't make sense for dynamic
>> or POSTable resources and shouldn't be used for those"?



--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 08/09/2009, at 11:19 AM, Anthony Bryan wrote:

> Here's what I have now. More inclusive is good but I think someone
> else would be better at writing it than me.
>
> http://tools.ietf.org/html/draft-bryan-metalinkhttp
>
> Link Relation Type Registration: "duplicate"
>
>   o Relation Name: duplicate
>   o Description: Refers to an identical resource that is a
> byte-for-byte equivalence of representations.

Does this imply that each resource has exactly the same set of  
representations, or that when two resources share representations,  
those representations are duplicates?


>   o Reference: This specification.
>   o Notes: This relation is for static resources.  That is, an HTTP
> GET request on any duplicate will return the same representation.  It
> does not make sense for dynamic or POSTable resources and should not
>   be used for them.
>
> And here's the introduction (Content-MD5 is now mentioned):
>
>   MetaLinkHeader is an alternative to Metalink, usually represented in
>   an XML-based document format [draft-bryan-metalink].  MetaLinkHeader
>   attempts to provide as much functionality as the Metalink XML format
>   by using existing standards such as Web Linking
>   [draft-nottingham-http-link-header], Instance Digests in HTTP
>   [RFC3230], and Content-MD5 [RFC1864].  MetaLinkHeader is used to  
> list
>   information about a file to be downloaded.  This includes lists of
>   multiple URIs (mirrors), Peer-to-Peer information, checksums, and
>   digital signatures.
>
> Here's what it looks like:
>
>   Link: <http://www2.example.com/example.ext>; rel="duplicate";
>   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
>   Link: <http://example.com/example.ext.torrent>; rel="describedby";
>   type="torrent";

Do torrents have media types yet?

>   Link: <http://example.com/example.ext.asc>; rel="describedby";
>   type="application/pgp-signature";
>   Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
>
> And more description:
>
>   Metalink servers are HTTP servers that MUST have lists of mirrors  
> and
>   use the Link header [draft-nottingham-http-link-header] to indicate
>   them.  They also MUST provide checksums of files via Instance  
> Digests
>   in HTTP [RFC3230].  Mirror and checksum information provided by the
>   originating Metalink server is considered authoritative.
>
>   Mirror servers are typically FTP or HTTP servers that "mirror"
>   another server.  That is, they provide identical copies of (at least
>   some) files that are also on the mirrored server.  Mirror servers  
> MAY
>   be Metalink servers.  Mirror servers MUST support serving partial
>   content.  Mirror servers SHOULD support Instance Digests in HTTP
>   [RFC3230].
>
>   Metalink clients use the mirrors provided by a Metalink server with
>   Link header [draft-nottingham-http-link-header].  Metalink clients
>   MUST support HTTP and MAY support FTP, BitTorrent, or other download
>   methods.  Metalink clients MUST switch downloads from one mirror to
>   another if the one mirror becomes unreachable.  Metalink clients are
>   RECOMMENDED to support multi-source, or parallel, downloads, where
>   chunks of a file are downloaded from multiple mirrors simultaneously
>   (and optionally, Peer-to-Peer).  Metalink clients MUST support
>   Instance Digests in HTTP [RFC3230] by requesting and verifying
>   checksums.  Metalink clients MAY make use of digital signatures if
>   they are offered.
>
>
>
> On Tue, Sep 1, 2009 at 3:08 AM, Mark Nottingham<mnot@...> wrote:
>> Totally; we just need to be crisp about it.
>>
>> My inclination would be that if we can be more inclusive without  
>> making it
>> significantly more complex or risky, we should; otherwise, just do  
>> what's
>> needed.
>>
>> Cheers,
>>
>>
>> On 01/09/2009, at 1:49 PM, Nicolas Alvarez wrote:
>>
>>> Mark Nottingham wrote:
>>>>
>>>> That's a good start, but it deserves a bit of discussion.
>>>>
>>>> "byte-for-byte" implies that the bodes are the same, but what about
>>>> things like:
>>>>
>>>> * Entity headers (e.g., Content-Type)
>>>> * Available content-encodings
>>>> * Whether partial content is supported
>>>> * Whether the same set of methods are supported (e.g., if A is a
>>>> duplicate of B, will POSTing something to either have the same  
>>>> effect
>>>> as on the other?)
>>>>
>>>> I think the answer is that entity headers should generally be the
>>>> same, so the real question is whether we're talking about the  
>>>> relation
>>>> describing:
>>>>
>>>> a) resources with duplicate representations (i.e., a GET on any  
>>>> of the
>>>> dups will return the same reps)
>>>> b) duplicate resources (i.e., any method will have the same effect)
>>>>
>>>> If it's (b), we should consider whether the resources are in fact  
>>>> the
>>>> same "behind the curtains" (e.g., POSTing to A has the exact same
>>>> effect on the world as POSTing to B), or whether they may be in  
>>>> fact
>>>> separate systems (i.e., A and B have the same "interface", but  
>>>> POSTing
>>>> to A may affect a different part of the world to B).
>>>
>>> Well, we're talking about static GETable resources with a single
>>> representation. But I agree that if you make a Link relation,  
>>> you'd want
>>> it
>>> to be applicable to as many HTTP resources as possible... Or is it
>>> possible / reasonable to say "this relation doesn't make sense for  
>>> dynamic
>>> or POSTable resources and shouldn't be used for those"?
>
>
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>  )) Easier, More Reliable, Self Healing Downloads


--
Mark Nottingham     http://www.mnot.net/



Re: Multi-server HTTP

by Anthony Bryan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Sep 14, 2009 at 11:06 PM, Mark Nottingham <mnot@...> wrote:

>
> On 08/09/2009, at 11:19 AM, Anthony Bryan wrote:
>
>> Here's what I have now. More inclusive is good but I think someone
>> else would be better at writing it than me.
>>
>> http://tools.ietf.org/html/draft-bryan-metalinkhttp
>>
>> Link Relation Type Registration: "duplicate"
>>
>>  o Relation Name: duplicate
>>  o Description: Refers to an identical resource that is a
>> byte-for-byte equivalence of representations.
>
> Does this imply that each resource has exactly the same set of
> representations, or that when two resources share representations, those
> representations are duplicates?

The latter.

Any suggestions for replacement text? Because what I have isn't cutting it.

>> Here's what it looks like:
>>
>>  Link: <http://www2.example.com/example.ext>; rel="duplicate";
>>  Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
>>  Link: <http://example.com/example.ext.torrent>; rel="describedby";
>>  type="torrent";
>
> Do torrents have media types yet?

Not as far as I know.

Which is also why in draft-bryan-metalink we have this:

4.2.10.2. The "type" Attribute

   metalink:metaurl elements MUST have a "type" attribute that indicates
   the MIME type of the metadata available at the IRI.  In the case of
   BitTorrent as specified in [BITTORRENT], the value "torrent" is
   required.  Types without "/" are reserved.  Currently, "torrent" is
   the only reserved value.

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


Re: Multi-server HTTP

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 15/09/2009, at 1:59 PM, Anthony Bryan wrote:

> On Mon, Sep 14, 2009 at 11:06 PM, Mark Nottingham <mnot@...>  
> wrote:
>>
>> On 08/09/2009, at 11:19 AM, Anthony Bryan wrote:
>>
>>> Here's what I have now. More inclusive is good but I think someone
>>> else would be better at writing it than me.
>>>
>>> http://tools.ietf.org/html/draft-bryan-metalinkhttp
>>>
>>> Link Relation Type Registration: "duplicate"
>>>
>>>  o Relation Name: duplicate
>>>  o Description: Refers to an identical resource that is a
>>> byte-for-byte equivalence of representations.
>>
>> Does this imply that each resource has exactly the same set of
>> representations, or that when two resources share representations,  
>> those
>> representations are duplicates?
>
> The latter.
>
> Any suggestions for replacement text? Because what I have isn't  
> cutting it.

Hm.

Refers to a resource whose available representations are byte-for-byte  
identical with the corresponding representations of the context IRI.


>>> Here's what it looks like:
>>>
>>>  Link: <http://www2.example.com/example.ext>; rel="duplicate";
>>>  Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
>>>  Link: <http://example.com/example.ext.torrent>; rel="describedby";
>>>  type="torrent";
>>
>> Do torrents have media types yet?
>
> Not as far as I know.
>
> Which is also why in draft-bryan-metalink we have this:
>
> 4.2.10.2. The "type" Attribute
>
>   metalink:metaurl elements MUST have a "type" attribute that  
> indicates
>   the MIME type of the metadata available at the IRI.  In the case of
>   BitTorrent as specified in [BITTORRENT], the value "torrent" is
>   required.  Types without "/" are reserved.  Currently, "torrent" is
>   the only reserved value.


Overloading type like that is bad; register a media type (or get the  
appropriate people to do it).

Cheers,


--
Mark Nottingham     http://www.mnot.net/


< Prev | 1 - 2 | Next >