Inconsistencies in Discovery methods

View: New views
7 Messages — Rating Filter:   Alert me  

Parent Message unknown Inconsistencies in Discovery methods

by Eran Hammer-Lahav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


In HTTP-based Resource Descriptor Discovery [1], I am trying to define a
uniform way to attach metadata (descriptors) to resources. The idea is to
define three methods for obtaining the location (URI) of the descriptor
document via the resource (URI or representation). All three methods use the
'describedby' relation type.

1. <LINK> elements in HTML, XHTML, and Atom documents.
2. Link: headers in HTTP responses.
3. /site-meta documents [2], using a Link-Template (transforming the
resource URI to the descriptor URI using a URI template).

A descriptor contains information about a resource, but it is hard to define
this association in practical terms (that can translate directly to code).
Instead, the proposal defines the descriptor as 'information about a
resource identified by a URI'.

In the current draft I tried to use the HTTP status codes (obtained with the
first two methods, <LINK> and Link:), by instructing the client to follow
redirects and only use links from a small subset of status codes (200, 303,
401). This approach proved broken for 2 reasons:

1. It is up to the application to decide how redirects should be followed.
If a URI (when dereferenced and requested using an HTTP GET) returns a 307,
any links associated with that response may contain valid metadata that is
not the same as the metadata describing the URI the user-agent is being
redirected to (which in this example returns a 200).

2. It makes information obtained from <LINK> and Link: inconsistent with
that obtained from /site-meta. /site-meta has no way of follow redirects (it
is a static transformation template) and will always produce a URI
identifying the location of the descriptor associated with the 307 response,
not the follow-up 200.

To address that, I started taking a different approach with my upcoming
revision (-02) that basically tries to ignore HTTP status codes. It moves
the focus away from the 'resource' to the URI. But Roy's recent comment made
this approach (ignoring HTTP status codes) incomplete as well.

On 2/6/09 11:03 AM, "Roy T. Fielding" <fielding@...> wrote:

> There are many resources involved in HTTP,
> only one of which is identified by the requested URI.  Each of those
> resources may have representations, and the meaning of the payload in a
> response message is defined by the status code.  A 404 response is going
> to contain a representation of a resource on the server that describes
> that error. A 200 response is going to contain a representation of the
> resource that was identified as the request target.

What this means is that a Link header in the HTTP response to a GET request
might not be about the resource identified by the URI used to make that
request.

For example, if:

GET /resource/1 HTTP/1.1
Host: example.com

returns:

HTTP/1.1 404 Not Found
Link: <http://example.com/about>; rel="describedby"

The Link is about the "resource on the server that describes that error",
and not about the resource identified by the URI
(http://example.com/resource/1).

Because /site-meta does not provide access to the HTTP status code, if it
returned http://example.com/about as the descriptor location of
http://example.com/resource/1, it would be incorrect (due to lack of
information about the 404 condition involved). In such a case, it is really
Link: header that is limited because the representation of the resource
isn't available (and therefore no place to put its links).

---

I am trying to find a way to keep the three methods in sync without further
limiting the usefulness of this protocol. So far the only approach I have is
to limit Link elements and headers (for use in this protocol) to HTTP
responses with a status code that can only be interpreted as about the
request URI.

>From a (very) quick review of the status codes, this means only the
following codes do not bind the response representation to the request URI:

* 1xx
* 202 - about the request's status, is this the same as the resource?
* 205 - does not seem to represent anything.
* 303 - not sure.
* 4xx, except maybe 406 - not sure, seems to be about the resource.
* 5xx

This seems to suggest most 2xx, most 3xx, and maybe 406, as the only valid
status codes to be allowed when looking for a 'describedby' link.

If this approach is acceptable, should the spec explicitly define which
status codes are valid? Or make do with a definition of 'HTTP responses with
a status code that is a representation of the request URI'. The second
option is generally preferred but at this point, even the spec author (me)
cannot fully determine how to implement it (as indicated by the 'not sure'
above).

Comments?

EHL

[1] http://tools.ietf.org/html/draft-hammer-discovery-01
[2] http://www.ietf.org/internet-drafts/draft-nottingham-site-meta-00.txt



Re: Inconsistencies in Discovery methods

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message





On Feb 6, 2009, at 4:03 PM, Eran Hammer-Lahav wrote:

> On 2/6/09 11:03 AM, "Roy T. Fielding" <fielding@...> wrote:
>
>> There are many resources involved in HTTP,
>> only one of which is identified by the requested URI.  Each of those
>> resources may have representations, and the meaning of the payload  
>> in a
>> response message is defined by the status code.  A 404 response is  
>> going
>> to contain a representation of a resource on the server that  
>> describes
>> that error. A 200 response is going to contain a representation of  
>> the
>> resource that was identified as the request target.
>
> What this means is that a Link header in the HTTP response to a GET  
> request
> might not be about the resource identified by the URI used to make  
> that
> request.

The Link header field defines what it is about: [RFC2068]

    The Link entity-header field provides a means for describing a
    relationship between two resources, generally between the requested
    resource and some other resource.

It says "requested resource" there for a reason.  It seems that has
been muddled a bit in Mark's draft, probably because you guys have had
too many discussions about what it could mean.

If you think it would be helpful to distinguish the Link header
field (resource metadata) from a Content-Link header field
(representation metadata), then that is a separate discussion.

....Roy



Re: Inconsistencies in Discovery methods

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 07/02/2009, at 11:40 AM, Roy T. Fielding wrote:

> On Feb 6, 2009, at 4:03 PM, Eran Hammer-Lahav wrote:
>> On 2/6/09 11:03 AM, "Roy T. Fielding" <fielding@...> wrote:
>>
>>> There are many resources involved in HTTP,
>>> only one of which is identified by the requested URI.  Each of those
>>> resources may have representations, and the meaning of the payload  
>>> in a
>>> response message is defined by the status code.  A 404 response is  
>>> going
>>> to contain a representation of a resource on the server that  
>>> describes
>>> that error. A 200 response is going to contain a representation of  
>>> the
>>> resource that was identified as the request target.
>>
>> What this means is that a Link header in the HTTP response to a GET  
>> request
>> might not be about the resource identified by the URI used to make  
>> that
>> request.
>
> The Link header field defines what it is about: [RFC2068]
>
>   The Link entity-header field provides a means for describing a
>   relationship between two resources, generally between the requested
>   resource and some other resource.
>
> It says "requested resource" there for a reason.  It seems that has
> been muddled a bit in Mark's draft, probably because you guys have had
> too many discussions about what it could mean.

Yes; this should be better in -04 (which is waiting for the IPR  
contributions clarification).

>
>
> If you think it would be helpful to distinguish the Link header
> field (resource metadata) from a Content-Link header field
> (representation metadata), then that is a separate discussion.
>
> ....Roy
>


--
Mark Nottingham     http://www.mnot.net/



RE: Inconsistencies in Discovery methods

by Eran Hammer-Lahav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


This solves my problem with regard to the Link header.

On Feb 06, 2009 4:41 PM, "Roy T. Fielding" <fielding@...> wrote:

> The Link header field defines what it is about: [RFC2068]
>
>     The Link entity-header field provides a means for describing a
>     relationship between two resources, generally between the requested
>     resource and some other resource.

Isn't this a bit of a contradiction? The same spec defines entity-header as:

    Entity-header fields define optional metainformation about the
    entity-body or, if no body is present, about the resource identified
    by the request.

(which is identical to the language in the most recent draft without the word 'optional').

A 404 response can have an entity-body, which you defined as "representation of a resource on the server that describes that error". So a Link header on a 404 with no body is consistent between the Link header definition and the entity-header definition. But if a body is present, they contradict each other.

> If you think it would be helpful to distinguish the Link header
> field (resource metadata) from a Content-Link header field
> (representation metadata), then that is a separate discussion.

My use case needs a resource metadata field, so a Content-Link header would not be needed.

This does not seem to help me with the case where a 404 response includes an HTML body with a <LINK> element, and a Link header. According to the explanation above, each has a very different context URI. The subject of the Link header is the requested resource, while the subject of the HTML <LINK> element is the "resource on the server that describes that error".

So in order to keep the three methods synced (Link: header, <LINK> element, /site-meta), we would still need to restrict the HTTP status codes... this time because of <LINK> elements.

EHL




Re: Inconsistencies in Discovery methods

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message





On Feb 6, 2009, at 10:48 PM, Eran Hammer-Lahav wrote:

> This solves my problem with regard to the Link header.
>
> On Feb 06, 2009 4:41 PM, "Roy T. Fielding" <fielding@...> wrote:
>
>> The Link header field defines what it is about: [RFC2068]
>>
>>     The Link entity-header field provides a means for describing a
>>     relationship between two resources, generally between the  
>> requested
>>     resource and some other resource.
>
> Isn't this a bit of a contradiction? The same spec defines entity-
> header as:
>
>     Entity-header fields define optional metainformation about the
>     entity-body or, if no body is present, about the resource  
> identified
>     by the request.
>
> (which is identical to the language in the most recent draft  
> without the word 'optional').
>
> A 404 response can have an entity-body, which you defined as  
> "representation of a resource on the server that describes that  
> error". So a Link header on a 404 with no body is consistent  
> between the Link header definition and the entity-header  
> definition. But if a body is present, they contradict each other.

Yes, we had several "category errors" in 2068, largely because we
chose the wrong names for the categories.  Some of them were fixed
in 2616, and they'll most likely be different in 2616bis.
Don't worry about that.

However, I think your attempt to make all types of links in the same
message be mirrors is unnecessary. In many cases, the relation name
will have implications beyond the resource being targeted, and in
other cases the links will simply be wrong if expressed as resource
metadata (e.g., a link rel="author" for which the relationship is
only true for one of the representations of this resource).

We could resolve that ambiguity by differentiating where the link
is indicated (link: vs content-link, <link> vs <a>) or by
differentiating the relation names (e.g., owner vs author).
Neither option has been used consistently in the past and I doubt
that it will ever be consistent in the future.

....Roy



RE: Inconsistencies in Discovery methods

by Eran Hammer-Lahav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Feb 07, 2009 11:49 AM, "Roy T. Fielding" <fielding@...> wrote:

> On Feb 6, 2009, at 10:48 PM, Eran Hammer-Lahav wrote:
>
> > A 404 response can have an entity-body, which you defined as
> > "representation of a resource on the server that describes that
> > error". So a Link header on a 404 with no body is consistent
> > between the Link header definition and the entity-header
> > definition. But if a body is present, they contradict each other.
>
> Yes, we had several "category errors" in 2068, largely because we
> chose the wrong names for the categories.  Some of them were fixed
> in 2616, and they'll most likely be different in 2616bis.
> Don't worry about that.

How do you expect this contradiction to be resolved? Should the Link header simply override the meaning of 'entity-header' and be always considered 'about the request URI'?

> However, I think your attempt to make all types of links in the same
> message be mirrors is unnecessary. In many cases, the relation name
> will have implications beyond the resource being targeted, and in
> other cases the links will simply be wrong if expressed as resource
> metadata (e.g., a link rel="author" for which the relationship is
> only true for one of the representations of this resource).

I am not.

My focus is very narrow, and deals with a single relation type 'describedby' (unless this path takes me to a place that is not compatible with the ideas expressed by the POWDER spec, in which case I will mint a new relation type, like 'about').

I'm just trying to make all types of links in the same message identical. How different types of links should be used to express the same 'describedby' relation (context-type-target). From your reply it seems I can accomplish this rather easily by saying:

A 'describedby' link from a resource URI (X) to its descriptor URI can be expressed by:

* Link: headers in responses to requests where the request URI is X.
* <LINK> elements where the document is a valid representation of the resource identified by URI X.
* /site-meta templates for URI X's authority, where URI X is the transformation input.

In the example we're discussing (404), the presence of a <LINK> element in the body is simply irrelevant. This approach removes any need to explicitly discuss the HTTP status code associated with the response.

But for this to work, 2616bis will need to be very clear about which entity-body (based on the response status code) is a representation of the request URI and which is of something else. Am I asking for too much?

EHL





Re: Inconsistencies in Discovery methods

by Jonathan Rees-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Feb 7, 2009, at 1:48 AM, Eran Hammer-Lahav wrote:

> This solves my problem with regard to the Link header.
>
> On Feb 06, 2009 4:41 PM, "Roy T. Fielding" <fielding@...> wrote:
>
>> The Link header field defines what it is about: [RFC2068]
>>
>>    The Link entity-header field provides a means for describing a
>>    relationship between two resources, generally between the  
>> requested
>>    resource and some other resource.
>
> Isn't this a bit of a contradiction? The same spec defines entity-
> header as:
>
>    Entity-header fields define optional metainformation about the
>    entity-body or, if no body is present, about the resource  
> identified
>    by the request.

This makes me wonder if Link: in its reincarnation ought to be defined  
to be a response-header instead of an entity-header:

    The response-header fields allow the server to pass additional
    information about the response which cannot be placed in the Status-
    Line. These header fields give information about the server and  
about
    further access to the resource identified by the Request-URI. [RFC  
2616]

What would this break? I would guess that there are implications for  
CN and caching, but not sure  whether the change would be an  
improvement or damaging.

Jonathan