|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
memento: time warp for mediawikiHi all
The Memento Project <http://www.mementoweb.org/> (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this <http://www.mediawiki.org/wiki/Extension:Memento> - which would of course be particularly interesting for use on Wikipedia. Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues: * the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. We need a tiebreak (rev_id would be the obvious choice). * templates and images also need to be "time warped". It seems like the extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here? * Squids would need to know about the new header, and by pass the cache when it's used. so, what do you think? what does it take? Can we point them to the missing bits? -- daniel _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiInstead of witting it as an extra header to HTTP protocol ... why don't
they write it as a proxy to wikimedia (or any other site the want to temporal proxy). Getting a new HTTP header out there is not an easy task at best a small percentage of sites will support it and then you need to deploy clients and write user interfaces that support it as well. If viewing old version of sites is something interesting to them. It probably best to write a interface a firefox extension or grease monkey script that integrates makes a "temporal" interface of their likening for the mediawiki api (presumably the "history button" fails to represent their vision? )... for non-mediawiki sites could access "the way back machine". If the purpose is to support searching or archival. Then its probably best to proxy the mediaWiki api through a proxy that they setup that supports those temporal requests across all sites (ie an enhanced interface to the wayback machine?) --michael Daniel Kinzler wrote: > Hi all > > The Memento Project <http://www.mementoweb.org/> (including the Los Alamos > National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is > proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web > resource. They already wrote a MediaWiki extension for this > <http://www.mediawiki.org/wiki/Extension:Memento> - which would of course be > particularly interesting for use on Wikipedia. > > Do you think we could have this for Wikimedia project? I think that would be > very nice indeed. I recall that ways to look at last weeks main page have been > discussed before, and I see several issues: > > * the timestamp isn't a unique identifier, multiple revisions *might* have the > same timestamp. We need a tiebreak (rev_id would be the obvious choice). > * templates and images also need to be "time warped". It seems like the > extension does not address this at the moment. For flagged revisions we do have > such a machnism, right? Could that be used here? > * Squids would need to know about the new header, and by pass the cache when > it's used. > > so, what do you think? what does it take? Can we point them to the missing bits? > > -- daniel > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiDaniel Kinzler wrote:
> The Memento Project <http://www.mementoweb.org/> (including the Los Alamos > National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is > proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web > resource. They already wrote a MediaWiki extension for this > <http://www.mediawiki.org/wiki/Extension:Memento> - which would of course be > particularly interesting for use on Wikipedia. > > Do you think we could have this for Wikimedia project? I think that would be > very nice indeed. I recall that ways to look at last weeks main page have been > discussed before, and I see several issues: > > * the timestamp isn't a unique identifier, multiple revisions *might* have the > same timestamp. We need a tiebreak (rev_id would be the obvious choice). I'd say it is, if sufficiently precise :) If not, either use the lowest/highest rev_id, or the user could be asked to choose a version. > * templates and images also need to be "time warped". It seems like the > extension does not address this at the moment. For flagged revisions we do have > such a machnism, right? Could that be used here? I see three independent things here: 1) When viewing a past version of a page, show appropriate templates, images, magic words etc. 2) When viewing a past version of a page, link to other pages as appropriate (show red links if they haven't yet existed, link to their appropriate past version if they have). I'd say this is the easiest to implement, and the most interesting for readers. 3) Ability to view a page as it looked at a certain time (as opposed to a certain revision). _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski <smolensk@...> wrote:
> I'd say it is, if sufficiently precise :) MediaWiki only keeps timestamps to one-second precision, so it's not. _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
|
|
|
Re: memento: time warp for mediawikiHi Michael and all,
The first thing which we implemented was exactly this idea of a proxy using the wikipedia API. The proxy is here: http://mementoproxy.lanl.gov/wiki/timegate/(wikipedia URI) For example: http://mementoproxy.lanl.gov/wiki/timegate/http://en.wikipedia.org/wiki/Clock We have also implemented proxies for the Internet Archive, Archive-It, WebCitation.org and several others, as proof-of-concept pieces for the research. There are several reasons why a native implementation is better for all concerned: 1. The browser somehow needs to know where the proxy is, rather than being natively redirected to the correct page. For a few websites, and a few proxies, this is tolerable. However even one proxy per CMS would be an impossible burden to maintain, let alone one proxy per website! 2. If the website redirected to the proxy, rather than the client knowing where to go, then this would be on trust that the proxy behaved correctly. In a native implementation, you're never redirected off-site. 3. The proxy will redirect back to the appropriate history page, however this page doesn't know that it's being treated as a Memento, and will not issue the X-Datetime-Validity or X-Archive-Interval headers. This makes it difficult (but not impossible) for the client to trap that it has been redirected correctly. 4. The offsite redirection adds at least 2 extra HTTP transactions per resource, slowing down the retrieval. In the native implementation the main page redirects to the history page directly. In the proxy, the browser goes to the main page, then either knows of or is redirected to the proxy, the proxy makes one or more API calls to fetch the history for the page to calculate the right revision, and then redirects the client back there. 5. We don't have to maintain the proxies :) So for wikimedia installations the native approach is better as it's trusted and faster and involves less API calls. For the client it's better as it's faster and doesn't require intelligence or a list of proxies. For the proxy maintainer it's better as they're no longer needed. I hope that helps clarify things, Rob Sanderson (Also at Los Alamos with Herbert Van de Sompel) Michael Dale wrote: Instead of witting it as an extra header to HTTP protocol ... why don't they write it as a proxy to wikimedia (or any other site the want to temporal proxy). Getting a new HTTP header out there is not an easy task at best a small percentage of sites will support it and then you need to deploy clients and write user interfaces that support it as well. If viewing old version of sites is something interesting to them. It probably best to write a interface a firefox extension or grease monkey script that integrates makes a "temporal" interface of their likening for the mediawiki api (presumably the "history button" fails to represent their vision? )... for non-mediawiki sites could access "the way back machine". If the purpose is to support searching or archival. Then its probably best to proxy the mediaWiki api through a proxy that they setup that supports those temporal requests across all sites (ie an enhanced interface to the wayback machine?) --michael _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiHello Herbert.
Herbert Van de Sompel wrote: > 2. Let me describe the actual status and challenges faced in the > Memento plug-in work: > > 2.1. The plug-in detects a client's X-Accept-Datetime header, and > returns the mediawiki page that was active at the datetime specified > in the header. Same for images, actually. > 2.2. Display history pages with the template that was active at > the time the history page acted as the current one. [Snip] So, we are > looking at the mediawiki code to see whether a history page, when > rendered, could itself retrieve the appropriate (old) template from > the database. If we are successful, we will share that code also at http://www.mediawiki.org/wiki/Extension:Memento > once available. It will obviously be up to the mediawiki community > whether they are willing to adopt the proposed change to the codebase. Obviously it's a server issue. > 2.3. We have looked into another issue raised by Jakob: Display > deleted pages as they existed at the datetime expressed in X-Datetime- > Accept. We have actually implemented this. There are 2 caveats: > - as is the case with mediawiki in general, deleted pages are only > accessible by those with appropriate permissions; > - as is the case with mediawiki in general, deleted pages show up in > Edit mode. > This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento Showing deleted pages in edit mode is not always the case, since they can't be rendered (albeit not with the old templates, which would be an interesting enhacement by your work). It is impressive how far you have gone. However, I don't think you can do a *complete* implementation. First, you should be aware that timemachining the pages has been tried in the past. Discussions treating FlaggedReves are also relevant for your project. FlaggedRevs is an extension which allow to mark the status of a page (eg. not vandalised) at a point in time. A naive implementation would store the timestamp and get the old version from the archive. They ended up storing in a table specific to the extension the page content with templates transcluded. However, flaggedrevs is a tool to fight vandalism. Yours is an archival one. You could accept imperfect results under certain circunstances. Problematic aspects: Page moves/image moves: *You want to see content of Foo at epoch, but the history now at Foo is wrong. Instead you need to look at that history of the page now at Foo_(disambiguation) You need to follow (perhaps even many times) the move logs to find out the real page. Page merges: *When two pages have been merged, you will want to show the revision which was originally at the page the user wants to timemachine. You can no longer just rely on the timestamps. You may be able to get that by splitting the sources at the merge time and going back via rev_parent_id. Needless to say, this is very inefficient, this piece wouldn't be put live at wikipedia. Partial undeletions: *When a page is undeleted, the summary shows how many revisions were undeleted, but not *which* ones. Case: *Page A has two edits (#1 and #2). *A vandal adds obscene content to it (#3). *Admin deletes the page and restores the two first revisions. *Several months later, the page is completely deleted. When an admin wants to view what the page looked like those months, an application is unable to determine if the two revisions which had been shown were #1 and #2 or perhaps #2 and #3. revdelete may have similar issues. > 2.4. We do not feel that all pages should necessarily be subject to > datetime content negotiation, in the same way that not all URIs are > subject to content negotiation in other dimensions. We feel that the > Special Pages fall under this category, as they do not have History. > > 2.5. We have ideas regarding how to address the issue raised by > Daniel: the timestamp isn't a unique identifier, multiple revisions > *might* have the > same timestamp. From the perspective of Memento, a datetime is > obviously the only "globally" recognizable value that can be used for > negotiation. If cases occur where multiple versions of a page exist > for the same second, the thing to do according to RFC 2295 would be to > return a "300 Mutliple Choices", listing the URIs (and metadata) of > those version in an Alternates header. The client then has to take it > from there. > 2.6. The caching issue is a general problem arising from introducing > Memento in a web that does not (yet) do Memento: when in datetime > content negotiation mode all caches between client and server (both > included) need to be bypassed. As described in our paper, we currently > address this problem by adding the following client headers: > > Cache-Control: no-cache => to force cache revalidation, and > If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce > validation failure > > We very much understand this is not elegant but it tends to work ;-) . The caching issue is IMHO the bigger problem in your approach using the new header. Disabling cache on the request kind of work (although not in the long term), but you also need to disable caching at the server, so when someone accessing by your same proxy (ignorant of X-Accept-Datetime) to the current page doesn't get the cached page you were served earlier. RFC 2145 states very clearly that "A proxy MUST forward an unknown header", but in your case it'd have been preferable that the header wasn't forwarded if the proxy isn't memento aware. Which leads us to another issue, which is that it seems your server implementation doesn't "acknowledge" memento, so given a response to a X-Accept-Datetime, you don't know if what you're getting is the version you requested or the current one (because the server ignored it). It can be as simple as requiring a Last-Modified <= X-Accept-Datetime on Accept-Datetime responses (that would allow the server to explicitely tell since when is it valid), but extended to all response codes. _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Nov 12, 2009, at 3:19 PM, Platonides wrote:
> >> 2.3. We have looked into another issue raised by Jakob: Display >> deleted pages as they existed at the datetime expressed in X- >> Datetime- >> Accept. We have actually implemented this. There are 2 caveats: >> - as is the case with mediawiki in general, deleted pages are only >> accessible by those with appropriate permissions; >> - as is the case with mediawiki in general, deleted pages show up in >> Edit mode. >> This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento > > Showing deleted pages in edit mode is not always the case, since they > can't be rendered (albeit not with the old templates, which would be > an > interesting enhacement by your work). > > > It is impressive how far you have gone. However, I don't think you can > do a *complete* implementation. > > First, you should be aware that timemachining the pages has been tried > in the past. Discussions treating FlaggedReves are also relevant for > your project. > FlaggedRevs is an extension which allow to mark the status of a page > (eg. not vandalised) at a point in time. A naive implementation would > store the timestamp and get the old version from the archive. They > ended > up storing in a table specific to the extension the page content with > templates transcluded. > However, flaggedrevs is a tool to fight vandalism. Yours is an > archival > one. You could accept imperfect results under certain circunstances. Indeed, it suffices to look at the Internet Archive and comparable web archives to see that one needs to live with what is reasonably achievable, not with what one would love to have. Imperfection is allowed when looking at this problem from an archival perspective. Related to this, one must be careful not to cross the border between: (a) what can purely be achieved using the primitives of the web architecture (URI, resource, representation), and HTTP, with datetime content negotiation added to the mix; (b) what is in the realm of content, interpretation, etc. Let me explain what I mean: Wikipedia used to have a page for "Alito". The page got discontinued and in its place came a page "Samuel Alito". Both have their separate URIs, and so for each individually datetime content negotiation will work nicely. That is what I mean with (a) above. However, connecting "Alito" and "Samuel Alito" moves us into the realm of (b). Things could be done in this specific type of case, as redirects are in place between the Alito and Samuel Alito URIs (unfortunately not the 304 or 302 one would expect but rather a 200) meaning such redirection info is in the database. Hence it could be acted upon. And, so we could explore this, although I feel this gets us into the (b) zone. Again, generally speaking we must remain aware of the line between (a) and (b) above. A Cheers herbert == Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267 _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Nov 12, 2009, at 3:19 PM, Platonides wrote:
> >> 2.6. The caching issue is a general problem arising from introducing >> Memento in a web that does not (yet) do Memento: when in datetime >> content negotiation mode all caches between client and server (both >> included) need to be bypassed. As described in our paper, we >> currently >> address this problem by adding the following client headers: >> >> Cache-Control: no-cache => to force cache revalidation, and >> If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce >> validation failure >> >> We very much understand this is not elegant but it tends to >> work ;-) . > > > The caching issue is IMHO the bigger problem in your approach using > the > new header. > Disabling cache on the request kind of work (although not in the long > term), but you also need to disable caching at the server, so when > someone accessing by your same proxy (ignorant of X-Accept-Datetime) > to > the current page doesn't get the cached page you were served earlier. Agreed, of course, that our current cache fix is a temp solution. Not sure what you mean by the above remark, but it is totally fine to cache the current page in mediawiki because the history pages are not served from the URI of the current page, neither by our plug-in nor in Memento in general (see http://www.mementoweb.org/guide/http/local/). Rather, a X-Datetime-Accept request is redirected (302 Found) to an appropriate history resource that has its own URI (with title and oldid in case of mediawiki). And, hence, even those history pages can be cached by mediawiki equipped with the memento plug-in. > RFC 2145 states very clearly that "A proxy MUST forward an unknown > header", but in your case it'd have been preferable that the header > wasn't forwarded if the proxy isn't memento aware. > > Which leads us to another issue, which is that it seems your server > implementation doesn't "acknowledge" memento, so given a response to a > X-Accept-Datetime, you don't know if what you're getting is the > version > you requested or the current one (because the server ignored it). > It can be as simple as requiring a Last-Modified <= X-Accept- > Datetime on > Accept-Datetime responses (that would allow the server to explicitely > tell since when is it valid), but extended to all response codes. > Actually, have a look at http://www.mementoweb.org/guide/http/local/ . You will note that the following response header is always included: X-Archive-Interval: {datetime_start} - {datetime_end} This allows a client to understand he received a history resource. The values to use are the start datetime and end datetime for which the server has representations for the the URI at hand. Our plug-in implements this for mediawiki. Our proxy can't do this. Cheers herbert == Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267 _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiWe have made some updates to the Memento extension and we have also
written a fix to perform datetime content negotiation on transcluded templates. Details can be found in the wiki page for the extension http://www.mediawiki.org/wiki/Extension:Memento . Harihar (Los Alamos National Labs) Herbert Van de Sompel wrote: > On Nov 12, 2009, at 3:19 PM, Platonides wrote: > >>> 2.6. The caching issue is a general problem arising from introducing >>> Memento in a web that does not (yet) do Memento: when in datetime >>> content negotiation mode all caches between client and server (both >>> included) need to be bypassed. As described in our paper, we >>> currently >>> address this problem by adding the following client headers: >>> >>> Cache-Control: no-cache => to force cache revalidation, and >>> If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce >>> validation failure >>> >>> We very much understand this is not elegant but it tends to >>> work ;-) . >>> >> The caching issue is IMHO the bigger problem in your approach using >> the >> new header. >> Disabling cache on the request kind of work (although not in the long >> term), but you also need to disable caching at the server, so when >> someone accessing by your same proxy (ignorant of X-Accept-Datetime) >> to >> the current page doesn't get the cached page you were served earlier. >> > > Agreed, of course, that our current cache fix is a temp solution. > > Not sure what you mean by the above remark, but it is totally fine to > cache the current page in mediawiki because the history pages are not > served from the URI of the current page, neither by our plug-in nor in > Memento in general (see http://www.mementoweb.org/guide/http/local/). > Rather, a X-Datetime-Accept request is redirected (302 Found) to an > appropriate history resource that has its own URI (with title and > oldid in case of mediawiki). And, hence, even those history pages can > be cached by mediawiki equipped with the memento plug-in. > > >> RFC 2145 states very clearly that "A proxy MUST forward an unknown >> header", but in your case it'd have been preferable that the header >> wasn't forwarded if the proxy isn't memento aware. >> >> Which leads us to another issue, which is that it seems your server >> implementation doesn't "acknowledge" memento, so given a response to a >> X-Accept-Datetime, you don't know if what you're getting is the >> version >> you requested or the current one (because the server ignored it). >> It can be as simple as requiring a Last-Modified <= X-Accept- >> Datetime on >> Accept-Datetime responses (that would allow the server to explicitely >> tell since when is it valid), but extended to all response codes. >> >> > > > Actually, have a look at http://www.mementoweb.org/guide/http/local/ . > You will note that the following response header is always included: > > X-Archive-Interval: {datetime_start} - {datetime_end} > > This allows a client to understand he received a history resource. The > values to use are the start datetime and end datetime for which the > server has representations for the the URI at hand. > > Our plug-in implements this for mediawiki. Our proxy can't do this. > > Cheers > > herbert > > > == > Herbert Van de Sompel > Digital Library Research & Prototyping > Los Alamos National Laboratory, Research Library > http://public.lanl.gov/herbertv/ > tel. +1 505 667 1267 > > > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiDaniel Kinzler wrote:
> Hi all > > The Memento Project <http://www.mementoweb.org/> (including the Los Alamos > National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is > proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web > resource. They already wrote a MediaWiki extension for this > <http://www.mediawiki.org/wiki/Extension:Memento> - which would of course be > particularly interesting for use on Wikipedia. > > Do you think we could have this for Wikimedia project? I think that would be > very nice indeed. I recall that ways to look at last weeks main page have been > discussed before, and I see several issues: > > * the timestamp isn't a unique identifier, multiple revisions *might* have the > same timestamp. We need a tiebreak (rev_id would be the obvious choice). > * templates and images also need to be "time warped". It seems like the > extension does not address this at the moment. For flagged revisions we do have > such a machnism, right? Could that be used here? > * Squids would need to know about the new header, and by pass the cache when > it's used. You can't view the main page as it was in the past, because users routinely upload temporary images to display there, so that they can be protected, and then delete them once they're off the page. Also, we can't have people crawling Wikipedia while requesting old versions, because of the excessive disk seeking and CPU usage that would generate. That's why the history page has a robot policy of noindex, nofollow. -- Tim Starling _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiHi Tim,
If there's a problem with viewing past versions of the main page, that's perfectly okay -- it can be excluded from the resources that are datetime content negotiable like the Special: pages. I admit to not following the second issue completely. A regular robot would never issue the X-Accept-Datetime to jump back in time, so that's okay. A regular robot would also respect the history page policy and not crawl backwards either, as you say. A robot that did issue X-Accept-Datetime would end up crawling old revision pages and never hit a history list, but this could also be forbidden via robots.txt if the revision pages were excluded too? However, that seems like it's a long time off before people write past-web crawlers and the use case for even doing it at all is pretty hard to come up with. :) Hope this addresses your concerns! Rob On Thu, Nov 12, 2009 at 5:15 PM, Tim Starling <tstarling@...>wrote: > Daniel Kinzler wrote: > > Hi all > > > > The Memento Project <http://www.mementoweb.org/> (including the Los > Alamos > > National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) > is > > proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of > a web > > resource. They already wrote a MediaWiki extension for this > > <http://www.mediawiki.org/wiki/Extension:Memento> - which would of > course be > > particularly interesting for use on Wikipedia. > > > > Do you think we could have this for Wikimedia project? I think that would > be > > very nice indeed. I recall that ways to look at last weeks main page have > been > > discussed before, and I see several issues: > > > You can't view the main page as it was in the past, because users > routinely upload temporary images to display there, so that they can > be protected, and then delete them once they're off the page. > > Also, we can't have people crawling Wikipedia while requesting old > versions, because of the excessive disk seeking and CPU usage that > would generate. That's why the history page has a robot policy of > noindex, nofollow. > > -- Tim Starling > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel
<hvdsomp@...> wrote: > 2.1. The plug-in detects a client's X-Accept-Datetime header, and > returns the mediawiki page that was active at the datetime specified > in the header. Same for images, actually. This effectively allows > navigating (as in clicking links) a mediawiki collection as it existed > in the past: as long as a client issues an X-Accept-Datetime header, > matching history pages/images will be retrieved. Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect. It seems to me like a better path would be to have different URLs for different dates. The obvious way to do this would be to take an approach like OpenSearch, and provide a URL pattern in some standard format. Maybe the page could contain <link rel=oldversions> or such, with the client appending a query parameter to the given URL, say time=T where T is an ISO 8601 string. _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiAryeh Gregor schrieb:
> Doesn't the use of a header here violate the idea of each URL > representing only one resource? The server will be returning totally > different things for a GET to the same URL. That seems like it would > cause all sorts of problems -- not only do caching proxies break > (which I'd think by itself makes the feature unusable for users behind > caching proxies), but how do you deal with things like bookmarking, or > sending a link to a particular version of the page to someone? These > would become impossible, unless the server goes to the extra effort to > return a redirect. > > It seems to me like a better path would be to have different URLs for > different dates. The obvious way to do this would be to take an > approach like OpenSearch, and provide a URL pattern in some standard > format. Maybe the page could contain <link rel=oldversions> or such, > with the client appending a query parameter to the given URL, say > time=T where T is an ISO 8601 string. How about doing both? If a X-Datetime-Accept header is received, it could trigger a 302 redirect, pointing at a url that specifies the desired point in time. -- daniel _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Nov 13, 2009, at 2:08, Daniel Kinzler <daniel@...> wrote:
> Aryeh Gregor schrieb: >> Doesn't the use of a header here violate the idea of each URL >> representing only one resource? The server will be returning totally >> different things for a GET to the same URL. That seems like it would >> cause all sorts of problems -- not only do caching proxies break >> (which I'd think by itself makes the feature unusable for users >> behind >> caching proxies), but how do you deal with things like bookmarking, >> or >> sending a link to a particular version of the page to someone? These >> would become impossible, unless the server goes to the extra effort >> to >> return a redirect. >> >> It seems to me like a better path would be to have different URLs for >> different dates. The obvious way to do this would be to take an >> approach like OpenSearch, and provide a URL pattern in some standard >> format. Maybe the page could contain <link rel=oldversions> or such, >> with the client appending a query parameter to the given URL, say >> time=T where T is an ISO 8601 string. > > How about doing both? If a X-Datetime-Accept header is received, it > could > trigger a 302 redirect, pointing at a url that specifies the desired > point in time. This is exactly what we do in Memento and with the plug-in: datetime content negotiation (X-Accept-Datetime header) on the generic URI (say /clock in wikipedia) followed by a 302 redirect to the time- specific URI (title="clock"&oldid=123456 in wikipedia). The generic URI is always only serving the current version of the page; the history URIs are serving the history pages. Herbert > > -- daniel > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiI'd like to expound on Herbert's point below. We chose 302/Location style CN (instead of 200/Content-Location) to provide more transparency in the process. So I can link to: http://en.wikipedia.org/wiki/The_Cribs but if I have my Memento FF add-on set to: X-Accept-Datetime: {Tue, 29 January 2009 11:41:00 GMT} I'll get redirected to: http://en.wikipedia.org/w/index.php?title=The_Cribs&oldid=187673999 which will show up in my browser's location bar and thus linking, sharing, etc. will be done with the correct "old" URI. This would not be the case with 200/Content-Location style CN. If the old version is not what the user wants to link, share, etc., then turning off the Memento add-on and doing a reload (possibly a shift-reload) will cause FF to correctly go back to the original URI (b/c FF does the right thing w/ the 302 semantics that say you should reuse the original URI). Wikipedia is sort of a special case in that the URI: http://en.wikipedia.org/wiki/The_Cribs will return both the current representation as well as an older representation (if CN is requested by the client). That is, that URI is both URI-R and URI-G in the parlance of: http://www.mementoweb.org/guide/http/local/ Most servers that are not hooked to a CMS (like a wiki) will have URI-G be a separate URI, presumably in a separate archive. See: http://www.mementoweb.org/guide/http/remote/ There is already support for caching & CN, see: http://httpd.apache.org/docs/2.3/content-negotiation.html#caching http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6 Of course, the current caches don't know about "X-Accept-Datetime", but that can come in the future (esp. when an RFC is written and the "X-" are removed from the various headers introduced by Memento). I'm not sure if they'll need to be aware of "Accept-Datetime" specifically, or (hopefully) they'll do the right thing with whatever values are returned in the "Vary" response header. We'll see. The goal of introducing a 5th dimension for CN (to complement type, encoding, language & charset) is that we are more likely to integrate with the existing http infrastructure. More so, we suspect, than introducing an RPC-like convention of arguments tacked onto URIs (e.g., "foo?datetime=xxx" or "foo?datetime=now") or overloading URI fragments. regards, Michael On Fri, 13 Nov 2009, Herbert Van de Sompel wrote: > On Nov 13, 2009, at 2:08, Daniel Kinzler <daniel@...> wrote: > >> Aryeh Gregor schrieb: >>> Doesn't the use of a header here violate the idea of each URL >>> representing only one resource? The server will be returning totally >>> different things for a GET to the same URL. That seems like it would >>> cause all sorts of problems -- not only do caching proxies break >>> (which I'd think by itself makes the feature unusable for users >>> behind >>> caching proxies), but how do you deal with things like bookmarking, >>> or >>> sending a link to a particular version of the page to someone? These >>> would become impossible, unless the server goes to the extra effort >>> to >>> return a redirect. >>> >>> It seems to me like a better path would be to have different URLs for >>> different dates. The obvious way to do this would be to take an >>> approach like OpenSearch, and provide a URL pattern in some standard >>> format. Maybe the page could contain <link rel=oldversions> or such, >>> with the client appending a query parameter to the given URL, say >>> time=T where T is an ISO 8601 string. >> >> How about doing both? If a X-Datetime-Accept header is received, it >> could >> trigger a 302 redirect, pointing at a url that specifies the desired >> point in time. > > This is exactly what we do in Memento and with the plug-in: datetime > content negotiation (X-Accept-Datetime header) on the generic URI > (say /clock in wikipedia) followed by a 302 redirect to the time- > specific URI (title="clock"&oldid=123456 in wikipedia). The generic > URI is always only serving the current version of the page; the > history URIs are serving the history pages. > > Herbert > > >> >> -- daniel >> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@... >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ---- Michael L. Nelson mln@... http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f) _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiДана Thursday 12 November 2009 16:52:54 Aryeh Gregor написа:
> On Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski <smolensk@...> wrote: > > I'd say it is, if sufficiently precise :) > > MediaWiki only keeps timestamps to one-second precision, so it's not. I propose the following heuristics: 1. If appropriate timestamp doesn't exist in the database, use the newest one older than the requested one. 2. If it exists, and only one revision has the timestamp, use that revision. 3. If more than one revision share the same timestamp, divide the second in the number of revisions parts, and use the revision that falls in the requested timestamp. Suppose that someone asks for Wikipedia as it looked on 2009-11-13 18:53:11.4281. There are foutr revisions that have 2009-11-13 18:53:11 timestamp, revisions 123456, 123457, 123459 and 123460. Each revision gets its quarter of the second, and since the request falls in the 2nd quarter, use revision 123457. _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikithe scenario of multiple URIs for a single Datetime (second granularity, which I think is all that RFC-822/RFC-1123 format supports) might be a good candidate for http response "300 Multiple choices": http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.1 the entity sent back with the 300 could be: 1. a TimeMap (read: ORE Resource Map), in Atom, RDF, or whatever (see the RDF example at: http://www.mementoweb.org/guide/api/map1.rdf) 2. a custom mediawiki html entity, like a history page with just the values for that Datetime, that allows the user to browse, compare, & select the version they desire. 3. a combination of #1 with an XSLT that transforms the XML into an HTML with the functionality of #2. 4. other ideas? regards, Michael On Fri, 13 Nov 2009, Nikola Smolenski wrote: > Дана Thursday 12 November 2009 16:52:54 Aryeh Gregor написа: >> On Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski <smolensk@...> > wrote: >> > I'd say it is, if sufficiently precise :) >> >> MediaWiki only keeps timestamps to one-second precision, so it's not. > > I propose the following heuristics: > > 1. If appropriate timestamp doesn't exist in the database, use the newest one > older than the requested one. > > 2. If it exists, and only one revision has the timestamp, use that revision. > > 3. If more than one revision share the same timestamp, divide the second in > the number of revisions parts, and use the revision that falls in the > requested timestamp. > > Suppose that someone asks for Wikipedia as it looked on 2009-11-13 > 18:53:11.4281. There are foutr revisions that have 2009-11-13 18:53:11 > timestamp, revisions 123456, 123457, 123459 and 123460. Each revision gets > its quarter of the second, and since the request falls in the 2nd quarter, > use revision 123457. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@... > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Michael L. Nelson mln@... http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f) _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn 13/11/2009, at 2:25 AM, Aryeh Gregor wrote: > On Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel > <hvdsomp@...> wrote: >> 2.1. The plug-in detects a client's X-Accept-Datetime header, and >> returns the mediawiki page that was active at the datetime specified >> in the header. Same for images, actually. This effectively allows >> navigating (as in clicking links) a mediawiki collection as it >> existed >> in the past: as long as a client issues an X-Accept-Datetime header, >> matching history pages/images will be retrieved. > > Doesn't the use of a header here violate the idea of each URL > representing only one resource? The server will be returning totally > different things for a GET to the same URL. That seems like it would > cause all sorts of problems -- not only do caching proxies break > (which I'd think by itself makes the feature unusable for users behind > caching proxies), but how do you deal with things like bookmarking, or > sending a link to a particular version of the page to someone? These > would become impossible, unless the server goes to the extra effort to > return a redirect. I assume the solution to this would be a Vary: X-Accept-Datetime header. -- Andrew Garrett agarrett@... http://werdn.us/ _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
|
|
Re: memento: time warp for mediawikiOn Nov 13, 2009, at 2:55 PM, Andrew Garrett wrote:
> On 13/11/2009, at 2:25 AM, Aryeh Gregor wrote: > >> On Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel >> <hvdsomp@...> wrote: >>> 2.1. The plug-in detects a client's X-Accept-Datetime header, and >>> returns the mediawiki page that was active at the datetime specified >>> in the header. Same for images, actually. This effectively allows >>> navigating (as in clicking links) a mediawiki collection as it >>> existed >>> in the past: as long as a client issues an X-Accept-Datetime header, >>> matching history pages/images will be retrieved. >> >> Doesn't the use of a header here violate the idea of each URL >> representing only one resource? The server will be returning totally >> different things for a GET to the same URL. That seems like it would >> cause all sorts of problems -- not only do caching proxies break >> (which I'd think by itself makes the feature unusable for users >> behind >> caching proxies), but how do you deal with things like bookmarking, >> or >> sending a link to a particular version of the page to someone? These >> would become impossible, unless the server goes to the extra effort >> to >> return a redirect. > > I assume the solution to this would be a Vary: X-Accept-Datetime > header. Please have a look at the HTTP Transactions for datetime content negotiation available at: http://www.mementoweb.org/guide/http/local/ This shows that we indeed include a response header: Vary: negotiate, X-Accept-Datetime Cheers Herbert Van de Sompel == Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267 _______________________________________________ Wikitech-l mailing list Wikitech-l@... https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |