|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
[PATCH v2] index-webjump: New module to define webjumps for index pages.A webjump to access URLs referenced from an index page can be defined
using define_xpath_webjump. An xpath expression is used to extract the indexed URLs and the anchor text; this provides completion for the webjump. The completion must be enabled using webjump-get-index once for each index webjump. This module also subsumes define_gitweb_summary_webjump, which results in changes to how gitweb webjumps are set up. --- The following features are added since version 1 of this patch: $description, use of html tidy, index_webjump_try_xpath. Also there's further description and examples below. This patch only shows the new index-webjumps.js module. The eventual commit will also remove the existing gitweb-webjumps.js module. Conkeror wiki pages will also be updated as follows. = Writing Webjumps = == Index webjumps == Index webjumps provide convenient access to a set of web pages that are indexed (referenced) from another page. Two kinds are provided; xpath webjumps and gitweb summary webjumps. Completions can be provided for the webjump by saving a copy of the index page to `index_webjumps_directory`, which can be set as follows. {{{ require("index-webjump.js"); index_webjumps_directory = get_home_directory(); index_webjumps_directory.appendRelativePath(".conkerorrc/index-webjumps"); }}} For each defined index webjump the index page can be saved using `M-x webjump-get-index`. === Gitweb summary webjumps === These webjumps help you visit repositories at a gitweb server: {{{ define_gitweb_summary_webjump("gitweb-ko", "http://git.kernel.org"); define_gitweb_summary_webjump("gitweb-cz", "http://repo.or.cz/w"); }}} You can now use the following webjumps: {{{ gitweb-cz conkeror gitweb-ko git/git }}} To make completions available use `M-x gitweb-webjump-get-opml` and select `gitweb-cz` then, once the download is finished, completions will be available for that webjump. Sites with many repositories (such as the two given) can take many minutes to return the OPML data. When defining the webjump, a default repository at the gitweb server can be specified using the `$default` keyword. An `$alternative` may otherwise be given as usual. If neither are given then the alternative url for the webjump is defined to be the gitweb repository list page. === XPath webjumps === An xpath webjump extracts the set of referenced web pages from an index page using an [[http://www.w3.org/TR/xpath|XPath]] expression. For these webjumps to work, the index must be downloaded using `M-x gitweb-webjump-get-opml`. Unfortunately, the xulrunner parser that is used is quite fussy and, in particular, is an xml parser. Many web pages fail to parse correctly. To correct this problem the downloaded index page is automatically cleaned up using `index_xpath_webjump_tidy_command`. The html [[http://tidy.sourceforge.net|tidy]] program should be installed for this to work. It can be a bit tricky to figure out an appropriate XPath expression; `index_webjump_try_xpath` is provided to help with that process. Examples: {{{ define_xpath_webjump( "gitdoc", "http://www.kernel.org/pub/software/scm/git/docs/", '//xhtml:dt/xhtml:a', $description = "Git documentation"); }}} The following examples require the html tidy program to be installed. {{{ define_xpath_webjump( "conkerorwiki-page", "http://conkeror.org/", '//xhtml:li/xhtml:p/xhtml:a[starts-with(@href,"/")]', $description = "Conkeror wiki pages linked from the front page"); define_xpath_webjump( "imagemagick-options", "http://imagemagick.org/script/command-line-options.php", '//xhtml:p[@class="navigation-index"]/xhtml:a', $description = "Imagemagick command line options"); }}} = BreakingChanges = Gitweb summary webjumps are now implemented as index webjumps. The `webjump-get-index` command and `index_webjumps_directory` variable are used rather than the previous gitweb equivalents. Existing gitweb opml files can be moved to the new locations using something like: {{{ cd ~/.conkerorrc mkdir index-webjumps for f in gitweb-webjumps-opml/*.opml; do mv $f index-webjumps/$(basename $f .opml).index done rmdir gitweb-webjumps-opml }}} The `$completer` option is no longer available. = User Variables = index_webjumps_directory:: :: A directory for storing the index files corresponding to index webjumps; the index data can be downloaded from the index URL using `webjump-get-index`. If the index file is available for an index webjump then the webjump will provide completions for the indexed URLs. index_xpath_webjump_tidy_command:: :: A command to run on the downloaded index. The xulrunner parser is quite fussy and specifically requires xhtml (or other xml). Running something like html tidy can avoid parser problems. --- modules/index-webjump.js | 312 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 312 insertions(+), 0 deletions(-) create mode 100644 modules/index-webjump.js diff --git a/modules/index-webjump.js b/modules/index-webjump.js new file mode 100644 index 0000000..16d1a49 --- /dev/null +++ b/modules/index-webjump.js @@ -0,0 +1,312 @@ +/** + * (C) Copyright 2009 David Kettler + * + * Use, modification, and distribution are subject to the terms specified in the + * COPYING file. + * + * Construct a webjump (with completer) to visit URLs referenced from + * an index page. An xpath expression is used to extract the indexed + * URLs. A specialized form is also provided for gitweb summary + * pages. +**/ + +require("webjump.js"); + +/* Objects with completion data for index webjumps. */ +index_webjumps = {}; + +define_variable("index_webjumps_directory", null, + "A directory for storing the index files corresponding to " + + "index webjumps; the index data can be downloaded from the " + + "index URL using webjump-get-index. " + + "If the index file is available for an index webjump then " + + "the webjump will provide completions for the indexed URLs."); + +define_variable("index_xpath_webjump_tidy_command", + "tidy -asxhtml -wrap 0 -modify -quiet --show-warnings no", + "A command to run on the downloaded index. The xulrunner " + + "parser is quite fussy and specifically requires xhtml (or " + + "other xml). Running something like html tidy can avoid " + + "parser problems."); + +function index_webjump(key, url, file) { + this.key = key; + this.url = url; + this.file = this.canonicalize_file(file); + + if (this.require_completions && !this.file) + throw interactive_error("Index file not defined for " + this.key); +} +index_webjump.prototype = { + constructor : index_webjump, + + mime_type : null, + xpath_expr : null, + make_completion : null, + require_completions : false, + completions : null, + file_time : 0, + tidy_command : null, + + /* Extract full completion list from index file. */ + extract_completions : function () { + /* Parse the index file. */ + var stream = Cc["@mozilla.org/network/file-input-stream;1"] + .createInstance(Ci.nsIFileInputStream); + stream.init(this.file, MODE_RDONLY, 0644, false); + var parser = Cc["@mozilla.org/xmlextras/domparser;1"] + .createInstance(Ci.nsIDOMParser); + var doc = parser.parseFromStream(stream, null, + this.file.fileSize, this.mime_type); + + /* Extract the completion items. */ + var cmpl = [], node, res; + res = doc.evaluate( + this.xpath_expr, doc, xpath_lookup_namespace, + Ci.nsIDOMXPathResult.UNORDERED_NODE_ITERATOR_TYPE, null); + while ((node = res.iterateNext())) + cmpl.push(this.make_completion(node)); + + cmpl.sort(function(a, b) { + if (a[1] < b[1]) return -1; + if (a[1] > b[1]) return 1; + if (a[0] < b[0]) return -1; + if (a[0] > b[0]) return 1; + return 0; + }); + + this.completions = cmpl; + }, + + /* The guts of the completer. */ + internal_completer : function (input, pos, conservative) { + if (pos == 0 && conservative) + yield co_return(undefined); + + let require = this.require_completions; + + /* Update full completion list if necessary. */ + if (require && !this.file.exists()) + throw interactive_error("Index file missing for " + this.key); + if (this.file.exists() && + this.file.lastModifiedTime > this.file_time) { + this.file_time = this.file.lastModifiedTime; + this.extract_completions(); + } + if (require && !this.completions) + throw interactive_error("No completions for " + this.key); + if (!this.completions) + yield co_return(null); + + /* Match completions against input. */ + let words = trim_whitespace(input.toLowerCase()).split(/\s+/); + let data = this.completions.filter(function (x) { + for (var i = 0; i < words.length; ++i) + if (x[0].toLowerCase().indexOf(words[i]) == -1 && + x[1].toLowerCase().indexOf(words[i]) == -1) + return false; + return true; + }); + + let c = { count: data.length, + get_string: function (i) data[i][0], + get_description: function (i) data[i][1], + get_input_state: function (i) [data[i][0]], + get_match_required: function() require + }; + yield co_return(c); + }, + + /* A completer suitable for supplying to define_webjump. */ + make_completer : function() { + if (!this.file) + return null; + let jmp = this; + return function (input, pos, conservative) { + return jmp.internal_completer(input, pos, conservative); + }; + }, + + /* Fetch and save the index for later use with completion. + * (buffer is used only to associate with the download) */ + get_index : function (buffer) { + if (!this.file) + throw interactive_error("Index file not defined for " + this.key); + + var cwd = null; + if (index_webjumps_directory instanceof Ci.nsILocalFile) + cwd = index_webjumps_directory.path; + else if (index_webjumps_directory) + cwd = index_webjumps_directory; + + var info = save_uri(load_spec(this.url), this.file, + $buffer = buffer, $use_cache = false, + $temp_file = true); + + // Note: it would be better to run this before the temp file + // is renamed; that requires support in save_uri. + if (this.tidy_command) + info.set_shell_command(this.tidy_command, cwd); + }, + + /* Try to make a suitable file object when the supplied file is a + * string or null. */ + canonicalize_file : function (file) { + if (typeof file == 'string') + file = make_file(file); + if (!file && index_webjumps_directory) { + file = Cc["@mozilla.org/file/local;1"] + .createInstance(Ci.nsILocalFile); + if (index_webjumps_directory instanceof Ci.nsILocalFile) + file.initWithFile(index_webjumps_directory); + else + file.initWithPath(index_webjumps_directory); + file.appendRelativePath(this.key + ".index"); + } + return file; + } +} + + +function index_webjump_xhtml(key, url, file, xpath_expr) { + index_webjump.call(this, key, url, file); + this.xpath_expr = xpath_expr; +} +index_webjump_xhtml.prototype = { + constructor : index_webjump_xhtml, + + require_completions : true, + mime_type : "application/xhtml+xml", + tidy_command : index_xpath_webjump_tidy_command, + + make_completion : function (node) { + return [makeURLAbsolute(this.url, node.href), node.text]; + }, + + __proto__ : index_webjump.prototype +} + + +function index_webjump_gitweb(key, url, file) { + index_webjump.call(this, key, url, file); +} +index_webjump_gitweb.prototype = { + constructor : index_webjump_gitweb, + + mime_type : "text/xml", + xpath_expr : '//outline[@type="rss"]', + + make_completion : function (node) { + var name = node.getAttribute("text"); + return [name.replace(/\.git$/, ""), ""]; + }, + + __proto__ : index_webjump.prototype +} + + +interactive("webjump-get-index", + "Fetch and save the index URL corresponding to an index " + + "webjump. It will then be available to the completer.", + function (I) { + var completions = []; + for (let i in index_webjumps) + completions.push(i); + completions.sort(); + + var key = yield I.minibuffer.read( + $prompt = "Fetch index for index webjump:", + $history = "webjump", + $completer = + all_word_completer($completions = completions), + $match_required = true); + + var jmp = index_webjumps[key]; + if (jmp) + jmp.get_index(I.buffer); + }); + +/** + * Construct a webjump to visit URLs referenced from an index page. + * + * The index page must be able to be parsed as xhtml. The anchor + * nodes indexed are those that match the given xpath_expr. Don't + * forget to use xhtml: prefixes on the xpath steps. + * + * If an alternative is not specified then it is set to the index page. + * + * A completer is provided that uses the index page. A local file for + * the index must be specified either with $index_file or via + * index_webjumps_directory. The index must be manually downloaded; + * eg. using webjump-get-index. Each time the completer is used it + * will check if the file has been updated and reload if necessary. + * This kind of webjump is not useful without the completions. + */ +define_keywords("$alternative", "$index_file", "$description"); +function define_xpath_webjump(key, index_url, xpath_expr) { + keywords(arguments); + let alternative = arguments.$alternative || index_url; + + var jmp = new index_webjump_xhtml(key, index_url, arguments.$index_file, + xpath_expr); + index_webjumps[key] = jmp; + + define_webjump(key, function (term) {return term;}, + $completer = jmp.make_completer(), + $alternative = alternative, + $description = arguments.$description); +} + +/** + * Modify the xpath for an index webjump and show the resulting + * completions. Useful for figuring out an appropriate xpath. Either + * run using mozrepl or eval in the browser with the dump parameter + * set. + */ +function index_webjump_try_xpath(key, xpath_expr, dump) { + jmp = index_webjumps[key]; + if (xpath_expr) + jmp.xpath_expr = xpath_expr; + jmp.extract_completions(); + if (dump) + dumpln(dump_obj(jmp.completions, + "Completions for index webjump " + key)); + return jmp.completions; +} + + +/** + * Construct a webjump to visit repository summary pages at a gitweb + * server. + * + * If a repository name is supplied as $default then the alternative + * url is set to that repository at the gitweb site. If an + * alternative is not specified by either $default or $alternative + * then it is set to the repository list page of the gitweb site. + * + * A completer is provided that uses the list of repositories from the + * OPML data on the gitweb server. The completer is setup in the same + * way as for define_xpath_webjump, but the webjump will work without + * the completions. + */ +define_keywords("$default", "$alternative", "$opml_file", "$description"); +function define_gitweb_summary_webjump(key, base_url) { + keywords(arguments); + let alternative = arguments.$alternative; + let gitweb_url = base_url + "/gitweb.cgi"; + let summary_url = gitweb_url + "?p=%s.git;a=summary"; + let opml_url = gitweb_url + "?a=opml"; + + if (arguments.$default) + alternative = summary_url.replace("%s", arguments.$default); + if (!alternative) + alternative = gitweb_url; + + var jmp = new index_webjump_gitweb(key, opml_url, arguments.$opml_file); + index_webjumps[key] = jmp; + + define_webjump(key, summary_url, + $completer = jmp.make_completer(), + $alternative = alternative, + $description = arguments.$description); +} -- 1.6.3.1 _______________________________________________ Conkeror mailing list Conkeror@... https://www.mozdev.org/mailman/listinfo/conkeror |
|
|
[PATCH] index-webjump: test suite---
tests/simple/gitweb-webjump-test.opml | 12 +++++++++ tests/simple/index-webjump.js | 45 +++++++++++++++++++++++++++++++++ tests/simple/xpath-webjump-test.xhtml | 12 +++++++++ 3 files changed, 69 insertions(+), 0 deletions(-) create mode 100644 tests/simple/gitweb-webjump-test.opml create mode 100644 tests/simple/index-webjump.js create mode 100644 tests/simple/xpath-webjump-test.xhtml diff --git a/tests/simple/gitweb-webjump-test.opml b/tests/simple/gitweb-webjump-test.opml new file mode 100644 index 0000000..661eefc --- /dev/null +++ b/tests/simple/gitweb-webjump-test.opml @@ -0,0 +1,12 @@ +<?xml version="1.0" encoding="utf-8"?> +<opml version="1.0"> +<head> + <title>foobar repositories OPML Export</title> +</head> +<body> +<outline text="git RSS feeds"> +<outline type="rss" text="foo.git" title="foo.git" xmlUrl="dummy://gitweb/gitweb.cgi?p=foo.git;a=rss" htmlUrl="dummy://gitweb/gitweb.cgi?p=foo.git;a=summary"/> +<outline type="rss" text="bar.git" title="bar.git" xmlUrl="dummy://gitweb/gitweb.cgi?p=bar.git;a=rss" htmlUrl="dummy://gitweb/gitweb.cgi?p=bar.git;a=summary"/> +</outline> +</body> +</opml> diff --git a/tests/simple/index-webjump.js b/tests/simple/index-webjump.js new file mode 100644 index 0000000..2b023c1 --- /dev/null +++ b/tests/simple/index-webjump.js @@ -0,0 +1,45 @@ +require('walnut.js'); +require('index-webjump.js'); + +{ let suite = { + suite_setup: function () { + this.real_webjumps = webjumps; + this.real_index_webjumps = index_webjumps; + conkeror.webjumps = {}; + conkeror.index_webjumps = {}; + }, + suite_teardown: function () { + conkeror.webjumps = this.real_webjumps; + conkeror.index_webjumps = this.real_index_webjumps; + }, + path: conkeror_source_code_path + "/tests/simple", + test_xpath_webjump: function () { + define_xpath_webjump( + "xpath", "http://dummy/xpath", '//xhtml:a[@class="index"]', + $index_file = this.path + '/xpath-webjump-test.xhtml'); + assert_equals(getWebJump("xpath foo"), "foo"); + var w = index_webjumps.xpath; + w.extract_completions(); + assert_equals(w.completions.length, 2); + assert_equals(w.completions[0][1], "The bar"); + assert_equals(w.completions[0][0], "http://dummy/xpath/bar"); + assert_equals(w.completions[1][1], "The foo"); + assert_equals(w.completions[1][0], "http://dummy/foo"); + }, + test_gitweb_webjump: function() { + define_gitweb_summary_webjump( + "gitweb", "http://dummy/gitweb", $default = "bar", + $opml_file = this.path + '/gitweb-webjump-test.opml'); + assert_equals(getWebJump("gitweb"), + "http://dummy/gitweb/gitweb.cgi?p=bar.git;a=summary"); + assert_equals(getWebJump("gitweb foo"), + "http://dummy/gitweb/gitweb.cgi?p=foo.git;a=summary"); + var w = index_webjumps.gitweb; + w.extract_completions(); + assert_equals(w.completions.length, 2); + assert_equals(w.completions[0][0], "bar"); + assert_equals(w.completions[1][0], "foo"); + }, + }; + walnut_run(suite); +} diff --git a/tests/simple/xpath-webjump-test.xhtml b/tests/simple/xpath-webjump-test.xhtml new file mode 100644 index 0000000..89e1d0b --- /dev/null +++ b/tests/simple/xpath-webjump-test.xhtml @@ -0,0 +1,12 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"> +<head> +<title>Dummy</title> +</head> +<body> +<a class="index" href="foo">The foo</a> +<a class="index" href="http://dummy/xpath/bar">The bar</a> +<a href="baz">The baz</a> +</body> +</html> -- 1.6.3.3 _______________________________________________ Conkeror mailing list Conkeror@... https://www.mozdev.org/mailman/listinfo/conkeror |
| Free embeddable forum powered by Nabble | Forum Help |