|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(Having read what was forwarded to -qa only) May I suggest to provide a little bit more details in a wiki page on wiki.debian.org on this initiative, so that the context is more clear for everybody potentially interested ? I think there's probably a lot of interest beyond UDD for such metadata standardization. My 2 cents, Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit : > [debian-qa in CC because here we are discussing UDD issues.] > > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote: > > First of all, let's summarise the situation. We want to integrate some metadata > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. > > I would like to add that most probably there might evolve even other use > cases for this kind of data. Keeping this in mind we might consider > moving the topic to debian-devel in the next stage of development. > > > What I propose is to have a special file in the source packages for gathering > > all possible useful informations, debian/upstream-metadata.yaml. > -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Using RDF and ontologies for such metadata (combined DOAP and other ontologies) Was: Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(responding with more feedback as I have taken time to dig the debian-med archives) If I get it right, you intend to match bibliographic references and software projects / packages ? I'd very much suggest adopting a Semantic Web perspective in a way to provide such links as RDF descriptions that can use ontologies used already by other applications, hence contributing to LinkedData [0] (maybe through microformats embedded as RDFa in the current 'web sentinels' or as specific RDF feeds. For an example of such application, see : http://www.connotea.org/rss/search?q=SAMtools which mixes RSS 1.0 with other ontologies (and exactly the same example you provided more or less). Here, you may then link DOAP [1] with existing bibliographic ontologies like PRISM. This of course could be provided from UDD also if UDD was to participate more to the Semantic Web as I proposed in http://lists.debian.org/debian-qa/2009/02/msg00016.html and further discussions. Just my 2 cents, [0] : http://linkeddata.org/ [1] : http://trac.usefulinc.com/doap Btw, : Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit : > Alternatively we could do > > CREATE TABLE upstream-metadata ( > package text, > key text, > value text, > PRIMARY KEY (package,key) > ); This very much looks like triples of RDF, which could store any metadata expressed in any RDF ontology, so that might be really useful ;) Best regards, -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. Dear Andreas and Olivier, thank you for your encouraging comments. I have made one more step forward, and upstream-metadata.debian.net now stores its information in a Berkeley database, refreshing only the data when it is older than a given age when it is accessed. For the moment, we only have 17 source packages that have an upstream-metadata.yaml file in their debian directory that is accessible through a public VCS. Nevertheless, I think that it is enough for a proof of principle. After resetting the database, I ‘loaded’ the data by accessing it: for package in bioperl clustalx mummer seaview perlprimer samtools dicomscope clustalw r-cran-combinat r-cran-haplo.stats r-cran-qvalue r-cran-randomforest r-cran-rocr r-other-bio3d mira bwa infernal ; do wget http://upstream-metadata.debian.net/$package/DOI -O /dev/stdout 2> /dev/null; done After loading, the resulting table are available here: http://upstream-metadata.debian.net/table/DOI Obviously, not all packages contain programs that have been described in an academic article (http://dx.doi.org/)… For the moment, one has to access an arbitrary key, but later the best would be to have a special key, for instance YAML-UPDATE, that would force the update. If it is possible to have a per-file commit hook, then each time a upstream-metadata.yaml is modified, the debian.net site can updated. Next step is to feed the UDD. For the moment, the site produces one table per keyword. The rationale is that for many keywords, the data will be too sparse to be interesting for the UDD. My current idea is to generate the tables for a limited set of curated keywords, assemble them (with the unix join command?), and give leave this in a public place that the UDD can read. In parallel, as Olivier suggested, each table could be exprorted in RDF format. But I am not sure I undersand it. Olivier, could you suggest a Perl module to use? As long as we are in a draft phase, I think that we can live with the currently biggest limitation: the lack of support for packages that are not stored in a VCS. One possible way to solve the problem is to provide repository, for instance in collab-maint on Alioth, where people can drop one yaml file per source packages. We could also unpack source files, as Andreas suggested. For the UDD import, what would be the most suitable among the two propositions of Andreas? > CREATE TABLE upstream-metadata ( > package text, > key1 text, > key2 text, > ... > keyN text, > PRIMARY KEY package > ); > CREATE TABLE upstream-metadata ( > package text, > key text, > value text, > PRIMARY KEY (package,key) > ); Since the addition of more meta-data to our source packages is a frequent issue raised on debian-devel, I think that there is a general interst for standardising ‘field’ names, whichever the technical solution that will be adopted. I will try to find a proper place on wiki.debian.org to let pepole document the fields they create, and if necessary discuss them. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Le Thu, Oct 22, 2009 at 09:49:10AM +0200, Andreas Tille a écrit :
> > BTW, I'm a bit concerned about mixing different database formats: On one > hand you are using yaml on the other hand BibTeX. Well, for sure having > a BibTeX record is very valuable. But on the other hand the tools who > are working with this data will need a BibTeX parser. I did not dived > into this and for sure it is doable - but I just wanted to raise this > topic here to hear opinions. Hi Andreas and all, since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to transfer this part of the discussion on debian-science@..., where I reopened an old thread. http://lists.debian.org/msgid-search/20091026145532.GA6594@... have a nice day, -- Charles -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)On Mon, Oct 26, 2009 at 11:05:10PM +0900, Charles Plessy wrote:
> For the moment, one has to access an arbitrary key, but later the best would be > to have a special key, for instance YAML-UPDATE, that would force the update. Or rather "upstream-metadata update". You certainly would not like to update the YAML standard. ;-) > If it is possible to have a per-file commit hook, then each time a > upstream-metadata.yaml is modified, the debian.net site can updated. As I said: I'm afraid it is hard to ensure that *every* potential VCS has a properly configured commit hook. I'm no VCS expert but it sounds hard to maintain. > Next step is to feed the UDD. For the moment, the site produces one table per > keyword. The rationale is that for many keywords, the data will be too sparse > to be interesting for the UDD. My current idea is to generate the tables for a > limited set of curated keywords, assemble them (with the unix join command?), > and give leave this in a public place that the UDD can read. As I said in my previous mail it is perfectly OK if there is a way to fetch the original upstream-metadata.yaml files in some reasonable way. Reading these is probably much easier than any aggregated format. > For the UDD import, what would be the most suitable among the two propositions > of Andreas? Well, I have no idea - it was a question and I gave the pros and cons for both variants in my mail. > > CREATE TABLE upstream-metadata ( > > package text, > > key1 text, > > key2 text, > > ... > > keyN text, > > PRIMARY KEY package > > ); > > > CREATE TABLE upstream-metadata ( > > package text, > > key text, > > value text, > > PRIMARY KEY (package,key) > > ); > > Since the addition of more meta-data to our source packages is a frequent issue > raised on debian-devel, I think that there is a general interst for > standardising ???field??? names, whichever the technical solution that will be > adopted. So if we have a really standardised set of keywords probably the first method sounds apropriate for the problem. > I will try to find a proper place on wiki.debian.org to let pepole document > the fields they create, and if necessary discuss them. Sounds good Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to > transfer this part of the discussion on debian-science@..., where I reopened > an old thread. Well, the question is not really about BibTeX or not. The question is whether it is a good idea to have a database format as a field value. If you have the field "Publication" and a complete BibTeX record as value I somehow wonder whether this is useful in the end or whether we rather should translate the record in a SQL table. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Le Mon, Oct 26, 2009 at 04:05:17PM +0100, Andreas Tille a écrit :
> On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote: > > since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to > > transfer this part of the discussion on debian-science@..., where I reopened > > an old thread. > > Well, the question is not really about BibTeX or not. The question is > whether it is a good idea to have a database format as a field value. > If you have the field "Publication" and a complete BibTeX record as > value I somehow wonder whether this is useful in the end or whether we > rather should translate the record in a SQL table. That is a good question, that I would rephrase: what should be stored, and should everything be exported? For the moment the BibTeX stored reference is a rather experimental feature, and its purpose is also to test the YAML format. As you probalbly noticed, the key parts of the BibTeX reference that allow to construct a weblink to the published article—the digital object identifier (DOI) and the PubMed record ID—have their own YAML mapping: I do not expect the BibTeX reference to be extracted and parsed, nor to be exported to SQL. On the other hand, it can be easily popped out at build time with a Perl oneliner (‘http://lists.debian.org/msgid-search/20090808073608.GF17276@...’). [For further discussion about how to make nice links on the Blends web sentinels, I propose to elaborate on another list.] There is another volatile meta-data with a much broader scope that could be included in the upstream-metadata.yaml file (or whichever smarter name we give to it), the Debian watch file. All the objections you made above apply. We could either store it raw in a YAML mapping, like: Watch: | version=3 opts=dversionmangle=s/~dfsg// \ http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2 Or split the information in multiple mappings: Watch-Version : 3 Watch-Options : dversionmangle=s/~dfsg// Watch-Regexp : http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2 While the last option looks more structured, we should really think twice if it makes sense to have the ‘Watch’ metadata in a tabluar SQL database, or if simply storing it raw somewhere else is enough. The same conclusion may apply to similar resources like the BibTeX reference. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)[Moving a thread from debian-med to debian-science because the problem
originated here some time ago.] On Wed, Oct 28, 2009 at 07:35:10PM +0900, Charles Plessy wrote: > > That is a good question, that I would rephrase: what should be stored, and > should everything be exported? The current use of specific publication data is a good application for upstream-metadata.yaml and here we actually need single fields of the BibTeX record. So whether it should be exported can clearly be answered with yes. It is just the question how to store it in UDD. > For the moment the BibTeX stored reference is a rather experimental feature, > and its purpose is also to test the YAML format. Sure, thats perfectly all right. But we have an application for exactly this *now* and IMHO it makes sense to clarify this in the beginning. > As you probalbly noticed, the > key parts of the BibTeX reference that allow to construct a weblink to the > published article???the digital object identifier (DOI) and the PubMed record > ID???have their own YAML mapping: Ahh, good you bring up this point again because I stumbled upon this but forgot to mention in my reply: I do not consider it a good idea to store one field two times. This just sucks. IMHO DOI and PubMed just are publication data and mention them twice. is wrong. > I do not expect the BibTeX reference to be > extracted and parsed, nor to be exported to SQL. But that's exactly what I need to do to solve the original problem to publish the publication data on the tasks pages. I perfectly agree the scope of your suggestion is much wider - but if we see a need for storing the publication data we should clarify in the beginning how they should be handled and whether the form is apropriately choosen. > On the other hand, it can be > easily popped out at build time with a Perl oneliner > (???http://lists.debian.org/msgid-search/20090808073608.GF17276@...???). Well, yes, that presents any YAML field - but you need to parse the BibTeX format in case you extract the Reference field. > [For further discussion about how to make nice links on the Blends web > sentinels, I propose to elaborate on another list.] I'm not sure whether my move to debian-science was the list you had in mind - but I think it is a wider forum which has an interest in publication issues. > There is another volatile meta-data with a much broader scope that could be > included in the upstream-metadata.yaml file (or whichever smarter name we give > to it), the Debian watch file. All the objections you made above apply. > > ... > > While the last option looks more structured, we should really think twice if it > makes sense to have the `Watch` metadata in a tabluar SQL database, or if > simply storing it raw somewhere else is enough. The same conclusion may apply > to similar resources like the BibTeX reference. While I perfectly agree that data in watch files are actually upstream-metadata I do not think that any atempt to move this data to another place would be really successful. The rationale why I'm thinking so is that you try to fix a non existent problem. Normally you change something if you realise something is broken. Even then it is hard to exchange an established system. But with watch files nothing is broken in principle. (Well, there are issues with uscan and there are several atempts to enhance this - but this is not a problem *where* (debian/watch or a different file) the data is stored nor *how* (text or yaml)). So if you are atempting to gather agreement for a new control file (which is reasonable in my opinion ) I would not start with convincing people to change things which do not really need a change. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)This idea of extra metadata storage is really excellent.
I'd like to suggest the following: Move this thread to debian-devel for a wider discussion. Move the upstream-metadata.yaml, Homepage, debian/watch out of source packages since they need to be able to change independently of the Debian package. Not sure what the right location is, but I'd suggest UDD could be the canonical location for it and a web interface at alioth be the way to edit it (like the debtags interface). -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)On 29/10/09 at 10:29 +0800, Paul Wise wrote:
> This idea of extra metadata storage is really excellent. > > I'd like to suggest the following: > > Move this thread to debian-devel for a wider discussion. > > Move the upstream-metadata.yaml, Homepage, debian/watch out of source > packages since they need to be able to change independently of the > Debian package. Not sure what the right location is, but I'd suggest > UDD could be the canonical location for it and a web interface at > alioth be the way to edit it (like the debtags interface). Some comments: 1/ UDD currently can't be the canonical location for this data: there are no backups of UDD currently, because it's supposed to be possible to remove the database, create it again, and import everything back in less than 2 days. So ideally, there would be another place where that data is stored, and it's simply imported to UDD. Or we would have to talk to DSA about backups (also possible). 2/ You might be trying to be too generic here. There are not so many different kinds of packages metadata that would be suitable for this thing, so you might want to just build it for your purpose (bibtex metadata) and forget the general picture. Past efforts for building a unified repository of metadata have basically failed, because of a lack of interest in the end, I think. It might be better to store that data directly into packages (debian/foo.bib), and have an UDD importer that extracts that data from packages. -- | Lucas Nussbaum | lucas@... http://www.lucas-nussbaum.net/ | | jabber: lucas@... GPG: 1024D/023B3F4F | -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(Responding a little late after vacation time.) Le lundi 26 octobre 2009 à 23:05 +0900, Charles Plessy a écrit : > > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote: > > > First of all, let's summarise the situation. We want to integrate some metadata > > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. > > Dear Andreas and Olivier, > > thank you for your encouraging comments. SNIP > In parallel, as Olivier suggested, each table could be exprorted in RDF format. > But I am not sure I undersand it. What exactly don't you understand ? ;) If you look back at the pointers I provided in http://lists.debian.org/debian-qa/2009/10/msg00050.html you'll find an example of using the PRISM and CONNOTEA ontologies for links with DOI and PUBMED IDs (more details in http://www.prismstandard.org/resources/mod_prism.html maybe). > Olivier, could you suggest a Perl module to > use? > I suppose that searching for perl+rdf on your preferred search engine will retrieve useful code ;) I'm not a perl hacker myself, but as RDF is a standard of the W3C, there are probably plenty of perl code to produce RDF. http://search.cpan.org/~mthurn/RDF-Simple-0.415/lib/RDF/Simple/Serialiser.pm seems to be a valid candidate for first experiments. Hope this helps. Best regards, -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-qa-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
| Free embeddable forum powered by Nabble | Forum Help |