Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

View: New views
12 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[debian-qa in CC because here we are discussing UDD issues.]

On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> First of all, let's summarise the situation. We want to integrate some metadata
> in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.

I would like to add that most probably there might evolve even other use
cases for this kind of data.  Keeping this in mind we might consider
moving the topic to debian-devel in the next stage of development.

> What I propose is to have a special file in the source packages for gathering
> all possible useful informations, debian/upstream-metadata.yaml.

I have noticed this and I really like this effort very much (even if I
did not actively suported it by adding such a file for packages I
touched recently).

> In contrary to
> debian/control, this file would not contribute data to the Packages.gz files of
> the Debian archive. I think that there are enough source packages managed in
> version control systems that we can use them as the main source of our data.

I'm not really happy about this "we ignore packages which are not
maintained in VCS" attitude but it sounds reasonably to assume that in
practice all those package that potentially contain such kind of
information are actually maintained in a VCS.  An alternative way to
gather the information popped up in my mind:  There is some code that
checks the translation status of upstream sources by unpacking all
source packages and checking for <lang>.po files.  So there is actually
some code which handles complete unpacking of Debian source packages
which might be used to fetch debian/upstream-metadata.yaml as well.
The pro is to get all packages - the con is that it only seeks in
already uploaded packages.

> This makes debian/upstream-metadata.yaml available indendantly of the Debian
> archive, and more importantly, will allow to update the metadata without
> uploading the package, but in a way that only the maintainers can do the
> update, which keeps things under control.

This has a certain advantage of flexibility over the method I suggested
above.  I'm not sure what way I would prefer.  Implementation wise
probably the VCS method is way easier to implement - so we probably
should stick to your decision - but I wanted to mention an alternative
way which IMHO might have slightly more chances to get accepted on
debian-devel for general purposes because people there might be
interested in completeness.
 

> The missing piece of the puzzle is then an aggregator that would collect the
> information from the source packages and prepare tables for the UDD. I am drafting
> such a program at http://upstream-metadata.debian.net/. Currently, it does
> not do much:
>
> http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if
> the package is in a subversion server that is available to ???debcheckout???. Luckily,
> most of our packages are.
>
> http://upstream-metadata.debian.net/<package>/<key> gives the content of the
> metadata for one key.

This sounds really good.

> For instance, http://upstream-metadata.debian.net/samtools/PMID gives the
> PubMed identification number for the article describing SamTools, 19505943.
>
> This is the proof or principle for data retreival. Then, we need to construct
> the tables.  I plan to have the program store the results in a BerkeleyDB
> database, and to make it output tables at constant intervals, for instance
> daily. The update of the internal database would we done in two ways.

If you plan to propagate this data to UDD this might not be an optimal
solution.  UDD imports are usually a two step process:

  1. Fetch text data from whatever source in clear text.
  2. Delete table, read text data and put it into the table.

If we want to follow this scheme for our specific case IMHO it would be the
best idea to just drop a <package>.yaml file in a directory where rsync or
wget can fetch these files.  the second step to read the yaml files is quite
simple.
 
> First, updates could be pushed with commit hooks when package maintainers
> commit changes to debian/upstream-metadata.yaml. It could be as simple as
> having an url that triggers an update, and using wget or curl to activate the
> aggregator.
>
> Second, normal read access could trigger an update if the record is getting old.

Currently UDD updates are time based (per cron job) and not event based
(per commit of some data).  If you gather the data by any means at
upstream-metadata.debian.net this is not really relevant for UDD import
(OK, it makes sense to synchronise the cron jobs to make sure that
upstream-metadata cron job runs before UDD cron job fetches data.  So I
would vote for the option which is safer to implement.  In this aspect I
would prefer the second method and run the job once a day.  The reason
is that if I'm not completely wrong the VCS push would require to
configure *every* VCS which *potentially* might contain
upstream-metadata.yaml files.  This is a weak aproach because you do not
have control over all VCSes and chances are very high that this will not
happen on all VCSes and it sounds quite hard to propagate changes to the
commit hooks (imagine upstream-metadata.debian.net becomes
upstream-metadata.debian.org or whatever).  In this sense I would vote
for relaying on the VCS fields in the packaging information and fetch
information via cron job using the Vcs specified in debian/control.
 
> In summary, I propose to store metadata in YAML format in the source pacakges,
> retreive and store it in a central place using a web agent through the VCS in
> which the source packages are stored, and periodically output tables for the
> UDD, which keeps a central role for the generation of our web sentinel pages.

I like this approach.  But there is one thing I'm not really sure about:
How should we design the UDD table?  There are two options:

CREATE TABLE upstream-metadata (
    package text,
    key1    text,
    key2    text,
    ...
    keyN    text,
    PRIMARY KEY package
);

with a defined set of keys allowed in upstream-metadata.yaml and exactly
one row per package.  Every unknown key will be ignored.  The
advantage of this approach is that tools *know* what keys to expect and
can just relay on how to handle these.

Alternatively we could do

CREATE TABLE upstream-metadata (
    package text,
    key     text,
    value   text,
    PRIMARY KEY (package,key)
);

with an arbitrary number of rows per package but no duplicated keys for
one package.  This is more flexible in case you need some new kind of
data you do not need to touch the UDD table structure but it restricts
the keys to only one per package.

The thir option is to leave out the PRIMARY KEY constraint at all which
allows maximum flexibility (for instance there might be more than one
citation records).

BTW, I'm a bit concerned about mixing different database formats: On one
hand you are using yaml on the other hand BibTeX.  Well, for sure having
a BibTeX record is very valuable.  But on the other hand the tools who
are working with this data will need a BibTeX parser.  I did not dived
into this and for sure it is doable - but I just wanted to raise this
topic here to hear opinions.

> The proof of principle presented above is only a few lines of code, but I would
> prefer discuss further the idea before putting more time on it.

Thanks for pushing this foreward!
 
> Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in
> the packages I maintain, so that meaningful tests are doable for table
> generation later. I do not remember the list by heart, but it contains seaview,
> bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have
> updated recently.
>
> Since I am quite unexperienced in programming, help is of course most welcome.

As I said above: IMHO most of the work is done if you can provide a set
of <package>.yaml files at a freely accessible place.

Kind regards

       Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(Having read what was forwarded to -qa only)

May I suggest to provide a little bit more details in a wiki page on
wiki.debian.org on this initiative, so that the context is more clear
for everybody potentially interested ?

I think there's probably a lot of interest beyond UDD for such metadata
standardization.

My 2 cents,

Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit :

> [debian-qa in CC because here we are discussing UDD issues.]
>
> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata
> > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.
>
> I would like to add that most probably there might evolve even other use
> cases for this kind of data.  Keeping this in mind we might consider
> moving the topic to debian-devel in the next stage of development.
>
> > What I propose is to have a special file in the source packages for gathering
> > all possible useful informations, debian/upstream-metadata.yaml.
>

--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Using RDF and ontologies for such metadata (combined DOAP and other ontologies) Was: Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(responding with more feedback as I have taken time to dig the
debian-med archives)

If I get it right, you intend to match bibliographic references and
software projects / packages ?

I'd very much suggest adopting a Semantic Web perspective in a way to
provide such links as RDF descriptions that can use ontologies used
already by other applications, hence contributing to LinkedData [0]
(maybe through microformats embedded as RDFa in the current 'web
sentinels' or as specific RDF feeds.

For an example of such application, see :
http://www.connotea.org/rss/search?q=SAMtools
which mixes RSS 1.0 with other ontologies (and exactly the same example
you provided more or less).

Here, you may then link DOAP [1] with existing bibliographic ontologies
like PRISM.

This of course could be provided from UDD also if UDD was to participate
more to the Semantic Web as I proposed in
http://lists.debian.org/debian-qa/2009/02/msg00016.html and further
discussions.

Just my 2 cents,

[0] : http://linkeddata.org/
[1] : http://trac.usefulinc.com/doap

Btw, :

Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit :

> Alternatively we could do
>
> CREATE TABLE upstream-metadata (
>     package text,
>     key     text,
>     value   text,
>     PRIMARY KEY (package,key)
> );

This very much looks like triples of RDF, which could store any metadata
expressed in any RDF ontology, so that might be really useful ;)

Best regards,
--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata
> > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.

Dear Andreas and Olivier,

thank you for your encouraging comments. I have made one more step forward, and
upstream-metadata.debian.net now stores its information in a Berkeley database,
refreshing only the data when it is older than a given age when it is accessed.

For the moment, we only have 17 source packages that have an
upstream-metadata.yaml file in their debian directory that is accessible
through a public VCS. Nevertheless, I think that it is enough for a proof of
principle.

After resetting the database, I ‘loaded’ the data by accessing it:

for package in bioperl clustalx mummer seaview perlprimer samtools dicomscope clustalw r-cran-combinat r-cran-haplo.stats r-cran-qvalue r-cran-randomforest r-cran-rocr r-other-bio3d mira bwa infernal ;
do wget http://upstream-metadata.debian.net/$package/DOI -O /dev/stdout 2> /dev/null;
done

After loading, the resulting table are available here:
http://upstream-metadata.debian.net/table/DOI

Obviously, not all packages contain programs that have been described in an
academic article (http://dx.doi.org/)…

For the moment, one has to access an arbitrary key, but later the best would be
to have a special key, for instance YAML-UPDATE, that would force the update.
If it is possible to have a per-file commit hook, then each time a
upstream-metadata.yaml is modified, the debian.net site can updated.

Next step is to feed the UDD. For the moment, the site produces one table per
keyword. The rationale is that for many keywords, the data will be too sparse
to be interesting for the UDD. My current idea is to generate the tables for a
limited set of curated keywords, assemble them (with the unix join command?),
and give leave this in a public place that the UDD can read.

In parallel, as Olivier suggested, each table could be exprorted in RDF format.
But I am not sure I undersand it. Olivier, could you suggest a Perl module to
use?

As long as we are in a draft phase, I think that we can live with the currently
biggest limitation: the lack of support for packages that are not stored in a
VCS. One possible way to solve the problem is to provide repository, for
instance in collab-maint on Alioth, where people can drop one yaml file per
source packages. We could also unpack source files, as Andreas suggested.

For the UDD import, what would be the most suitable among the two propositions
of Andreas?

> CREATE TABLE upstream-metadata (
>     package text,
>     key1    text,
>     key2    text,
>     ...
>     keyN    text,
>     PRIMARY KEY package
> );
 
> CREATE TABLE upstream-metadata (
>     package text,
>     key     text,
>     value   text,
>     PRIMARY KEY (package,key)
> );

Since the addition of more meta-data to our source packages is a frequent issue
raised on debian-devel, I think that there is a general interst for
standardising ‘field’ names, whichever the technical solution that will be
adopted. I will try to find a proper place on wiki.debian.org to let pepole document
the fields they create, and if necessary discuss them.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Thu, Oct 22, 2009 at 09:49:10AM +0200, Andreas Tille a écrit :
>
> BTW, I'm a bit concerned about mixing different database formats: On one
> hand you are using yaml on the other hand BibTeX.  Well, for sure having
> a BibTeX record is very valuable.  But on the other hand the tools who
> are working with this data will need a BibTeX parser.  I did not dived
> into this and for sure it is doable - but I just wanted to raise this
> topic here to hear opinions.

Hi Andreas and all,

since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
transfer this part of the discussion on debian-science@..., where I reopened
an old thread.

http://lists.debian.org/msgid-search/20091026145532.GA6594@...

have a nice day,

--
Charles


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 26, 2009 at 11:05:10PM +0900, Charles Plessy wrote:
> For the moment, one has to access an arbitrary key, but later the best would be
> to have a special key, for instance YAML-UPDATE, that would force the update.

Or rather "upstream-metadata update".  You certainly would not like to update
the YAML standard. ;-)

> If it is possible to have a per-file commit hook, then each time a
> upstream-metadata.yaml is modified, the debian.net site can updated.

As I said: I'm afraid it is hard to ensure that *every* potential VCS has
a properly configured commit hook.  I'm no VCS expert but it sounds hard
to maintain.

> Next step is to feed the UDD. For the moment, the site produces one table per
> keyword. The rationale is that for many keywords, the data will be too sparse
> to be interesting for the UDD. My current idea is to generate the tables for a
> limited set of curated keywords, assemble them (with the unix join command?),
> and give leave this in a public place that the UDD can read.

As I said in my previous mail it is perfectly OK if there is a way to fetch
the original upstream-metadata.yaml files in some reasonable way.  Reading
these is probably much easier than any aggregated format.
 
> For the UDD import, what would be the most suitable among the two propositions
> of Andreas?

Well, I have no idea - it was a question and I gave the pros and cons for both
variants in my mail.
 

> > CREATE TABLE upstream-metadata (
> >     package text,
> >     key1    text,
> >     key2    text,
> >     ...
> >     keyN    text,
> >     PRIMARY KEY package
> > );
>  
> > CREATE TABLE upstream-metadata (
> >     package text,
> >     key     text,
> >     value   text,
> >     PRIMARY KEY (package,key)
> > );
>
> Since the addition of more meta-data to our source packages is a frequent issue
> raised on debian-devel, I think that there is a general interst for
> standardising ???field??? names, whichever the technical solution that will be
> adopted.

So if we have a really standardised set of keywords probably the first method
sounds apropriate for the problem.

> I will try to find a proper place on wiki.debian.org to let pepole document
> the fields they create, and if necessary discuss them.

Sounds good

     Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
> transfer this part of the discussion on debian-science@..., where I reopened
> an old thread.

Well, the question is not really about BibTeX or not.  The question is
whether it is a good idea to have a database format as a field value.
If you have the field "Publication" and a complete BibTeX record as
value I somehow wonder whether this is useful in the end or whether we
rather should translate the record in a SQL table.

Kind regards

        Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Mon, Oct 26, 2009 at 04:05:17PM +0100, Andreas Tille a écrit :

> On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> > since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
> > transfer this part of the discussion on debian-science@..., where I reopened
> > an old thread.
>
> Well, the question is not really about BibTeX or not.  The question is
> whether it is a good idea to have a database format as a field value.
> If you have the field "Publication" and a complete BibTeX record as
> value I somehow wonder whether this is useful in the end or whether we
> rather should translate the record in a SQL table.

That is a good question, that I would rephrase: what should be stored, and
should everything be exported?

For the moment the BibTeX stored reference is a rather experimental feature,
and its purpose is also to test the YAML format. As you probalbly noticed, the
key parts of the BibTeX reference that allow to construct a weblink to the
published article—the digital object identifier (DOI) and the PubMed record
ID—have their own YAML mapping: I do not expect the BibTeX reference to be
extracted and parsed, nor to be exported to SQL. On the other hand, it can be
easily popped out at build time with a Perl oneliner
(‘http://lists.debian.org/msgid-search/20090808073608.GF17276@...’).

[For further discussion about how to make nice links on the Blends web
sentinels, I propose to elaborate on another list.]

There is another volatile meta-data with a much broader scope that could be
included in the upstream-metadata.yaml file (or whichever smarter name we give
to it), the Debian watch file. All the objections you made above apply.

We could either store it raw in a YAML mapping, like:

Watch: |
 version=3
 opts=dversionmangle=s/~dfsg// \
   http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2

Or split the information in multiple mappings:

Watch-Version : 3
Watch-Options : dversionmangle=s/~dfsg//
Watch-Regexp  : http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2

While the last option looks more structured, we should really think twice if it
makes sense to have the ‘Watch’ metadata in a tabluar SQL database, or if
simply storing it raw somewhere else is enough. The same conclusion may apply
to similar resources like the BibTeX reference.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[Moving a thread from debian-med to debian-science because the problem
 originated here some time ago.]

On Wed, Oct 28, 2009 at 07:35:10PM +0900, Charles Plessy wrote:
>
> That is a good question, that I would rephrase: what should be stored, and
> should everything be exported?

The current use of specific publication data is a good application for
upstream-metadata.yaml and here we actually need single fields of the
BibTeX record.  So whether it should be exported can clearly be answered
with yes.  It is just the question how to store it in UDD.
 
> For the moment the BibTeX stored reference is a rather experimental feature,
> and its purpose is also to test the YAML format.

Sure, thats perfectly all right.  But we have an application for exactly
this *now* and IMHO it makes sense to clarify this in the beginning.

> As you probalbly noticed, the
> key parts of the BibTeX reference that allow to construct a weblink to the
> published article???the digital object identifier (DOI) and the PubMed record
> ID???have their own YAML mapping:

Ahh, good you bring up this point again because I stumbled upon this but
forgot to mention in my reply: I do not consider it a good idea to store
one field two times.  This just sucks.  IMHO DOI and PubMed just are
publication data and mention them twice. is wrong.

> I do not expect the BibTeX reference to be
> extracted and parsed, nor to be exported to SQL.

But that's exactly what I need to do to solve the original problem to
publish the publication data on the tasks pages.  I perfectly agree the
scope of your suggestion is much wider - but if we see a need for
storing the publication data we should clarify in the beginning how they
should be handled and whether the form is apropriately choosen.

> On the other hand, it can be
> easily popped out at build time with a Perl oneliner
> (???http://lists.debian.org/msgid-search/20090808073608.GF17276@...???).

Well, yes, that presents any YAML field - but you need to parse the
BibTeX format in case you extract the Reference field.
 
> [For further discussion about how to make nice links on the Blends web
> sentinels, I propose to elaborate on another list.]

I'm not sure whether my move to debian-science was the list you had
in mind - but I think it is a wider forum which has an interest in
publication issues.

> There is another volatile meta-data with a much broader scope that could be
> included in the upstream-metadata.yaml file (or whichever smarter name we give
> to it), the Debian watch file. All the objections you made above apply.
>
> ...
>
> While the last option looks more structured, we should really think twice if it
> makes sense to have the `Watch` metadata in a tabluar SQL database, or if
> simply storing it raw somewhere else is enough. The same conclusion may apply
> to similar resources like the BibTeX reference.

While I perfectly agree that data in watch files are actually
upstream-metadata I do not think that any atempt to move this data to
another place would be really successful.  The rationale why I'm
thinking so is that you try to fix a non existent problem.  Normally you
change something if you realise something is broken.  Even then it is
hard to exchange an established system.  But with watch files nothing is
broken in principle.  (Well, there are issues with uscan and there are
several atempts to enhance this - but this is not a problem *where*
(debian/watch or a different file) the data is stored nor *how* (text or
yaml)).  So if you are atempting to gather agreement for a new control
file (which is reasonable in my opinion ) I would not start with
convincing people to change things which do not really need a change.

Kind regards

     Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Paul Wise-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This idea of extra metadata storage is really excellent.

I'd like to suggest the following:

Move this thread to debian-devel for a wider discussion.

Move the upstream-metadata.yaml, Homepage, debian/watch out of source
packages since they need to be able to change independently of the
Debian package. Not sure what the right location is, but I'd suggest
UDD could be the canonical location for it and a web interface at
alioth be the way to edit it (like the debtags interface).

--
bye,
pabs

http://wiki.debian.org/PaulWise


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Lucas Nussbaum :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 29/10/09 at 10:29 +0800, Paul Wise wrote:

> This idea of extra metadata storage is really excellent.
>
> I'd like to suggest the following:
>
> Move this thread to debian-devel for a wider discussion.
>
> Move the upstream-metadata.yaml, Homepage, debian/watch out of source
> packages since they need to be able to change independently of the
> Debian package. Not sure what the right location is, but I'd suggest
> UDD could be the canonical location for it and a web interface at
> alioth be the way to edit it (like the debtags interface).

Some comments:
1/ UDD currently can't be the canonical location for this data: there
are no backups of UDD currently, because it's supposed to be possible to
remove the database, create it again, and import everything back in less
than 2 days. So ideally, there would be another place where that data is
stored, and it's simply imported to UDD. Or we would have to talk to DSA
about backups (also possible).

2/ You might be trying to be too generic here. There are not so many
different kinds of packages metadata that would be suitable for this
thing, so you might want to just build it for your purpose (bibtex
metadata) and forget the general picture.

Past efforts for building a unified repository of metadata have
basically failed, because of a lack of interest in the end, I think. It
might be better to store that data directly into packages
(debian/foo.bib), and have an UDD importer that extracts that data from
packages.
--
| Lucas Nussbaum
| lucas@...   http://www.lucas-nussbaum.net/ |
| jabber: lucas@...             GPG: 1024D/023B3F4F |


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(Responding a little late after vacation time.)

Le lundi 26 octobre 2009 à 23:05 +0900, Charles Plessy a écrit :
> > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > > First of all, let's summarise the situation. We want to integrate some metadata
> > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.
>
> Dear Andreas and Olivier,
>
> thank you for your encouraging comments.

SNIP

> In parallel, as Olivier suggested, each table could be exprorted in RDF format.
> But I am not sure I undersand it.

What exactly don't you understand ? ;) If you look back at the pointers
I provided in http://lists.debian.org/debian-qa/2009/10/msg00050.html
you'll find an example of using the PRISM and CONNOTEA ontologies for
links with DOI and PUBMED IDs (more details in
http://www.prismstandard.org/resources/mod_prism.html maybe).

>  Olivier, could you suggest a Perl module to
> use?
>

I suppose that searching for perl+rdf on your preferred search engine
will retrieve useful code ;)

I'm not a perl hacker myself, but as RDF is a standard of the W3C, there
are probably plenty of perl code to produce RDF.

http://search.cpan.org/~mthurn/RDF-Simple-0.415/lib/RDF/Simple/Serialiser.pm seems to be a valid candidate for first experiments.

Hope this helps.

Best regards,
--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-qa-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...