more formally indicating the registration URL

View: New views
19 Messages — Rating Filter:   Alert me  

more formally indicating the registration URL

by Steffen Moeller-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

following up the idea of Andreas to flag those software packages that ask for a
registration of their users, I skimmed through the tango-icon-theme package and found the
following icons that I thought to fit:

/usr/share/icons/Tango/32x32/emotes/face-angel.png
(also found here in large
https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png)

        I found it to fit rather nicely since we are requested to
        to be nice and register

/usr/share/icons/Tango/32x32/devices/stock_mic.png
(in large http://municipality.zlatograd.com/tango-icons/stock_mic.png)

        since we are opening a channel to talk back to upstream, I found
        this mike to be also rather nice.

I would be prepared to follow Michael's suggestion to flag debian/control files with a
separate URL for the registration and parse that information for the pure-blends' package
presentation scripts. Is there speaking much against such a pilot? Or should it rather be
a comment in the description in analogy to the introduction of the "  Homepage:" indication?

Many greetings

Steffen


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller a écrit :
>
> I would be prepared to follow Michael's suggestion to flag debian/control
> files with a separate URL for the registration and parse that information for
> the pure-blends' package presentation scripts. Is there speaking much against
> such a pilot? Or should it rather be a comment in the description in analogy
> to the introduction of the "  Homepage:" indication?

Dear all,

I think that the issue of managing package metadata goes beyond homepage and
registration, I propose to start a discussion on the subject on debian-devel,
and to fall back on using the debian/control file if this discussion is not
fruitful.

The key problem is that the metadata can evolve independantly of the source
code, and that updating the binary packages just for changing URLs is not
suitable.

Have a nice week-end,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Daniel Leidert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Samstag, den 01.08.2009, 23:29 +0900 schrieb Charles Plessy:

[..]
> The key problem is that the metadata can evolve independantly of the source
> code, and that updating the binary packages just for changing URLs is not
> suitable.

Well, if upstream chooses to change a registration URL, they should be
smart enough to create a permanent redirection and over workarounds so
the above is definitly not our problem.

Regards, Daniel


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller wrote:

> /usr/share/icons/Tango/32x32/emotes/face-angel.png
> (also found here in large
> https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png)
>
> I found it to fit rather nicely since we are requested to
> to be nice and register
>
> /usr/share/icons/Tango/32x32/devices/stock_mic.png
> (in large http://municipality.zlatograd.com/tango-icons/stock_mic.png)
>
> since we are opening a channel to talk back to upstream, I found
> this mike to be also rather nice.

Well, I admit that I do not really understand at which places you want to
put these icons - I think we wanted to ask a textual information ...
 
> I would be prepared to follow Michael's suggestion to flag debian/control files with a
> separate URL for the registration and parse that information for the pure-blends' package
> presentation scripts.

... and I replied that these X?-fields will *not* be propagated to a place
where we can parse it for the Blends pages - so sorry, it will not work
this way.

> Is there speaking much against such a pilot? Or should it rather be
> a comment in the description in analogy to the introduction of the "  Homepage:" indication?

My suggestion was to use the "Remark" field which ends up in these grayish
additions to a description.

Kind regards

     Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Steffen Moeller-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Andreas Tille wrote:

> On Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller wrote:
>> /usr/share/icons/Tango/32x32/emotes/face-angel.png
>> (also found here in large
>> https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png)
>>
>> I found it to fit rather nicely since we are requested to
>> to be nice and register
>>
>> /usr/share/icons/Tango/32x32/devices/stock_mic.png
>> (in large http://municipality.zlatograd.com/tango-icons/stock_mic.png)
>>
>> since we are opening a channel to talk back to upstream, I found
>> this mike to be also rather nice.
>
> Well, I admit that I do not really understand at which places you want to
> put these icons - I think we wanted to ask a textual information ...

my hunch was that we should have some symbol that is shown together with the program name
to indicate that a program is requesting a registration. That symbol should appear (in my
mind) as consistently as non-annoyingly possible together with the package names.

I would prefer not to read text, whenever that is avoidable.

>> I would be prepared to follow Michael's suggestion to flag debian/control files with a
>> separate URL for the registration and parse that information for the pure-blends' package
>> presentation scripts.
>
> ... and I replied that these X?-fields will *not* be propagated to a place
> where we can parse it for the Blends pages - so sorry, it will not work
> this way.

Fine. This seconds Michael's (?) objections towards an extension of the debian/* files for
such non-technical meta-issues.

>> Is there speaking much against such a pilot? Or should it rather be
>> a comment in the description in analogy to the introduction of the "  Homepage:" indication?
>
> My suggestion was to use the "Remark" field which ends up in these grayish
> additions to a description.

I am not aware of the Remark field, but maybe this would be compatible with somthing
analogous to

--- bio (Revision 1027)
+++ bio (Arbeitskopie)
@@ -18,7 +18,7 @@
 Depends:     arb, clustalw | clustalw-mpi, clustalx
 Why:         Sequence alignments and related programs (Non-free, thus only suggested).

-Depends:     adun.app, garlic, gdpc, ghemical, gromacs, pymol, rasmol, autodock,
autogrid, r-other-bio3d
+Depends:     adun.app, garlic, gdpc, ghemical, gromacs, pymol, rasmol, r-other-bio3d

 Why:         Molecular modelling and molecular dynamics.

 Depends:     plasmidomics
@@ -35,6 +35,10 @@
 Depends: glam2
 Why:         Motif search

+Depends: autodock, autogrid
+Pkg-Registration: http://autodock.scripps.edu/downloads/autodock-registration
+Why:         Molecular modelling and molecular dynamics.
+
 Suggests: pdb2pqr


? And similarly there could be a "Pkg-Reference" for instance, or "Pkg-Publication". I
don't know about how close such information needs to be to the package. Both belong rather
to debian/copyright than to debian/control, but either could be parsed to present the
information to portals like that of Debian-Med.

Cheers,

Steffen


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Aug 04, 2009 at 03:32:55PM +0200, Steffen Moeller wrote:
> my hunch was that we should have some symbol that is shown together with the program name
> to indicate that a program is requesting a registration. That symbol should appear (in my
> mind) as consistently as non-annoyingly possible together with the package names.
>
> I would prefer not to read text, whenever that is avoidable.

Well, patches for the template file[1] are perfectly welcome
 
> I am not aware of the Remark field, but maybe this would be compatible with somthing
> analogous to

Check out [2] and try "grep "^Remark:" *" then see what's on the resulting tasks
page[3].  Unfortunately it is not yet documented in the docs ...

> ? And similarly there could be a "Pkg-Reference" for instance, or "Pkg-Publication".

In principle doable - but I do not volunteer to maintain this myself.  You
will notice that editing these tasks files is really easy.  Once there are >= 3
such fields set I'll be happy to publish this information on the tasks pages.

> I don't know about how close such information needs to be to the package. Both belong rather
> to debian/copyright than to debian/control, but either could be parsed to present the
> information to portals like that of Debian-Med.

[Side note: Pliese use "Debian Med" (without the dash)]
Yes, the usage of Pkg-Something ... I would rather use "Reference" or
"Publication" (the prefix "Pkg-" was only used because "Description" is
just used) would do the trick for the moment until we might have better
means.  Just adding those fields will not harm at all (unknown fields
are ignored - so you can't really break anything).  Let's start investigating
what might make sense here.

Kind regards

     Andreas.

[1] svn://svn.debian.org/svn/blends/blends/trunk/webtools/templates/tasks.xhtml
[2] svn://svn.debian.org/svn/blends/projects/med/trunk/debian-med/tasks
[3] http://debian-med.alioth.debian.org/tasks/

--
http://fam-tille.de
Klarmachen zum Ändern!


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Mon, Aug 03, 2009 at 08:21:30PM +0200, Andreas Tille a écrit :
>
> My suggestion was to use the "Remark" field which ends up in these grayish
> additions to a description.

Hi all,

I have been thinking a bit on the issue. How about the following workflow:

 - Create a new file with a ‘Name: contents’ field syntax in the Debian source
   packages, for ‘online meta-data’ that typically require internet access to
   be useful.

 - Write a script to use this file to feed the Ultimate Debian database, that
   would be authoritative. This way, the meta-data does not need to be added to
   the Packages and Source files of the Debian mirrors.

 - Keep the meta-data file up to date in our version control systems, but do
   not trigger an upload only for this. When we will be tired to call the UDD
   updated by hand, we can perhaps write commit hooks for this.


With this workflow, we get the best of all systems we were thinking about:

 - The blends task files get a central point where to find the data.

 - The packages maintainers have an easy way to update the meta-data.

 - Offline users who have access to the source packages have a copy of
   the meta-data that was up to date at the time of the last upload.

The flaw is that it may be difficult in some cases to push meta-data for
packages that we are not maintaining ourselves. But my feeling is that the
packages for which registration is really an issue are in our hands.

If you like the idea, I will wrap up a more detailed proposal, and submit it on
-blends and -science and then on -devel.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote:
> I have been thinking a bit on the issue. How about the following workflow:
>
>  - Create a new file with a ???Name: contents??? field syntax in the Debian source
>    packages, for ???online meta-data??? that typically require internet access to
>    be useful.

Sounds reasonable.
 
>  - Write a script to use this file to feed the Ultimate Debian database, that
>    would be authoritative. This way, the meta-data does not need to be added to
>    the Packages and Source files of the Debian mirrors.

While it makes sense to create an UDD table featuring

   packagename, version, release, metadatatag, metadatavalue

or something like this,  I'm wondering how to reliably fetch this data.
For the moment this seems to end up in browsing all Vcs-* locations for
such a file which sounds not really reliable to me considering the different
ways a repository layout might be buildet.  While it sounds doable it looks
like "not the kind of jobs I really want to do" for me personally ...
 
>  - Keep the meta-data file up to date in our version control systems, but do
>    not trigger an upload only for this. When we will be tired to call the UDD
>    updated by hand, we can perhaps write commit hooks for this.

Ahhh, this brings up the idea of pushing data to UDD or some intermediate
file which might be read later.  Hmmm, currently UDD gatherers are written
to gather information from a certain place (like fetching Packages.gz files)
and then read this file into UDD.  But at DebCOnf we thought about alternative
methods to handle Package Entropy Tracker (PET) which also is more like
pushing the data in than fetching a large chunk of data at once.
 
> With this workflow, we get the best of all systems we were thinking about:
>
>  - The blends task files get a central point where to find the data.
>  - The packages maintainers have an easy way to update the meta-data.
>  - Offline users who have access to the source packages have a copy of
>    the meta-data that was up to date at the time of the last upload.

Yes to all 3 items.
 
> The flaw is that it may be difficult in some cases to push meta-data for
> packages that we are not maintaining ourselves. But my feeling is that the
> packages for which registration is really an issue are in our hands.

Well, we might announce this option once it is solved for our packages.
 
> If you like the idea, I will wrap up a more detailed proposal, and submit it on
> -blends and -science and then on -devel.

Sounds promising

   Andreas.
 
--
http://fam-tille.de
Klarmachen zum Ändern!


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: more formally indicating the registration URL

by Steffen Moeller-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andreas Tille wrote:
> On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote:
>> I have been thinking a bit on the issue. How about the following workflow:
>>
>>  - Create a new file with a ???Name: contents??? field syntax in the Debian source
>>    packages, for ???online meta-data??? that typically require internet access to
>>    be useful.
>
> Sounds reasonable.

I agree.

Could we somehow prototype what we want to achieve?

>  
>>  - Write a script to use this file to feed the Ultimate Debian database, that
>>    would be authoritative. This way, the meta-data does not need to be added to
>>    the Packages and Source files of the Debian mirrors.
>
> While it makes sense to create an UDD table featuring
>
>    packagename, version, release, metadatatag, metadatavalue
>
> or something like this,  I'm wondering how to reliably fetch this data.
> For the moment this seems to end up in browsing all Vcs-* locations for
> such a file which sounds not really reliable to me considering the different
> ways a repository layout might be buildet.  While it sounds doable it looks
> like "not the kind of jobs I really want to do" for me personally ...

It could be some RDF file to store the data.

>>  - Keep the meta-data file up to date in our version control systems, but do
>>    not trigger an upload only for this. When we will be tired to call the UDD
>>    updated by hand, we can perhaps write commit hooks for this.
>
> Ahhh, this brings up the idea of pushing data to UDD or some intermediate
> file which might be read later.  Hmmm, currently UDD gatherers are written
> to gather information from a certain place (like fetching Packages.gz files)
> and then read this file into UDD.  But at DebCOnf we thought about alternative
> methods to handle Package Entropy Tracker (PET) which also is more like
> pushing the data in than fetching a large chunk of data at once.

We need to know "who the boss is". It seems like we are starting to collect data
redundantly, because of different ways to update the info. I personally like the
idea to talk back to the online repositories of the packages to get the latest info,
but, still, we need ways to deal with semantic conflicts.


>> With this workflow, we get the best of all systems we were thinking about:
>>
>>  - The blends task files get a central point where to find the data.
>>  - The packages maintainers have an easy way to update the meta-data.
>>  - Offline users who have access to the source packages have a copy of
>>    the meta-data that was up to date at the time of the last upload.
>
> Yes to all 3 items.

Fine.

>> The flaw is that it may be difficult in some cases to push meta-data for
>> packages that we are not maintaining ourselves. But my feeling is that the
>> packages for which registration is really an issue are in our hands.
>
> Well, we might announce this option once it is solved for our packages.
>  
>> If you like the idea, I will wrap up a more detailed proposal, and submit it on
>> -blends and -science and then on -devel.

Could you pair that with an incremental implementation plan? And ask for help were you
want help?

Cheers,

Steffen



--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Wed, Aug 05, 2009 at 12:05:53PM +0200, Steffen Moeller a écrit :

> Andreas Tille wrote:
> > On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote:
> >> I have been thinking a bit on the issue. How about the following workflow:
> >>
> >>  - Create a new file with a ???Name: contents??? field syntax in the Debian source
> >>    packages, for ???online meta-data??? that typically require internet access to
> >>    be useful.
> >
> > Sounds reasonable.
>
> I agree.
>
> Could we somehow prototype what we want to achieve?

> Could you pair that with an incremental implementation plan? And ask for help were you
> want help?

Dear all,

it took some time, but I have now a more concrete proposal.

First of all, let's summarise the situation. We want to integrate some metadata
in our “web sentinels”, like ‘http://debian-med.alioth.debian.org/tasks/bio’.
The simplest for creating these pages is to centralise all the information in
the Ultimate Debian Database (http://udd.debian.org/). Typical metadata is
bibliographic information or registration URL. The UDD is fed with tables that
have to be deposited in a trusted location. The issue is how to prepare the
tables with data collected by multiple package maintainers.

What I propose is to have a special file in the source packages for gathering
all possible useful informations, debian/upstream-metadata.yaml. In contrary to
debian/control, this file would not contribute data to the Packages.gz files of
the Debian archive. I think that there are enough source packages managed in
version control systems that we can use them as the main source of our data.
This makes debian/upstream-metadata.yaml available indendantly of the Debian
archive, and more importantly, will allow to update the metadata without
uploading the package, but in a way that only the maintainers can do the
update, which keeps things under control.

The missing piece of the puzzle is then an aggregator that would collect the
information from the source packages and prepare tables for the UDD. I am drafting
such a program at http://upstream-metadata.debian.net/. Currently, it does
not do much:

http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if
the package is in a subversion server that is available to ’debcheckout’. Luckily,
most of our packages are.

http://upstream-metadata.debian.net/<package>/<key> gives the content of the
metadata for one key.

For instance, http://upstream-metadata.debian.net/samtools/PMID gives the
PubMed identification number for the article describing SamTools, 19505943.

This is the proof or principle for data retreival. Then, we need to construct
the tables.  I plan to have the program store the results in a BerkeleyDB
database, and to make it output tables at constant intervals, for instance
daily. The update of the internal database would we done in two ways.

First, updates could be pushed with commit hooks when package maintainers
commit changes to debian/upstream-metadata.yaml. It could be as simple as
having an url that triggers an update, and using wget or curl to activate the
aggregator.

Second, normal read access could trigger an update if the record is getting old.

In summary, I propose to store metadata in YAML format in the source pacakges,
retreive and store it in a central place using a web agent through the VCS in
which the source packages are stored, and periodically output tables for the
UDD, which keeps a central role for the generation of our web sentinel pages.

The proof of principle presented above is only a few lines of code, but I would
prefer discuss further the idea before putting more time on it.

Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in
the packages I maintain, so that meaningful tests are doable for table
generation later. I do not remember the list by heart, but it contains seaview,
bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have
updated recently.

Since I am quite unexperienced in programming, help is of course most welcome.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[debian-qa in CC because here we are discussing UDD issues.]

On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> First of all, let's summarise the situation. We want to integrate some metadata
> in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.

I would like to add that most probably there might evolve even other use
cases for this kind of data.  Keeping this in mind we might consider
moving the topic to debian-devel in the next stage of development.

> What I propose is to have a special file in the source packages for gathering
> all possible useful informations, debian/upstream-metadata.yaml.

I have noticed this and I really like this effort very much (even if I
did not actively suported it by adding such a file for packages I
touched recently).

> In contrary to
> debian/control, this file would not contribute data to the Packages.gz files of
> the Debian archive. I think that there are enough source packages managed in
> version control systems that we can use them as the main source of our data.

I'm not really happy about this "we ignore packages which are not
maintained in VCS" attitude but it sounds reasonably to assume that in
practice all those package that potentially contain such kind of
information are actually maintained in a VCS.  An alternative way to
gather the information popped up in my mind:  There is some code that
checks the translation status of upstream sources by unpacking all
source packages and checking for <lang>.po files.  So there is actually
some code which handles complete unpacking of Debian source packages
which might be used to fetch debian/upstream-metadata.yaml as well.
The pro is to get all packages - the con is that it only seeks in
already uploaded packages.

> This makes debian/upstream-metadata.yaml available indendantly of the Debian
> archive, and more importantly, will allow to update the metadata without
> uploading the package, but in a way that only the maintainers can do the
> update, which keeps things under control.

This has a certain advantage of flexibility over the method I suggested
above.  I'm not sure what way I would prefer.  Implementation wise
probably the VCS method is way easier to implement - so we probably
should stick to your decision - but I wanted to mention an alternative
way which IMHO might have slightly more chances to get accepted on
debian-devel for general purposes because people there might be
interested in completeness.
 

> The missing piece of the puzzle is then an aggregator that would collect the
> information from the source packages and prepare tables for the UDD. I am drafting
> such a program at http://upstream-metadata.debian.net/. Currently, it does
> not do much:
>
> http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if
> the package is in a subversion server that is available to ???debcheckout???. Luckily,
> most of our packages are.
>
> http://upstream-metadata.debian.net/<package>/<key> gives the content of the
> metadata for one key.

This sounds really good.

> For instance, http://upstream-metadata.debian.net/samtools/PMID gives the
> PubMed identification number for the article describing SamTools, 19505943.
>
> This is the proof or principle for data retreival. Then, we need to construct
> the tables.  I plan to have the program store the results in a BerkeleyDB
> database, and to make it output tables at constant intervals, for instance
> daily. The update of the internal database would we done in two ways.

If you plan to propagate this data to UDD this might not be an optimal
solution.  UDD imports are usually a two step process:

  1. Fetch text data from whatever source in clear text.
  2. Delete table, read text data and put it into the table.

If we want to follow this scheme for our specific case IMHO it would be the
best idea to just drop a <package>.yaml file in a directory where rsync or
wget can fetch these files.  the second step to read the yaml files is quite
simple.
 
> First, updates could be pushed with commit hooks when package maintainers
> commit changes to debian/upstream-metadata.yaml. It could be as simple as
> having an url that triggers an update, and using wget or curl to activate the
> aggregator.
>
> Second, normal read access could trigger an update if the record is getting old.

Currently UDD updates are time based (per cron job) and not event based
(per commit of some data).  If you gather the data by any means at
upstream-metadata.debian.net this is not really relevant for UDD import
(OK, it makes sense to synchronise the cron jobs to make sure that
upstream-metadata cron job runs before UDD cron job fetches data.  So I
would vote for the option which is safer to implement.  In this aspect I
would prefer the second method and run the job once a day.  The reason
is that if I'm not completely wrong the VCS push would require to
configure *every* VCS which *potentially* might contain
upstream-metadata.yaml files.  This is a weak aproach because you do not
have control over all VCSes and chances are very high that this will not
happen on all VCSes and it sounds quite hard to propagate changes to the
commit hooks (imagine upstream-metadata.debian.net becomes
upstream-metadata.debian.org or whatever).  In this sense I would vote
for relaying on the VCS fields in the packaging information and fetch
information via cron job using the Vcs specified in debian/control.
 
> In summary, I propose to store metadata in YAML format in the source pacakges,
> retreive and store it in a central place using a web agent through the VCS in
> which the source packages are stored, and periodically output tables for the
> UDD, which keeps a central role for the generation of our web sentinel pages.

I like this approach.  But there is one thing I'm not really sure about:
How should we design the UDD table?  There are two options:

CREATE TABLE upstream-metadata (
    package text,
    key1    text,
    key2    text,
    ...
    keyN    text,
    PRIMARY KEY package
);

with a defined set of keys allowed in upstream-metadata.yaml and exactly
one row per package.  Every unknown key will be ignored.  The
advantage of this approach is that tools *know* what keys to expect and
can just relay on how to handle these.

Alternatively we could do

CREATE TABLE upstream-metadata (
    package text,
    key     text,
    value   text,
    PRIMARY KEY (package,key)
);

with an arbitrary number of rows per package but no duplicated keys for
one package.  This is more flexible in case you need some new kind of
data you do not need to touch the UDD table structure but it restricts
the keys to only one per package.

The thir option is to leave out the PRIMARY KEY constraint at all which
allows maximum flexibility (for instance there might be more than one
citation records).

BTW, I'm a bit concerned about mixing different database formats: On one
hand you are using yaml on the other hand BibTeX.  Well, for sure having
a BibTeX record is very valuable.  But on the other hand the tools who
are working with this data will need a BibTeX parser.  I did not dived
into this and for sure it is doable - but I just wanted to raise this
topic here to hear opinions.

> The proof of principle presented above is only a few lines of code, but I would
> prefer discuss further the idea before putting more time on it.

Thanks for pushing this foreward!
 
> Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in
> the packages I maintain, so that meaningful tests are doable for table
> generation later. I do not remember the list by heart, but it contains seaview,
> bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have
> updated recently.
>
> Since I am quite unexperienced in programming, help is of course most welcome.

As I said above: IMHO most of the work is done if you can provide a set
of <package>.yaml files at a freely accessible place.

Kind regards

       Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(Having read what was forwarded to -qa only)

May I suggest to provide a little bit more details in a wiki page on
wiki.debian.org on this initiative, so that the context is more clear
for everybody potentially interested ?

I think there's probably a lot of interest beyond UDD for such metadata
standardization.

My 2 cents,

Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit :

> [debian-qa in CC because here we are discussing UDD issues.]
>
> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata
> > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.
>
> I would like to add that most probably there might evolve even other use
> cases for this kind of data.  Keeping this in mind we might consider
> moving the topic to debian-devel in the next stage of development.
>
> > What I propose is to have a special file in the source packages for gathering
> > all possible useful informations, debian/upstream-metadata.yaml.
>

--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Using RDF and ontologies for such metadata (combined DOAP and other ontologies) Was: Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(responding with more feedback as I have taken time to dig the
debian-med archives)

If I get it right, you intend to match bibliographic references and
software projects / packages ?

I'd very much suggest adopting a Semantic Web perspective in a way to
provide such links as RDF descriptions that can use ontologies used
already by other applications, hence contributing to LinkedData [0]
(maybe through microformats embedded as RDFa in the current 'web
sentinels' or as specific RDF feeds.

For an example of such application, see :
http://www.connotea.org/rss/search?q=SAMtools
which mixes RSS 1.0 with other ontologies (and exactly the same example
you provided more or less).

Here, you may then link DOAP [1] with existing bibliographic ontologies
like PRISM.

This of course could be provided from UDD also if UDD was to participate
more to the Semantic Web as I proposed in
http://lists.debian.org/debian-qa/2009/02/msg00016.html and further
discussions.

Just my 2 cents,

[0] : http://linkeddata.org/
[1] : http://trac.usefulinc.com/doap

Btw, :

Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit :

> Alternatively we could do
>
> CREATE TABLE upstream-metadata (
>     package text,
>     key     text,
>     value   text,
>     PRIMARY KEY (package,key)
> );

This very much looks like triples of RDF, which could store any metadata
expressed in any RDF ontology, so that might be really useful ;)

Best regards,
--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata
> > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.

Dear Andreas and Olivier,

thank you for your encouraging comments. I have made one more step forward, and
upstream-metadata.debian.net now stores its information in a Berkeley database,
refreshing only the data when it is older than a given age when it is accessed.

For the moment, we only have 17 source packages that have an
upstream-metadata.yaml file in their debian directory that is accessible
through a public VCS. Nevertheless, I think that it is enough for a proof of
principle.

After resetting the database, I ‘loaded’ the data by accessing it:

for package in bioperl clustalx mummer seaview perlprimer samtools dicomscope clustalw r-cran-combinat r-cran-haplo.stats r-cran-qvalue r-cran-randomforest r-cran-rocr r-other-bio3d mira bwa infernal ;
do wget http://upstream-metadata.debian.net/$package/DOI -O /dev/stdout 2> /dev/null;
done

After loading, the resulting table are available here:
http://upstream-metadata.debian.net/table/DOI

Obviously, not all packages contain programs that have been described in an
academic article (http://dx.doi.org/)…

For the moment, one has to access an arbitrary key, but later the best would be
to have a special key, for instance YAML-UPDATE, that would force the update.
If it is possible to have a per-file commit hook, then each time a
upstream-metadata.yaml is modified, the debian.net site can updated.

Next step is to feed the UDD. For the moment, the site produces one table per
keyword. The rationale is that for many keywords, the data will be too sparse
to be interesting for the UDD. My current idea is to generate the tables for a
limited set of curated keywords, assemble them (with the unix join command?),
and give leave this in a public place that the UDD can read.

In parallel, as Olivier suggested, each table could be exprorted in RDF format.
But I am not sure I undersand it. Olivier, could you suggest a Perl module to
use?

As long as we are in a draft phase, I think that we can live with the currently
biggest limitation: the lack of support for packages that are not stored in a
VCS. One possible way to solve the problem is to provide repository, for
instance in collab-maint on Alioth, where people can drop one yaml file per
source packages. We could also unpack source files, as Andreas suggested.

For the UDD import, what would be the most suitable among the two propositions
of Andreas?

> CREATE TABLE upstream-metadata (
>     package text,
>     key1    text,
>     key2    text,
>     ...
>     keyN    text,
>     PRIMARY KEY package
> );
 
> CREATE TABLE upstream-metadata (
>     package text,
>     key     text,
>     value   text,
>     PRIMARY KEY (package,key)
> );

Since the addition of more meta-data to our source packages is a frequent issue
raised on debian-devel, I think that there is a general interst for
standardising ‘field’ names, whichever the technical solution that will be
adopted. I will try to find a proper place on wiki.debian.org to let pepole document
the fields they create, and if necessary discuss them.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Thu, Oct 22, 2009 at 09:49:10AM +0200, Andreas Tille a écrit :
>
> BTW, I'm a bit concerned about mixing different database formats: On one
> hand you are using yaml on the other hand BibTeX.  Well, for sure having
> a BibTeX record is very valuable.  But on the other hand the tools who
> are working with this data will need a BibTeX parser.  I did not dived
> into this and for sure it is doable - but I just wanted to raise this
> topic here to hear opinions.

Hi Andreas and all,

since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
transfer this part of the discussion on debian-science@..., where I reopened
an old thread.

http://lists.debian.org/msgid-search/20091026145532.GA6594@...

have a nice day,

--
Charles


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 26, 2009 at 11:05:10PM +0900, Charles Plessy wrote:
> For the moment, one has to access an arbitrary key, but later the best would be
> to have a special key, for instance YAML-UPDATE, that would force the update.

Or rather "upstream-metadata update".  You certainly would not like to update
the YAML standard. ;-)

> If it is possible to have a per-file commit hook, then each time a
> upstream-metadata.yaml is modified, the debian.net site can updated.

As I said: I'm afraid it is hard to ensure that *every* potential VCS has
a properly configured commit hook.  I'm no VCS expert but it sounds hard
to maintain.

> Next step is to feed the UDD. For the moment, the site produces one table per
> keyword. The rationale is that for many keywords, the data will be too sparse
> to be interesting for the UDD. My current idea is to generate the tables for a
> limited set of curated keywords, assemble them (with the unix join command?),
> and give leave this in a public place that the UDD can read.

As I said in my previous mail it is perfectly OK if there is a way to fetch
the original upstream-metadata.yaml files in some reasonable way.  Reading
these is probably much easier than any aggregated format.
 
> For the UDD import, what would be the most suitable among the two propositions
> of Andreas?

Well, I have no idea - it was a question and I gave the pros and cons for both
variants in my mail.
 

> > CREATE TABLE upstream-metadata (
> >     package text,
> >     key1    text,
> >     key2    text,
> >     ...
> >     keyN    text,
> >     PRIMARY KEY package
> > );
>  
> > CREATE TABLE upstream-metadata (
> >     package text,
> >     key     text,
> >     value   text,
> >     PRIMARY KEY (package,key)
> > );
>
> Since the addition of more meta-data to our source packages is a frequent issue
> raised on debian-devel, I think that there is a general interst for
> standardising ???field??? names, whichever the technical solution that will be
> adopted.

So if we have a really standardised set of keywords probably the first method
sounds apropriate for the problem.

> I will try to find a proper place on wiki.debian.org to let pepole document
> the fields they create, and if necessary discuss them.

Sounds good

     Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Andreas Tille-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
> transfer this part of the discussion on debian-science@..., where I reopened
> an old thread.

Well, the question is not really about BibTeX or not.  The question is
whether it is a good idea to have a database format as a field value.
If you have the field "Publication" and a complete BibTeX record as
value I somehow wonder whether this is useful in the end or whether we
rather should translate the record in a SQL table.

Kind regards

        Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Charles Plessy-12 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Mon, Oct 26, 2009 at 04:05:17PM +0100, Andreas Tille a écrit :

> On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> > since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to
> > transfer this part of the discussion on debian-science@..., where I reopened
> > an old thread.
>
> Well, the question is not really about BibTeX or not.  The question is
> whether it is a good idea to have a database format as a field value.
> If you have the field "Publication" and a complete BibTeX record as
> value I somehow wonder whether this is useful in the end or whether we
> rather should translate the record in a SQL table.

That is a good question, that I would rephrase: what should be stored, and
should everything be exported?

For the moment the BibTeX stored reference is a rather experimental feature,
and its purpose is also to test the YAML format. As you probalbly noticed, the
key parts of the BibTeX reference that allow to construct a weblink to the
published article—the digital object identifier (DOI) and the PubMed record
ID—have their own YAML mapping: I do not expect the BibTeX reference to be
extracted and parsed, nor to be exported to SQL. On the other hand, it can be
easily popped out at build time with a Perl oneliner
(‘http://lists.debian.org/msgid-search/20090808073608.GF17276@...’).

[For further discussion about how to make nice links on the Blends web
sentinels, I propose to elaborate on another list.]

There is another volatile meta-data with a much broader scope that could be
included in the upstream-metadata.yaml file (or whichever smarter name we give
to it), the Debian watch file. All the objections you made above apply.

We could either store it raw in a YAML mapping, like:

Watch: |
 version=3
 opts=dversionmangle=s/~dfsg// \
   http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2

Or split the information in multiple mappings:

Watch-Version : 3
Watch-Options : dversionmangle=s/~dfsg//
Watch-Regexp  : http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2

While the last option looks more structured, we should really think twice if it
makes sense to have the ‘Watch’ metadata in a tabluar SQL database, or if
simply storing it raw somewhere else is enough. The same conclusion may apply
to similar resources like the BibTeX reference.

Have a nice day,

--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

by Olivier Berger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

(Responding a little late after vacation time.)

Le lundi 26 octobre 2009 à 23:05 +0900, Charles Plessy a écrit :
> > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > > First of all, let's summarise the situation. We want to integrate some metadata
> > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'.
>
> Dear Andreas and Olivier,
>
> thank you for your encouraging comments.

SNIP

> In parallel, as Olivier suggested, each table could be exprorted in RDF format.
> But I am not sure I undersand it.

What exactly don't you understand ? ;) If you look back at the pointers
I provided in http://lists.debian.org/debian-qa/2009/10/msg00050.html
you'll find an example of using the PRISM and CONNOTEA ontologies for
links with DOI and PUBMED IDs (more details in
http://www.prismstandard.org/resources/mod_prism.html maybe).

>  Olivier, could you suggest a Perl module to
> use?
>

I suppose that searching for perl+rdf on your preferred search engine
will retrieve useful code ;)

I'm not a perl hacker myself, but as RDF is a standard of the W3C, there
are probably plenty of perl code to produce RDF.

http://search.cpan.org/~mthurn/RDF-Simple-0.415/lib/RDF/Simple/Serialiser.pm seems to be a valid candidate for first experiments.

Hope this helps.

Best regards,
--
Olivier BERGER <olivier.berger@...>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-med-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...