|
View:
New views
19 Messages
—
Rating Filter:
Alert me
|
|
|
more formally indicating the registration URLHello,
following up the idea of Andreas to flag those software packages that ask for a registration of their users, I skimmed through the tango-icon-theme package and found the following icons that I thought to fit: /usr/share/icons/Tango/32x32/emotes/face-angel.png (also found here in large https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png) I found it to fit rather nicely since we are requested to to be nice and register /usr/share/icons/Tango/32x32/devices/stock_mic.png (in large http://municipality.zlatograd.com/tango-icons/stock_mic.png) since we are opening a channel to talk back to upstream, I found this mike to be also rather nice. I would be prepared to follow Michael's suggestion to flag debian/control files with a separate URL for the registration and parse that information for the pure-blends' package presentation scripts. Is there speaking much against such a pilot? Or should it rather be a comment in the description in analogy to the introduction of the " Homepage:" indication? Many greetings Steffen -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLLe Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller a écrit :
> > I would be prepared to follow Michael's suggestion to flag debian/control > files with a separate URL for the registration and parse that information for > the pure-blends' package presentation scripts. Is there speaking much against > such a pilot? Or should it rather be a comment in the description in analogy > to the introduction of the " Homepage:" indication? Dear all, I think that the issue of managing package metadata goes beyond homepage and registration, I propose to start a discussion on the subject on debian-devel, and to fall back on using the debian/control file if this discussion is not fruitful. The key problem is that the metadata can evolve independantly of the source code, and that updating the binary packages just for changing URLs is not suitable. Have a nice week-end, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLAm Samstag, den 01.08.2009, 23:29 +0900 schrieb Charles Plessy:
[..] > The key problem is that the metadata can evolve independantly of the source > code, and that updating the binary packages just for changing URLs is not > suitable. Well, if upstream chooses to change a registration URL, they should be smart enough to create a permanent redirection and over workarounds so the above is definitly not our problem. Regards, Daniel -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLOn Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller wrote:
> /usr/share/icons/Tango/32x32/emotes/face-angel.png > (also found here in large > https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png) > > I found it to fit rather nicely since we are requested to > to be nice and register > > /usr/share/icons/Tango/32x32/devices/stock_mic.png > (in large http://municipality.zlatograd.com/tango-icons/stock_mic.png) > > since we are opening a channel to talk back to upstream, I found > this mike to be also rather nice. Well, I admit that I do not really understand at which places you want to put these icons - I think we wanted to ask a textual information ... > I would be prepared to follow Michael's suggestion to flag debian/control files with a > separate URL for the registration and parse that information for the pure-blends' package > presentation scripts. ... and I replied that these X?-fields will *not* be propagated to a place where we can parse it for the Blends pages - so sorry, it will not work this way. > Is there speaking much against such a pilot? Or should it rather be > a comment in the description in analogy to the introduction of the " Homepage:" indication? My suggestion was to use the "Remark" field which ends up in these grayish additions to a description. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLHello,
Andreas Tille wrote: > On Fri, Jul 31, 2009 at 05:12:24PM +0200, Steffen Moeller wrote: >> /usr/share/icons/Tango/32x32/emotes/face-angel.png >> (also found here in large >> https://sharesource.org/svn/phoneme/theme/png/256x256/face-angel.png) >> >> I found it to fit rather nicely since we are requested to >> to be nice and register >> >> /usr/share/icons/Tango/32x32/devices/stock_mic.png >> (in large http://municipality.zlatograd.com/tango-icons/stock_mic.png) >> >> since we are opening a channel to talk back to upstream, I found >> this mike to be also rather nice. > > Well, I admit that I do not really understand at which places you want to > put these icons - I think we wanted to ask a textual information ... my hunch was that we should have some symbol that is shown together with the program name to indicate that a program is requesting a registration. That symbol should appear (in my mind) as consistently as non-annoyingly possible together with the package names. I would prefer not to read text, whenever that is avoidable. >> I would be prepared to follow Michael's suggestion to flag debian/control files with a >> separate URL for the registration and parse that information for the pure-blends' package >> presentation scripts. > > ... and I replied that these X?-fields will *not* be propagated to a place > where we can parse it for the Blends pages - so sorry, it will not work > this way. Fine. This seconds Michael's (?) objections towards an extension of the debian/* files for such non-technical meta-issues. >> Is there speaking much against such a pilot? Or should it rather be >> a comment in the description in analogy to the introduction of the " Homepage:" indication? > > My suggestion was to use the "Remark" field which ends up in these grayish > additions to a description. I am not aware of the Remark field, but maybe this would be compatible with somthing analogous to --- bio (Revision 1027) +++ bio (Arbeitskopie) @@ -18,7 +18,7 @@ Depends: arb, clustalw | clustalw-mpi, clustalx Why: Sequence alignments and related programs (Non-free, thus only suggested). -Depends: adun.app, garlic, gdpc, ghemical, gromacs, pymol, rasmol, autodock, autogrid, r-other-bio3d +Depends: adun.app, garlic, gdpc, ghemical, gromacs, pymol, rasmol, r-other-bio3d Why: Molecular modelling and molecular dynamics. Depends: plasmidomics @@ -35,6 +35,10 @@ Depends: glam2 Why: Motif search +Depends: autodock, autogrid +Pkg-Registration: http://autodock.scripps.edu/downloads/autodock-registration +Why: Molecular modelling and molecular dynamics. + Suggests: pdb2pqr ? And similarly there could be a "Pkg-Reference" for instance, or "Pkg-Publication". I don't know about how close such information needs to be to the package. Both belong rather to debian/copyright than to debian/control, but either could be parsed to present the information to portals like that of Debian-Med. Cheers, Steffen -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLOn Tue, Aug 04, 2009 at 03:32:55PM +0200, Steffen Moeller wrote:
> my hunch was that we should have some symbol that is shown together with the program name > to indicate that a program is requesting a registration. That symbol should appear (in my > mind) as consistently as non-annoyingly possible together with the package names. > > I would prefer not to read text, whenever that is avoidable. Well, patches for the template file[1] are perfectly welcome > I am not aware of the Remark field, but maybe this would be compatible with somthing > analogous to Check out [2] and try "grep "^Remark:" *" then see what's on the resulting tasks page[3]. Unfortunately it is not yet documented in the docs ... > ? And similarly there could be a "Pkg-Reference" for instance, or "Pkg-Publication". In principle doable - but I do not volunteer to maintain this myself. You will notice that editing these tasks files is really easy. Once there are >= 3 such fields set I'll be happy to publish this information on the tasks pages. > I don't know about how close such information needs to be to the package. Both belong rather > to debian/copyright than to debian/control, but either could be parsed to present the > information to portals like that of Debian-Med. [Side note: Pliese use "Debian Med" (without the dash)] Yes, the usage of Pkg-Something ... I would rather use "Reference" or "Publication" (the prefix "Pkg-" was only used because "Description" is just used) would do the trick for the moment until we might have better means. Just adding those fields will not harm at all (unknown fields are ignored - so you can't really break anything). Let's start investigating what might make sense here. Kind regards Andreas. [1] svn://svn.debian.org/svn/blends/blends/trunk/webtools/templates/tasks.xhtml [2] svn://svn.debian.org/svn/blends/projects/med/trunk/debian-med/tasks [3] http://debian-med.alioth.debian.org/tasks/ -- http://fam-tille.de Klarmachen zum Ändern! -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLLe Mon, Aug 03, 2009 at 08:21:30PM +0200, Andreas Tille a écrit :
> > My suggestion was to use the "Remark" field which ends up in these grayish > additions to a description. Hi all, I have been thinking a bit on the issue. How about the following workflow: - Create a new file with a ‘Name: contents’ field syntax in the Debian source packages, for ‘online meta-data’ that typically require internet access to be useful. - Write a script to use this file to feed the Ultimate Debian database, that would be authoritative. This way, the meta-data does not need to be added to the Packages and Source files of the Debian mirrors. - Keep the meta-data file up to date in our version control systems, but do not trigger an upload only for this. When we will be tired to call the UDD updated by hand, we can perhaps write commit hooks for this. With this workflow, we get the best of all systems we were thinking about: - The blends task files get a central point where to find the data. - The packages maintainers have an easy way to update the meta-data. - Offline users who have access to the source packages have a copy of the meta-data that was up to date at the time of the last upload. The flaw is that it may be difficult in some cases to push meta-data for packages that we are not maintaining ourselves. But my feeling is that the packages for which registration is really an issue are in our hands. If you like the idea, I will wrap up a more detailed proposal, and submit it on -blends and -science and then on -devel. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLOn Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote:
> I have been thinking a bit on the issue. How about the following workflow: > > - Create a new file with a ???Name: contents??? field syntax in the Debian source > packages, for ???online meta-data??? that typically require internet access to > be useful. Sounds reasonable. > - Write a script to use this file to feed the Ultimate Debian database, that > would be authoritative. This way, the meta-data does not need to be added to > the Packages and Source files of the Debian mirrors. While it makes sense to create an UDD table featuring packagename, version, release, metadatatag, metadatavalue or something like this, I'm wondering how to reliably fetch this data. For the moment this seems to end up in browsing all Vcs-* locations for such a file which sounds not really reliable to me considering the different ways a repository layout might be buildet. While it sounds doable it looks like "not the kind of jobs I really want to do" for me personally ... > - Keep the meta-data file up to date in our version control systems, but do > not trigger an upload only for this. When we will be tired to call the UDD > updated by hand, we can perhaps write commit hooks for this. Ahhh, this brings up the idea of pushing data to UDD or some intermediate file which might be read later. Hmmm, currently UDD gatherers are written to gather information from a certain place (like fetching Packages.gz files) and then read this file into UDD. But at DebCOnf we thought about alternative methods to handle Package Entropy Tracker (PET) which also is more like pushing the data in than fetching a large chunk of data at once. > With this workflow, we get the best of all systems we were thinking about: > > - The blends task files get a central point where to find the data. > - The packages maintainers have an easy way to update the meta-data. > - Offline users who have access to the source packages have a copy of > the meta-data that was up to date at the time of the last upload. Yes to all 3 items. > The flaw is that it may be difficult in some cases to push meta-data for > packages that we are not maintaining ourselves. But my feeling is that the > packages for which registration is really an issue are in our hands. Well, we might announce this option once it is solved for our packages. > If you like the idea, I will wrap up a more detailed proposal, and submit it on > -blends and -science and then on -devel. Sounds promising Andreas. -- http://fam-tille.de Klarmachen zum Ändern! -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: more formally indicating the registration URLAndreas Tille wrote:
> On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote: >> I have been thinking a bit on the issue. How about the following workflow: >> >> - Create a new file with a ???Name: contents??? field syntax in the Debian source >> packages, for ???online meta-data??? that typically require internet access to >> be useful. > > Sounds reasonable. I agree. Could we somehow prototype what we want to achieve? > >> - Write a script to use this file to feed the Ultimate Debian database, that >> would be authoritative. This way, the meta-data does not need to be added to >> the Packages and Source files of the Debian mirrors. > > While it makes sense to create an UDD table featuring > > packagename, version, release, metadatatag, metadatavalue > > or something like this, I'm wondering how to reliably fetch this data. > For the moment this seems to end up in browsing all Vcs-* locations for > such a file which sounds not really reliable to me considering the different > ways a repository layout might be buildet. While it sounds doable it looks > like "not the kind of jobs I really want to do" for me personally ... It could be some RDF file to store the data. >> - Keep the meta-data file up to date in our version control systems, but do >> not trigger an upload only for this. When we will be tired to call the UDD >> updated by hand, we can perhaps write commit hooks for this. > > Ahhh, this brings up the idea of pushing data to UDD or some intermediate > file which might be read later. Hmmm, currently UDD gatherers are written > to gather information from a certain place (like fetching Packages.gz files) > and then read this file into UDD. But at DebCOnf we thought about alternative > methods to handle Package Entropy Tracker (PET) which also is more like > pushing the data in than fetching a large chunk of data at once. We need to know "who the boss is". It seems like we are starting to collect data redundantly, because of different ways to update the info. I personally like the idea to talk back to the online repositories of the packages to get the latest info, but, still, we need ways to deal with semantic conflicts. >> With this workflow, we get the best of all systems we were thinking about: >> >> - The blends task files get a central point where to find the data. >> - The packages maintainers have an easy way to update the meta-data. >> - Offline users who have access to the source packages have a copy of >> the meta-data that was up to date at the time of the last upload. > > Yes to all 3 items. Fine. >> The flaw is that it may be difficult in some cases to push meta-data for >> packages that we are not maintaining ourselves. But my feeling is that the >> packages for which registration is really an issue are in our hands. > > Well, we might announce this option once it is solved for our packages. > >> If you like the idea, I will wrap up a more detailed proposal, and submit it on >> -blends and -science and then on -devel. Could you pair that with an incremental implementation plan? And ask for help were you want help? Cheers, Steffen -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Le Wed, Aug 05, 2009 at 12:05:53PM +0200, Steffen Moeller a écrit :
> Andreas Tille wrote: > > On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote: > >> I have been thinking a bit on the issue. How about the following workflow: > >> > >> - Create a new file with a ???Name: contents??? field syntax in the Debian source > >> packages, for ???online meta-data??? that typically require internet access to > >> be useful. > > > > Sounds reasonable. > > I agree. > > Could we somehow prototype what we want to achieve? > Could you pair that with an incremental implementation plan? And ask for help were you > want help? Dear all, it took some time, but I have now a more concrete proposal. First of all, let's summarise the situation. We want to integrate some metadata in our “web sentinels”, like ‘http://debian-med.alioth.debian.org/tasks/bio’. The simplest for creating these pages is to centralise all the information in the Ultimate Debian Database (http://udd.debian.org/). Typical metadata is bibliographic information or registration URL. The UDD is fed with tables that have to be deposited in a trusted location. The issue is how to prepare the tables with data collected by multiple package maintainers. What I propose is to have a special file in the source packages for gathering all possible useful informations, debian/upstream-metadata.yaml. In contrary to debian/control, this file would not contribute data to the Packages.gz files of the Debian archive. I think that there are enough source packages managed in version control systems that we can use them as the main source of our data. This makes debian/upstream-metadata.yaml available indendantly of the Debian archive, and more importantly, will allow to update the metadata without uploading the package, but in a way that only the maintainers can do the update, which keeps things under control. The missing piece of the puzzle is then an aggregator that would collect the information from the source packages and prepare tables for the UDD. I am drafting such a program at http://upstream-metadata.debian.net/. Currently, it does not do much: http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if the package is in a subversion server that is available to ’debcheckout’. Luckily, most of our packages are. http://upstream-metadata.debian.net/<package>/<key> gives the content of the metadata for one key. For instance, http://upstream-metadata.debian.net/samtools/PMID gives the PubMed identification number for the article describing SamTools, 19505943. This is the proof or principle for data retreival. Then, we need to construct the tables. I plan to have the program store the results in a BerkeleyDB database, and to make it output tables at constant intervals, for instance daily. The update of the internal database would we done in two ways. First, updates could be pushed with commit hooks when package maintainers commit changes to debian/upstream-metadata.yaml. It could be as simple as having an url that triggers an update, and using wget or curl to activate the aggregator. Second, normal read access could trigger an update if the record is getting old. In summary, I propose to store metadata in YAML format in the source pacakges, retreive and store it in a central place using a web agent through the VCS in which the source packages are stored, and periodically output tables for the UDD, which keeps a central role for the generation of our web sentinel pages. The proof of principle presented above is only a few lines of code, but I would prefer discuss further the idea before putting more time on it. Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in the packages I maintain, so that meaningful tests are doable for table generation later. I do not remember the list by heart, but it contains seaview, bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have updated recently. Since I am quite unexperienced in programming, help is of course most welcome. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)[debian-qa in CC because here we are discussing UDD issues.]
On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote: > First of all, let's summarise the situation. We want to integrate some metadata > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. I would like to add that most probably there might evolve even other use cases for this kind of data. Keeping this in mind we might consider moving the topic to debian-devel in the next stage of development. > What I propose is to have a special file in the source packages for gathering > all possible useful informations, debian/upstream-metadata.yaml. I have noticed this and I really like this effort very much (even if I did not actively suported it by adding such a file for packages I touched recently). > In contrary to > debian/control, this file would not contribute data to the Packages.gz files of > the Debian archive. I think that there are enough source packages managed in > version control systems that we can use them as the main source of our data. I'm not really happy about this "we ignore packages which are not maintained in VCS" attitude but it sounds reasonably to assume that in practice all those package that potentially contain such kind of information are actually maintained in a VCS. An alternative way to gather the information popped up in my mind: There is some code that checks the translation status of upstream sources by unpacking all source packages and checking for <lang>.po files. So there is actually some code which handles complete unpacking of Debian source packages which might be used to fetch debian/upstream-metadata.yaml as well. The pro is to get all packages - the con is that it only seeks in already uploaded packages. > This makes debian/upstream-metadata.yaml available indendantly of the Debian > archive, and more importantly, will allow to update the metadata without > uploading the package, but in a way that only the maintainers can do the > update, which keeps things under control. This has a certain advantage of flexibility over the method I suggested above. I'm not sure what way I would prefer. Implementation wise probably the VCS method is way easier to implement - so we probably should stick to your decision - but I wanted to mention an alternative way which IMHO might have slightly more chances to get accepted on debian-devel for general purposes because people there might be interested in completeness. > The missing piece of the puzzle is then an aggregator that would collect the > information from the source packages and prepare tables for the UDD. I am drafting > such a program at http://upstream-metadata.debian.net/. Currently, it does > not do much: > > http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if > the package is in a subversion server that is available to ???debcheckout???. Luckily, > most of our packages are. > > http://upstream-metadata.debian.net/<package>/<key> gives the content of the > metadata for one key. This sounds really good. > For instance, http://upstream-metadata.debian.net/samtools/PMID gives the > PubMed identification number for the article describing SamTools, 19505943. > > This is the proof or principle for data retreival. Then, we need to construct > the tables. I plan to have the program store the results in a BerkeleyDB > database, and to make it output tables at constant intervals, for instance > daily. The update of the internal database would we done in two ways. If you plan to propagate this data to UDD this might not be an optimal solution. UDD imports are usually a two step process: 1. Fetch text data from whatever source in clear text. 2. Delete table, read text data and put it into the table. If we want to follow this scheme for our specific case IMHO it would be the best idea to just drop a <package>.yaml file in a directory where rsync or wget can fetch these files. the second step to read the yaml files is quite simple. > First, updates could be pushed with commit hooks when package maintainers > commit changes to debian/upstream-metadata.yaml. It could be as simple as > having an url that triggers an update, and using wget or curl to activate the > aggregator. > > Second, normal read access could trigger an update if the record is getting old. Currently UDD updates are time based (per cron job) and not event based (per commit of some data). If you gather the data by any means at upstream-metadata.debian.net this is not really relevant for UDD import (OK, it makes sense to synchronise the cron jobs to make sure that upstream-metadata cron job runs before UDD cron job fetches data. So I would vote for the option which is safer to implement. In this aspect I would prefer the second method and run the job once a day. The reason is that if I'm not completely wrong the VCS push would require to configure *every* VCS which *potentially* might contain upstream-metadata.yaml files. This is a weak aproach because you do not have control over all VCSes and chances are very high that this will not happen on all VCSes and it sounds quite hard to propagate changes to the commit hooks (imagine upstream-metadata.debian.net becomes upstream-metadata.debian.org or whatever). In this sense I would vote for relaying on the VCS fields in the packaging information and fetch information via cron job using the Vcs specified in debian/control. > In summary, I propose to store metadata in YAML format in the source pacakges, > retreive and store it in a central place using a web agent through the VCS in > which the source packages are stored, and periodically output tables for the > UDD, which keeps a central role for the generation of our web sentinel pages. I like this approach. But there is one thing I'm not really sure about: How should we design the UDD table? There are two options: CREATE TABLE upstream-metadata ( package text, key1 text, key2 text, ... keyN text, PRIMARY KEY package ); with a defined set of keys allowed in upstream-metadata.yaml and exactly one row per package. Every unknown key will be ignored. The advantage of this approach is that tools *know* what keys to expect and can just relay on how to handle these. Alternatively we could do CREATE TABLE upstream-metadata ( package text, key text, value text, PRIMARY KEY (package,key) ); with an arbitrary number of rows per package but no duplicated keys for one package. This is more flexible in case you need some new kind of data you do not need to touch the UDD table structure but it restricts the keys to only one per package. The thir option is to leave out the PRIMARY KEY constraint at all which allows maximum flexibility (for instance there might be more than one citation records). BTW, I'm a bit concerned about mixing different database formats: On one hand you are using yaml on the other hand BibTeX. Well, for sure having a BibTeX record is very valuable. But on the other hand the tools who are working with this data will need a BibTeX parser. I did not dived into this and for sure it is doable - but I just wanted to raise this topic here to hear opinions. > The proof of principle presented above is only a few lines of code, but I would > prefer discuss further the idea before putting more time on it. Thanks for pushing this foreward! > Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in > the packages I maintain, so that meaningful tests are doable for table > generation later. I do not remember the list by heart, but it contains seaview, > bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have > updated recently. > > Since I am quite unexperienced in programming, help is of course most welcome. As I said above: IMHO most of the work is done if you can provide a set of <package>.yaml files at a freely accessible place. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(Having read what was forwarded to -qa only) May I suggest to provide a little bit more details in a wiki page on wiki.debian.org on this initiative, so that the context is more clear for everybody potentially interested ? I think there's probably a lot of interest beyond UDD for such metadata standardization. My 2 cents, Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit : > [debian-qa in CC because here we are discussing UDD issues.] > > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote: > > First of all, let's summarise the situation. We want to integrate some metadata > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. > > I would like to add that most probably there might evolve even other use > cases for this kind of data. Keeping this in mind we might consider > moving the topic to debian-devel in the next stage of development. > > > What I propose is to have a special file in the source packages for gathering > > all possible useful informations, debian/upstream-metadata.yaml. > -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Using RDF and ontologies for such metadata (combined DOAP and other ontologies) Was: Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(responding with more feedback as I have taken time to dig the debian-med archives) If I get it right, you intend to match bibliographic references and software projects / packages ? I'd very much suggest adopting a Semantic Web perspective in a way to provide such links as RDF descriptions that can use ontologies used already by other applications, hence contributing to LinkedData [0] (maybe through microformats embedded as RDFa in the current 'web sentinels' or as specific RDF feeds. For an example of such application, see : http://www.connotea.org/rss/search?q=SAMtools which mixes RSS 1.0 with other ontologies (and exactly the same example you provided more or less). Here, you may then link DOAP [1] with existing bibliographic ontologies like PRISM. This of course could be provided from UDD also if UDD was to participate more to the Semantic Web as I proposed in http://lists.debian.org/debian-qa/2009/02/msg00016.html and further discussions. Just my 2 cents, [0] : http://linkeddata.org/ [1] : http://trac.usefulinc.com/doap Btw, : Le jeudi 22 octobre 2009 à 09:49 +0200, Andreas Tille a écrit : > Alternatively we could do > > CREATE TABLE upstream-metadata ( > package text, > key text, > value text, > PRIMARY KEY (package,key) > ); This very much looks like triples of RDF, which could store any metadata expressed in any RDF ontology, so that might be really useful ;) Best regards, -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)> On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > First of all, let's summarise the situation. We want to integrate some metadata > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. Dear Andreas and Olivier, thank you for your encouraging comments. I have made one more step forward, and upstream-metadata.debian.net now stores its information in a Berkeley database, refreshing only the data when it is older than a given age when it is accessed. For the moment, we only have 17 source packages that have an upstream-metadata.yaml file in their debian directory that is accessible through a public VCS. Nevertheless, I think that it is enough for a proof of principle. After resetting the database, I ‘loaded’ the data by accessing it: for package in bioperl clustalx mummer seaview perlprimer samtools dicomscope clustalw r-cran-combinat r-cran-haplo.stats r-cran-qvalue r-cran-randomforest r-cran-rocr r-other-bio3d mira bwa infernal ; do wget http://upstream-metadata.debian.net/$package/DOI -O /dev/stdout 2> /dev/null; done After loading, the resulting table are available here: http://upstream-metadata.debian.net/table/DOI Obviously, not all packages contain programs that have been described in an academic article (http://dx.doi.org/)… For the moment, one has to access an arbitrary key, but later the best would be to have a special key, for instance YAML-UPDATE, that would force the update. If it is possible to have a per-file commit hook, then each time a upstream-metadata.yaml is modified, the debian.net site can updated. Next step is to feed the UDD. For the moment, the site produces one table per keyword. The rationale is that for many keywords, the data will be too sparse to be interesting for the UDD. My current idea is to generate the tables for a limited set of curated keywords, assemble them (with the unix join command?), and give leave this in a public place that the UDD can read. In parallel, as Olivier suggested, each table could be exprorted in RDF format. But I am not sure I undersand it. Olivier, could you suggest a Perl module to use? As long as we are in a draft phase, I think that we can live with the currently biggest limitation: the lack of support for packages that are not stored in a VCS. One possible way to solve the problem is to provide repository, for instance in collab-maint on Alioth, where people can drop one yaml file per source packages. We could also unpack source files, as Andreas suggested. For the UDD import, what would be the most suitable among the two propositions of Andreas? > CREATE TABLE upstream-metadata ( > package text, > key1 text, > key2 text, > ... > keyN text, > PRIMARY KEY package > ); > CREATE TABLE upstream-metadata ( > package text, > key text, > value text, > PRIMARY KEY (package,key) > ); Since the addition of more meta-data to our source packages is a frequent issue raised on debian-devel, I think that there is a general interst for standardising ‘field’ names, whichever the technical solution that will be adopted. I will try to find a proper place on wiki.debian.org to let pepole document the fields they create, and if necessary discuss them. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Le Thu, Oct 22, 2009 at 09:49:10AM +0200, Andreas Tille a écrit :
> > BTW, I'm a bit concerned about mixing different database formats: On one > hand you are using yaml on the other hand BibTeX. Well, for sure having > a BibTeX record is very valuable. But on the other hand the tools who > are working with this data will need a BibTeX parser. I did not dived > into this and for sure it is doable - but I just wanted to raise this > topic here to hear opinions. Hi Andreas and all, since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to transfer this part of the discussion on debian-science@..., where I reopened an old thread. http://lists.debian.org/msgid-search/20091026145532.GA6594@... have a nice day, -- Charles -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)On Mon, Oct 26, 2009 at 11:05:10PM +0900, Charles Plessy wrote:
> For the moment, one has to access an arbitrary key, but later the best would be > to have a special key, for instance YAML-UPDATE, that would force the update. Or rather "upstream-metadata update". You certainly would not like to update the YAML standard. ;-) > If it is possible to have a per-file commit hook, then each time a > upstream-metadata.yaml is modified, the debian.net site can updated. As I said: I'm afraid it is hard to ensure that *every* potential VCS has a properly configured commit hook. I'm no VCS expert but it sounds hard to maintain. > Next step is to feed the UDD. For the moment, the site produces one table per > keyword. The rationale is that for many keywords, the data will be too sparse > to be interesting for the UDD. My current idea is to generate the tables for a > limited set of curated keywords, assemble them (with the unix join command?), > and give leave this in a public place that the UDD can read. As I said in my previous mail it is perfectly OK if there is a way to fetch the original upstream-metadata.yaml files in some reasonable way. Reading these is probably much easier than any aggregated format. > For the UDD import, what would be the most suitable among the two propositions > of Andreas? Well, I have no idea - it was a question and I gave the pros and cons for both variants in my mail. > > CREATE TABLE upstream-metadata ( > > package text, > > key1 text, > > key2 text, > > ... > > keyN text, > > PRIMARY KEY package > > ); > > > CREATE TABLE upstream-metadata ( > > package text, > > key text, > > value text, > > PRIMARY KEY (package,key) > > ); > > Since the addition of more meta-data to our source packages is a frequent issue > raised on debian-devel, I think that there is a general interst for > standardising ???field??? names, whichever the technical solution that will be > adopted. So if we have a really standardised set of keywords probably the first method sounds apropriate for the problem. > I will try to find a proper place on wiki.debian.org to let pepole document > the fields they create, and if necessary discuss them. Sounds good Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote:
> since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to > transfer this part of the discussion on debian-science@..., where I reopened > an old thread. Well, the question is not really about BibTeX or not. The question is whether it is a good idea to have a database format as a field value. If you have the field "Publication" and a complete BibTeX record as value I somehow wonder whether this is useful in the end or whether we rather should translate the record in a SQL table. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Le Mon, Oct 26, 2009 at 04:05:17PM +0100, Andreas Tille a écrit :
> On Tue, Oct 27, 2009 at 12:00:48AM +0900, Charles Plessy wrote: > > since BibTeX issues are perhaps a bit specialised for debian-qa, I propose to > > transfer this part of the discussion on debian-science@..., where I reopened > > an old thread. > > Well, the question is not really about BibTeX or not. The question is > whether it is a good idea to have a database format as a field value. > If you have the field "Publication" and a complete BibTeX record as > value I somehow wonder whether this is useful in the end or whether we > rather should translate the record in a SQL table. That is a good question, that I would rephrase: what should be stored, and should everything be exported? For the moment the BibTeX stored reference is a rather experimental feature, and its purpose is also to test the YAML format. As you probalbly noticed, the key parts of the BibTeX reference that allow to construct a weblink to the published article—the digital object identifier (DOI) and the PubMed record ID—have their own YAML mapping: I do not expect the BibTeX reference to be extracted and parsed, nor to be exported to SQL. On the other hand, it can be easily popped out at build time with a Perl oneliner (‘http://lists.debian.org/msgid-search/20090808073608.GF17276@...’). [For further discussion about how to make nice links on the Blends web sentinels, I propose to elaborate on another list.] There is another volatile meta-data with a much broader scope that could be included in the upstream-metadata.yaml file (or whichever smarter name we give to it), the Debian watch file. All the objections you made above apply. We could either store it raw in a YAML mapping, like: Watch: | version=3 opts=dversionmangle=s/~dfsg// \ http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2 Or split the information in multiple mappings: Watch-Version : 3 Watch-Options : dversionmangle=s/~dfsg// Watch-Regexp : http://sf.net/samtools/samtools-([\d\.]*)\.tar\.bz2 While the last option looks more structured, we should really think twice if it makes sense to have the ‘Watch’ metadata in a tabluar SQL database, or if simply storing it raw somewhere else is enough. The same conclusion may apply to similar resources like the BibTeX reference. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)Hi.
(Responding a little late after vacation time.) Le lundi 26 octobre 2009 à 23:05 +0900, Charles Plessy a écrit : > > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote: > > > First of all, let's summarise the situation. We want to integrate some metadata > > > in our 'web sentinels', like 'http://debian-med.alioth.debian.org/tasks/bio'. > > Dear Andreas and Olivier, > > thank you for your encouraging comments. SNIP > In parallel, as Olivier suggested, each table could be exprorted in RDF format. > But I am not sure I undersand it. What exactly don't you understand ? ;) If you look back at the pointers I provided in http://lists.debian.org/debian-qa/2009/10/msg00050.html you'll find an example of using the PRISM and CONNOTEA ontologies for links with DOI and PUBMED IDs (more details in http://www.prismstandard.org/resources/mod_prism.html maybe). > Olivier, could you suggest a Perl module to > use? > I suppose that searching for perl+rdf on your preferred search engine will retrieve useful code ;) I'm not a perl hacker myself, but as RDF is a standard of the W3C, there are probably plenty of perl code to produce RDF. http://search.cpan.org/~mthurn/RDF-Simple-0.415/lib/RDF/Simple/Serialiser.pm seems to be a valid candidate for first experiments. Hope this helps. Best regards, -- Olivier BERGER <olivier.berger@...> http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-med-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
| Free embeddable forum powered by Nabble | Forum Help |