PubMed records (was: MeSH terms)

View: New views
6 Messages — Rating Filter:   Alert me  

PubMed records (was: MeSH terms)

by RobertBradbury :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 <alsaplayer-devel@...>
I'm not sure if this is related to the MeSH question question or not, but
I've googled the documentation several times and never managed to find
"robust" examples for how to manipulate PubMed records.

It would seem that there ought to be code lying around which does:
  Given Genbank ID,
     Fetch all Pubmed records from that ID
         Fetch all related records (via NCBI's "related" record IDs)

     Purge the list of duplicates, then do things like fetch all of the
abstracts or fetch all of the MeSH headings, etc. for all of those records.

Another example would include fetching all records of relatedness (i.e. a
PubMed tree of depth N (or cloud of some max N)).

I think that one can use NCBI's fetch interface to do this (one could do it
by having NCBI email you all of the PubMed results and have an email
harvester collect those results, parse them and setup a new set of
queries).  Of course this seems like an overhead intensive way to do this.
Given the fact that increasing amounts of information is becoming open to
the public one could consider even parsing the published papers and
supplemental files (e.g. XLS tables) for genes of interest (as it seems the
authors of most work as well as the PubMed record processors fail to provide
or research the gene name information that is supposed to be in the PubMed
records).

Now it may simply be that its because I lack sufficient experience with the
BioPerl documentation that I am unaware of the functions/tools which do this
type of thing.  So if anyone has any hints/pointers they would be
appreciated.

Thanks,
Robert Bradbury
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: PubMed records (was: MeSH terms)

by Brian Osborne-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Robert,

Not sure what "robust" means - would "working" suffice? Also, you  
suggested starting with a Genbank id but what I'm about to show you  
starts with Pubmed ids, at the other end. What I will do is take some  
of this and make a little script for Bioperl's examples/ directory. In  
the meantime, here is some code:

#!/bin/perl -w

use Bio::Biblio;

my $pmid = 52;

my $biblio = Bio::Biblio->new(-access => "eutils");

my $ref = $biblio->get_by_id($pmid);

# $ref contains raw XML
print $ref,"\n";

And what it prints is below.

Brian O.

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2009//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_090101.dtd 
">
<PubmedArticleSet>
<PubmedArticle>
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <PMID>52</PMID>
         <DateCreated>
             <Year>1976</Year>
             <Month>02</Month>
             <Day>09</Day>
         </DateCreated>
         <DateCompleted>
             <Year>1976</Year>
             <Month>02</Month>
             <Day>09</Day>
         </DateCompleted>
         <DateRevised>
             <Year>2006</Year>
             <Month>11</Month>
             <Day>15</Day>
         </DateRevised>
         <Article PubModel="Print">
             <Journal>
                 <ISSN IssnType="Print">0006-2960</ISSN>
                 <JournalIssue CitedMedium="Print">
                     <Volume>14</Volume>
                     <Issue>24</Issue>
                     <PubDate>
                         <Year>1975</Year>
                         <Month>Dec</Month>
                         <Day>2</Day>
                     </PubDate>
                 </JournalIssue>
                 <Title>Biochemistry</Title>
                 <ISOAbbreviation>Biochemistry</ISOAbbreviation>
             </Journal>
             <ArticleTitle>Evidence of the involvement of a 50S  
ribosomal protein in several active sites.</ArticleTitle>
             <Pagination>
                 <MedlinePgn>5321-7</MedlinePgn>
             </Pagination>
             <Abstract>
                 <AbstractText>The functional role of the Bacillus  
stearothermophilus 50S ribosomal protein B-L3 (probably homologous to  
the Escherichia coli protein L2) was examined by chemical  
modification. The complex [B-L3-23S RNA] was photooxidized in the  
presence of rose bengal and the modified protein incorporated by  
reconstitution into 50S ribosomal subunits containing all other  
unmodified components. Particles containing photooxidized B-L3 are  
defective in several functional assays, including (1) poly(U)-directed  
poly(Phe) synthesis, (2) peptidyltransferase activity, (3) ability to  
associate with a [30S-poly(U)-Phe-tRNA] complex, and (4) binding of  
elongation factor G and GTP. The rates of loss of the partial  
functional activities during photooxidation of B-L3 indicate that at  
least two independent inactivating events are occurring, a faster one,  
involving oxidation of one or more histidine residues, affecting  
peptidyltransferase and subunit association activities and a slower  
one affecting EF-G binding. Therefore the protein B-L3 has one or more  
histidine residues which are essential for peptidyltransferase and  
subunit association, and another residue which is essential for EF-G-
GTP binding. B-L3 may be the ribosomal peptidyltransferase protein, or  
a part of the active site, and may contribute functional groups to the  
other active sites as well.</AbstractText>
             </Abstract>
             <AuthorList CompleteYN="Y">
                 <Author ValidYN="Y">
                     <LastName>Fahnestock</LastName>
                     <ForeName>S R</ForeName>
                     <Initials>SR</Initials>
                 </Author>
             </AuthorList>
             <Language>eng</Language>
             <PublicationTypeList>
                 <PublicationType>Journal Article</PublicationType>
                 <PublicationType>Research Support, U.S. Gov't,  
P.H.S.</PublicationType>
             </PublicationTypeList>
         </Article>
         <MedlineJournalInfo>
             <Country>UNITED STATES</Country>
             <MedlineTA>Biochemistry</MedlineTA>
             <NlmUniqueID>0370623</NlmUniqueID>
         </MedlineJournalInfo>
         <ChemicalList>
             <Chemical>
                 <RegistryNumber>0</RegistryNumber>
                 <NameOfSubstance>Macromolecular Substances</
NameOfSubstance>
             </Chemical>
             <Chemical>
                 <RegistryNumber>0</RegistryNumber>
                 <NameOfSubstance>Ribosomal Proteins</NameOfSubstance>
             </Chemical>
         </ChemicalList>
         <CitationSubset>IM</CitationSubset>
         <MeshHeadingList>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Bacillus  
stearothermophilus</DescriptorName>
                 <QualifierName MajorTopicYN="Y">metabolism</
QualifierName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Binding Sites</
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Hydrogen-Ion  
Concentration</DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Kinetics</
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Macromolecular  
Substances</DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Oxidation-Reduction</
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Photochemistry</
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Protein Binding</
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Ribosomal Proteins</
DescriptorName>
                 <QualifierName MajorTopicYN="Y">metabolism</
QualifierName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Ribosomes</
DescriptorName>
                 <QualifierName MajorTopicYN="N">metabolism</
QualifierName>
             </MeshHeading>
         </MeshHeadingList>
     </MedlineCitation>
     <PubmedData>
         <History>
             <PubMedPubDate PubStatus="pubmed">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
             </PubMedPubDate>
             <PubMedPubDate PubStatus="medline">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
                 <Hour>0</Hour>
                 <Minute>1</Minute>
             </PubMedPubDate>
             <PubMedPubDate PubStatus="entrez">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
                 <Hour>0</Hour>
                 <Minute>0</Minute>
             </PubMedPubDate>
         </History>
         <PublicationStatus>ppublish</PublicationStatus>
         <ArticleIdList>
             <ArticleId IdType="pubmed">52</ArticleId>
         </ArticleIdList>
     </PubmedData>
</PubmedArticle>


</PubmedArticleSet>



On Oct 24, 2009, at 2:45 PM, Robert Bradbury wrote:

> <alsaplayer-devel@...>
> I'm not sure if this is related to the MeSH question question or  
> not, but
> I've googled the documentation several times and never managed to find
> "robust" examples for how to manipulate PubMed records.
>
> It would seem that there ought to be code lying around which does:
>  Given Genbank ID,
>     Fetch all Pubmed records from that ID
>         Fetch all related records (via NCBI's "related" record IDs)
>
>     Purge the list of duplicates, then do things like fetch all of the
> abstracts or fetch all of the MeSH headings, etc. for all of those  
> records.
>
> Another example would include fetching all records of relatedness  
> (i.e. a
> PubMed tree of depth N (or cloud of some max N)).
>
> I think that one can use NCBI's fetch interface to do this (one  
> could do it
> by having NCBI email you all of the PubMed results and have an email
> harvester collect those results, parse them and setup a new set of
> queries).  Of course this seems like an overhead intensive way to do  
> this.
> Given the fact that increasing amounts of information is becoming  
> open to
> the public one could consider even parsing the published papers and
> supplemental files (e.g. XLS tables) for genes of interest (as it  
> seems the
> authors of most work as well as the PubMed record processors fail to  
> provide
> or research the gene name information that is supposed to be in the  
> PubMed
> records).
>
> Now it may simply be that its because I lack sufficient experience  
> with the
> BioPerl documentation that I am unaware of the functions/tools which  
> do this
> type of thing.  So if anyone has any hints/pointers they would be
> appreciated.
>
> Thanks,
> Robert Bradbury
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: PubMed records (was: MeSH terms)

by Peter-329 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury
<robert.bradbury@...> wrote:
>  <alsaplayer-devel@...>
> I'm not sure if this is related to the MeSH question question or not, but
> I've googled the documentation several times and never managed to find
> "robust" examples for how to manipulate PubMed records.
>
> It would seem that there ought to be code lying around which does:
>  Given Genbank ID,
>     Fetch all Pubmed records from that ID
>         Fetch all related records (via NCBI's "related" record IDs)

Isn't this exactly what the NCBI's ELink is for?

http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html
http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html

You'd need to work out which of the Entrez databases you are
starting from (probably protein or genome), and then the relevant
ELink command (maybe genome_pubmed, or protein_pubmed,
protein_pubmed or protein_pubmed_weighted look possible).
Then for related pubmed articles, the ELink command is just
pubmed_pubmed.

Peter

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: PubMed records (was: MeSH terms)

by RobertBradbury :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Oct 25, 2009 at 7:12 AM, Peter <biopython@...>wrote:

>
> Isn't this exactly what the NCBI's ELink is for?
> (snip)
>

Interesting.  I knew they had some of this but I didn't know they had
extended it so far (obviously the cloud of computers normally running blast
searches is keeping itself busy when there is nothing better to do).

[Aside: Does anyone know exactly what NCBI's computing capacity is?  Or how
this compares with the other major sites (Ensembl, Sanger, Broad, UCSC)
capacity?]

Now, extending Brian's helpful comment, is there anything within BioPerl or
CPAN in general which can take the "link cloud" and extend it into "degree
of connectivity cloud".  If anyone has ever used the Linux utility
"etherape" they will have an idea of what I'm talking about.  Etherape
displays the local network traffic in a grapical format with nodes for which
machines are communicating with each other and colors, sizes and brightness
of the links between them representing the traffic type, amount and age of
the communications.  I'm sure that agencies which shall not be named have
similar display programs which deal with the quantity and type of "chatter"
between "persons of interest".  This is all pretty standard stuff from a
"network graph" standpoint.

This type of graphical information can be highly useful from a "systems
biology" standpoint as well as an educational standpoint when one wants to
understand something like the components of a protein complex, the time
domain of gene expression, the activity of research in an area, etc.

So if the information is available I'm wondering what the tools are to make
it useful?  With the amount of computing power now available to home users
now one could begin to think of very creative ways of using this type of
information from a data mining standpoint.

Robert
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: PubMed records (was: MeSH terms)

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 25, 2009, at 6:12 AM, Peter wrote:

> On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury
> <robert.bradbury@...> wrote:
>>  <alsaplayer-devel@...>
>> I'm not sure if this is related to the MeSH question question or  
>> not, but
>> I've googled the documentation several times and never managed to  
>> find
>> "robust" examples for how to manipulate PubMed records.
>>
>> It would seem that there ought to be code lying around which does:
>>  Given Genbank ID,
>>     Fetch all Pubmed records from that ID
>>         Fetch all related records (via NCBI's "related" record IDs)
>
> Isn't this exactly what the NCBI's ELink is for?
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html
>
> You'd need to work out which of the Entrez databases you are
> starting from (probably protein or genome), and then the relevant
> ELink command (maybe genome_pubmed, or protein_pubmed,
> protein_pubmed or protein_pubmed_weighted look possible).
> Then for related pubmed articles, the ELink command is just
> pubmed_pubmed.
>
> Peter

Agreed.  This should be possible through BioPerl's  
Bio::DB::EUtilities.  I'll try to post an example up.

chris
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: PubMed records (was: MeSH terms)

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Oct 25, 2009, at 9:16 AM, Chris Fields wrote:

> On Oct 25, 2009, at 6:12 AM, Peter wrote:
>
>> On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury
>> <robert.bradbury@...> wrote:
>>> <alsaplayer-devel@...>
>>> I'm not sure if this is related to the MeSH question question or  
>>> not, but
>>> I've googled the documentation several times and never managed to  
>>> find
>>> "robust" examples for how to manipulate PubMed records.
>>>
>>> It would seem that there ought to be code lying around which does:
>>> Given Genbank ID,
>>>    Fetch all Pubmed records from that ID
>>>        Fetch all related records (via NCBI's "related" record IDs)
>>
>> Isn't this exactly what the NCBI's ELink is for?
>>
>> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html
>> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html
>>
>> You'd need to work out which of the Entrez databases you are
>> starting from (probably protein or genome), and then the relevant
>> ELink command (maybe genome_pubmed, or protein_pubmed,
>> protein_pubmed or protein_pubmed_weighted look possible).
>> Then for related pubmed articles, the ELink command is just
>> pubmed_pubmed.
>>
>> Peter
>
> Agreed.  This should be possible through BioPerl's  
> Bio::DB::EUtilities.  I'll try to post an example up.
>
> chris

As promised...

chris

PS. I've been thinking about a couple of small additions:

1) normalizing how one indicates whether or not to use eutil cookies/
history (indicating such currently requires one to be more explicit),
2) adding a pipeline-like utility where one could pass in a series of  
hashrefs with databases and it will link them all together using eutil  
history/cookies.

If there is a need for this I can start working on it over the next  
few weeks, after I finally get the next set of alphas out.

--------------------------------------

#!/usr/bin/perl -w
use strict;
use warnings;
use Bio::DB::EUtilities;

my $term = 'Notch3 AND "Mus musculus"[ORGN]';

my $eutil = Bio::DB::EUtilities->new(-eutil     => 'esearch',
                                      -db        => 'protein',
                                      -term      => $term,
                                      -usehistory => 'y');

my $hist = $eutil->next_History || die "No queue history returned";

# can stipulate the actual linkname as well using -linkname
$eutil->reset_parameters(-eutil     => 'elink',
                          -db        => 'pubmed',
                          -dbfrom    => 'protein',
                          -history   => $hist,
                          -cmd       => 'neighbor_history');

$hist = $eutil->next_History || die "No queue history returned";

# adjust -retstart/-retmax to get more results
$eutil->reset_parameters(-eutil       => 'esummary',
                          -db          => 'protein',
                          -history     => $hist);

$eutil->print_all;


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l