|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
PubMed records (was: MeSH terms) <alsaplayer-devel@...>
I'm not sure if this is related to the MeSH question question or not, but I've googled the documentation several times and never managed to find "robust" examples for how to manipulate PubMed records. It would seem that there ought to be code lying around which does: Given Genbank ID, Fetch all Pubmed records from that ID Fetch all related records (via NCBI's "related" record IDs) Purge the list of duplicates, then do things like fetch all of the abstracts or fetch all of the MeSH headings, etc. for all of those records. Another example would include fetching all records of relatedness (i.e. a PubMed tree of depth N (or cloud of some max N)). I think that one can use NCBI's fetch interface to do this (one could do it by having NCBI email you all of the PubMed results and have an email harvester collect those results, parse them and setup a new set of queries). Of course this seems like an overhead intensive way to do this. Given the fact that increasing amounts of information is becoming open to the public one could consider even parsing the published papers and supplemental files (e.g. XLS tables) for genes of interest (as it seems the authors of most work as well as the PubMed record processors fail to provide or research the gene name information that is supposed to be in the PubMed records). Now it may simply be that its because I lack sufficient experience with the BioPerl documentation that I am unaware of the functions/tools which do this type of thing. So if anyone has any hints/pointers they would be appreciated. Thanks, Robert Bradbury _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: PubMed records (was: MeSH terms)Robert,
Not sure what "robust" means - would "working" suffice? Also, you suggested starting with a Genbank id but what I'm about to show you starts with Pubmed ids, at the other end. What I will do is take some of this and make a little script for Bioperl's examples/ directory. In the meantime, here is some code: #!/bin/perl -w use Bio::Biblio; my $pmid = 52; my $biblio = Bio::Biblio->new(-access => "eutils"); my $ref = $biblio->get_by_id($pmid); # $ref contains raw XML print $ref,"\n"; And what it prints is below. Brian O. <?xml version="1.0"?> <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2009//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_090101.dtd "> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="MEDLINE"> <PMID>52</PMID> <DateCreated> <Year>1976</Year> <Month>02</Month> <Day>09</Day> </DateCreated> <DateCompleted> <Year>1976</Year> <Month>02</Month> <Day>09</Day> </DateCompleted> <DateRevised> <Year>2006</Year> <Month>11</Month> <Day>15</Day> </DateRevised> <Article PubModel="Print"> <Journal> <ISSN IssnType="Print">0006-2960</ISSN> <JournalIssue CitedMedium="Print"> <Volume>14</Volume> <Issue>24</Issue> <PubDate> <Year>1975</Year> <Month>Dec</Month> <Day>2</Day> </PubDate> </JournalIssue> <Title>Biochemistry</Title> <ISOAbbreviation>Biochemistry</ISOAbbreviation> </Journal> <ArticleTitle>Evidence of the involvement of a 50S ribosomal protein in several active sites.</ArticleTitle> <Pagination> <MedlinePgn>5321-7</MedlinePgn> </Pagination> <Abstract> <AbstractText>The functional role of the Bacillus stearothermophilus 50S ribosomal protein B-L3 (probably homologous to the Escherichia coli protein L2) was examined by chemical modification. The complex [B-L3-23S RNA] was photooxidized in the presence of rose bengal and the modified protein incorporated by reconstitution into 50S ribosomal subunits containing all other unmodified components. Particles containing photooxidized B-L3 are defective in several functional assays, including (1) poly(U)-directed poly(Phe) synthesis, (2) peptidyltransferase activity, (3) ability to associate with a [30S-poly(U)-Phe-tRNA] complex, and (4) binding of elongation factor G and GTP. The rates of loss of the partial functional activities during photooxidation of B-L3 indicate that at least two independent inactivating events are occurring, a faster one, involving oxidation of one or more histidine residues, affecting peptidyltransferase and subunit association activities and a slower one affecting EF-G binding. Therefore the protein B-L3 has one or more histidine residues which are essential for peptidyltransferase and subunit association, and another residue which is essential for EF-G- GTP binding. B-L3 may be the ribosomal peptidyltransferase protein, or a part of the active site, and may contribute functional groups to the other active sites as well.</AbstractText> </Abstract> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Fahnestock</LastName> <ForeName>S R</ForeName> <Initials>SR</Initials> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType>Journal Article</PublicationType> <PublicationType>Research Support, U.S. Gov't, P.H.S.</PublicationType> </PublicationTypeList> </Article> <MedlineJournalInfo> <Country>UNITED STATES</Country> <MedlineTA>Biochemistry</MedlineTA> <NlmUniqueID>0370623</NlmUniqueID> </MedlineJournalInfo> <ChemicalList> <Chemical> <RegistryNumber>0</RegistryNumber> <NameOfSubstance>Macromolecular Substances</ NameOfSubstance> </Chemical> <Chemical> <RegistryNumber>0</RegistryNumber> <NameOfSubstance>Ribosomal Proteins</NameOfSubstance> </Chemical> </ChemicalList> <CitationSubset>IM</CitationSubset> <MeshHeadingList> <MeshHeading> <DescriptorName MajorTopicYN="N">Bacillus stearothermophilus</DescriptorName> <QualifierName MajorTopicYN="Y">metabolism</ QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Binding Sites</ DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Hydrogen-Ion Concentration</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Kinetics</ DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Macromolecular Substances</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Oxidation-Reduction</ DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Photochemistry</ DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Protein Binding</ DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Ribosomal Proteins</ DescriptorName> <QualifierName MajorTopicYN="Y">metabolism</ QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Ribosomes</ DescriptorName> <QualifierName MajorTopicYN="N">metabolism</ QualifierName> </MeshHeading> </MeshHeadingList> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="pubmed"> <Year>1975</Year> <Month>12</Month> <Day>2</Day> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>1975</Year> <Month>12</Month> <Day>2</Day> <Hour>0</Hour> <Minute>1</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="entrez"> <Year>1975</Year> <Month>12</Month> <Day>2</Day> <Hour>0</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>ppublish</PublicationStatus> <ArticleIdList> <ArticleId IdType="pubmed">52</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet> On Oct 24, 2009, at 2:45 PM, Robert Bradbury wrote: > <alsaplayer-devel@...> > I'm not sure if this is related to the MeSH question question or > not, but > I've googled the documentation several times and never managed to find > "robust" examples for how to manipulate PubMed records. > > It would seem that there ought to be code lying around which does: > Given Genbank ID, > Fetch all Pubmed records from that ID > Fetch all related records (via NCBI's "related" record IDs) > > Purge the list of duplicates, then do things like fetch all of the > abstracts or fetch all of the MeSH headings, etc. for all of those > records. > > Another example would include fetching all records of relatedness > (i.e. a > PubMed tree of depth N (or cloud of some max N)). > > I think that one can use NCBI's fetch interface to do this (one > could do it > by having NCBI email you all of the PubMed results and have an email > harvester collect those results, parse them and setup a new set of > queries). Of course this seems like an overhead intensive way to do > this. > Given the fact that increasing amounts of information is becoming > open to > the public one could consider even parsing the published papers and > supplemental files (e.g. XLS tables) for genes of interest (as it > seems the > authors of most work as well as the PubMed record processors fail to > provide > or research the gene name information that is supposed to be in the > PubMed > records). > > Now it may simply be that its because I lack sufficient experience > with the > BioPerl documentation that I am unaware of the functions/tools which > do this > type of thing. So if anyone has any hints/pointers they would be > appreciated. > > Thanks, > Robert Bradbury > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: PubMed records (was: MeSH terms)On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury
<robert.bradbury@...> wrote: > <alsaplayer-devel@...> > I'm not sure if this is related to the MeSH question question or not, but > I've googled the documentation several times and never managed to find > "robust" examples for how to manipulate PubMed records. > > It would seem that there ought to be code lying around which does: > Given Genbank ID, > Fetch all Pubmed records from that ID > Fetch all related records (via NCBI's "related" record IDs) Isn't this exactly what the NCBI's ELink is for? http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html You'd need to work out which of the Entrez databases you are starting from (probably protein or genome), and then the relevant ELink command (maybe genome_pubmed, or protein_pubmed, protein_pubmed or protein_pubmed_weighted look possible). Then for related pubmed articles, the ELink command is just pubmed_pubmed. Peter _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: PubMed records (was: MeSH terms)On Sun, Oct 25, 2009 at 7:12 AM, Peter <biopython@...>wrote:
> > Isn't this exactly what the NCBI's ELink is for? > (snip) > Interesting. I knew they had some of this but I didn't know they had extended it so far (obviously the cloud of computers normally running blast searches is keeping itself busy when there is nothing better to do). [Aside: Does anyone know exactly what NCBI's computing capacity is? Or how this compares with the other major sites (Ensembl, Sanger, Broad, UCSC) capacity?] Now, extending Brian's helpful comment, is there anything within BioPerl or CPAN in general which can take the "link cloud" and extend it into "degree of connectivity cloud". If anyone has ever used the Linux utility "etherape" they will have an idea of what I'm talking about. Etherape displays the local network traffic in a grapical format with nodes for which machines are communicating with each other and colors, sizes and brightness of the links between them representing the traffic type, amount and age of the communications. I'm sure that agencies which shall not be named have similar display programs which deal with the quantity and type of "chatter" between "persons of interest". This is all pretty standard stuff from a "network graph" standpoint. This type of graphical information can be highly useful from a "systems biology" standpoint as well as an educational standpoint when one wants to understand something like the components of a protein complex, the time domain of gene expression, the activity of research in an area, etc. So if the information is available I'm wondering what the tools are to make it useful? With the amount of computing power now available to home users now one could begin to think of very creative ways of using this type of information from a data mining standpoint. Robert _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: PubMed records (was: MeSH terms)On Oct 25, 2009, at 6:12 AM, Peter wrote:
> On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury > <robert.bradbury@...> wrote: >> <alsaplayer-devel@...> >> I'm not sure if this is related to the MeSH question question or >> not, but >> I've googled the documentation several times and never managed to >> find >> "robust" examples for how to manipulate PubMed records. >> >> It would seem that there ought to be code lying around which does: >> Given Genbank ID, >> Fetch all Pubmed records from that ID >> Fetch all related records (via NCBI's "related" record IDs) > > Isn't this exactly what the NCBI's ELink is for? > > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html > > You'd need to work out which of the Entrez databases you are > starting from (probably protein or genome), and then the relevant > ELink command (maybe genome_pubmed, or protein_pubmed, > protein_pubmed or protein_pubmed_weighted look possible). > Then for related pubmed articles, the ELink command is just > pubmed_pubmed. > > Peter Agreed. This should be possible through BioPerl's Bio::DB::EUtilities. I'll try to post an example up. chris _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: PubMed records (was: MeSH terms)On Oct 25, 2009, at 9:16 AM, Chris Fields wrote: > On Oct 25, 2009, at 6:12 AM, Peter wrote: > >> On Sat, Oct 24, 2009 at 6:45 PM, Robert Bradbury >> <robert.bradbury@...> wrote: >>> <alsaplayer-devel@...> >>> I'm not sure if this is related to the MeSH question question or >>> not, but >>> I've googled the documentation several times and never managed to >>> find >>> "robust" examples for how to manipulate PubMed records. >>> >>> It would seem that there ought to be code lying around which does: >>> Given Genbank ID, >>> Fetch all Pubmed records from that ID >>> Fetch all related records (via NCBI's "related" record IDs) >> >> Isn't this exactly what the NCBI's ELink is for? >> >> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html >> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html >> >> You'd need to work out which of the Entrez databases you are >> starting from (probably protein or genome), and then the relevant >> ELink command (maybe genome_pubmed, or protein_pubmed, >> protein_pubmed or protein_pubmed_weighted look possible). >> Then for related pubmed articles, the ELink command is just >> pubmed_pubmed. >> >> Peter > > Agreed. This should be possible through BioPerl's > Bio::DB::EUtilities. I'll try to post an example up. > > chris As promised... chris PS. I've been thinking about a couple of small additions: 1) normalizing how one indicates whether or not to use eutil cookies/ history (indicating such currently requires one to be more explicit), 2) adding a pipeline-like utility where one could pass in a series of hashrefs with databases and it will link them all together using eutil history/cookies. If there is a need for this I can start working on it over the next few weeks, after I finally get the next set of alphas out. -------------------------------------- #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $term = 'Notch3 AND "Mus musculus"[ORGN]'; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'protein', -term => $term, -usehistory => 'y'); my $hist = $eutil->next_History || die "No queue history returned"; # can stipulate the actual linkname as well using -linkname $eutil->reset_parameters(-eutil => 'elink', -db => 'pubmed', -dbfrom => 'protein', -history => $hist, -cmd => 'neighbor_history'); $hist = $eutil->next_History || die "No queue history returned"; # adjust -retstart/-retmax to get more results $eutil->reset_parameters(-eutil => 'esummary', -db => 'protein', -history => $hist); $eutil->print_all; _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |