Bio::Index::GenBank - by organism?

View: New views
5 Messages — Rating Filter:   Alert me  

Bio::Index::GenBank - by organism?

by Jay Hannah :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Many thanks to Ewan Birney et. al. for Bio::Index::*

I can throw away my awful grep based index-by-accession stuff.   :)

Any chance someone has also written an organism based index mechanism?  
Something like...

while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
    print $seq->display_id . "\n";
}

Thanks,

j




_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Bio::Index::GenBank - by organism?

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote:

> Many thanks to Ewan Birney et. al. for Bio::Index::*
>
> I can throw away my awful grep based index-by-accession stuff.   :)
>
> Any chance someone has also written an organism based index  
> mechanism? Something like...
>
> while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
>   print $seq->display_id . "\n";
> }
>
> Thanks,
>
> j

It should work via id_parser(); from Bio::Index::GenBank:

    $inx->id_parser(\&get_id);
    # make the index
    $inx->make_index($file_name);

    # here is where the retrieval key is specified
    sub get_id {
       my $line = shift;
       $line =~ /clone="(\S+)"/;
       $1;
    }

Change the code ref deal with the line you want and parse the name  
out.  Caveat: this may not be absolutely perfect (it only passes in a  
line at a time, and some species lines will wrap).  Also not sure how  
this would work in cases where multiple sequences from the same  
species are present.

The other option is to preparse everything and tie a hash to store a  
species->UID map, then use that along with your Bio::Index index to  
grab what you need.

chris
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Bio::Index::GenBank - by organism?

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You might also look at what mygenbank does:
http://homepage.mac.com/iankorf/mygenbank.html

On Nov 9, 2009, at 7:55 PM, Chris Fields wrote:

> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote:
>
>> Many thanks to Ewan Birney et. al. for Bio::Index::*
>>
>> I can throw away my awful grep based index-by-accession stuff.   :)
>>
>> Any chance someone has also written an organism based index  
>> mechanism? Something like...
>>
>> while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
>>  print $seq->display_id . "\n";
>> }
>>
>> Thanks,
>>
>> j
>
> It should work via id_parser(); from Bio::Index::GenBank:
>
>   $inx->id_parser(\&get_id);
>   # make the index
>   $inx->make_index($file_name);
>
>   # here is where the retrieval key is specified
>   sub get_id {
>      my $line = shift;
>      $line =~ /clone="(\S+)"/;
>      $1;
>   }
>
> Change the code ref deal with the line you want and parse the name  
> out.  Caveat: this may not be absolutely perfect (it only passes in  
> a line at a time, and some species lines will wrap).  Also not sure  
> how this would work in cases where multiple sequences from the same  
> species are present.
>
> The other option is to preparse everything and tie a hash to store a  
> species->UID map, then use that along with your Bio::Index index to  
> grab what you need.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich@...
jason@...


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Bio::Index::GenBank - by organism?

by Jay Hannah :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 9, 2009, at 9:55 PM, Chris Fields wrote:

> It should work via id_parser(); from Bio::Index::GenBank:
>
>   $inx->id_parser(\&get_id);
>   # make the index
>   $inx->make_index($file_name);
>
>   # here is where the retrieval key is specified
>   sub get_id {
>      my $line = shift;
>      $line =~ /clone="(\S+)"/;
>      $1;
>   }

This worked great for me today (tackling a different problem than the original).  Thanks!!

j



_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Bio::Index::GenBank - by organism?

by Jay Hannah :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 10, 2009, at 12:50 PM, Jason Stajich wrote:
> You might also look at what mygenbank does:
> http://homepage.mac.com/iankorf/mygenbank.html

It appears, perhaps, that BioSQL can provide *foo* searching like so:

http://www.biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME

 SELECT DISTINCT include.ncbi_taxon_id FROM taxon
    INNER JOIN taxon AS include ON
      (include.left_value BETWEEN taxon.left_value
        AND taxon.right_value)
 WHERE taxon.taxon_id IN
   (SELECT taxon_id FROM taxon_name
    WHERE name LIKE '%fungi%')

So I think we're going to chase that for a while.

I didn't see a *foo* search in MyGenBank?

Thanks,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l