While loop - SearchIO for BioPerl

View: New views
8 Messages — Rating Filter:   Alert me  

While loop - SearchIO for BioPerl

by Rytsareva, Inna (I) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I have a follow script to parse the BLAST  report:

my $in = Bio::SearchIO->new (   -file =>$out_file,
                                -format =>'blast') or die $!;

while (my $result = $in->next_result) {
        while (my $hit = $result->next_hit)
        {
                while (my $hsp = $hit->next_hsp) {
                        $qhit = $hit->name;
                        $start = $hsp->hit->start;
                        $end = $hsp->hit->end;
                                                        }
                               
                       
        } print "Hit= ", $qhit,
                        ",Start = ", $start,
                        ",End = ", $end,"\n";
                                        }

Usually, the report has a number of the same hsp for each hit.
Using "print" command it gives me a hit name, start and end positions
for each hit, except last on. For last one it prints all the hsps.
Something like this:

Hit= gnl|DAS|22386,Start = 7578,End = 7601
Hit= gnl|DAS|25627,Start = 2824,End = 2863
Hit= gnl|DAS|25328,Start = 8864,End = 8887
Hit= gnl|DAS|4890,Start = 1896,End = 1919
Hit= gnl|DAS|12191,Start = 1898,End = 1921
Hit= gnl|DAS|4276,Start = 557,End = 580
Hit= gnl|DAS|12959,Start = 801,End = 824
Hit= gnl|DAS|4092,Start = 2266,End = 2304
Hit= gnl|DAS|19740,Start = 13572,End = 13610
Hit= gnl|DAS|12393,Start = 3901,End = 3924
Hit= gnl|DAS|25687,Start = 10415,End = 10438
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
Hit= gnl|DAS|12277,Start = 7410,End = 7433
 
Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
I don't need these duplicates.
How can I fix that?

Thanks,
Inna Rytsareva
Discovery Information Management
Dow AgroSciences
Indianapolis, IN
317-337-4716


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Torsten Seemann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Inna,

> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
> I don't need these duplicates.
> How can I fix that?

>                        $start = $hsp->hit->start;
>                        $end = $hsp->hit->end;

Are you sure you mean $hsp->hit->start ?
Perhaps you mean $hsp->start() or $hsp->start('hit') ?


--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
University, AUSTRALIA

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

both work...TMTOWTDI
$hsp->query->start and $hsp->start('query') are equivalent.
as are $hsp->hit->start and $hsp->start('hit') .

On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote:

> Inna,
>
>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>> I don't need these duplicates.
>> How can I fix that?
>
>>                        $start = $hsp->hit->start;
>>                        $end = $hsp->hit->end;
>
> Are you sure you mean $hsp->hit->start ?
> Perhaps you mean $hsp->start() or $hsp->start('hit') ?
>
>
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
> University, AUSTRALIA
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason@...

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My guess would be you have multiple query sequences (27, to be exact)
that hit the same subject, viz. 12277
MAJ

----- Original Message -----
From: "Rytsareva, Inna (I)" <IRytsareva@...>
To: <bioperl-l@...>
Sent: Wednesday, July 08, 2009 3:42 PM
Subject: [Bioperl-l] While loop - SearchIO for BioPerl


> Hello,
>
> I have a follow script to parse the BLAST  report:
>
> my $in = Bio::SearchIO->new (  -file =>$out_file,
> -format =>'blast') or die $!;
>
> while (my $result = $in->next_result) {
> while (my $hit = $result->next_hit)
> {
> while (my $hsp = $hit->next_hsp) {
> $qhit = $hit->name;
> $start = $hsp->hit->start;
> $end = $hsp->hit->end;
> }
>
>
> } print "Hit= ", $qhit,
> ",Start = ", $start,
> ",End = ", $end,"\n";
> }
>
> Usually, the report has a number of the same hsp for each hit.
> Using "print" command it gives me a hit name, start and end positions
> for each hit, except last on. For last one it prints all the hsps.
> Something like this:
>
> Hit= gnl|DAS|22386,Start = 7578,End = 7601
> Hit= gnl|DAS|25627,Start = 2824,End = 2863
> Hit= gnl|DAS|25328,Start = 8864,End = 8887
> Hit= gnl|DAS|4890,Start = 1896,End = 1919
> Hit= gnl|DAS|12191,Start = 1898,End = 1921
> Hit= gnl|DAS|4276,Start = 557,End = 580
> Hit= gnl|DAS|12959,Start = 801,End = 824
> Hit= gnl|DAS|4092,Start = 2266,End = 2304
> Hit= gnl|DAS|19740,Start = 13572,End = 13610
> Hit= gnl|DAS|12393,Start = 3901,End = 3924
> Hit= gnl|DAS|25687,Start = 10415,End = 10438
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>
> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
> I don't need these duplicates.
> How can I fix that?
>
> Thanks,
> Inna Rytsareva
> Discovery Information Management
> Dow AgroSciences
> Indianapolis, IN
> 317-337-4716
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm curious as to what this report looks like.  The example report you  
posted to the gbrowse list had serious issues (different problem, 'No  
midline' error which I replicated); mainly there were no blank lines  
making it pretty much invalid, so the parser had issues with it.  
Example lines from one HSP:

 > gnl|DAS|24699 pDAB101580
           Length = 12942
  Score = 50.1 bits (25), Expect = 5e-06
  Identities = 37/41 (90%)
  Strand = Plus / Plus
Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
             ||||||||||||||| ||| |||||||| ||||| ||||||
Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
  Score = 46.1 bits (23), Expect = 8e-05
  Identities = 35/39 (89%)
  Strand = Plus / Plus
Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
             ||||||||||||| ||| |||||||| ||||| ||||||
Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
  Score = 46.1 bits (23), Expect = 8e-05
  Identities = 35/39 (89%)
  Strand = Plus / Plus
Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
             ||||||||||||| ||| |||||||| ||||| ||||||
Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659

...

chris




On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:

> Hello,
>
> I have a follow script to parse the BLAST  report:
>
> my $in = Bio::SearchIO->new (   -file =>$out_file,
> -format =>'blast') or die $!;
>
> while (my $result = $in->next_result) {
> while (my $hit = $result->next_hit)
> {
> while (my $hsp = $hit->next_hsp) {
> $qhit = $hit->name;
> $start = $hsp->hit->start;
> $end = $hsp->hit->end;
> }
>
>
> } print "Hit= ", $qhit,
> ",Start = ", $start,
> ",End = ", $end,"\n";
> }
>
> Usually, the report has a number of the same hsp for each hit.
> Using "print" command it gives me a hit name, start and end positions
> for each hit, except last on. For last one it prints all the hsps.
> Something like this:
>
> Hit= gnl|DAS|22386,Start = 7578,End = 7601
> Hit= gnl|DAS|25627,Start = 2824,End = 2863
> Hit= gnl|DAS|25328,Start = 8864,End = 8887
> Hit= gnl|DAS|4890,Start = 1896,End = 1919
> Hit= gnl|DAS|12191,Start = 1898,End = 1921
> Hit= gnl|DAS|4276,Start = 557,End = 580
> Hit= gnl|DAS|12959,Start = 801,End = 824
> Hit= gnl|DAS|4092,Start = 2266,End = 2304
> Hit= gnl|DAS|19740,Start = 13572,End = 13610
> Hit= gnl|DAS|12393,Start = 3901,End = 3924
> Hit= gnl|DAS|25687,Start = 10415,End = 10438
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>
> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
> I don't need these duplicates.
> How can I fix that?
>
> Thanks,
> Inna Rytsareva
> Discovery Information Management
> Dow AgroSciences
> Indianapolis, IN
> 317-337-4716
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A lack of low-complexity filtering  (as seems apparent from this report snippet,
if
I understand that concept correctly) could explain the multiple query hits on a
short (24bp) region of the same subject...
----- Original Message -----
From: "Chris Fields" <cjfields@...>
To: "Rytsareva, Inna (I)" <IRytsareva@...>
Cc: <bioperl-l@...>
Sent: Wednesday, July 08, 2009 9:08 PM
Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl


> I'm curious as to what this report looks like.  The example report you  posted
> to the gbrowse list had serious issues (different problem, 'No  midline' error
> which I replicated); mainly there were no blank lines  making it pretty much
> invalid, so the parser had issues with it.   Example lines from one HSP:
>
> > gnl|DAS|24699 pDAB101580
>           Length = 12942
>  Score = 50.1 bits (25), Expect = 5e-06
>  Identities = 37/41 (90%)
>  Strand = Plus / Plus
> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>             ||||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>  Score = 46.1 bits (23), Expect = 8e-05
>  Identities = 35/39 (89%)
>  Strand = Plus / Plus
> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>             ||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>  Score = 46.1 bits (23), Expect = 8e-05
>  Identities = 35/39 (89%)
>  Strand = Plus / Plus
> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>             ||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>
> ...
>
> chris
>
>
>
>
> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>
>> Hello,
>>
>> I have a follow script to parse the BLAST  report:
>>
>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>> -format =>'blast') or die $!;
>>
>> while (my $result = $in->next_result) {
>> while (my $hit = $result->next_hit)
>> {
>> while (my $hsp = $hit->next_hsp) {
>> $qhit = $hit->name;
>> $start = $hsp->hit->start;
>> $end = $hsp->hit->end;
>> }
>>
>>
>> } print "Hit= ", $qhit,
>> ",Start = ", $start,
>> ",End = ", $end,"\n"; }
>>
>> Usually, the report has a number of the same hsp for each hit.
>> Using "print" command it gives me a hit name, start and end positions
>> for each hit, except last on. For last one it prints all the hsps.
>> Something like this:
>>
>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>> Hit= gnl|DAS|4276,Start = 557,End = 580
>> Hit= gnl|DAS|12959,Start = 801,End = 824
>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>
>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>> I don't need these duplicates.
>> How can I fix that?
>>
>> Thanks,
>> Inna Rytsareva
>> Discovery Information Management
>> Dow AgroSciences
>> Indianapolis, IN
>> 317-337-4716
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yep, that's what I was thinking.  The fragment in question is fairly  
short.

Inna, if you want the best HSP you could just grab the one that best  
fits what you expect (best eval, score, whatever).

chris

On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote:

> A lack of low-complexity filtering  (as seems apparent from this  
> report snippet, if
> I understand that concept correctly) could explain the multiple  
> query hits on a
> short (24bp) region of the same subject...
> ----- Original Message ----- From: "Chris Fields" <cjfields@...
> >
> To: "Rytsareva, Inna (I)" <IRytsareva@...>
> Cc: <bioperl-l@...>
> Sent: Wednesday, July 08, 2009 9:08 PM
> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl
>
>
>> I'm curious as to what this report looks like.  The example report  
>> you  posted to the gbrowse list had serious issues (different  
>> problem, 'No  midline' error which I replicated); mainly there were  
>> no blank lines  making it pretty much invalid, so the parser had  
>> issues with it.   Example lines from one HSP:
>>
>> > gnl|DAS|24699 pDAB101580
>>          Length = 12942
>> Score = 50.1 bits (25), Expect = 5e-06
>> Identities = 37/41 (90%)
>> Strand = Plus / Plus
>> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>>            ||||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>>            ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>>            ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>
>> ...
>>
>> chris
>>
>>
>>
>>
>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>>
>>> Hello,
>>>
>>> I have a follow script to parse the BLAST  report:
>>>
>>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>>> -format =>'blast') or die $!;
>>>
>>> while (my $result = $in->next_result) {
>>> while (my $hit = $result->next_hit)
>>> {
>>> while (my $hsp = $hit->next_hsp) {
>>> $qhit = $hit->name;
>>> $start = $hsp->hit->start;
>>> $end = $hsp->hit->end;
>>> }
>>>
>>>
>>> } print "Hit= ", $qhit,
>>> ",Start = ", $start,
>>> ",End = ", $end,"\n"; }
>>>
>>> Usually, the report has a number of the same hsp for each hit.
>>> Using "print" command it gives me a hit name, start and end  
>>> positions
>>> for each hit, except last on. For last one it prints all the hsps.
>>> Something like this:
>>>
>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>>> Hit= gnl|DAS|4276,Start = 557,End = 580
>>> Hit= gnl|DAS|12959,Start = 801,End = 824
>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>
>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>>> I don't need these duplicates.
>>> How can I fix that?
>>>
>>> Thanks,
>>> Inna Rytsareva
>>> Discovery Information Management
>>> Dow AgroSciences
>>> Indianapolis, IN
>>> 317-337-4716
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: While loop - SearchIO for BioPerl

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Allow me to shamelessly plug the following:

http://www.bioperl.org/wiki/HOWTO:Tiling#Quick_and_Dirty_.22Tiling.22

MAJ
----- Original Message -----
From: "Chris Fields" <cjfields@...>
To: "Mark A. Jensen" <maj@...>
Cc: "Rytsareva, Inna (I)" <IRytsareva@...>; <bioperl-l@...>
Sent: Wednesday, July 08, 2009 9:41 PM
Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl


> Yep, that's what I was thinking.  The fragment in question is fairly  
> short.
>
> Inna, if you want the best HSP you could just grab the one that best  
> fits what you expect (best eval, score, whatever).
>
> chris
>
> On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote:
>
>> A lack of low-complexity filtering  (as seems apparent from this  
>> report snippet, if
>> I understand that concept correctly) could explain the multiple  
>> query hits on a
>> short (24bp) region of the same subject...
>> ----- Original Message ----- From: "Chris Fields" <cjfields@...
>> >
>> To: "Rytsareva, Inna (I)" <IRytsareva@...>
>> Cc: <bioperl-l@...>
>> Sent: Wednesday, July 08, 2009 9:08 PM
>> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl
>>
>>
>>> I'm curious as to what this report looks like.  The example report  
>>> you  posted to the gbrowse list had serious issues (different  
>>> problem, 'No  midline' error which I replicated); mainly there were  
>>> no blank lines  making it pretty much invalid, so the parser had  
>>> issues with it.   Example lines from one HSP:
>>>
>>> > gnl|DAS|24699 pDAB101580
>>>          Length = 12942
>>> Score = 50.1 bits (25), Expect = 5e-06
>>> Identities = 37/41 (90%)
>>> Strand = Plus / Plus
>>> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>>>            ||||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>> Score = 46.1 bits (23), Expect = 8e-05
>>> Identities = 35/39 (89%)
>>> Strand = Plus / Plus
>>> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>>>            ||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>> Score = 46.1 bits (23), Expect = 8e-05
>>> Identities = 35/39 (89%)
>>> Strand = Plus / Plus
>>> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>>>            ||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>>
>>> ...
>>>
>>> chris
>>>
>>>
>>>
>>>
>>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a follow script to parse the BLAST  report:
>>>>
>>>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>>>> -format =>'blast') or die $!;
>>>>
>>>> while (my $result = $in->next_result) {
>>>> while (my $hit = $result->next_hit)
>>>> {
>>>> while (my $hsp = $hit->next_hsp) {
>>>> $qhit = $hit->name;
>>>> $start = $hsp->hit->start;
>>>> $end = $hsp->hit->end;
>>>> }
>>>>
>>>>
>>>> } print "Hit= ", $qhit,
>>>> ",Start = ", $start,
>>>> ",End = ", $end,"\n"; }
>>>>
>>>> Usually, the report has a number of the same hsp for each hit.
>>>> Using "print" command it gives me a hit name, start and end  
>>>> positions
>>>> for each hit, except last on. For last one it prints all the hsps.
>>>> Something like this:
>>>>
>>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>>>> Hit= gnl|DAS|4276,Start = 557,End = 580
>>>> Hit= gnl|DAS|12959,Start = 801,End = 824
>>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>>
>>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>>>> I don't need these duplicates.
>>>> How can I fix that?
>>>>
>>>> Thanks,
>>>> Inna Rytsareva
>>>> Discovery Information Management
>>>> Dow AgroSciences
>>>> Indianapolis, IN
>>>> 317-337-4716
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@...
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l