Yep, that's what I was thinking. The fragment in question is fairly
On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote:
> A lack of low-complexity filtering (as seems apparent from this
> report snippet, if
> I understand that concept correctly) could explain the multiple
> query hits on a
> short (24bp) region of the same subject...
> ----- Original Message ----- From: "Chris Fields" <
cjfields@...
> >
> To: "Rytsareva, Inna (I)" <
IRytsareva@...>
> Cc: <
bioperl-l@...>
> Sent: Wednesday, July 08, 2009 9:08 PM
> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl
>
>
>> I'm curious as to what this report looks like. The example report
>> you posted to the gbrowse list had serious issues (different
>> problem, 'No midline' error which I replicated); mainly there were
>> no blank lines making it pretty much invalid, so the parser had
>> issues with it. Example lines from one HSP:
>>
>> > gnl|DAS|24699 pDAB101580
>> Length = 12942
>> Score = 50.1 bits (25), Expect = 5e-06
>> Identities = 37/41 (90%)
>> Strand = Plus / Plus
>> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>> ||||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>> ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>> ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>
>> ...
>>
>> chris
>>
>>
>>
>>
>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>>
>>> Hello,
>>>
>>> I have a follow script to parse the BLAST report:
>>>
>>> my $in = Bio::SearchIO->new ( -file =>$out_file,
>>> -format =>'blast') or die $!;
>>>
>>> while (my $result = $in->next_result) {
>>> while (my $hit = $result->next_hit)
>>> {
>>> while (my $hsp = $hit->next_hsp) {
>>> $qhit = $hit->name;
>>> $start = $hsp->hit->start;
>>> $end = $hsp->hit->end;
>>> }
>>>
>>>
>>> } print "Hit= ", $qhit,
>>> ",Start = ", $start,
>>> ",End = ", $end,"\n"; }
>>>
>>> Usually, the report has a number of the same hsp for each hit.
>>> Using "print" command it gives me a hit name, start and end
>>> positions
>>> for each hit, except last on. For last one it prints all the hsps.
>>> Something like this:
>>>
>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>>> Hit= gnl|DAS|4276,Start = 557,End = 580
>>> Hit= gnl|DAS|12959,Start = 801,End = 824
>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>
>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>>> I don't need these duplicates.
>>> How can I fix that?
>>>
>>> Thanks,
>>> Inna Rytsareva
>>> Discovery Information Management
>>> Dow AgroSciences
>>> Indianapolis, IN
>>> 317-337-4716
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>>
Bioperl-l@...
>>>
http://lists.open-bio.org/mailman/listinfo/bioperl-l>>
>> _______________________________________________
>> Bioperl-l mailing list
>>
Bioperl-l@...
>>
http://lists.open-bio.org/mailman/listinfo/bioperl-l>>
>