|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
While loop - SearchIO for BioPerlHello,
I have a follow script to parse the BLAST report: my $in = Bio::SearchIO->new ( -file =>$out_file, -format =>'blast') or die $!; while (my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { $qhit = $hit->name; $start = $hsp->hit->start; $end = $hsp->hit->end; } } print "Hit= ", $qhit, ",Start = ", $start, ",End = ", $end,"\n"; } Usually, the report has a number of the same hsp for each hit. Using "print" command it gives me a hit name, start and end positions for each hit, except last on. For last one it prints all the hsps. Something like this: Hit= gnl|DAS|22386,Start = 7578,End = 7601 Hit= gnl|DAS|25627,Start = 2824,End = 2863 Hit= gnl|DAS|25328,Start = 8864,End = 8887 Hit= gnl|DAS|4890,Start = 1896,End = 1919 Hit= gnl|DAS|12191,Start = 1898,End = 1921 Hit= gnl|DAS|4276,Start = 557,End = 580 Hit= gnl|DAS|12959,Start = 801,End = 824 Hit= gnl|DAS|4092,Start = 2266,End = 2304 Hit= gnl|DAS|19740,Start = 13572,End = 13610 Hit= gnl|DAS|12393,Start = 3901,End = 3924 Hit= gnl|DAS|25687,Start = 10415,End = 10438 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. I don't need these duplicates. How can I fix that? Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlInna,
> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > $start = $hsp->hit->start; > $end = $hsp->hit->end; Are you sure you mean $hsp->hit->start ? Perhaps you mean $hsp->start() or $hsp->start('hit') ? --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlboth work...TMTOWTDI
$hsp->query->start and $hsp->start('query') are equivalent. as are $hsp->hit->start and $hsp->start('hit') . On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote: > Inna, > >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? > >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; > > Are you sure you mean $hsp->hit->start ? > Perhaps you mean $hsp->start() or $hsp->start('hit') ? > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlMy guess would be you have multiple query sequences (27, to be exact)
that hit the same subject, viz. 12277 MAJ ----- Original Message ----- From: "Rytsareva, Inna (I)" <IRytsareva@...> To: <bioperl-l@...> Sent: Wednesday, July 08, 2009 3:42 PM Subject: [Bioperl-l] While loop - SearchIO for BioPerl > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlI'm curious as to what this report looks like. The example report you
posted to the gbrowse list had serious issues (different problem, 'No midline' error which I replicated); mainly there were no blank lines making it pretty much invalid, so the parser had issues with it. Example lines from one HSP: > gnl|DAS|24699 pDAB101580 Length = 12942 Score = 50.1 bits (25), Expect = 5e-06 Identities = 37/41 (90%) Strand = Plus / Plus Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 ||||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 ... chris On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlA lack of low-complexity filtering (as seems apparent from this report snippet,
if I understand that concept correctly) could explain the multiple query hits on a short (24bp) region of the same subject... ----- Original Message ----- From: "Chris Fields" <cjfields@...> To: "Rytsareva, Inna (I)" <IRytsareva@...> Cc: <bioperl-l@...> Sent: Wednesday, July 08, 2009 9:08 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > I'm curious as to what this report looks like. The example report you posted > to the gbrowse list had serious issues (different problem, 'No midline' error > which I replicated); mainly there were no blank lines making it pretty much > invalid, so the parser had issues with it. Example lines from one HSP: > > > gnl|DAS|24699 pDAB101580 > Length = 12942 > Score = 50.1 bits (25), Expect = 5e-06 > Identities = 37/41 (90%) > Strand = Plus / Plus > Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 > ||||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > > ... > > chris > > > > > On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > >> Hello, >> >> I have a follow script to parse the BLAST report: >> >> my $in = Bio::SearchIO->new ( -file =>$out_file, >> -format =>'blast') or die $!; >> >> while (my $result = $in->next_result) { >> while (my $hit = $result->next_hit) >> { >> while (my $hsp = $hit->next_hsp) { >> $qhit = $hit->name; >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; >> } >> >> >> } print "Hit= ", $qhit, >> ",Start = ", $start, >> ",End = ", $end,"\n"; } >> >> Usually, the report has a number of the same hsp for each hit. >> Using "print" command it gives me a hit name, start and end positions >> for each hit, except last on. For last one it prints all the hsps. >> Something like this: >> >> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >> Hit= gnl|DAS|4276,Start = 557,End = 580 >> Hit= gnl|DAS|12959,Start = 801,End = 824 >> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? >> >> Thanks, >> Inna Rytsareva >> Discovery Information Management >> Dow AgroSciences >> Indianapolis, IN >> 317-337-4716 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlYep, that's what I was thinking. The fragment in question is fairly
short. Inna, if you want the best HSP you could just grab the one that best fits what you expect (best eval, score, whatever). chris On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > A lack of low-complexity filtering (as seems apparent from this > report snippet, if > I understand that concept correctly) could explain the multiple > query hits on a > short (24bp) region of the same subject... > ----- Original Message ----- From: "Chris Fields" <cjfields@... > > > To: "Rytsareva, Inna (I)" <IRytsareva@...> > Cc: <bioperl-l@...> > Sent: Wednesday, July 08, 2009 9:08 PM > Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > > >> I'm curious as to what this report looks like. The example report >> you posted to the gbrowse list had serious issues (different >> problem, 'No midline' error which I replicated); mainly there were >> no blank lines making it pretty much invalid, so the parser had >> issues with it. Example lines from one HSP: >> >> > gnl|DAS|24699 pDAB101580 >> Length = 12942 >> Score = 50.1 bits (25), Expect = 5e-06 >> Identities = 37/41 (90%) >> Strand = Plus / Plus >> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >> ||||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> >> ... >> >> chris >> >> >> >> >> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >> >>> Hello, >>> >>> I have a follow script to parse the BLAST report: >>> >>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>> -format =>'blast') or die $!; >>> >>> while (my $result = $in->next_result) { >>> while (my $hit = $result->next_hit) >>> { >>> while (my $hsp = $hit->next_hsp) { >>> $qhit = $hit->name; >>> $start = $hsp->hit->start; >>> $end = $hsp->hit->end; >>> } >>> >>> >>> } print "Hit= ", $qhit, >>> ",Start = ", $start, >>> ",End = ", $end,"\n"; } >>> >>> Usually, the report has a number of the same hsp for each hit. >>> Using "print" command it gives me a hit name, start and end >>> positions >>> for each hit, except last on. For last one it prints all the hsps. >>> Something like this: >>> >>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> >>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>> I don't need these duplicates. >>> How can I fix that? >>> >>> Thanks, >>> Inna Rytsareva >>> Discovery Information Management >>> Dow AgroSciences >>> Indianapolis, IN >>> 317-337-4716 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: While loop - SearchIO for BioPerlAllow me to shamelessly plug the following:
http://www.bioperl.org/wiki/HOWTO:Tiling#Quick_and_Dirty_.22Tiling.22 MAJ ----- Original Message ----- From: "Chris Fields" <cjfields@...> To: "Mark A. Jensen" <maj@...> Cc: "Rytsareva, Inna (I)" <IRytsareva@...>; <bioperl-l@...> Sent: Wednesday, July 08, 2009 9:41 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > Yep, that's what I was thinking. The fragment in question is fairly > short. > > Inna, if you want the best HSP you could just grab the one that best > fits what you expect (best eval, score, whatever). > > chris > > On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > >> A lack of low-complexity filtering (as seems apparent from this >> report snippet, if >> I understand that concept correctly) could explain the multiple >> query hits on a >> short (24bp) region of the same subject... >> ----- Original Message ----- From: "Chris Fields" <cjfields@... >> > >> To: "Rytsareva, Inna (I)" <IRytsareva@...> >> Cc: <bioperl-l@...> >> Sent: Wednesday, July 08, 2009 9:08 PM >> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl >> >> >>> I'm curious as to what this report looks like. The example report >>> you posted to the gbrowse list had serious issues (different >>> problem, 'No midline' error which I replicated); mainly there were >>> no blank lines making it pretty much invalid, so the parser had >>> issues with it. Example lines from one HSP: >>> >>> > gnl|DAS|24699 pDAB101580 >>> Length = 12942 >>> Score = 50.1 bits (25), Expect = 5e-06 >>> Identities = 37/41 (90%) >>> Strand = Plus / Plus >>> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >>> ||||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> >>> ... >>> >>> chris >>> >>> >>> >>> >>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >>> >>>> Hello, >>>> >>>> I have a follow script to parse the BLAST report: >>>> >>>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>>> -format =>'blast') or die $!; >>>> >>>> while (my $result = $in->next_result) { >>>> while (my $hit = $result->next_hit) >>>> { >>>> while (my $hsp = $hit->next_hsp) { >>>> $qhit = $hit->name; >>>> $start = $hsp->hit->start; >>>> $end = $hsp->hit->end; >>>> } >>>> >>>> >>>> } print "Hit= ", $qhit, >>>> ",Start = ", $start, >>>> ",End = ", $end,"\n"; } >>>> >>>> Usually, the report has a number of the same hsp for each hit. >>>> Using "print" command it gives me a hit name, start and end >>>> positions >>>> for each hit, except last on. For last one it prints all the hsps. >>>> Something like this: >>>> >>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> >>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>>> I don't need these duplicates. >>>> How can I fix that? >>>> >>>> Thanks, >>>> Inna Rytsareva >>>> Discovery Information Management >>>> Dow AgroSciences >>>> Indianapolis, IN >>>> 317-337-4716 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@... >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |