|
View:
New views
17 Messages
—
Rating Filter:
Alert me
|
|
|
different results with remote-blast skriptHi again :)
please, I only have this little question: why do I get different results with my remote::blast perl skript then on the ncbi blast homepage? I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'. PLEASE help me, thank you in advance, Jonas ps: my skript: ################################################################################ use Bio::Seq::SeqFactory; use Bio::Tools::Run::RemoteBlast; use strict; my @blast_report; my $prog = 'blastp'; my $db = 'nr'; my $e_val= '1e-10'; #my $e_val= '10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE'; #$v is just to turn on and off the messages my $v = 1; my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); my $filename='temp2.out'; my $r = $factory->submit_blast($seq); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); } else { my $result = $rc->next_result(); $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } @blast_report = get_file_data ($filename); return @blast_report; ################################################################################## _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: different results with remote-blast skriptI'd guess it's a difference in the parameters used.
Interesting that both have the number of letters in the db as "-1,125,070,205", I assume that's a bug :-) Stats from your remote_blast: 'stats' => { 'S1' => '42', 'S1_bits' => '20.8', 'lambda' => '0.309', 'entropy' => '0.345', 'kappa_gapped' => '0.0410', 'T' => '11', 'kappa' => '0.122', 'X3_bits' => '24.7', 'X1' => '16', 'lambda_gapped' => '0.267', 'X2' => '38', 'S2' => '74', 'seqs_better_than_cutoff' => '0', 'posted_date' => 'Jul 4, 2009 4:41 AM', 'Hits_to_DB' => '60102303', 'dbletters' => '-1125070205', 'A' => '40', 'num_successful_extensions' => '2004', 'num_extensions' => '1436892', 'X1_bits' => '7.1', 'X3' => '64', 'entropy_gapped' => '0.140', 'dbentries' => '9252258', 'X2_bits' => '14.6', 'S2_bits' => '33.1' } Stats from a blast done on the NCBI webpage: Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 4, 2009 4:41 AM Number of letters in database: -1,125,070,205 Number of sequences in database: 9,252,258 Lambda K H 0.309 0.124 0.340 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252258 Number of Hits to DB: 86493230 Number of extensions: 3101413 Number of successful extensions: 9001 Number of sequences better than 100: 65 Number of HSP's better than 100 without gapping: 0 Number of HSP's gapped: 9000 Number of HSP's successfully gapped: 66 Length of query: 150 Length of database: 3169897087 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124391933 Effective search space: 78602501521 Effective search space used: 78602501521 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 65 (29.6 bits) > -----Original Message----- > From: bioperl-l-bounces@... [mailto:bioperl-l- > bounces@...] On Behalf Of Jonas Schaer > Sent: Sunday, 28 June 2009 10:15 p.m. > To: BioPerl List > Subject: [Bioperl-l] different results with remote-blast skript > > Hi again :) > please, I only have this little question: > why do I get different results with my remote::blast perl skript then on the > ncbi blast homepage? > I am using blastp, the query is an amino-sequence (different results with any > sequence, differences not only in number of hits but even in e-values, scores > etc...), the database is 'nr'. > PLEASE help me, > thank you in advance, > Jonas > > ps: my skript: > ############################################################################## > ## > use Bio::Seq::SeqFactory; > use Bio::Tools::Run::RemoteBlast; > use strict; > my @blast_report; > my $prog = 'blastp'; > my $db = 'nr'; > my $e_val= '1e-10'; > #my $e_val= '10'; > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > > my > $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > PDDEYE'; > #$v is just to turn on and off the messages > my $v = 1; > my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); > my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > "$blast_seq"); > my $filename='temp2.out'; > my $r = $factory->submit_blast($seq); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) > { > foreach my $rid ( @rids ) > { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > } > else > { > my $result = $rc->next_result(); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > > > } > @blast_report = get_file_data ($filename); > return @blast_report; > ############################################################################## > #### > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: different results with remote-blast skriptinteger overflow in blast....
On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > I'd guess it's a difference in the parameters used. > Interesting that both have the number of letters in the db as > "-1,125,070,205", I assume that's a bug :-) > > Stats from your remote_blast: > > 'stats' => { > 'S1' => '42', > 'S1_bits' => '20.8', > 'lambda' => '0.309', > 'entropy' => '0.345', > 'kappa_gapped' => '0.0410', > 'T' => '11', > 'kappa' => '0.122', > 'X3_bits' => '24.7', > 'X1' => '16', > 'lambda_gapped' => '0.267', > 'X2' => '38', > 'S2' => '74', > 'seqs_better_than_cutoff' => '0', > 'posted_date' => 'Jul 4, 2009 4:41 AM', > 'Hits_to_DB' => '60102303', > 'dbletters' => '-1125070205', > 'A' => '40', > 'num_successful_extensions' => '2004', > 'num_extensions' => '1436892', > 'X1_bits' => '7.1', > 'X3' => '64', > 'entropy_gapped' => '0.140', > 'dbentries' => '9252258', > 'X2_bits' => '14.6', > 'S2_bits' => '33.1' > } > > > Stats from a blast done on the NCBI webpage: > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt > +PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 4, 2009 4:41 AM > Number of letters in database: -1,125,070,205 > Number of sequences in database: 9,252,258 > > Lambda K H > 0.309 0.124 0.340 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252258 > Number of Hits to DB: 86493230 > Number of extensions: 3101413 > Number of successful extensions: 9001 > Number of sequences better than 100: 65 > Number of HSP's better than 100 without gapping: 0 > Number of HSP's gapped: 9000 > Number of HSP's successfully gapped: 66 > Length of query: 150 > Length of database: 3169897087 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124391933 > Effective search space: 78602501521 > Effective search space used: 78602501521 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 65 (29.6 bits) > > >> -----Original Message----- >> From: bioperl-l-bounces@... [mailto:bioperl-l- >> bounces@...] On Behalf Of Jonas Schaer >> Sent: Sunday, 28 June 2009 10:15 p.m. >> To: BioPerl List >> Subject: [Bioperl-l] different results with remote-blast skript >> >> Hi again :) >> please, I only have this little question: >> why do I get different results with my remote::blast perl skript >> then on the >> ncbi blast homepage? >> I am using blastp, the query is an amino-sequence (different >> results with any >> sequence, differences not only in number of hits but even in e- >> values, scores >> etc...), the database is 'nr'. >> PLEASE help me, >> thank you in advance, >> Jonas >> >> ps: my skript: >> ############################################################################## >> ## >> use Bio::Seq::SeqFactory; >> use Bio::Tools::Run::RemoteBlast; >> use strict; >> my @blast_report; >> my $prog = 'blastp'; >> my $db = 'nr'; >> my $e_val= '1e-10'; >> #my $e_val= '10'; >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO' ); >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> >> my >> $ >> blast_seq >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> PDDEYE'; >> #$v is just to turn on and off the messages >> my $v = 1; >> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> 'Bio::PrimarySeq'); >> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> "$blast_seq"); >> my $filename='temp2.out'; >> my $r = $factory->submit_blast($seq); >> print STDERR "waiting..." if( $v > 0 ); >> while ( my @rids = $factory->each_rid ) >> { >> foreach my $rid ( @rids ) >> { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> } >> else >> { >> my $result = $rc->next_result(); >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), >> "\n"; >> while ( my $hit = $result->next_hit ) >> { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) >> { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> >> >> } >> @blast_report = get_file_data ($filename); >> return @blast_report; >> ############################################################################## >> #### >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: different results with remote-blast skriptThat inspires confidence ;>
chris On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > integer overflow in blast.... > > On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> I'd guess it's a difference in the parameters used. >> Interesting that both have the number of letters in the db as >> "-1,125,070,205", I assume that's a bug :-) >> >> Stats from your remote_blast: >> >> 'stats' => { >> 'S1' => '42', >> 'S1_bits' => '20.8', >> 'lambda' => '0.309', >> 'entropy' => '0.345', >> 'kappa_gapped' => '0.0410', >> 'T' => '11', >> 'kappa' => '0.122', >> 'X3_bits' => '24.7', >> 'X1' => '16', >> 'lambda_gapped' => '0.267', >> 'X2' => '38', >> 'S2' => '74', >> 'seqs_better_than_cutoff' => '0', >> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> 'Hits_to_DB' => '60102303', >> 'dbletters' => '-1125070205', >> 'A' => '40', >> 'num_successful_extensions' => '2004', >> 'num_extensions' => '1436892', >> 'X1_bits' => '7.1', >> 'X3' => '64', >> 'entropy_gapped' => '0.140', >> 'dbentries' => '9252258', >> 'X2_bits' => '14.6', >> 'S2_bits' => '33.1' >> } >> >> >> Stats from a blast done on the NCBI webpage: >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 4, 2009 4:41 AM >> Number of letters in database: -1,125,070,205 >> Number of sequences in database: 9,252,258 >> >> Lambda K H >> 0.309 0.124 0.340 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252258 >> Number of Hits to DB: 86493230 >> Number of extensions: 3101413 >> Number of successful extensions: 9001 >> Number of sequences better than 100: 65 >> Number of HSP's better than 100 without gapping: 0 >> Number of HSP's gapped: 9000 >> Number of HSP's successfully gapped: 66 >> Length of query: 150 >> Length of database: 3169897087 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124391933 >> Effective search space: 78602501521 >> Effective search space used: 78602501521 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 65 (29.6 bits) >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>> bounces@...] On Behalf Of Jonas Schaer >>> Sent: Sunday, 28 June 2009 10:15 p.m. >>> To: BioPerl List >>> Subject: [Bioperl-l] different results with remote-blast skript >>> >>> Hi again :) >>> please, I only have this little question: >>> why do I get different results with my remote::blast perl skript >>> then on the >>> ncbi blast homepage? >>> I am using blastp, the query is an amino-sequence (different >>> results with any >>> sequence, differences not only in number of hits but even in e- >>> values, scores >>> etc...), the database is 'nr'. >>> PLEASE help me, >>> thank you in advance, >>> Jonas >>> >>> ps: my skript: >>> ############################################################################## >>> ## >>> use Bio::Seq::SeqFactory; >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my @blast_report; >>> my $prog = 'blastp'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> #my $e_val= '10'; >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> $ >>> Bio >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>> = '1'; >>> >>> my >>> $ >>> blast_seq >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>> PDDEYE'; >>> #$v is just to turn on and off the messages >>> my $v = 1; >>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>> 'Bio::PrimarySeq'); >>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>> "$blast_seq"); >>> my $filename='temp2.out'; >>> my $r = $factory->submit_blast($seq); >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) >>> { >>> foreach my $rid ( @rids ) >>> { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> } >>> else >>> { >>> my $result = $rc->next_result(); >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), >>> "\n"; >>> while ( my $hit = $result->next_hit ) >>> { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) >>> { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> >>> >>> } >>> @blast_report = get_file_data ($filename); >>> return @blast_report; >>> ############################################################################## >>> #### >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason@... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: different results with remote-blast skriptHi guys, thanks for your answers so far.
@jason: integer overflow in blast.... sorry, but what do you mean by that? how can I fix it...? Since I never really changed any parameters I thought them all to be default. whatever, I tried to get "better" results with my prog by changing these: $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; with no effect...I guess these were default values anyway. So please maybe you can tell me all the other parameters I can change with my perl-skript AND how to do that? Unfortunately both, perl and the blast-algorithm are pretty much new to me, maybe thats why I just cannot find out how to do that on my own... :/ Here is the output I get with my remote-blast skript: ################################################################################################################# Query Name: MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL L hit name is ref|XP_001702807.1| score is 442 BLASTP 2.2.21+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. RID: 53STX5G2013 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 9,252,587 sequences; 3,169,972,781 total letters Query= MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM ATGPDPDDEYE Length=150 Score E Sequences producing significant alignments: (Bits) Value ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 2e-42 ALIGNMENTS >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] Length=303 Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS 213 Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR 120 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR 273 Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 5, 2009 4:41 AM Number of letters in database: -1,124,994,511 Number of sequences in database: 9,252,587 Lambda K H 0.309 0.122 0.345 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252587 Number of Hits to DB: 60273703 Number of extensions: 1448367 Number of successful extensions: 2103 Number of sequences better than 10: 0 Number of HSP's better than 10 without gapping: 0 Number of HSP's gapped: 2113 Number of HSP's successfully gapped: 0 Length of query: 150 Length of database: 3169972781 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124430450 Effective search space: 78603926650 Effective search space used: 78603926650 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 74 (33.1 bits) ################################################################################################################# and here are the hits (?) of the blast-algorithm on the ncbi-homepage with the same query of course: ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 3e-80 ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 1.1 ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 1.8 ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 4.2 ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 6.0 ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 7.0 #################################################################################################################at least the first hit is the same, but even there there is a different score and e-value. thanks so much for any help :) regards, jonas ----- Original Message ----- From: "Chris Fields" <cjfields@...> To: "Jason Stajich" <jason@...> Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl List'" <bioperl-l@...>; "'Jonas Schaer'" <Jonas_Schaer@...> Sent: Monday, July 06, 2009 12:51 AM Subject: Re: [Bioperl-l] different results with remote-blast skript > That inspires confidence ;> > > chris > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >> integer overflow in blast.... >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >>> I'd guess it's a difference in the parameters used. >>> Interesting that both have the number of letters in the db as >>> "-1,125,070,205", I assume that's a bug :-) >>> >>> Stats from your remote_blast: >>> >>> 'stats' => { >>> 'S1' => '42', >>> 'S1_bits' => '20.8', >>> 'lambda' => '0.309', >>> 'entropy' => '0.345', >>> 'kappa_gapped' => '0.0410', >>> 'T' => '11', >>> 'kappa' => '0.122', >>> 'X3_bits' => '24.7', >>> 'X1' => '16', >>> 'lambda_gapped' => '0.267', >>> 'X2' => '38', >>> 'S2' => '74', >>> 'seqs_better_than_cutoff' => '0', >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>> 'Hits_to_DB' => '60102303', >>> 'dbletters' => '-1125070205', >>> 'A' => '40', >>> 'num_successful_extensions' => '2004', >>> 'num_extensions' => '1436892', >>> 'X1_bits' => '7.1', >>> 'X3' => '64', >>> 'entropy_gapped' => '0.140', >>> 'dbentries' => '9252258', >>> 'X2_bits' => '14.6', >>> 'S2_bits' => '33.1' >>> } >>> >>> >>> Stats from a blast done on the NCBI webpage: >>> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >>> +PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 4, 2009 4:41 AM >>> Number of letters in database: -1,125,070,205 >>> Number of sequences in database: 9,252,258 >>> >>> Lambda K H >>> 0.309 0.124 0.340 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252258 >>> Number of Hits to DB: 86493230 >>> Number of extensions: 3101413 >>> Number of successful extensions: 9001 >>> Number of sequences better than 100: 65 >>> Number of HSP's better than 100 without gapping: 0 >>> Number of HSP's gapped: 9000 >>> Number of HSP's successfully gapped: 66 >>> Length of query: 150 >>> Length of database: 3169897087 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124391933 >>> Effective search space: 78602501521 >>> Effective search space used: 78602501521 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 65 (29.6 bits) >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>> bounces@...] On Behalf Of Jonas Schaer >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>> To: BioPerl List >>>> Subject: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi again :) >>>> please, I only have this little question: >>>> why do I get different results with my remote::blast perl skript >>>> then on the >>>> ncbi blast homepage? >>>> I am using blastp, the query is an amino-sequence (different >>>> results with any >>>> sequence, differences not only in number of hits but even in e- >>>> values, scores >>>> etc...), the database is 'nr'. >>>> PLEASE help me, >>>> thank you in advance, >>>> Jonas >>>> >>>> ps: my skript: >>>> ############################################################################## >>>> ## >>>> use Bio::Seq::SeqFactory; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use strict; >>>> my @blast_report; >>>> my $prog = 'blastp'; >>>> my $db = 'nr'; >>>> my $e_val= '1e-10'; >>>> #my $e_val= '10'; >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO' ); >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> $ >>>> Bio >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>> = '1'; >>>> >>>> my >>>> $ >>>> blast_seq >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>>> PDDEYE'; >>>> #$v is just to turn on and off the messages >>>> my $v = 1; >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>> 'Bio::PrimarySeq'); >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>> "$blast_seq"); >>>> my $filename='temp2.out'; >>>> my $r = $factory->submit_blast($seq); >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) >>>> { >>>> foreach my $rid ( @rids ) >>>> { >>>> my $rc = $factory->retrieve_blast($rid); >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>>> } >>>> else >>>> { >>>> my $result = $rc->next_result(); >>>> $factory->save_output($filename); >>>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", $result->query_name(), >>>> "\n"; >>>> while ( my $hit = $result->next_hit ) >>>> { >>>> next unless ( $v > 0); >>>> print "\thit name is ", $hit->name, "\n"; >>>> while( my $hsp = $hit->next_hsp ) >>>> { >>>> print "\t\tscore is ", $hsp->score, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> } >>>> @blast_report = get_file_data ($filename); >>>> return @blast_report; >>>> ############################################################################## >>>> #### >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@... >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> = >>> = >>> ===================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> ===================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason@... >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 05:53:00 _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: different results with remote-blast skriptHi Jonas,
You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces@... [mailto:bioperl-l- > bounces@...] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped > BLAST and PSI-BLAST: a new generation of protein database search programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" <cjfields@...> > To: "Jason Stajich" <jason@...> > Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl > List'" <bioperl-l@...>; "'Jonas Schaer'" > <Jonas_Schaer@...> > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > >>>> bounces@...] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@... > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@... > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason@... > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@... > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Hi guys,
Thank you all so much for your help and patience :). Of course you were right and I finaly found the right put-parameter to get exactly the same hits as on the homepage. I do have an other question though :)... I now want to include a search for conserved domains, but when I try to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub:CDD_SEARCH) like the other put-parameters the way chris once told me(works fine with the other params): my %put = ( WORD_SIZE => 3, HITLIST_SIZE => 100, THRESHOLD => 11, FILTER => 'R', GENETIC_CODE => 1, CDD_SEARCH => 'on' ###I tried it with 'true' and '1', too. ); for my $putName (keys %put) { $factory->submit_parameter($putName,$put{$putName}); } ...an exception is thrown: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: CDD_SEARCH is not a valid PUT parameter. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:325 STACK: main::blast_a_sequence firsteval0.8.pm:383 STACK: main::blast_it firsteval0.8.pm:288 STACK: firsteval0.8.pm:35 ----------------------------------------------------------- . I guess somehow this could be the solution to my problem: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID-for-Simultaneous , but unfortunately I don't understand what to do. I'm so sorry to bother you with this but please help me once more...:) Best regards and thanks in advance, Jonas ----- Original Message ----- From: "Smithies, Russell" <Russell.Smithies@...> To: "'Jonas Schaer'" <Brotelzwieb@...> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" <bioperl-l@...> Sent: Monday, July 06, 2009 10:56 PM Subject: RE: [Bioperl-l] different results with remote-blast skript Hi Jonas, You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces@... [mailto:bioperl-l- > bounces@...] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be > default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = > '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with > my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to > me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped > BLAST and PSI-BLAST: a new generation of protein database search > programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, > Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based > stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 > (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" <cjfields@...> > To: "Jason Stajich" <jason@...> > Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl > List'" <bioperl-l@...>; "'Jonas Schaer'" > <Jonas_Schaer@...> > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > >>>> bounces@...] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@... > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@... > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason@... > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@... > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: 07/05/09 17:54:00 _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?I'm not sure, but I think adding this in will take a little work
(we'll need to catch the RID returned, which I'm fairly sure will require some modifications to checking the returned output). I would also have to look at the RemoteBlast API to see how this would fit in (I'm assuming we could either lump it in with other returned RIDs or create a new method for that). You are more than welcome to add this in as an enhancement request to bugzilla for BioPerl: http://bugzilla.open-bio.org/ chris On Jul 9, 2009, at 5:16 AM, Jonas Schaer wrote: > Hi guys, > Thank you all so much for your help and patience :). Of course you > were right and I finaly found the right put-parameter to get exactly > the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but when I try > to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub > :CDD_SEARCH) like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' ###I tried > it with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/ > lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID- > for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- From: "Smithies, Russell" <Russell.Smithies@... > > > To: "'Jonas Schaer'" <Brotelzwieb@...> > Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" <bioperl-l@... > > > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a > "better" result. > I'd suggest that if you aren't sure what they do, you should leave > them alone as small changes can make huge differences in the output > - it's quite possible to miss finding what you're looking for by > using the wrong parameters. > If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > > Don't worry about the integer overflow bug as there's nothing you > can do about it. If you're interested, Google and Wikipedia are your > friends: http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces@... [mailto:bioperl-l- >> bounces@...] On Behalf Of Jonas Schaer >> Sent: Tuesday, 7 July 2009 12:14 a.m. >> To: BioPerl List; Chris Fields >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> Hi guys, thanks for your answers so far. >> @jason: integer overflow in blast.... sorry, but what do you mean >> by that? >> how can I fix it...? >> >> Since I never really changed any parameters I thought them all to >> be default. >> whatever, I tried to get "better" results with my prog by changing >> these: >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> with no effect...I guess these were default values anyway. >> >> So please maybe you can tell me all the other parameters I can >> change with my >> perl-skript AND how to do that? >> Unfortunately both, perl and the blast-algorithm are pretty much >> new to me, >> maybe thats why I just cannot find out how to do that on my own... :/ >> >> Here is the output I get with my remote-blast skript: >> ############################################################################## >> ################################### >> Query Name: >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >> L >> hit name is ref|XP_001702807.1| >> score is 442 >> BLASTP 2.2.21+ >> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >> (1997), "Gapped >> BLAST and PSI-BLAST: a new generation of protein database search >> programs", >> Nucleic Acids Res. 25:3389-3402. >> >> >> Reference for composition-based statistics: Alejandro A. >> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. >> Spouge, Yuri >> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >> accuracy of PSI-BLAST protein database searches with composition- >> based >> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >> >> >> RID: 53STX5G2013 >> >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> from WGS projects >> 9,252,587 sequences; 3,169,972,781 total letters Query= >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >> ATGPDPDDEYE >> Length=150 >> >> >> >> Score >> E >> Sequences producing significant alignments: >> (Bits) >> Value >> >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 174 >> 2e-42 >> >> >> ALIGNMENTS >> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >> Length=303 >> >> Score = 174 bits (442), Expect = 2e-42, Method: Composition-based >> stats. >> Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = >> 0/150 (0%) >> >> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 >> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> 213 >> >> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >> 120 >> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> 273 >> >> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >> >> >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 5, 2009 4:41 AM >> Number of letters in database: -1,124,994,511 >> Number of sequences in database: 9,252,587 >> >> Lambda K H >> 0.309 0.122 0.345 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252587 >> Number of Hits to DB: 60273703 >> Number of extensions: 1448367 >> Number of successful extensions: 2103 >> Number of sequences better than 10: 0 >> Number of HSP's better than 10 without gapping: 0 >> Number of HSP's gapped: 2113 >> Number of HSP's successfully gapped: 0 >> Length of query: 150 >> Length of database: 3169972781 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124430450 >> Effective search space: 78603926650 >> Effective search space used: 78603926650 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 74 (33.1 bits) >> >> ############################################################################## >> ################################### >> and here are the hits (?) of the blast-algorithm on the ncbi- >> homepage with >> the same query of course: >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 300 >> 3e-80 >> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... >> 36.2 >> 1.1 >> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... >> 35.4 >> 1.8 >> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... >> 34.3 >> 4.2 >> ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... >> 33.5 >> 6.0 >> ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... >> 33.5 >> 7.0 >> ############################################################################## >> ###################################at >> least the first hit is the same, but even there there is a >> different score >> and e-value. >> >> thanks so much for any help :) >> regards, jonas >> >> >> ----- Original Message ----- >> From: "Chris Fields" <cjfields@...> >> To: "Jason Stajich" <jason@...> >> Cc: "Smithies, Russell" <Russell.Smithies@...>; >> "'BioPerl >> List'" <bioperl-l@...>; "'Jonas Schaer'" >> <Jonas_Schaer@...> >> Sent: Monday, July 06, 2009 12:51 AM >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> >> > That inspires confidence ;> >> > >> > chris >> > >> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >> > >> >> integer overflow in blast.... >> >> >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >> >> >>> I'd guess it's a difference in the parameters used. >> >>> Interesting that both have the number of letters in the db as >> >>> "-1,125,070,205", I assume that's a bug :-) >> >>> >> >>> Stats from your remote_blast: >> >>> >> >>> 'stats' => { >> >>> 'S1' => '42', >> >>> 'S1_bits' => '20.8', >> >>> 'lambda' => '0.309', >> >>> 'entropy' => '0.345', >> >>> 'kappa_gapped' => '0.0410', >> >>> 'T' => '11', >> >>> 'kappa' => '0.122', >> >>> 'X3_bits' => '24.7', >> >>> 'X1' => '16', >> >>> 'lambda_gapped' => '0.267', >> >>> 'X2' => '38', >> >>> 'S2' => '74', >> >>> 'seqs_better_than_cutoff' => '0', >> >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> >>> 'Hits_to_DB' => '60102303', >> >>> 'dbletters' => '-1125070205', >> >>> 'A' => '40', >> >>> 'num_successful_extensions' => '2004', >> >>> 'num_extensions' => '1436892', >> >>> 'X1_bits' => '7.1', >> >>> 'X3' => '64', >> >>> 'entropy_gapped' => '0.140', >> >>> 'dbentries' => '9252258', >> >>> 'X2_bits' => '14.6', >> >>> 'S2_bits' => '33.1' >> >>> } >> >>> >> >>> >> >>> Stats from a blast done on the NCBI webpage: >> >>> >> >>> Database: All non-redundant GenBank CDS translations+PDB >> +SwissProt >> >>> +PIR+PRF >> >>> excluding environmental samples from WGS projects >> >>> Posted date: Jul 4, 2009 4:41 AM >> >>> Number of letters in database: -1,125,070,205 >> >>> Number of sequences in database: 9,252,258 >> >>> >> >>> Lambda K H >> >>> 0.309 0.124 0.340 >> >>> Gapped >> >>> Lambda K H >> >>> 0.267 0.0410 0.140 >> >>> Matrix: BLOSUM62 >> >>> Gap Penalties: Existence: 11, Extension: 1 >> >>> Number of Sequences: 9252258 >> >>> Number of Hits to DB: 86493230 >> >>> Number of extensions: 3101413 >> >>> Number of successful extensions: 9001 >> >>> Number of sequences better than 100: 65 >> >>> Number of HSP's better than 100 without gapping: 0 >> >>> Number of HSP's gapped: 9000 >> >>> Number of HSP's successfully gapped: 66 >> >>> Length of query: 150 >> >>> Length of database: 3169897087 >> >>> Length adjustment: 113 >> >>> Effective length of query: 37 >> >>> Effective length of database: 2124391933 >> >>> Effective search space: 78602501521 >> >>> Effective search space used: 78602501521 >> >>> T: 11 >> >>> A: 40 >> >>> X1: 16 (7.1 bits) >> >>> X2: 38 (14.6 bits) >> >>> X3: 64 (24.7 bits) >> >>> S1: 42 (20.8 bits) >> >>> S2: 65 (29.6 bits) >> >>> >> >>> >> >>>> -----Original Message----- >> >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >> >>>> bounces@...] On Behalf Of Jonas Schaer >> >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >> >>>> To: BioPerl List >> >>>> Subject: [Bioperl-l] different results with remote-blast skript >> >>>> >> >>>> Hi again :) >> >>>> please, I only have this little question: >> >>>> why do I get different results with my remote::blast perl skript >> >>>> then on the >> >>>> ncbi blast homepage? >> >>>> I am using blastp, the query is an amino-sequence (different >> >>>> results with any >> >>>> sequence, differences not only in number of hits but even in e- >> >>>> values, scores >> >>>> etc...), the database is 'nr'. >> >>>> PLEASE help me, >> >>>> thank you in advance, >> >>>> Jonas >> >>>> >> >>>> ps: my skript: >> >>>> >> ############################################################################## >> >>>> ## >> >>>> use Bio::Seq::SeqFactory; >> >>>> use Bio::Tools::Run::RemoteBlast; >> >>>> use strict; >> >>>> my @blast_report; >> >>>> my $prog = 'blastp'; >> >>>> my $db = 'nr'; >> >>>> my $e_val= '1e-10'; >> >>>> #my $e_val= '10'; >> >>>> my @params = ( '-prog' => $prog, >> >>>> '-data' => $db, >> >>>> '-expect' => $e_val, >> >>>> '-readmethod' => 'SearchIO' ); >> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> >>>> $ >> >>>> Bio >> > >> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >> >>>> = '1'; >> >>>> >> >>>> my >> >>>> $ >> >>>> blast_seq >> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> >>>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> >>>> PDDEYE'; >> >>>> #$v is just to turn on and off the messages >> >>>> my $v = 1; >> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> >>>> 'Bio::PrimarySeq'); >> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> >>>> "$blast_seq"); >> >>>> my $filename='temp2.out'; >> >>>> my $r = $factory->submit_blast($seq); >> >>>> print STDERR "waiting..." if( $v > 0 ); >> >>>> while ( my @rids = $factory->each_rid ) >> >>>> { >> >>>> foreach my $rid ( @rids ) >> >>>> { >> >>>> my $rc = $factory->retrieve_blast($rid); >> >>>> if( !ref($rc) ) >> >>>> { >> >>>> if( $rc < 0 ) >> >>>> { >> >>>> $factory->remove_rid($rid); >> >>>> } >> >>>> print STDERR "." if ( $v > 0 ); >> >>>> } >> >>>> else >> >>>> { >> >>>> my $result = $rc->next_result(); >> >>>> $factory->save_output($filename); >> >>>> $factory->remove_rid($rid); >> >>>> print "\nQuery Name: ", $result->query_name(), >> >>>> "\n"; >> >>>> while ( my $hit = $result->next_hit ) >> >>>> { >> >>>> next unless ( $v > 0); >> >>>> print "\thit name is ", $hit->name, "\n"; >> >>>> while( my $hsp = $hit->next_hsp ) >> >>>> { >> >>>> print "\t\tscore is ", $hsp->score, >> "\n"; >> >>>> } >> >>>> } >> >>>> } >> >>>> } >> >>>> >> >>>> >> >>>> } >> >>>> @blast_report = get_file_data ($filename); >> >>>> return @blast_report; >> >>>> >> ############################################################################## >> >>>> #### >> >>>> _______________________________________________ >> >>>> Bioperl-l mailing list >> >>>> Bioperl-l@... >> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> Attention: The information contained in this message and/or >> >>> attachments >> >>> from AgResearch Limited is intended only for the persons or >> entities >> >>> to which it is addressed and may contain confidential and/or >> >>> privileged >> >>> material. Any review, retransmission, dissemination or other use >> >>> of, or >> >>> taking of any action in reliance upon, this information by >> persons or >> >>> entities other than the intended recipients is prohibited by >> >>> AgResearch >> >>> Limited. If you have received this message in error, please >> notify >> >>> the >> >>> sender immediately. >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l@... >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> >> Jason Stajich >> >> jason@... >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l@... >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------------ >> -- >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: >> 07/05/09 >> 05:53:00 > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: > 07/05/09 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Jonas,
If you want to continue to use the bioperl remoteblast interface, probably what you should do is simply call it twice. Once, as you already know how to do, which will return without CDD results. Secondly, to get the CDD results, call remoteblast a second time. This time, using -database => 'CDD' -program => 'rpsblast' However, the wrapper may object to the 'rpsblast' program. It is not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) If so, my guess is that changing the perl wrapper to allow rpsblast will "just work" (tm). I've cc:ed cjfields@... for his opinion on this. Also, you might want to perform the CDD search first, especially if you are streaming results to eyeball that might like something to look at while the second (presumably longer) search is running. Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces@... > [mailto:bioperl-l-bounces@...] On Behalf Of > Jonas Schaer > Sent: Thursday, July 09, 2009 5:16 AM > To: BioPerl List; Smithies, Russell > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi guys, > Thank you all so much for your help and patience :). Of > course you were right and I finaly found the right > put-parameter to get exactly the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but > when I try to use the CDD_SEARCH-parameter > (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > sub:CDD_SEARCH) > like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' > ###I tried it > with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > ub:RID-for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- > From: "Smithies, Russell" <Russell.Smithies@...> > To: "'Jonas Schaer'" <Brotelzwieb@...> > Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" > <bioperl-l@...> > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a "better" > result. > I'd suggest that if you aren't sure what they do, you should > leave them > alone as small changes can make huge differences in the > output - it's quite > possible to miss finding what you're looking for by using the wrong > parameters. > If all else fails, read the blast manual: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > _all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: > http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > > Don't worry about the integer overflow bug as there's nothing > you can do > about it. If you're interested, Google and Wikipedia are your > friends: > http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > > > -----Original Message----- > > From: bioperl-l-bounces@... [mailto:bioperl-l- > > bounces@...] On Behalf Of Jonas Schaer > > Sent: Tuesday, 7 July 2009 12:14 a.m. > > To: BioPerl List; Chris Fields > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > Hi guys, thanks for your answers so far. > > @jason: integer overflow in blast.... sorry, but what do > you mean by that? > > how can I fix it...? > > > > Since I never really changed any parameters I thought them > all to be > > default. > > whatever, I tried to get "better" results with my prog by changing > > these: > > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > STICS'} = > > '1'; > > with no effect...I guess these were default values anyway. > > > > So please maybe you can tell me all the other parameters I > can change with > > my > > perl-skript AND how to do that? > > Unfortunately both, perl and the blast-algorithm are pretty > much new to > > me, > > maybe thats why I just cannot find out how to do that on my > own... :/ > > > > Here is the output I get with my remote-blast skript: > > > ############################################################## > ################ > > ################################### > > Query Name: > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > > L > > hit name is ref|XP_001702807.1| > > score is 442 > > BLASTP 2.2.21+ > > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > A. Schaffer, > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > Lipman (1997), > > "Gapped > > BLAST and PSI-BLAST: a new generation of protein database search > > programs", > > Nucleic Acids Res. 25:3389-3402. > > > > > > Reference for composition-based statistics: Alejandro A. > > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > John L. Spouge, > > Yuri > > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the > > accuracy of PSI-BLAST protein database searches with > composition-based > > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > > > > RID: 53STX5G2013 > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > > from WGS projects > > 9,252,587 sequences; 3,169,972,781 total letters Query= > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > > > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > > ATGPDPDDEYE > > Length=150 > > > > > > > Score > > E > > Sequences producing significant alignments: > (Bits) > > Value > > > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 174 > > 2e-42 > > > > > > ALIGNMENTS > > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > > Length=303 > > > > Score = 174 bits (442), Expect = 2e-42, Method: > Composition-based > > stats. > > Identities = 150/150 (100%), Positives = 150/150 (100%), > Gaps = 0/150 > > (0%) > > > > Query 1 > MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > > 60 > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > Sbjct 154 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > 213 > > > > Query 61 > dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > > 120 > > > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > Sbjct 214 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > 273 > > > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF > > excluding environmental samples from WGS projects > > Posted date: Jul 5, 2009 4:41 AM > > Number of letters in database: -1,124,994,511 > > Number of sequences in database: 9,252,587 > > > > Lambda K H > > 0.309 0.122 0.345 > > Gapped > > Lambda K H > > 0.267 0.0410 0.140 > > Matrix: BLOSUM62 > > Gap Penalties: Existence: 11, Extension: 1 > > Number of Sequences: 9252587 > > Number of Hits to DB: 60273703 > > Number of extensions: 1448367 > > Number of successful extensions: 2103 > > Number of sequences better than 10: 0 > > Number of HSP's better than 10 without gapping: 0 > > Number of HSP's gapped: 2113 > > Number of HSP's successfully gapped: 0 > > Length of query: 150 > > Length of database: 3169972781 > > Length adjustment: 113 > > Effective length of query: 37 > > Effective length of database: 2124430450 > > Effective search space: 78603926650 > > Effective search space used: 78603926650 > > T: 11 > > A: 40 > > X1: 16 (7.1 bits) > > X2: 38 (14.6 bits) > > X3: 64 (24.7 bits) > > S1: 42 (20.8 bits) > > S2: 74 (33.1 bits) > > > > > ############################################################## > ################ > > ################################### > > and here are the hits (?) of the blast-algorithm on the > ncbi-homepage with > > the same query of course: > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 300 > > 3e-80 > > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > [Acyrtho... 36.2 > > 1.1 > > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > [Blautia... 35.4 > > 1.8 > > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > brazil... 34.3 > > 4.2 > > ref|XP_680841.1| hypothetical protein AN7572.2 > [Aspergillus n... 33.5 > > 6.0 > > ref|YP_001768110.1| hypothetical protein M446_1150 > [Methyloba... 33.5 > > 7.0 > > > ############################################################## > ################ > > ###################################at > > least the first hit is the same, but even there there is a > different score > > and e-value. > > > > thanks so much for any help :) > > regards, jonas > > > > > > ----- Original Message ----- > > From: "Chris Fields" <cjfields@...> > > To: "Jason Stajich" <jason@...> > > Cc: "Smithies, Russell" > <Russell.Smithies@...>; "'BioPerl > > List'" <bioperl-l@...>; "'Jonas Schaer'" > > <Jonas_Schaer@...> > > Sent: Monday, July 06, 2009 12:51 AM > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > > > > That inspires confidence ;> > > > > > > chris > > > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > > > >> integer overflow in blast.... > > >> > > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > > >> > > >>> I'd guess it's a difference in the parameters used. > > >>> Interesting that both have the number of letters in the db as > > >>> "-1,125,070,205", I assume that's a bug :-) > > >>> > > >>> Stats from your remote_blast: > > >>> > > >>> 'stats' => { > > >>> 'S1' => '42', > > >>> 'S1_bits' => '20.8', > > >>> 'lambda' => '0.309', > > >>> 'entropy' => '0.345', > > >>> 'kappa_gapped' => '0.0410', > > >>> 'T' => '11', > > >>> 'kappa' => '0.122', > > >>> 'X3_bits' => '24.7', > > >>> 'X1' => '16', > > >>> 'lambda_gapped' => '0.267', > > >>> 'X2' => '38', > > >>> 'S2' => '74', > > >>> 'seqs_better_than_cutoff' => '0', > > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > > >>> 'Hits_to_DB' => '60102303', > > >>> 'dbletters' => '-1125070205', > > >>> 'A' => '40', > > >>> 'num_successful_extensions' => '2004', > > >>> 'num_extensions' => '1436892', > > >>> 'X1_bits' => '7.1', > > >>> 'X3' => '64', > > >>> 'entropy_gapped' => '0.140', > > >>> 'dbentries' => '9252258', > > >>> 'X2_bits' => '14.6', > > >>> 'S2_bits' => '33.1' > > >>> } > > >>> > > >>> > > >>> Stats from a blast done on the NCBI webpage: > > >>> > > >>> Database: All non-redundant GenBank CDS > translations+PDB+SwissProt > > >>> +PIR+PRF > > >>> excluding environmental samples from WGS projects > > >>> Posted date: Jul 4, 2009 4:41 AM > > >>> Number of letters in database: -1,125,070,205 > > >>> Number of sequences in database: 9,252,258 > > >>> > > >>> Lambda K H > > >>> 0.309 0.124 0.340 > > >>> Gapped > > >>> Lambda K H > > >>> 0.267 0.0410 0.140 > > >>> Matrix: BLOSUM62 > > >>> Gap Penalties: Existence: 11, Extension: 1 > > >>> Number of Sequences: 9252258 > > >>> Number of Hits to DB: 86493230 > > >>> Number of extensions: 3101413 > > >>> Number of successful extensions: 9001 > > >>> Number of sequences better than 100: 65 > > >>> Number of HSP's better than 100 without gapping: 0 > > >>> Number of HSP's gapped: 9000 > > >>> Number of HSP's successfully gapped: 66 > > >>> Length of query: 150 > > >>> Length of database: 3169897087 > > >>> Length adjustment: 113 > > >>> Effective length of query: 37 > > >>> Effective length of database: 2124391933 > > >>> Effective search space: 78602501521 > > >>> Effective search space used: 78602501521 > > >>> T: 11 > > >>> A: 40 > > >>> X1: 16 (7.1 bits) > > >>> X2: 38 (14.6 bits) > > >>> X3: 64 (24.7 bits) > > >>> S1: 42 (20.8 bits) > > >>> S2: 65 (29.6 bits) > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > > >>>> bounces@...] On Behalf Of Jonas Schaer > > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > > >>>> To: BioPerl List > > >>>> Subject: [Bioperl-l] different results with remote-blast skript > > >>>> > > >>>> Hi again :) > > >>>> please, I only have this little question: > > >>>> why do I get different results with my remote::blast > perl skript > > >>>> then on the > > >>>> ncbi blast homepage? > > >>>> I am using blastp, the query is an amino-sequence (different > > >>>> results with any > > >>>> sequence, differences not only in number of hits but even in e- > > >>>> values, scores > > >>>> etc...), the database is 'nr'. > > >>>> PLEASE help me, > > >>>> thank you in advance, > > >>>> Jonas > > >>>> > > >>>> ps: my skript: > > >>>> > > > ############################################################## > ################ > > >>>> ## > > >>>> use Bio::Seq::SeqFactory; > > >>>> use Bio::Tools::Run::RemoteBlast; > > >>>> use strict; > > >>>> my @blast_report; > > >>>> my $prog = 'blastp'; > > >>>> my $db = 'nr'; > > >>>> my $e_val= '1e-10'; > > >>>> #my $e_val= '10'; > > >>>> my @params = ( '-prog' => $prog, > > >>>> '-data' => $db, > > >>>> '-expect' => $e_val, > > >>>> '-readmethod' => 'SearchIO' ); > > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > >>>> $ > > >>>> Bio > > >>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > > >>>> = '1'; > > >>>> > > >>>> my > > >>>> $ > > >>>> blast_seq > > >>>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > > >>>> > > > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > AFRQAHQNTAMATGPD > > >>>> PDDEYE'; > > >>>> #$v is just to turn on and off the messages > > >>>> my $v = 1; > > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > > >>>> 'Bio::PrimarySeq'); > > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > > >>>> "$blast_seq"); > > >>>> my $filename='temp2.out'; > > >>>> my $r = $factory->submit_blast($seq); > > >>>> print STDERR "waiting..." if( $v > 0 ); > > >>>> while ( my @rids = $factory->each_rid ) > > >>>> { > > >>>> foreach my $rid ( @rids ) > > >>>> { > > >>>> my $rc = $factory->retrieve_blast($rid); > > >>>> if( !ref($rc) ) > > >>>> { > > >>>> if( $rc < 0 ) > > >>>> { > > >>>> $factory->remove_rid($rid); > > >>>> } > > >>>> print STDERR "." if ( $v > 0 ); > > >>>> } > > >>>> else > > >>>> { > > >>>> my $result = $rc->next_result(); > > >>>> $factory->save_output($filename); > > >>>> $factory->remove_rid($rid); > > >>>> print "\nQuery Name: ", > $result->query_name(), > > >>>> "\n"; > > >>>> while ( my $hit = $result->next_hit ) > > >>>> { > > >>>> next unless ( $v > 0); > > >>>> print "\thit name is ", $hit->name, "\n"; > > >>>> while( my $hsp = $hit->next_hsp ) > > >>>> { > > >>>> print "\t\tscore is ", > $hsp->score, "\n"; > > >>>> } > > >>>> } > > >>>> } > > >>>> } > > >>>> > > >>>> > > >>>> } > > >>>> @blast_report = get_file_data ($filename); > > >>>> return @blast_report; > > >>>> > > > ############################################################## > ################ > > >>>> #### > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l@... > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> Attention: The information contained in this message and/or > > >>> attachments > > >>> from AgResearch Limited is intended only for the > persons or entities > > >>> to which it is addressed and may contain confidential and/or > > >>> privileged > > >>> material. Any review, retransmission, dissemination or other use > > >>> of, or > > >>> taking of any action in reliance upon, this information > by persons or > > >>> entities other than the intended recipients is prohibited by > > >>> AgResearch > > >>> Limited. If you have received this message in error, > please notify > > >>> the > > >>> sender immediately. > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l@... > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> -- > > >> Jason Stajich > > >> jason@... > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l@... > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -------------------------------------------------------------- > ---------------- > > -- > > > > > > > > No virus found in this incoming message. > > Checked by AVG - www.avg.com > > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > Date: 07/05/09 > > 05:53:00 > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > Date: 07/05/09 > 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?I've scheduled this tentatively for the 1.6 release series (just not
sure when yet). It may work as is, but I haven't tried it out yet (and am hazarding to guess it only retrieves the single main RID at the moment). chris On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > Jonas, > > If you want to continue to use the bioperl remoteblast interface, > probably what you should do is simply call it twice. > > Once, as you already know how to do, which will return without CDD > results. > > Secondly, to get the CDD results, call remoteblast a second time. > This time, using > -database => 'CDD' > -program => 'rpsblast' > > However, the wrapper may object to the 'rpsblast' program. It is > not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) > If so, my guess is that changing the perl wrapper to allow > rpsblast will "just work" (tm). I've cc:ed cjfields@... for > his opinion on this. > > Also, you might want to perform the CDD search first, especially if > you are streaming results to eyeball that might like something to > look at while the second (presumably longer) search is running. > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces@... >> [mailto:bioperl-l-bounces@...] On Behalf Of >> Jonas Schaer >> Sent: Thursday, July 09, 2009 5:16 AM >> To: BioPerl List; Smithies, Russell >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi guys, >> Thank you all so much for your help and patience :). Of >> course you were right and I finaly found the right >> put-parameter to get exactly the same hits as on the homepage. >> I do have an other question though :)... >> I now want to include a search for conserved domains, but >> when I try to use the CDD_SEARCH-parameter >> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >> sub:CDD_SEARCH) >> like the other put-parameters the way chris once told >> me(works fine with the other params): >> >> my %put = ( >> WORD_SIZE => 3, >> HITLIST_SIZE => 100, >> THRESHOLD => 11, >> FILTER => 'R', >> GENETIC_CODE => 1, >> CDD_SEARCH => 'on' >> ###I tried it >> with 'true' and '1', too. >> >> ); >> >> for my $putName (keys %put) { >> $factory->submit_parameter($putName,$put{$putName}); >> } >> >> >> ...an exception is thrown: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: CDD_SEARCH is not a valid PUT parameter. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:325 >> STACK: main::blast_a_sequence firsteval0.8.pm:383 >> STACK: main::blast_it firsteval0.8.pm:288 >> STACK: firsteval0.8.pm:35 >> ----------------------------------------------------------- . >> I guess somehow this could be the solution to my problem: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >> ub:RID-for-Simultaneous >> , but unfortunately I don't understand what to do. >> I'm so sorry to bother you with this but please help me once >> more...:) >> >> Best regards and thanks in advance, >> Jonas >> >> ----- Original Message ----- >> From: "Smithies, Russell" <Russell.Smithies@...> >> To: "'Jonas Schaer'" <Brotelzwieb@...> >> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" >> <bioperl-l@...> >> Sent: Monday, July 06, 2009 10:56 PM >> Subject: RE: [Bioperl-l] different results with remote-blast skript >> >> >> Hi Jonas, >> You can't just play with the BLAST parameters and hope for a "better" >> result. >> I'd suggest that if you aren't sure what they do, you should >> leave them >> alone as small changes can make huge differences in the >> output - it's quite >> possible to miss finding what you're looking for by using the wrong >> parameters. >> If all else fails, read the blast manual: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >> _all.html >> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >> Or Read Ian Korfs' excellent book: >> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >> >> Don't worry about the integer overflow bug as there's nothing >> you can do >> about it. If you're interested, Google and Wikipedia are your >> friends: >> http://en.wikipedia.org/wiki/Integer_overflow >> >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>> bounces@...] On Behalf Of Jonas Schaer >>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>> To: BioPerl List; Chris Fields >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> Hi guys, thanks for your answers so far. >>> @jason: integer overflow in blast.... sorry, but what do >> you mean by that? >>> how can I fix it...? >>> >>> Since I never really changed any parameters I thought them >> all to be >>> default. >>> whatever, I tried to get "better" results with my prog by changing >>> these: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >> STICS'} = >>> '1'; >>> with no effect...I guess these were default values anyway. >>> >>> So please maybe you can tell me all the other parameters I >> can change with >>> my >>> perl-skript AND how to do that? >>> Unfortunately both, perl and the blast-algorithm are pretty >> much new to >>> me, >>> maybe thats why I just cannot find out how to do that on my >> own... :/ >>> >>> Here is the output I get with my remote-blast skript: >>> >> ############################################################## >> ################ >>> ################################### >>> Query Name: >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>> L >>> hit name is ref|XP_001702807.1| >>> score is 442 >>> BLASTP 2.2.21+ >>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >> A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), >>> "Gapped >>> BLAST and PSI-BLAST: a new generation of protein database search >>> programs", >>> Nucleic Acids Res. 25:3389-3402. >>> >>> >>> Reference for composition-based statistics: Alejandro A. >>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >> John L. Spouge, >>> Yuri >>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >>> accuracy of PSI-BLAST protein database searches with >> composition-based >>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>> >>> >>> RID: 53STX5G2013 >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> from WGS projects >>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>> ATGPDPDDEYE >>> Length=150 >>> >>> >>> >> Score >>> E >>> Sequences producing significant alignments: >> (Bits) >>> Value >>> >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 174 >>> 2e-42 >>> >>> >>> ALIGNMENTS >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> Length=303 >>> >>> Score = 174 bits (442), Expect = 2e-42, Method: >> Composition-based >>> stats. >>> Identities = 150/150 (100%), Positives = 150/150 (100%), >> Gaps = 0/150 >>> (0%) >>> >>> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>> 60 >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> 213 >>> >>> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 120 >>> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 273 >>> >>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>> >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 5, 2009 4:41 AM >>> Number of letters in database: -1,124,994,511 >>> Number of sequences in database: 9,252,587 >>> >>> Lambda K H >>> 0.309 0.122 0.345 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252587 >>> Number of Hits to DB: 60273703 >>> Number of extensions: 1448367 >>> Number of successful extensions: 2103 >>> Number of sequences better than 10: 0 >>> Number of HSP's better than 10 without gapping: 0 >>> Number of HSP's gapped: 2113 >>> Number of HSP's successfully gapped: 0 >>> Length of query: 150 >>> Length of database: 3169972781 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124430450 >>> Effective search space: 78603926650 >>> Effective search space used: 78603926650 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 74 (33.1 bits) >>> >>> >> ############################################################## >> ################ >>> ################################### >>> and here are the hits (?) of the blast-algorithm on the >> ncbi-homepage with >>> the same query of course: >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 300 >>> 3e-80 >>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >> [Acyrtho... 36.2 >>> 1.1 >>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >> [Blautia... 35.4 >>> 1.8 >>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >> brazil... 34.3 >>> 4.2 >>> ref|XP_680841.1| hypothetical protein AN7572.2 >> [Aspergillus n... 33.5 >>> 6.0 >>> ref|YP_001768110.1| hypothetical protein M446_1150 >> [Methyloba... 33.5 >>> 7.0 >>> >> ############################################################## >> ################ >>> ###################################at >>> least the first hit is the same, but even there there is a >> different score >>> and e-value. >>> >>> thanks so much for any help :) >>> regards, jonas >>> >>> >>> ----- Original Message ----- >>> From: "Chris Fields" <cjfields@...> >>> To: "Jason Stajich" <jason@...> >>> Cc: "Smithies, Russell" >> <Russell.Smithies@...>; "'BioPerl >>> List'" <bioperl-l@...>; "'Jonas Schaer'" >>> <Jonas_Schaer@...> >>> Sent: Monday, July 06, 2009 12:51 AM >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> >>>> That inspires confidence ;> >>>> >>>> chris >>>> >>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>> >>>>> integer overflow in blast.... >>>>> >>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>> >>>>>> I'd guess it's a difference in the parameters used. >>>>>> Interesting that both have the number of letters in the db as >>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>> >>>>>> Stats from your remote_blast: >>>>>> >>>>>> 'stats' => { >>>>>> 'S1' => '42', >>>>>> 'S1_bits' => '20.8', >>>>>> 'lambda' => '0.309', >>>>>> 'entropy' => '0.345', >>>>>> 'kappa_gapped' => '0.0410', >>>>>> 'T' => '11', >>>>>> 'kappa' => '0.122', >>>>>> 'X3_bits' => '24.7', >>>>>> 'X1' => '16', >>>>>> 'lambda_gapped' => '0.267', >>>>>> 'X2' => '38', >>>>>> 'S2' => '74', >>>>>> 'seqs_better_than_cutoff' => '0', >>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>> 'Hits_to_DB' => '60102303', >>>>>> 'dbletters' => '-1125070205', >>>>>> 'A' => '40', >>>>>> 'num_successful_extensions' => '2004', >>>>>> 'num_extensions' => '1436892', >>>>>> 'X1_bits' => '7.1', >>>>>> 'X3' => '64', >>>>>> 'entropy_gapped' => '0.140', >>>>>> 'dbentries' => '9252258', >>>>>> 'X2_bits' => '14.6', >>>>>> 'S2_bits' => '33.1' >>>>>> } >>>>>> >>>>>> >>>>>> Stats from a blast done on the NCBI webpage: >>>>>> >>>>>> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt >>>>>> +PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>> Number of letters in database: -1,125,070,205 >>>>>> Number of sequences in database: 9,252,258 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.124 0.340 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252258 >>>>>> Number of Hits to DB: 86493230 >>>>>> Number of extensions: 3101413 >>>>>> Number of successful extensions: 9001 >>>>>> Number of sequences better than 100: 65 >>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>> Number of HSP's gapped: 9000 >>>>>> Number of HSP's successfully gapped: 66 >>>>>> Length of query: 150 >>>>>> Length of database: 3169897087 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124391933 >>>>>> Effective search space: 78602501521 >>>>>> Effective search space used: 78602501521 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 65 (29.6 bits) >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>> To: BioPerl List >>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>> >>>>>>> Hi again :) >>>>>>> please, I only have this little question: >>>>>>> why do I get different results with my remote::blast >> perl skript >>>>>>> then on the >>>>>>> ncbi blast homepage? >>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>> results with any >>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>> values, scores >>>>>>> etc...), the database is 'nr'. >>>>>>> PLEASE help me, >>>>>>> thank you in advance, >>>>>>> Jonas >>>>>>> >>>>>>> ps: my skript: >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> ## >>>>>>> use Bio::Seq::SeqFactory; >>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>> use strict; >>>>>>> my @blast_report; >>>>>>> my $prog = 'blastp'; >>>>>>> my $db = 'nr'; >>>>>>> my $e_val= '1e-10'; >>>>>>> #my $e_val= '10'; >>>>>>> my @params = ( '-prog' => $prog, >>>>>>> '-data' => $db, >>>>>>> '-expect' => $e_val, >>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>> $ >>>>>>> Bio >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>> = '1'; >>>>>>> >>>>>>> my >>>>>>> $ >>>>>>> blast_seq >>>>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>> >>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >> AFRQAHQNTAMATGPD >>>>>>> PDDEYE'; >>>>>>> #$v is just to turn on and off the messages >>>>>>> my $v = 1; >>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>> 'Bio::PrimarySeq'); >>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>> "$blast_seq"); >>>>>>> my $filename='temp2.out'; >>>>>>> my $r = $factory->submit_blast($seq); >>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>> { >>>>>>> foreach my $rid ( @rids ) >>>>>>> { >>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>> if( !ref($rc) ) >>>>>>> { >>>>>>> if( $rc < 0 ) >>>>>>> { >>>>>>> $factory->remove_rid($rid); >>>>>>> } >>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>> } >>>>>>> else >>>>>>> { >>>>>>> my $result = $rc->next_result(); >>>>>>> $factory->save_output($filename); >>>>>>> $factory->remove_rid($rid); >>>>>>> print "\nQuery Name: ", >> $result->query_name(), >>>>>>> "\n"; >>>>>>> while ( my $hit = $result->next_hit ) >>>>>>> { >>>>>>> next unless ( $v > 0); >>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>> { >>>>>>> print "\t\tscore is ", >> $hsp->score, "\n"; >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> } >>>>>>> @blast_report = get_file_data ($filename); >>>>>>> return @blast_report; >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> #### >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@... >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> Attention: The information contained in this message and/or >>>>>> attachments >>>>>> from AgResearch Limited is intended only for the >> persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use >>>>>> of, or >>>>>> taking of any action in reliance upon, this information >> by persons or >>>>>> entities other than the intended recipients is prohibited by >>>>>> AgResearch >>>>>> Limited. If you have received this message in error, >> please notify >>>>>> the >>>>>> sender immediately. >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@... >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason@... >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@... >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> -------------------------------------------------------------- >> ---------------- >>> -- >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >> Date: 07/05/09 >>> 05:53:00 >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >> Date: 07/05/09 >> 17:54:00 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Hi,
I tried to do what Malcom proposed my ($prog = 'rpsblast'; my $db = 'CDD';) but that didn't work. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Value rpsblast for PUT parameter PROGRAM does not match expression t?blast[ pnx]. Rejecting. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:329 STACK: Bio::Tools::Run::RemoteBlast::new C:/Perl/site/lib/Bio/Tools/Run/RemoteBl ast.pm:257 STACK: blast_a_seq2.pm:14 ----------------------------------------------------------- So I should try to "change the wrapper to allow 'rpsblast'", right? Could You tell me how to do that, please? So sorry but I have no idea yet...:) If that doesn't work, is there any other way to run cdd-searches with perl? Thank you so much! Regards, Jonas ----- Original Message ----- From: "Chris Fields" <cjfields1@...> To: "Cook, Malcolm" <MEC@...> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" <bioperl-l@...>; "'Smithies, Russell'" <Russell.Smithies@...>; <cjfields@...> Sent: Thursday, July 09, 2009 9:19 PM Subject: Re: [Bioperl-l] cdd-search with remoteblast? > I've scheduled this tentatively for the 1.6 release series (just not > sure when yet). It may work as is, but I haven't tried it out yet > (and am hazarding to guess it only retrieves the single main RID at > the moment). > > chris > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >> Jonas, >> >> If you want to continue to use the bioperl remoteblast interface, >> probably what you should do is simply call it twice. >> >> Once, as you already know how to do, which will return without CDD >> results. >> >> Secondly, to get the CDD results, call remoteblast a second time. >> This time, using >> -database => 'CDD' >> -program => 'rpsblast' >> >> However, the wrapper may object to the 'rpsblast' program. It is >> not listed in the POD - >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) >> If so, my guess is that changing the perl wrapper to allow >> rpsblast will "just work" (tm). I've cc:ed cjfields@... for >> his opinion on this. >> >> Also, you might want to perform the CDD search first, especially if >> you are streaming results to eyeball that might like something to >> look at while the second (presumably longer) search is running. >> >> Cheers, >> >> Malcolm Cook >> Stowers Institute for Medical Research - Kansas City, Missouri >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces@... >>> [mailto:bioperl-l-bounces@...] On Behalf Of >>> Jonas Schaer >>> Sent: Thursday, July 09, 2009 5:16 AM >>> To: BioPerl List; Smithies, Russell >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>> >>> Hi guys, >>> Thank you all so much for your help and patience :). Of >>> course you were right and I finaly found the right >>> put-parameter to get exactly the same hits as on the homepage. >>> I do have an other question though :)... >>> I now want to include a search for conserved domains, but >>> when I try to use the CDD_SEARCH-parameter >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>> sub:CDD_SEARCH) >>> like the other put-parameters the way chris once told >>> me(works fine with the other params): >>> >>> my %put = ( >>> WORD_SIZE => 3, >>> HITLIST_SIZE => 100, >>> THRESHOLD => 11, >>> FILTER => 'R', >>> GENETIC_CODE => 1, >>> CDD_SEARCH => 'on' >>> ###I tried it >>> with 'true' and '1', too. >>> >>> ); >>> >>> for my $putName (keys %put) { >>> $factory->submit_parameter($putName,$put{$putName}); >>> } >>> >>> >>> ...an exception is thrown: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: CDD_SEARCH is not a valid PUT parameter. >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>> C:/Perl/site/lib/Bio/Tools >>> /Run/RemoteBlast.pm:325 >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>> STACK: main::blast_it firsteval0.8.pm:288 >>> STACK: firsteval0.8.pm:35 >>> ----------------------------------------------------------- . >>> I guess somehow this could be the solution to my problem: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>> ub:RID-for-Simultaneous >>> , but unfortunately I don't understand what to do. >>> I'm so sorry to bother you with this but please help me once >>> more...:) >>> >>> Best regards and thanks in advance, >>> Jonas >>> >>> ----- Original Message ----- >>> From: "Smithies, Russell" <Russell.Smithies@...> >>> To: "'Jonas Schaer'" <Brotelzwieb@...> >>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" >>> <bioperl-l@...> >>> Sent: Monday, July 06, 2009 10:56 PM >>> Subject: RE: [Bioperl-l] different results with remote-blast skript >>> >>> >>> Hi Jonas, >>> You can't just play with the BLAST parameters and hope for a "better" >>> result. >>> I'd suggest that if you aren't sure what they do, you should >>> leave them >>> alone as small changes can make huge differences in the >>> output - it's quite >>> possible to miss finding what you're looking for by using the wrong >>> parameters. >>> If all else fails, read the blast manual: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>> _all.html >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>> Or Read Ian Korfs' excellent book: >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>> >>> Don't worry about the integer overflow bug as there's nothing >>> you can do >>> about it. If you're interested, Google and Wikipedia are your >>> friends: >>> http://en.wikipedia.org/wiki/Integer_overflow >>> >>> >>> Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>> bounces@...] On Behalf Of Jonas Schaer >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>> To: BioPerl List; Chris Fields >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi guys, thanks for your answers so far. >>>> @jason: integer overflow in blast.... sorry, but what do >>> you mean by that? >>>> how can I fix it...? >>>> >>>> Since I never really changed any parameters I thought them >>> all to be >>>> default. >>>> whatever, I tried to get "better" results with my prog by changing >>>> these: >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>> STICS'} = >>>> '1'; >>>> with no effect...I guess these were default values anyway. >>>> >>>> So please maybe you can tell me all the other parameters I >>> can change with >>>> my >>>> perl-skript AND how to do that? >>>> Unfortunately both, perl and the blast-algorithm are pretty >>> much new to >>>> me, >>>> maybe thats why I just cannot find out how to do that on my >>> own... :/ >>>> >>>> Here is the output I get with my remote-blast skript: >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> Query Name: >>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>> L >>>> hit name is ref|XP_001702807.1| >>>> score is 442 >>>> BLASTP 2.2.21+ >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>> A. Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>> Lipman (1997), >>>> "Gapped >>>> BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", >>>> Nucleic Acids Res. 25:3389-3402. >>>> >>>> >>>> Reference for composition-based statistics: Alejandro A. >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>> John L. Spouge, >>>> Yuri >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>> "Improving the >>>> accuracy of PSI-BLAST protein database searches with >>> composition-based >>>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>>> >>>> >>>> RID: 53STX5G2013 >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> from WGS projects >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>> >>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>> ATGPDPDDEYE >>>> Length=150 >>>> >>>> >>>> >>> Score >>>> E >>>> Sequences producing significant alignments: >>> (Bits) >>>> Value >>>> >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 174 >>>> 2e-42 >>>> >>>> >>>> ALIGNMENTS >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> Length=303 >>>> >>>> Score = 174 bits (442), Expect = 2e-42, Method: >>> Composition-based >>>> stats. >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>> Gaps = 0/150 >>>> (0%) >>>> >>>> Query 1 >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>> 60 >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> Sbjct 154 >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> 213 >>>> >>>> Query 61 >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 120 >>>> >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> Sbjct 214 >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 273 >>>> >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>> >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF >>>> excluding environmental samples from WGS projects >>>> Posted date: Jul 5, 2009 4:41 AM >>>> Number of letters in database: -1,124,994,511 >>>> Number of sequences in database: 9,252,587 >>>> >>>> Lambda K H >>>> 0.309 0.122 0.345 >>>> Gapped >>>> Lambda K H >>>> 0.267 0.0410 0.140 >>>> Matrix: BLOSUM62 >>>> Gap Penalties: Existence: 11, Extension: 1 >>>> Number of Sequences: 9252587 >>>> Number of Hits to DB: 60273703 >>>> Number of extensions: 1448367 >>>> Number of successful extensions: 2103 >>>> Number of sequences better than 10: 0 >>>> Number of HSP's better than 10 without gapping: 0 >>>> Number of HSP's gapped: 2113 >>>> Number of HSP's successfully gapped: 0 >>>> Length of query: 150 >>>> Length of database: 3169972781 >>>> Length adjustment: 113 >>>> Effective length of query: 37 >>>> Effective length of database: 2124430450 >>>> Effective search space: 78603926650 >>>> Effective search space used: 78603926650 >>>> T: 11 >>>> A: 40 >>>> X1: 16 (7.1 bits) >>>> X2: 38 (14.6 bits) >>>> X3: 64 (24.7 bits) >>>> S1: 42 (20.8 bits) >>>> S2: 74 (33.1 bits) >>>> >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> and here are the hits (?) of the blast-algorithm on the >>> ncbi-homepage with >>>> the same query of course: >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 300 >>>> 3e-80 >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>> [Acyrtho... 36.2 >>>> 1.1 >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>> [Blautia... 35.4 >>>> 1.8 >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>> brazil... 34.3 >>>> 4.2 >>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>> [Aspergillus n... 33.5 >>>> 6.0 >>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>> [Methyloba... 33.5 >>>> 7.0 >>>> >>> ############################################################## >>> ################ >>>> ###################################at >>>> least the first hit is the same, but even there there is a >>> different score >>>> and e-value. >>>> >>>> thanks so much for any help :) >>>> regards, jonas >>>> >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" <cjfields@...> >>>> To: "Jason Stajich" <jason@...> >>>> Cc: "Smithies, Russell" >>> <Russell.Smithies@...>; "'BioPerl >>>> List'" <bioperl-l@...>; "'Jonas Schaer'" >>>> <Jonas_Schaer@...> >>>> Sent: Monday, July 06, 2009 12:51 AM >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> >>>>> That inspires confidence ;> >>>>> >>>>> chris >>>>> >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>> >>>>>> integer overflow in blast.... >>>>>> >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>> >>>>>>> I'd guess it's a difference in the parameters used. >>>>>>> Interesting that both have the number of letters in the db as >>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>> >>>>>>> Stats from your remote_blast: >>>>>>> >>>>>>> 'stats' => { >>>>>>> 'S1' => '42', >>>>>>> 'S1_bits' => '20.8', >>>>>>> 'lambda' => '0.309', >>>>>>> 'entropy' => '0.345', >>>>>>> 'kappa_gapped' => '0.0410', >>>>>>> 'T' => '11', >>>>>>> 'kappa' => '0.122', >>>>>>> 'X3_bits' => '24.7', >>>>>>> 'X1' => '16', >>>>>>> 'lambda_gapped' => '0.267', >>>>>>> 'X2' => '38', >>>>>>> 'S2' => '74', >>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>> 'Hits_to_DB' => '60102303', >>>>>>> 'dbletters' => '-1125070205', >>>>>>> 'A' => '40', >>>>>>> 'num_successful_extensions' => '2004', >>>>>>> 'num_extensions' => '1436892', >>>>>>> 'X1_bits' => '7.1', >>>>>>> 'X3' => '64', >>>>>>> 'entropy_gapped' => '0.140', >>>>>>> 'dbentries' => '9252258', >>>>>>> 'X2_bits' => '14.6', >>>>>>> 'S2_bits' => '33.1' >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>> >>>>>>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt >>>>>>> +PIR+PRF >>>>>>> excluding environmental samples from WGS projects >>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>> Number of letters in database: -1,125,070,205 >>>>>>> Number of sequences in database: 9,252,258 >>>>>>> >>>>>>> Lambda K H >>>>>>> 0.309 0.124 0.340 >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 0.267 0.0410 0.140 >>>>>>> Matrix: BLOSUM62 >>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>> Number of Sequences: 9252258 >>>>>>> Number of Hits to DB: 86493230 >>>>>>> Number of extensions: 3101413 >>>>>>> Number of successful extensions: 9001 >>>>>>> Number of sequences better than 100: 65 >>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>> Number of HSP's gapped: 9000 >>>>>>> Number of HSP's successfully gapped: 66 >>>>>>> Length of query: 150 >>>>>>> Length of database: 3169897087 >>>>>>> Length adjustment: 113 >>>>>>> Effective length of query: 37 >>>>>>> Effective length of database: 2124391933 >>>>>>> Effective search space: 78602501521 >>>>>>> Effective search space used: 78602501521 >>>>>>> T: 11 >>>>>>> A: 40 >>>>>>> X1: 16 (7.1 bits) >>>>>>> X2: 38 (14.6 bits) >>>>>>> X3: 64 (24.7 bits) >>>>>>> S1: 42 (20.8 bits) >>>>>>> S2: 65 (29.6 bits) >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>> To: BioPerl List >>>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>>> >>>>>>>> Hi again :) >>>>>>>> please, I only have this little question: >>>>>>>> why do I get different results with my remote::blast >>> perl skript >>>>>>>> then on the >>>>>>>> ncbi blast homepage? >>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>> results with any >>>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>>> values, scores >>>>>>>> etc...), the database is 'nr'. >>>>>>>> PLEASE help me, >>>>>>>> thank you in advance, >>>>>>>> Jonas >>>>>>>> >>>>>>>> ps: my skript: >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> ## >>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use strict; >>>>>>>> my @blast_report; >>>>>>>> my $prog = 'blastp'; >>>>>>>> my $db = 'nr'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> #my $e_val= '10'; >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> $ >>>>>>>> Bio >>>>>>>> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>> = '1'; >>>>>>>> >>>>>>>> my >>>>>>>> $ >>>>>>>> blast_seq >>>>>>>> >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>> >>>> >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>> AFRQAHQNTAMATGPD >>>>>>>> PDDEYE'; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> my $v = 1; >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>> 'Bio::PrimarySeq'); >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>>> "$blast_seq"); >>>>>>>> my $filename='temp2.out'; >>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>> { >>>>>>>> foreach my $rid ( @rids ) >>>>>>>> { >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> my $result = $rc->next_result(); >>>>>>>> $factory->save_output($filename); >>>>>>>> $factory->remove_rid($rid); >>>>>>>> print "\nQuery Name: ", >>> $result->query_name(), >>>>>>>> "\n"; >>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>> { >>>>>>>> next unless ( $v > 0); >>>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>> { >>>>>>>> print "\t\tscore is ", >>> $hsp->score, "\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>> return @blast_report; >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> #### >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l@... >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> Attention: The information contained in this message and/or >>>>>>> attachments >>>>>>> from AgResearch Limited is intended only for the >>> persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use >>>>>>> of, or >>>>>>> taking of any action in reliance upon, this information >>> by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>>>> AgResearch >>>>>>> Limited. If you have received this message in error, >>> please notify >>>>>>> the >>>>>>> sender immediately. >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@... >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason@... >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@... >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ---------------- >>>> -- >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>> Date: 07/05/09 >>>> 05:53:00 >>> >>> >>> -------------------------------------------------------------- >>> ------------------ >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>> Date: 07/05/09 >>> 17:54:00 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release Date: 07/09/09 05:55:00 _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Chris, I've added a test to bioperl RemoteBlast.t that demonstrates the following. Is it appropriate to submit it?
Jonas, OK, I was a little quick on the gun... but I've got it now. You don't need to change the wrapper. Here is what you need to do: # 1) set your database like this: -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html for other cdd database options # 2) add this line before submitting the job: $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; You're in - No other changes needed. Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Jonas Schaer [mailto:Brotelzwieb@...] > Sent: Friday, July 10, 2009 4:18 AM > To: BioPerl List; Cook, Malcolm; Chris Fields > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi, > I tried to do what Malcom proposed my ($prog = 'rpsblast'; > my $db = > 'CDD';) but that didn't work. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Value rpsblast for PUT parameter PROGRAM does not match > expression t?blast[ pnx]. Rejecting. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:329 > STACK: Bio::Tools::Run::RemoteBlast::new > C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > ast.pm:257 > STACK: blast_a_seq2.pm:14 > ----------------------------------------------------------- > So I should try to "change the wrapper to allow 'rpsblast'", > right? Could You tell me how to do that, please? So sorry but > I have no idea yet...:) If that doesn't work, is there any > other way to run cdd-searches with perl? > Thank you so much! > Regards, Jonas > > ----- Original Message ----- > From: "Chris Fields" <cjfields1@...> > To: "Cook, Malcolm" <MEC@...> > Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" > <bioperl-l@...>; "'Smithies, Russell'" > <Russell.Smithies@...>; <cjfields@...> > Sent: Thursday, July 09, 2009 9:19 PM > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > > > I've scheduled this tentatively for the 1.6 release series (just not > > sure when yet). It may work as is, but I haven't tried it out yet > > (and am hazarding to guess it only retrieves the single main RID at > > the moment). > > > > chris > > > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > > > >> Jonas, > >> > >> If you want to continue to use the bioperl remoteblast interface, > >> probably what you should do is simply call it twice. > >> > >> Once, as you already know how to do, which will return without CDD > >> results. > >> > >> Secondly, to get the CDD results, call remoteblast a second time. > >> This time, using > >> -database => 'CDD' > >> -program => 'rpsblast' > >> > >> However, the wrapper may object to the 'rpsblast' program. It is > >> not listed in the POD - > >> > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > emoteBlast.pm) > >> If so, my guess is that changing the perl wrapper to allow > >> rpsblast will "just work" (tm). I've cc:ed > cjfields@... for > >> his opinion on this. > >> > >> Also, you might want to perform the CDD search first, especially if > >> you are streaming results to eyeball that might like something to > >> look at while the second (presumably longer) search is running. > >> > >> Cheers, > >> > >> Malcolm Cook > >> Stowers Institute for Medical Research - Kansas City, Missouri > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces@... > >>> [mailto:bioperl-l-bounces@...] On Behalf Of > >>> Jonas Schaer > >>> Sent: Thursday, July 09, 2009 5:16 AM > >>> To: BioPerl List; Smithies, Russell > >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>> > >>> Hi guys, > >>> Thank you all so much for your help and patience :). Of > >>> course you were right and I finaly found the right > >>> put-parameter to get exactly the same hits as on the homepage. > >>> I do have an other question though :)... > >>> I now want to include a search for conserved domains, but > >>> when I try to use the CDD_SEARCH-parameter > >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>> sub:CDD_SEARCH) > >>> like the other put-parameters the way chris once told > >>> me(works fine with the other params): > >>> > >>> my %put = ( > >>> WORD_SIZE => 3, > >>> HITLIST_SIZE => 100, > >>> THRESHOLD => 11, > >>> FILTER => 'R', > >>> GENETIC_CODE => 1, > >>> CDD_SEARCH => 'on' > >>> ###I tried it > >>> with 'true' and '1', too. > >>> > >>> ); > >>> > >>> for my $putName (keys %put) { > >>> $factory->submit_parameter($putName,$put{$putName}); > >>> } > >>> > >>> > >>> ...an exception is thrown: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>> C:/Perl/site/lib/Bio/Tools > >>> /Run/RemoteBlast.pm:325 > >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>> STACK: main::blast_it firsteval0.8.pm:288 > >>> STACK: firsteval0.8.pm:35 > >>> ----------------------------------------------------------- . > >>> I guess somehow this could be the solution to my problem: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>> ub:RID-for-Simultaneous > >>> , but unfortunately I don't understand what to do. > >>> I'm so sorry to bother you with this but please help me once > >>> more...:) > >>> > >>> Best regards and thanks in advance, > >>> Jonas > >>> > >>> ----- Original Message ----- > >>> From: "Smithies, Russell" <Russell.Smithies@...> > >>> To: "'Jonas Schaer'" <Brotelzwieb@...> > >>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" > >>> <bioperl-l@...> > >>> Sent: Monday, July 06, 2009 10:56 PM > >>> Subject: RE: [Bioperl-l] different results with > remote-blast skript > >>> > >>> > >>> Hi Jonas, > >>> You can't just play with the BLAST parameters and hope > for a "better" > >>> result. > >>> I'd suggest that if you aren't sure what they do, you should > >>> leave them > >>> alone as small changes can make huge differences in the > >>> output - it's quite > >>> possible to miss finding what you're looking for by using > the wrong > >>> parameters. > >>> If all else fails, read the blast manual: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>> _all.html > >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>> Or Read Ian Korfs' excellent book: > >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>> > >>> Don't worry about the integer overflow bug as there's nothing > >>> you can do > >>> about it. If you're interested, Google and Wikipedia are your > >>> friends: > >>> http://en.wikipedia.org/wiki/Integer_overflow > >>> > >>> > >>> Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > >>>> bounces@...] On Behalf Of Jonas Schaer > >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>> To: BioPerl List; Chris Fields > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> Hi guys, thanks for your answers so far. > >>>> @jason: integer overflow in blast.... sorry, but what do > >>> you mean by that? > >>>> how can I fix it...? > >>>> > >>>> Since I never really changed any parameters I thought them > >>> all to be > >>>> default. > >>>> whatever, I tried to get "better" results with my prog > by changing > >>>> these: > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> > >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>> STICS'} = > >>>> '1'; > >>>> with no effect...I guess these were default values anyway. > >>>> > >>>> So please maybe you can tell me all the other parameters I > >>> can change with > >>>> my > >>>> perl-skript AND how to do that? > >>>> Unfortunately both, perl and the blast-algorithm are pretty > >>> much new to > >>>> me, > >>>> maybe thats why I just cannot find out how to do that on my > >>> own... :/ > >>>> > >>>> Here is the output I get with my remote-blast skript: > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> Query Name: > >>>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>> L > >>>> hit name is ref|XP_001702807.1| > >>>> score is 442 > >>>> BLASTP 2.2.21+ > >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>> A. Schaffer, > >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>> Lipman (1997), > >>>> "Gapped > >>>> BLAST and PSI-BLAST: a new generation of protein database search > >>>> programs", > >>>> Nucleic Acids Res. 25:3389-3402. > >>>> > >>>> > >>>> Reference for composition-based statistics: Alejandro A. > >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>> John L. Spouge, > >>>> Yuri > >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>> "Improving the > >>>> accuracy of PSI-BLAST protein database searches with > >>> composition-based > >>>> statistics and other refinements", Nucleic Acids Res. > 29:2994-3005. > >>>> > >>>> > >>>> RID: 53STX5G2013 > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>> from WGS projects > >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= > >>>> > >>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>> > >>> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > >>>> ATGPDPDDEYE > >>>> Length=150 > >>>> > >>>> > >>>> > >>> Score > >>>> E > >>>> Sequences producing significant alignments: > >>> (Bits) > >>>> Value > >>>> > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 174 > >>>> 2e-42 > >>>> > >>>> > >>>> ALIGNMENTS > >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhardtii] > >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>> Length=303 > >>>> > >>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>> Composition-based > >>>> stats. > >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>> Gaps = 0/150 > >>>> (0%) > >>>> > >>>> Query 1 > >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>> 60 > >>>> > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> Sbjct 154 > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> 213 > >>>> > >>>> Query 61 > >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 120 > >>>> > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> Sbjct 214 > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 273 > >>>> > >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE > >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>> > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF > >>>> excluding environmental samples from WGS projects > >>>> Posted date: Jul 5, 2009 4:41 AM > >>>> Number of letters in database: -1,124,994,511 > >>>> Number of sequences in database: 9,252,587 > >>>> > >>>> Lambda K H > >>>> 0.309 0.122 0.345 > >>>> Gapped > >>>> Lambda K H > >>>> 0.267 0.0410 0.140 > >>>> Matrix: BLOSUM62 > >>>> Gap Penalties: Existence: 11, Extension: 1 > >>>> Number of Sequences: 9252587 > >>>> Number of Hits to DB: 60273703 > >>>> Number of extensions: 1448367 > >>>> Number of successful extensions: 2103 > >>>> Number of sequences better than 10: 0 > >>>> Number of HSP's better than 10 without gapping: 0 > >>>> Number of HSP's gapped: 2113 > >>>> Number of HSP's successfully gapped: 0 > >>>> Length of query: 150 > >>>> Length of database: 3169972781 > >>>> Length adjustment: 113 > >>>> Effective length of query: 37 > >>>> Effective length of database: 2124430450 > >>>> Effective search space: 78603926650 > >>>> Effective search space used: 78603926650 > >>>> T: 11 > >>>> A: 40 > >>>> X1: 16 (7.1 bits) > >>>> X2: 38 (14.6 bits) > >>>> X3: 64 (24.7 bits) > >>>> S1: 42 (20.8 bits) > >>>> S2: 74 (33.1 bits) > >>>> > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> and here are the hits (?) of the blast-algorithm on the > >>> ncbi-homepage with > >>>> the same query of course: > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 300 > >>>> 3e-80 > >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>> [Acyrtho... 36.2 > >>>> 1.1 > >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>> [Blautia... 35.4 > >>>> 1.8 > >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>> brazil... 34.3 > >>>> 4.2 > >>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>> [Aspergillus n... 33.5 > >>>> 6.0 > >>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>> [Methyloba... 33.5 > >>>> 7.0 > >>>> > >>> ############################################################## > >>> ################ > >>>> ###################################at > >>>> least the first hit is the same, but even there there is a > >>> different score > >>>> and e-value. > >>>> > >>>> thanks so much for any help :) > >>>> regards, jonas > >>>> > >>>> > >>>> ----- Original Message ----- > >>>> From: "Chris Fields" <cjfields@...> > >>>> To: "Jason Stajich" <jason@...> > >>>> Cc: "Smithies, Russell" > >>> <Russell.Smithies@...>; "'BioPerl > >>>> List'" <bioperl-l@...>; "'Jonas Schaer'" > >>>> <Jonas_Schaer@...> > >>>> Sent: Monday, July 06, 2009 12:51 AM > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> > >>>>> That inspires confidence ;> > >>>>> > >>>>> chris > >>>>> > >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>> > >>>>>> integer overflow in blast.... > >>>>>> > >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>> > >>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>> Interesting that both have the number of letters in the db as > >>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>> > >>>>>>> Stats from your remote_blast: > >>>>>>> > >>>>>>> 'stats' => { > >>>>>>> 'S1' => '42', > >>>>>>> 'S1_bits' => '20.8', > >>>>>>> 'lambda' => '0.309', > >>>>>>> 'entropy' => '0.345', > >>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>> 'T' => '11', > >>>>>>> 'kappa' => '0.122', > >>>>>>> 'X3_bits' => '24.7', > >>>>>>> 'X1' => '16', > >>>>>>> 'lambda_gapped' => '0.267', > >>>>>>> 'X2' => '38', > >>>>>>> 'S2' => '74', > >>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>> 'dbletters' => '-1125070205', > >>>>>>> 'A' => '40', > >>>>>>> 'num_successful_extensions' => '2004', > >>>>>>> 'num_extensions' => '1436892', > >>>>>>> 'X1_bits' => '7.1', > >>>>>>> 'X3' => '64', > >>>>>>> 'entropy_gapped' => '0.140', > >>>>>>> 'dbentries' => '9252258', > >>>>>>> 'X2_bits' => '14.6', > >>>>>>> 'S2_bits' => '33.1' > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>> > >>>>>>> Database: All non-redundant GenBank CDS > >>> translations+PDB+SwissProt > >>>>>>> +PIR+PRF > >>>>>>> excluding environmental samples from WGS projects > >>>>>>> Posted date: Jul 4, 2009 4:41 AM > >>>>>>> Number of letters in database: -1,125,070,205 > >>>>>>> Number of sequences in database: 9,252,258 > >>>>>>> > >>>>>>> Lambda K H > >>>>>>> 0.309 0.124 0.340 > >>>>>>> Gapped > >>>>>>> Lambda K H > >>>>>>> 0.267 0.0410 0.140 > >>>>>>> Matrix: BLOSUM62 > >>>>>>> Gap Penalties: Existence: 11, Extension: 1 > >>>>>>> Number of Sequences: 9252258 > >>>>>>> Number of Hits to DB: 86493230 > >>>>>>> Number of extensions: 3101413 > >>>>>>> Number of successful extensions: 9001 > >>>>>>> Number of sequences better than 100: 65 > >>>>>>> Number of HSP's better than 100 without gapping: 0 > >>>>>>> Number of HSP's gapped: 9000 > >>>>>>> Number of HSP's successfully gapped: 66 > >>>>>>> Length of query: 150 > >>>>>>> Length of database: 3169897087 > >>>>>>> Length adjustment: 113 > >>>>>>> Effective length of query: 37 > >>>>>>> Effective length of database: 2124391933 > >>>>>>> Effective search space: 78602501521 > >>>>>>> Effective search space used: 78602501521 > >>>>>>> T: 11 > >>>>>>> A: 40 > >>>>>>> X1: 16 (7.1 bits) > >>>>>>> X2: 38 (14.6 bits) > >>>>>>> X3: 64 (24.7 bits) > >>>>>>> S1: 42 (20.8 bits) > >>>>>>> S2: 65 (29.6 bits) > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > >>>>>>>> bounces@...] On Behalf Of Jonas Schaer > >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>> To: BioPerl List > >>>>>>>> Subject: [Bioperl-l] different results with > remote-blast skript > >>>>>>>> > >>>>>>>> Hi again :) > >>>>>>>> please, I only have this little question: > >>>>>>>> why do I get different results with my remote::blast > >>> perl skript > >>>>>>>> then on the > >>>>>>>> ncbi blast homepage? > >>>>>>>> I am using blastp, the query is an amino-sequence (different > >>>>>>>> results with any > >>>>>>>> sequence, differences not only in number of hits but > even in e- > >>>>>>>> values, scores > >>>>>>>> etc...), the database is 'nr'. > >>>>>>>> PLEASE help me, > >>>>>>>> thank you in advance, > >>>>>>>> Jonas > >>>>>>>> > >>>>>>>> ps: my skript: > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> ## > >>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>> use Bio::Tools::Run::RemoteBlast; > >>>>>>>> use strict; > >>>>>>>> my @blast_report; > >>>>>>>> my $prog = 'blastp'; > >>>>>>>> my $db = 'nr'; > >>>>>>>> my $e_val= '1e-10'; > >>>>>>>> #my $e_val= '10'; > >>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>> '-data' => $db, > >>>>>>>> '-expect' => $e_val, > >>>>>>>> '-readmethod' => 'SearchIO' ); > >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>>>> $ > >>>>>>>> Bio > >>>>>>>> > >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>> = '1'; > >>>>>>>> > >>>>>>>> my > >>>>>>>> $ > >>>>>>>> blast_seq > >>>>>>>> > >>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>> > >>>> > >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>> AFRQAHQNTAMATGPD > >>>>>>>> PDDEYE'; > >>>>>>>> #$v is just to turn on and off the messages > >>>>>>>> my $v = 1; > >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>> 'Bio::PrimarySeq'); > >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, > -display_id => > >>>>>>>> "$blast_seq"); > >>>>>>>> my $filename='temp2.out'; > >>>>>>>> my $r = $factory->submit_blast($seq); > >>>>>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>>>>> while ( my @rids = $factory->each_rid ) > >>>>>>>> { > >>>>>>>> foreach my $rid ( @rids ) > >>>>>>>> { > >>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>> if( !ref($rc) ) > >>>>>>>> { > >>>>>>>> if( $rc < 0 ) > >>>>>>>> { > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> } > >>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>> } > >>>>>>>> else > >>>>>>>> { > >>>>>>>> my $result = $rc->next_result(); > >>>>>>>> $factory->save_output($filename); > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> print "\nQuery Name: ", > >>> $result->query_name(), > >>>>>>>> "\n"; > >>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>> { > >>>>>>>> next unless ( $v > 0); > >>>>>>>> print "\thit name is ", > $hit->name, "\n"; > >>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>> { > >>>>>>>> print "\t\tscore is ", > >>> $hsp->score, "\n"; > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> } > >>>>>>>> @blast_report = get_file_data ($filename); > >>>>>>>> return @blast_report; > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> #### > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l@... > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> Attention: The information contained in this message and/or > >>>>>>> attachments > >>>>>>> from AgResearch Limited is intended only for the > >>> persons or entities > >>>>>>> to which it is addressed and may contain confidential and/or > >>>>>>> privileged > >>>>>>> material. Any review, retransmission, dissemination > or other use > >>>>>>> of, or > >>>>>>> taking of any action in reliance upon, this information > >>> by persons or > >>>>>>> entities other than the intended recipients is prohibited by > >>>>>>> AgResearch > >>>>>>> Limited. If you have received this message in error, > >>> please notify > >>>>>>> the > >>>>>>> sender immediately. > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l@... > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> -- > >>>>>> Jason Stajich > >>>>>> jason@... > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l@... > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>> -------------------------------------------------------------- > >>> ---------------- > >>>> -- > >>>> > >>>> > >>>> > >>>> No virus found in this incoming message. > >>>> Checked by AVG - www.avg.com > >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>> Date: 07/05/09 > >>>> 05:53:00 > >>> > >>> > >>> -------------------------------------------------------------- > >>> ------------------ > >>> > >>> > >>> > >>> No virus found in this incoming message. > >>> Checked by AVG - www.avg.com > >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>> Date: 07/05/09 > >>> 17:54:00 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@... > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > Date: 07/09/09 > 05:55:00 > > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Malcolm,
Nice! Go ahead and add the test in; we can look at trying to get CDD_SEARCH working at some point but this is a nice workaround. chris On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > the following. Is it appropriate to submit it? > > Jonas, OK, I was a little quick on the gun... but I've got it now. > > You don't need to change the wrapper. Here is what you need to do: > > # 1) set your database like this: > > -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > for other cdd database options > > # 2) add this line before submitting the job: > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > > You're in - No other changes needed. > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Jonas Schaer [mailto:Brotelzwieb@...] >> Sent: Friday, July 10, 2009 4:18 AM >> To: BioPerl List; Cook, Malcolm; Chris Fields >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi, >> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >> my $db = >> 'CDD';) but that didn't work. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Value rpsblast for PUT parameter PROGRAM does not match >> expression t?blast[ pnx]. Rejecting. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:329 >> STACK: Bio::Tools::Run::RemoteBlast::new >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >> ast.pm:257 >> STACK: blast_a_seq2.pm:14 >> ----------------------------------------------------------- >> So I should try to "change the wrapper to allow 'rpsblast'", >> right? Could You tell me how to do that, please? So sorry but >> I have no idea yet...:) If that doesn't work, is there any >> other way to run cdd-searches with perl? >> Thank you so much! >> Regards, Jonas >> >> ----- Original Message ----- >> From: "Chris Fields" <cjfields1@...> >> To: "Cook, Malcolm" <MEC@...> >> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" >> <bioperl-l@...>; "'Smithies, Russell'" >> <Russell.Smithies@...>; <cjfields@...> >> Sent: Thursday, July 09, 2009 9:19 PM >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> >>> I've scheduled this tentatively for the 1.6 release series (just not >>> sure when yet). It may work as is, but I haven't tried it out yet >>> (and am hazarding to guess it only retrieves the single main RID at >>> the moment). >>> >>> chris >>> >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>> >>>> Jonas, >>>> >>>> If you want to continue to use the bioperl remoteblast interface, >>>> probably what you should do is simply call it twice. >>>> >>>> Once, as you already know how to do, which will return without CDD >>>> results. >>>> >>>> Secondly, to get the CDD results, call remoteblast a second time. >>>> This time, using >>>> -database => 'CDD' >>>> -program => 'rpsblast' >>>> >>>> However, the wrapper may object to the 'rpsblast' program. It is >>>> not listed in the POD - >>>> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >> emoteBlast.pm) >>>> If so, my guess is that changing the perl wrapper to allow >>>> rpsblast will "just work" (tm). I've cc:ed >> cjfields@... for >>>> his opinion on this. >>>> >>>> Also, you might want to perform the CDD search first, especially if >>>> you are streaming results to eyeball that might like something to >>>> look at while the second (presumably longer) search is running. >>>> >>>> Cheers, >>>> >>>> Malcolm Cook >>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces@... >>>>> [mailto:bioperl-l-bounces@...] On Behalf Of >>>>> Jonas Schaer >>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>> To: BioPerl List; Smithies, Russell >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>> >>>>> Hi guys, >>>>> Thank you all so much for your help and patience :). Of >>>>> course you were right and I finaly found the right >>>>> put-parameter to get exactly the same hits as on the homepage. >>>>> I do have an other question though :)... >>>>> I now want to include a search for conserved domains, but >>>>> when I try to use the CDD_SEARCH-parameter >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>> sub:CDD_SEARCH) >>>>> like the other put-parameters the way chris once told >>>>> me(works fine with the other params): >>>>> >>>>> my %put = ( >>>>> WORD_SIZE => 3, >>>>> HITLIST_SIZE => 100, >>>>> THRESHOLD => 11, >>>>> FILTER => 'R', >>>>> GENETIC_CODE => 1, >>>>> CDD_SEARCH => 'on' >>>>> ###I tried it >>>>> with 'true' and '1', too. >>>>> >>>>> ); >>>>> >>>>> for my $putName (keys %put) { >>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>> } >>>>> >>>>> >>>>> ...an exception is thrown: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>> C:/Perl/site/lib/Bio/Tools >>>>> /Run/RemoteBlast.pm:325 >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>> STACK: firsteval0.8.pm:35 >>>>> ----------------------------------------------------------- . >>>>> I guess somehow this could be the solution to my problem: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>> ub:RID-for-Simultaneous >>>>> , but unfortunately I don't understand what to do. >>>>> I'm so sorry to bother you with this but please help me once >>>>> more...:) >>>>> >>>>> Best regards and thanks in advance, >>>>> Jonas >>>>> >>>>> ----- Original Message ----- >>>>> From: "Smithies, Russell" <Russell.Smithies@...> >>>>> To: "'Jonas Schaer'" <Brotelzwieb@...> >>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" >>>>> <bioperl-l@...> >>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>> Subject: RE: [Bioperl-l] different results with >> remote-blast skript >>>>> >>>>> >>>>> Hi Jonas, >>>>> You can't just play with the BLAST parameters and hope >> for a "better" >>>>> result. >>>>> I'd suggest that if you aren't sure what they do, you should >>>>> leave them >>>>> alone as small changes can make huge differences in the >>>>> output - it's quite >>>>> possible to miss finding what you're looking for by using >> the wrong >>>>> parameters. >>>>> If all else fails, read the blast manual: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>> _all.html >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>> Or Read Ian Korfs' excellent book: >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>> >>>>> Don't worry about the integer overflow bug as there's nothing >>>>> you can do >>>>> about it. If you're interested, Google and Wikipedia are your >>>>> friends: >>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>> >>>>> >>>>> Russell >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>> To: BioPerl List; Chris Fields >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> Hi guys, thanks for your answers so far. >>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>> you mean by that? >>>>>> how can I fix it...? >>>>>> >>>>>> Since I never really changed any parameters I thought them >>>>> all to be >>>>>> default. >>>>>> whatever, I tried to get "better" results with my prog >> by changing >>>>>> these: >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>> >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>> STICS'} = >>>>>> '1'; >>>>>> with no effect...I guess these were default values anyway. >>>>>> >>>>>> So please maybe you can tell me all the other parameters I >>>>> can change with >>>>>> my >>>>>> perl-skript AND how to do that? >>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>> much new to >>>>>> me, >>>>>> maybe thats why I just cannot find out how to do that on my >>>>> own... :/ >>>>>> >>>>>> Here is the output I get with my remote-blast skript: >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> Query Name: >>>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>> L >>>>>> hit name is ref|XP_001702807.1| >>>>>> score is 442 >>>>>> BLASTP 2.2.21+ >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>> A. Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>> Lipman (1997), >>>>>> "Gapped >>>>>> BLAST and PSI-BLAST: a new generation of protein database search >>>>>> programs", >>>>>> Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> >>>>>> Reference for composition-based statistics: Alejandro A. >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>> John L. Spouge, >>>>>> Yuri >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>> "Improving the >>>>>> accuracy of PSI-BLAST protein database searches with >>>>> composition-based >>>>>> statistics and other refinements", Nucleic Acids Res. >> 29:2994-3005. >>>>>> >>>>>> >>>>>> RID: 53STX5G2013 >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF excluding >> environmental samples >>>>>> from WGS projects >>>>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>>>> >>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>> >>>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>>>> ATGPDPDDEYE >>>>>> Length=150 >>>>>> >>>>>> >>>>>> >>>>> Score >>>>>> E >>>>>> Sequences producing significant alignments: >>>>> (Bits) >>>>>> Value >>>>>> >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 174 >>>>>> 2e-42 >>>>>> >>>>>> >>>>>> ALIGNMENTS >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhardtii] >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>> Length=303 >>>>>> >>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>> Composition-based >>>>>> stats. >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>> Gaps = 0/150 >>>>>> (0%) >>>>>> >>>>>> Query 1 >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>> 60 >>>>>> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> Sbjct 154 >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> 213 >>>>>> >>>>>> Query 61 >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 120 >>>>>> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> Sbjct 214 >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 273 >>>>>> >>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>> >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 5, 2009 4:41 AM >>>>>> Number of letters in database: -1,124,994,511 >>>>>> Number of sequences in database: 9,252,587 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.122 0.345 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252587 >>>>>> Number of Hits to DB: 60273703 >>>>>> Number of extensions: 1448367 >>>>>> Number of successful extensions: 2103 >>>>>> Number of sequences better than 10: 0 >>>>>> Number of HSP's better than 10 without gapping: 0 >>>>>> Number of HSP's gapped: 2113 >>>>>> Number of HSP's successfully gapped: 0 >>>>>> Length of query: 150 >>>>>> Length of database: 3169972781 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124430450 >>>>>> Effective search space: 78603926650 >>>>>> Effective search space used: 78603926650 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 74 (33.1 bits) >>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> and here are the hits (?) of the blast-algorithm on the >>>>> ncbi-homepage with >>>>>> the same query of course: >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 300 >>>>>> 3e-80 >>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>> [Acyrtho... 36.2 >>>>>> 1.1 >>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>> [Blautia... 35.4 >>>>>> 1.8 >>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>> brazil... 34.3 >>>>>> 4.2 >>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>> [Aspergillus n... 33.5 >>>>>> 6.0 >>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>> [Methyloba... 33.5 >>>>>> 7.0 >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ###################################at >>>>>> least the first hit is the same, but even there there is a >>>>> different score >>>>>> and e-value. >>>>>> >>>>>> thanks so much for any help :) >>>>>> regards, jonas >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Chris Fields" <cjfields@...> >>>>>> To: "Jason Stajich" <jason@...> >>>>>> Cc: "Smithies, Russell" >>>>> <Russell.Smithies@...>; "'BioPerl >>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'" >>>>>> <Jonas_Schaer@...> >>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> >>>>>>> That inspires confidence ;> >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>> >>>>>>>> integer overflow in blast.... >>>>>>>> >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>> >>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>> Interesting that both have the number of letters in the db as >>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>> >>>>>>>>> Stats from your remote_blast: >>>>>>>>> >>>>>>>>> 'stats' => { >>>>>>>>> 'S1' => '42', >>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>> 'lambda' => '0.309', >>>>>>>>> 'entropy' => '0.345', >>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>> 'T' => '11', >>>>>>>>> 'kappa' => '0.122', >>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>> 'X1' => '16', >>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>> 'X2' => '38', >>>>>>>>> 'S2' => '74', >>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>> 'A' => '40', >>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>> 'X3' => '64', >>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>> 'dbentries' => '9252258', >>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>> >>>>>>>>> Database: All non-redundant GenBank CDS >>>>> translations+PDB+SwissProt >>>>>>>>> +PIR+PRF >>>>>>>>> excluding environmental samples from WGS projects >>>>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>>>> Number of letters in database: -1,125,070,205 >>>>>>>>> Number of sequences in database: 9,252,258 >>>>>>>>> >>>>>>>>> Lambda K H >>>>>>>>> 0.309 0.124 0.340 >>>>>>>>> Gapped >>>>>>>>> Lambda K H >>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>> Matrix: BLOSUM62 >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>>>> Number of Sequences: 9252258 >>>>>>>>> Number of Hits to DB: 86493230 >>>>>>>>> Number of extensions: 3101413 >>>>>>>>> Number of successful extensions: 9001 >>>>>>>>> Number of sequences better than 100: 65 >>>>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>>>> Number of HSP's gapped: 9000 >>>>>>>>> Number of HSP's successfully gapped: 66 >>>>>>>>> Length of query: 150 >>>>>>>>> Length of database: 3169897087 >>>>>>>>> Length adjustment: 113 >>>>>>>>> Effective length of query: 37 >>>>>>>>> Effective length of database: 2124391933 >>>>>>>>> Effective search space: 78602501521 >>>>>>>>> Effective search space used: 78602501521 >>>>>>>>> T: 11 >>>>>>>>> A: 40 >>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>> To: BioPerl List >>>>>>>>>> Subject: [Bioperl-l] different results with >> remote-blast skript >>>>>>>>>> >>>>>>>>>> Hi again :) >>>>>>>>>> please, I only have this little question: >>>>>>>>>> why do I get different results with my remote::blast >>>>> perl skript >>>>>>>>>> then on the >>>>>>>>>> ncbi blast homepage? >>>>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>>>> results with any >>>>>>>>>> sequence, differences not only in number of hits but >> even in e- >>>>>>>>>> values, scores >>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>> PLEASE help me, >>>>>>>>>> thank you in advance, >>>>>>>>>> Jonas >>>>>>>>>> >>>>>>>>>> ps: my skript: >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> ## >>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>>>> use strict; >>>>>>>>>> my @blast_report; >>>>>>>>>> my $prog = 'blastp'; >>>>>>>>>> my $db = 'nr'; >>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>> #my $e_val= '10'; >>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>> '-data' => $db, >>>>>>>>>> '-expect' => $e_val, >>>>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>>>> $ >>>>>>>>>> Bio >>>>>>>>>> >>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>> = '1'; >>>>>>>>>> >>>>>>>>>> my >>>>>>>>>> $ >>>>>>>>>> blast_seq >>>>>>>>>> >>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>> >>>>>> >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>> AFRQAHQNTAMATGPD >>>>>>>>>> PDDEYE'; >>>>>>>>>> #$v is just to turn on and off the messages >>>>>>>>>> my $v = 1; >>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>> 'Bio::PrimarySeq'); >>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, >> -display_id => >>>>>>>>>> "$blast_seq"); >>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>>>> { >>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>> { >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>> if( !ref($rc) ) >>>>>>>>>> { >>>>>>>>>> if( $rc < 0 ) >>>>>>>>>> { >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> } >>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>> } >>>>>>>>>> else >>>>>>>>>> { >>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>> $factory->save_output($filename); >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> print "\nQuery Name: ", >>>>> $result->query_name(), >>>>>>>>>> "\n"; >>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>> { >>>>>>>>>> next unless ( $v > 0); >>>>>>>>>> print "\thit name is ", >> $hit->name, "\n"; >>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>> { >>>>>>>>>> print "\t\tscore is ", >>>>> $hsp->score, "\n"; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>>>> return @blast_report; >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> #### >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l@... >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>> attachments >>>>>>>>> from AgResearch Limited is intended only for the >>>>> persons or entities >>>>>>>>> to which it is addressed and may contain confidential and/or >>>>>>>>> privileged >>>>>>>>> material. Any review, retransmission, dissemination >> or other use >>>>>>>>> of, or >>>>>>>>> taking of any action in reliance upon, this information >>>>> by persons or >>>>>>>>> entities other than the intended recipients is prohibited by >>>>>>>>> AgResearch >>>>>>>>> Limited. If you have received this message in error, >>>>> please notify >>>>>>>>> the >>>>>>>>> sender immediately. >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l@... >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> -- >>>>>>>> Jason Stajich >>>>>>>> jason@... >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l@... >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ---------------- >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> No virus found in this incoming message. >>>>>> Checked by AVG - www.avg.com >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>> Date: 07/05/09 >>>>>> 05:53:00 >>>>> >>>>> >>>>> -------------------------------------------------------------- >>>>> ------------------ >>>>> >>>>> >>>>> >>>>> No virus found in this incoming message. >>>>> Checked by AVG - www.avg.com >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>> Date: 07/05/09 >>>>> 17:54:00 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@... >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >> Date: 07/09/09 >> 05:55:00 >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
|
|
|
Re: cdd-search with remoteblast?Chris,
I wound up adding a new test # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ with the comment : # malcolm_cook@...: this test is in a separate file from # RemoteBlast.t (on which it is modelled) since there is some sort of # side-effecting between the multiple remote blasts that is causing # this test to fail, if it comes last, or the other test to fail, if # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY # UNDERSTANDING, i.e. of how to initialize it. In any case, the test passes and demos rpsblast usage. Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Chris Fields [mailto:cjfields1@...] > Sent: Friday, July 10, 2009 1:05 PM > To: Cook, Malcolm > Cc: 'Jonas Schaer'; 'BioPerl List' > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Malcolm, > > Nice! Go ahead and add the test in; we can look at trying to > get CDD_SEARCH working at some point but this is a nice workaround. > > chris > > On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > > > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > > the following. Is it appropriate to submit it? > > > > Jonas, OK, I was a little quick on the gun... but I've got it now. > > > > You don't need to change the wrapper. Here is what you need to do: > > > > # 1) set your database like this: > > > > -database => 'cdsearch/cdd', # c.f. > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > > for other cdd database options > > > > # 2) add this line before submitting the job: > > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > > > > You're in - No other changes needed. > > > > Malcolm Cook > > Stowers Institute for Medical Research - Kansas City, Missouri > > > > > >> -----Original Message----- > >> From: Jonas Schaer [mailto:Brotelzwieb@...] > >> Sent: Friday, July 10, 2009 4:18 AM > >> To: BioPerl List; Cook, Malcolm; Chris Fields > >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >> > >> Hi, > >> I tried to do what Malcom proposed my ($prog = 'rpsblast'; > >> my $db = > >> 'CDD';) but that didn't work. > >> > >> ------------- EXCEPTION: Bio::Root::Exception ------------- > >> MSG: Value rpsblast for PUT parameter PROGRAM does not match > >> expression t?blast[ pnx]. Rejecting. > >> STACK: Error::throw > >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >> C:/Perl/site/lib/Bio/Tools > >> /Run/RemoteBlast.pm:329 > >> STACK: Bio::Tools::Run::RemoteBlast::new > >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > >> ast.pm:257 > >> STACK: blast_a_seq2.pm:14 > >> ----------------------------------------------------------- > >> So I should try to "change the wrapper to allow > 'rpsblast'", right? > >> Could You tell me how to do that, please? So sorry but I > have no idea > >> yet...:) If that doesn't work, is there any other way to run > >> cdd-searches with perl? > >> Thank you so much! > >> Regards, Jonas > >> > >> ----- Original Message ----- > >> From: "Chris Fields" <cjfields1@...> > >> To: "Cook, Malcolm" <MEC@...> > >> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" > >> <bioperl-l@...>; "'Smithies, Russell'" > >> <Russell.Smithies@...>; <cjfields@...> > >> Sent: Thursday, July 09, 2009 9:19 PM > >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >> > >> > >>> I've scheduled this tentatively for the 1.6 release > series (just not > >>> sure when yet). It may work as is, but I haven't tried > it out yet > >>> (and am hazarding to guess it only retrieves the single > main RID at > >>> the moment). > >>> > >>> chris > >>> > >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >>> > >>>> Jonas, > >>>> > >>>> If you want to continue to use the bioperl remoteblast > interface, > >>>> probably what you should do is simply call it twice. > >>>> > >>>> Once, as you already know how to do, which will return > without CDD > >>>> results. > >>>> > >>>> Secondly, to get the CDD results, call remoteblast a second time. > >>>> This time, using > >>>> -database => 'CDD' > >>>> -program => 'rpsblast' > >>>> > >>>> However, the wrapper may object to the 'rpsblast' > program. It is > >>>> not listed in the POD - > >>>> > >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > >> emoteBlast.pm) > >>>> If so, my guess is that changing the perl wrapper to allow > >>>> rpsblast will "just work" (tm). I've cc:ed > >> cjfields@... for > >>>> his opinion on this. > >>>> > >>>> Also, you might want to perform the CDD search first, > especially if > >>>> you are streaming results to eyeball that might like > something to > >>>> look at while the second (presumably longer) search is running. > >>>> > >>>> Cheers, > >>>> > >>>> Malcolm Cook > >>>> Stowers Institute for Medical Research - Kansas City, Missouri > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces@... > >>>>> [mailto:bioperl-l-bounces@...] On Behalf > Of Jonas > >>>>> Schaer > >>>>> Sent: Thursday, July 09, 2009 5:16 AM > >>>>> To: BioPerl List; Smithies, Russell > >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>>>> > >>>>> Hi guys, > >>>>> Thank you all so much for your help and patience :). Of > course you > >>>>> were right and I finaly found the right put-parameter to get > >>>>> exactly the same hits as on the homepage. > >>>>> I do have an other question though :)... > >>>>> I now want to include a search for conserved domains, > but when I > >>>>> try to use the CDD_SEARCH-parameter > >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>>>> sub:CDD_SEARCH) > >>>>> like the other put-parameters the way chris once told me(works > >>>>> fine with the other params): > >>>>> > >>>>> my %put = ( > >>>>> WORD_SIZE => 3, > >>>>> HITLIST_SIZE => 100, > >>>>> THRESHOLD => 11, > >>>>> FILTER => 'R', > >>>>> GENETIC_CODE => 1, > >>>>> CDD_SEARCH => 'on' > >>>>> ###I tried it > >>>>> with 'true' and '1', too. > >>>>> > >>>>> ); > >>>>> > >>>>> for my $putName (keys %put) { > >>>>> $factory->submit_parameter($putName,$put{$putName}); > >>>>> } > >>>>> > >>>>> > >>>>> ...an exception is thrown: > >>>>> > >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>>>> STACK: Error::throw > >>>>> STACK: Bio::Root::Root::throw > >> C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>>>> C:/Perl/site/lib/Bio/Tools > >>>>> /Run/RemoteBlast.pm:325 > >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>>>> STACK: main::blast_it firsteval0.8.pm:288 > >>>>> STACK: firsteval0.8.pm:35 > >>>>> ----------------------------------------------------------- . > >>>>> I guess somehow this could be the solution to my problem: > >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>>>> ub:RID-for-Simultaneous > >>>>> , but unfortunately I don't understand what to do. > >>>>> I'm so sorry to bother you with this but please help me once > >>>>> more...:) > >>>>> > >>>>> Best regards and thanks in advance, Jonas > >>>>> > >>>>> ----- Original Message ----- > >>>>> From: "Smithies, Russell" <Russell.Smithies@...> > >>>>> To: "'Jonas Schaer'" <Brotelzwieb@...> > >>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" > >>>>> <bioperl-l@...> > >>>>> Sent: Monday, July 06, 2009 10:56 PM > >>>>> Subject: RE: [Bioperl-l] different results with > >> remote-blast skript > >>>>> > >>>>> > >>>>> Hi Jonas, > >>>>> You can't just play with the BLAST parameters and hope > >> for a "better" > >>>>> result. > >>>>> I'd suggest that if you aren't sure what they do, you > should leave > >>>>> them alone as small changes can make huge differences in the > >>>>> output - it's quite possible to miss finding what > you're looking > >>>>> for by using > >> the wrong > >>>>> parameters. > >>>>> If all else fails, read the blast manual: > >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>>>> _all.html > >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>>>> Or Read Ian Korfs' excellent book: > >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>>>> > >>>>> Don't worry about the integer overflow bug as there's > nothing you > >>>>> can do about it. If you're interested, Google and Wikipedia are > >>>>> your > >>>>> friends: > >>>>> http://en.wikipedia.org/wiki/Integer_overflow > >>>>> > >>>>> > >>>>> Russell > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- > >>>>>> bounces@...] On Behalf Of Jonas Schaer > >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>>>> To: BioPerl List; Chris Fields > >>>>>> Subject: Re: [Bioperl-l] different results with > >> remote-blast skript > >>>>>> > >>>>>> Hi guys, thanks for your answers so far. > >>>>>> @jason: integer overflow in blast.... sorry, but what do > >>>>> you mean by that? > >>>>>> how can I fix it...? > >>>>>> > >>>>>> Since I never really changed any parameters I thought them > >>>>> all to be > >>>>>> default. > >>>>>> whatever, I tried to get "better" results with my prog > >> by changing > >>>>>> these: > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>> > >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>>>> STICS'} = > >>>>>> '1'; > >>>>>> with no effect...I guess these were default values anyway. > >>>>>> > >>>>>> So please maybe you can tell me all the other parameters I > >>>>> can change with > >>>>>> my > >>>>>> perl-skript AND how to do that? > >>>>>> Unfortunately both, perl and the blast-algorithm are pretty > >>>>> much new to > >>>>>> me, > >>>>>> maybe thats why I just cannot find out how to do that on my > >>>>> own... :/ > >>>>>> > >>>>>> Here is the output I get with my remote-blast skript: > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ################################### > >>>>>> Query Name: > >>>>>> > >> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>>>> L > >>>>>> hit name is ref|XP_001702807.1| > >>>>>> score is 442 > >>>>>> BLASTP 2.2.21+ > >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>>>> A. Schaffer, > >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>>>> Lipman (1997), > >>>>>> "Gapped > >>>>>> BLAST and PSI-BLAST: a new generation of protein > database search > >>>>>> programs", Nucleic Acids Res. 25:3389-3402. > >>>>>> > >>>>>> > >>>>>> Reference for composition-based statistics: Alejandro A. > >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>>>> John L. Spouge, > >>>>>> Yuri > >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>>>> "Improving the > >>>>>> accuracy of PSI-BLAST protein database searches with > >>>>> composition-based > >>>>>> statistics and other refinements", Nucleic Acids Res. > >> 29:2994-3005. > >>>>>> > >>>>>> > >>>>>> RID: 53STX5G2013 > >>>>>> > >>>>>> > >>>>>> Database: All non-redundant GenBank CDS > >>>>>> translations+PDB+SwissProt+PIR+PRF excluding > >> environmental samples > >>>>>> from WGS projects > >>>>>> 9,252,587 sequences; 3,169,972,781 total > letters Query= > >>>>>> > >>>>> > >> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>>>> > >>>>> > >> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA > >> M > >>>>>> ATGPDPDDEYE > >>>>>> Length=150 > >>>>>> > >>>>>> > >>>>>> > >>>>> Score > >>>>>> E > >>>>>> Sequences producing significant alignments: > >>>>> (Bits) > >>>>>> Value > >>>>>> > >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>> reinhard... 174 > >>>>>> 2e-42 > >>>>>> > >>>>>> > >>>>>> ALIGNMENTS > >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >> reinhardtii] > >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>>>> Length=303 > >>>>>> > >>>>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>>>> Composition-based > >>>>>> stats. > >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>>>> Gaps = 0/150 > >>>>>> (0%) > >>>>>> > >>>>>> Query 1 > >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>>>> 60 > >>>>>> > >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>> Sbjct 154 > >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>> 213 > >>>>>> > >>>>>> Query 61 > >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> 120 > >>>>>> > >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> Sbjct 214 > >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> 273 > >>>>>> > >>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 > >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>>>> > >>>>>> > >>>>>> > >>>>>> Database: All non-redundant GenBank CDS > >>>>>> translations+PDB+SwissProt+PIR+PRF > >>>>>> excluding environmental samples from WGS projects > >>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in > >>>>>> database: -1,124,994,511 Number of sequences in database: > >>>>>> 9,252,587 > >>>>>> > >>>>>> Lambda K H > >>>>>> 0.309 0.122 0.345 > >>>>>> Gapped > >>>>>> Lambda K H > >>>>>> 0.267 0.0410 0.140 > >>>>>> Matrix: BLOSUM62 > >>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of > Sequences: > >>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: > >>>>>> 1448367 Number of successful extensions: 2103 Number > of sequences > >>>>>> better than 10: 0 Number of HSP's better than 10 > without gapping: > >>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully > >>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 > >>>>>> Length adjustment: 113 Effective length of query: 37 Effective > >>>>>> length of database: 2124430450 Effective search space: > >>>>>> 78603926650 Effective search space used: 78603926650 > >>>>>> T: 11 > >>>>>> A: 40 > >>>>>> X1: 16 (7.1 bits) > >>>>>> X2: 38 (14.6 bits) > >>>>>> X3: 64 (24.7 bits) > >>>>>> S1: 42 (20.8 bits) > >>>>>> S2: 74 (33.1 bits) > >>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ################################### > >>>>>> and here are the hits (?) of the blast-algorithm on the > >>>>> ncbi-homepage with > >>>>>> the same query of course: > >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>> reinhard... 300 > >>>>>> 3e-80 > >>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>>>> [Acyrtho... 36.2 > >>>>>> 1.1 > >>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>>>> [Blautia... 35.4 > >>>>>> 1.8 > >>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>>>> brazil... 34.3 > >>>>>> 4.2 > >>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>>>> [Aspergillus n... 33.5 > >>>>>> 6.0 > >>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>>>> [Methyloba... 33.5 > >>>>>> 7.0 > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ###################################at > >>>>>> least the first hit is the same, but even there there is a > >>>>> different score > >>>>>> and e-value. > >>>>>> > >>>>>> thanks so much for any help :) > >>>>>> regards, jonas > >>>>>> > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>> From: "Chris Fields" <cjfields@...> > >>>>>> To: "Jason Stajich" <jason@...> > >>>>>> Cc: "Smithies, Russell" > >>>>> <Russell.Smithies@...>; "'BioPerl > >>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'" > >>>>>> <Jonas_Schaer@...> > >>>>>> Sent: Monday, July 06, 2009 12:51 AM > >>>>>> Subject: Re: [Bioperl-l] different results with > >> remote-blast skript > >>>>>> > >>>>>> > >>>>>>> That inspires confidence ;> > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>>>> > >>>>>>>> integer overflow in blast.... > >>>>>>>> > >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>>>> > >>>>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>>>> Interesting that both have the number of letters in > the db as > >>>>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>>>> > >>>>>>>>> Stats from your remote_blast: > >>>>>>>>> > >>>>>>>>> 'stats' => { > >>>>>>>>> 'S1' => '42', > >>>>>>>>> 'S1_bits' => '20.8', > >>>>>>>>> 'lambda' => '0.309', > >>>>>>>>> 'entropy' => '0.345', > >>>>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>>>> 'T' => '11', > >>>>>>>>> 'kappa' => '0.122', > >>>>>>>>> 'X3_bits' => '24.7', > >>>>>>>>> 'X1' => '16', > >>>>>>>>> 'lambda_gapped' => '0.267', > >>>>>>>>> 'X2' => '38', > >>>>>>>>> 'S2' => '74', > >>>>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>>>> 'dbletters' => '-1125070205', > >>>>>>>>> 'A' => '40', > >>>>>>>>> 'num_successful_extensions' => '2004', > >>>>>>>>> 'num_extensions' => '1436892', > >>>>>>>>> 'X1_bits' => '7.1', > >>>>>>>>> 'X3' => '64', > >>>>>>>>> 'entropy_gapped' => '0.140', > >>>>>>>>> 'dbentries' => '9252258', > >>>>>>>>> 'X2_bits' => '14.6', > >>>>>>>>> 'S2_bits' => '33.1' > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>>>> > >>>>>>>>> Database: All non-redundant GenBank CDS > >>>>> translations+PDB+SwissProt > >>>>>>>>> +PIR+PRF > >>>>>>>>> excluding environmental samples from WGS projects > Posted date: > >>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: > >>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 > >>>>>>>>> > >>>>>>>>> Lambda K H > >>>>>>>>> 0.309 0.124 0.340 > >>>>>>>>> Gapped > >>>>>>>>> Lambda K H > >>>>>>>>> 0.267 0.0410 0.140 > >>>>>>>>> Matrix: BLOSUM62 > >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of > >>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of > >>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 > >>>>>>>>> Number of sequences better than 100: 65 Number of > HSP's better > >>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 > >>>>>>>>> Number of HSP's successfully gapped: 66 Length of > query: 150 > >>>>>>>>> Length of database: 3169897087 Length adjustment: 113 > >>>>>>>>> Effective length of query: 37 Effective length of database: > >>>>>>>>> 2124391933 Effective search space: 78602501521 Effective > >>>>>>>>> search space used: 78602501521 > >>>>>>>>> T: 11 > >>>>>>>>> A: 40 > >>>>>>>>> X1: 16 (7.1 bits) > >>>>>>>>> X2: 38 (14.6 bits) > >>>>>>>>> X3: 64 (24.7 bits) > >>>>>>>>> S1: 42 (20.8 bits) > >>>>>>>>> S2: 65 (29.6 bits) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: bioperl-l-bounces@... > [mailto:bioperl-l- > >>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer > >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>>>> To: BioPerl List > >>>>>>>>>> Subject: [Bioperl-l] different results with > >> remote-blast skript > >>>>>>>>>> > >>>>>>>>>> Hi again :) > >>>>>>>>>> please, I only have this little question: > >>>>>>>>>> why do I get different results with my remote::blast > >>>>> perl skript > >>>>>>>>>> then on the > >>>>>>>>>> ncbi blast homepage? > >>>>>>>>>> I am using blastp, the query is an amino-sequence > (different > >>>>>>>>>> results with any sequence, differences not only in > number of > >>>>>>>>>> hits but > >> even in e- > >>>>>>>>>> values, scores > >>>>>>>>>> etc...), the database is 'nr'. > >>>>>>>>>> PLEASE help me, > >>>>>>>>>> thank you in advance, > >>>>>>>>>> Jonas > >>>>>>>>>> > >>>>>>>>>> ps: my skript: > >>>>>>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>>>>>> ## > >>>>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my > >>>>>>>>>> @blast_report; my $prog = 'blastp'; > >>>>>>>>>> my $db = 'nr'; > >>>>>>>>>> my $e_val= '1e-10'; > >>>>>>>>>> #my $e_val= '10'; > >>>>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>>>> '-data' => $db, > >>>>>>>>>> '-expect' => $e_val, > >>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = > >>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} > = '11 1'; > >>>>>>>>>> > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = > '10'; $ Bio > >>>>>>>>>> > >>>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>>>> = '1'; > >>>>>>>>>> > >>>>>>>>>> my > >>>>>>>>>> $ > >>>>>>>>>> blast_seq > >>>>>>>>>> > >>>>> > >> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>>>> > >>>>>> > >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>>>> AFRQAHQNTAMATGPD > >>>>>>>>>> PDDEYE'; > >>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my > >>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq > >>>>>>>>>> =>$blast_seq, > >> -display_id => > >>>>>>>>>> "$blast_seq"); > >>>>>>>>>> my $filename='temp2.out'; > >>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR > >>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = > >>>>>>>>>> $factory->each_rid ) { > >>>>>>>>>> foreach my $rid ( @rids ) > >>>>>>>>>> { > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>>>> if( !ref($rc) ) > >>>>>>>>>> { > >>>>>>>>>> if( $rc < 0 ) > >>>>>>>>>> { > >>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>> } > >>>>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>>>> } > >>>>>>>>>> else > >>>>>>>>>> { > >>>>>>>>>> my $result = $rc->next_result(); > >>>>>>>>>> $factory->save_output($filename); > >>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>> print "\nQuery Name: ", > >>>>> $result->query_name(), > >>>>>>>>>> "\n"; > >>>>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>>>> { > >>>>>>>>>> next unless ( $v > 0); > >>>>>>>>>> print "\thit name is ", > >> $hit->name, "\n"; > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>>>> { > >>>>>>>>>> print "\t\tscore is ", > >>>>> $hsp->score, "\n"; > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> } > >>>>>>>>>> @blast_report = get_file_data ($filename); return > >>>>>>>>>> @blast_report; > >>>>>>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>>>>>> #### > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Bioperl-l mailing list > >>>>>>>>>> Bioperl-l@... > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> = > >>>>>>>>> = > >>>>>>>>> > >>>>> > >> > ===================================================================== > >>>>>>>>> Attention: The information contained in this message and/or > >>>>>>>>> attachments from AgResearch Limited is intended only for the > >>>>> persons or entities > >>>>>>>>> to which it is addressed and may contain > confidential and/or > >>>>>>>>> privileged material. Any review, retransmission, > dissemination > >> or other use > >>>>>>>>> of, or > >>>>>>>>> taking of any action in reliance upon, this information > >>>>> by persons or > >>>>>>>>> entities other than the intended recipients is > prohibited by > >>>>>>>>> AgResearch Limited. If you have received this message in > >>>>>>>>> error, > >>>>> please notify > >>>>>>>>> the > >>>>>>>>> sender immediately. > >>>>>>>>> = > >>>>>>>>> = > >>>>>>>>> > >>>>> > >> > ===================================================================== > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioperl-l mailing list > >>>>>>>>> Bioperl-l@... > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Jason Stajich > >>>>>>>> jason@... > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l@... > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> -------------------------------------------------------------- > >>>>> ---------------- > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> > >>>>>> No virus found in this incoming message. > >>>>>> Checked by AVG - www.avg.com > >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>>>> Date: 07/05/09 > >>>>>> 05:53:00 > >>>>> > >>>>> > >>>>> -------------------------------------------------------------- > >>>>> ------------------ > >>>>> > >>>>> > >>>>> > >>>>> No virus found in this incoming message. > >>>>> Checked by AVG - www.avg.com > >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>>>> Date: 07/05/09 > >>>>> 17:54:00 > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l@... > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >> > >> > >> -------------------------------------------------------------- > >> ------------------ > >> > >> > >> > >> No virus found in this incoming message. > >> Checked by AVG - www.avg.com > >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > >> Date: 07/09/09 > >> 05:55:00 > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@... > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Malcolm, it's probably not you. Looks like the get/put parameters are
set as globals, so there may be cross-contamination of instances (worth checking JIC). You can probably work around that to an extent by encompassing any calls in blocks to localize changes. chris On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote: > Chris, > > I wound up adding a new test > > # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ > > with the comment : > > # malcolm_cook@...: this test is in a separate file from > # RemoteBlast.t (on which it is modelled) since there is some sort of > # side-effecting between the multiple remote blasts that is causing > # this test to fail, if it comes last, or the other test to fail, if > # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY > # UNDERSTANDING, i.e. of how to initialize it. > > In any case, the test passes and demos rpsblast usage. > > Cheers, > > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields1@...] >> Sent: Friday, July 10, 2009 1:05 PM >> To: Cook, Malcolm >> Cc: 'Jonas Schaer'; 'BioPerl List' >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Malcolm, >> >> Nice! Go ahead and add the test in; we can look at trying to >> get CDD_SEARCH working at some point but this is a nice workaround. >> >> chris >> >> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: >> >>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates >>> the following. Is it appropriate to submit it? >>> >>> Jonas, OK, I was a little quick on the gun... but I've got it now. >>> >>> You don't need to change the wrapper. Here is what you need to do: >>> >>> # 1) set your database like this: >>> >>> -database => 'cdsearch/cdd', # c.f. >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html >>> for other cdd database options >>> >>> # 2) add this line before submitting the job: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; >>> >>> You're in - No other changes needed. >>> >>> Malcolm Cook >>> Stowers Institute for Medical Research - Kansas City, Missouri >>> >>> >>>> -----Original Message----- >>>> From: Jonas Schaer [mailto:Brotelzwieb@...] >>>> Sent: Friday, July 10, 2009 4:18 AM >>>> To: BioPerl List; Cook, Malcolm; Chris Fields >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> Hi, >>>> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >>>> my $db = >>>> 'CDD';) but that didn't work. >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match >>>> expression t?blast[ pnx]. Rejecting. >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:/Perl/site/lib/Bio/Tools >>>> /Run/RemoteBlast.pm:329 >>>> STACK: Bio::Tools::Run::RemoteBlast::new >>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >>>> ast.pm:257 >>>> STACK: blast_a_seq2.pm:14 >>>> ----------------------------------------------------------- >>>> So I should try to "change the wrapper to allow >> 'rpsblast'", right? >>>> Could You tell me how to do that, please? So sorry but I >> have no idea >>>> yet...:) If that doesn't work, is there any other way to run >>>> cdd-searches with perl? >>>> Thank you so much! >>>> Regards, Jonas >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" <cjfields1@...> >>>> To: "Cook, Malcolm" <MEC@...> >>>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" >>>> <bioperl-l@...>; "'Smithies, Russell'" >>>> <Russell.Smithies@...>; <cjfields@...> >>>> Sent: Thursday, July 09, 2009 9:19 PM >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> >>>>> I've scheduled this tentatively for the 1.6 release >> series (just not >>>>> sure when yet). It may work as is, but I haven't tried >> it out yet >>>>> (and am hazarding to guess it only retrieves the single >> main RID at >>>>> the moment). >>>>> >>>>> chris >>>>> >>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>>>> >>>>>> Jonas, >>>>>> >>>>>> If you want to continue to use the bioperl remoteblast >> interface, >>>>>> probably what you should do is simply call it twice. >>>>>> >>>>>> Once, as you already know how to do, which will return >> without CDD >>>>>> results. >>>>>> >>>>>> Secondly, to get the CDD results, call remoteblast a second time. >>>>>> This time, using >>>>>> -database => 'CDD' >>>>>> -program => 'rpsblast' >>>>>> >>>>>> However, the wrapper may object to the 'rpsblast' >> program. It is >>>>>> not listed in the POD - >>>>>> >>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >>>> emoteBlast.pm) >>>>>> If so, my guess is that changing the perl wrapper to allow >>>>>> rpsblast will "just work" (tm). I've cc:ed >>>> cjfields@... for >>>>>> his opinion on this. >>>>>> >>>>>> Also, you might want to perform the CDD search first, >> especially if >>>>>> you are streaming results to eyeball that might like >> something to >>>>>> look at while the second (presumably longer) search is running. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Malcolm Cook >>>>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces@... >>>>>>> [mailto:bioperl-l-bounces@...] On Behalf >> Of Jonas >>>>>>> Schaer >>>>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>>>> To: BioPerl List; Smithies, Russell >>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>>>> >>>>>>> Hi guys, >>>>>>> Thank you all so much for your help and patience :). Of >> course you >>>>>>> were right and I finaly found the right put-parameter to get >>>>>>> exactly the same hits as on the homepage. >>>>>>> I do have an other question though :)... >>>>>>> I now want to include a search for conserved domains, >> but when I >>>>>>> try to use the CDD_SEARCH-parameter >>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>>>> sub:CDD_SEARCH) >>>>>>> like the other put-parameters the way chris once told me(works >>>>>>> fine with the other params): >>>>>>> >>>>>>> my %put = ( >>>>>>> WORD_SIZE => 3, >>>>>>> HITLIST_SIZE => 100, >>>>>>> THRESHOLD => 11, >>>>>>> FILTER => 'R', >>>>>>> GENETIC_CODE => 1, >>>>>>> CDD_SEARCH => 'on' >>>>>>> ###I tried it >>>>>>> with 'true' and '1', too. >>>>>>> >>>>>>> ); >>>>>>> >>>>>>> for my $putName (keys %put) { >>>>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> ...an exception is thrown: >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw >>>> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>>>> C:/Perl/site/lib/Bio/Tools >>>>>>> /Run/RemoteBlast.pm:325 >>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>>>> STACK: firsteval0.8.pm:35 >>>>>>> ----------------------------------------------------------- . >>>>>>> I guess somehow this could be the solution to my problem: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>>>> ub:RID-for-Simultaneous >>>>>>> , but unfortunately I don't understand what to do. >>>>>>> I'm so sorry to bother you with this but please help me once >>>>>>> more...:) >>>>>>> >>>>>>> Best regards and thanks in advance, Jonas >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Smithies, Russell" <Russell.Smithies@...> >>>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...> >>>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" >>>>>>> <bioperl-l@...> >>>>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>>>> Subject: RE: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>> >>>>>>> >>>>>>> Hi Jonas, >>>>>>> You can't just play with the BLAST parameters and hope >>>> for a "better" >>>>>>> result. >>>>>>> I'd suggest that if you aren't sure what they do, you >> should leave >>>>>>> them alone as small changes can make huge differences in the >>>>>>> output - it's quite possible to miss finding what >> you're looking >>>>>>> for by using >>>> the wrong >>>>>>> parameters. >>>>>>> If all else fails, read the blast manual: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>>>> _all.html >>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>>>> Or Read Ian Korfs' excellent book: >>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>>>> >>>>>>> Don't worry about the integer overflow bug as there's >> nothing you >>>>>>> can do about it. If you're interested, Google and Wikipedia are >>>>>>> your >>>>>>> friends: >>>>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>>>> >>>>>>> >>>>>>> Russell >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>>>> To: BioPerl List; Chris Fields >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> Hi guys, thanks for your answers so far. >>>>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>>>> you mean by that? >>>>>>>> how can I fix it...? >>>>>>>> >>>>>>>> Since I never really changed any parameters I thought them >>>>>>> all to be >>>>>>>> default. >>>>>>>> whatever, I tried to get "better" results with my prog >>>> by changing >>>>>>>> these: >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>>>> STICS'} = >>>>>>>> '1'; >>>>>>>> with no effect...I guess these were default values anyway. >>>>>>>> >>>>>>>> So please maybe you can tell me all the other parameters I >>>>>>> can change with >>>>>>>> my >>>>>>>> perl-skript AND how to do that? >>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>>>> much new to >>>>>>>> me, >>>>>>>> maybe thats why I just cannot find out how to do that on my >>>>>>> own... :/ >>>>>>>> >>>>>>>> Here is the output I get with my remote-blast skript: >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> Query Name: >>>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>>>> L >>>>>>>> hit name is ref|XP_001702807.1| >>>>>>>> score is 442 >>>>>>>> BLASTP 2.2.21+ >>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>>>> A. Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>>> Lipman (1997), >>>>>>>> "Gapped >>>>>>>> BLAST and PSI-BLAST: a new generation of protein >> database search >>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> >>>>>>>> Reference for composition-based statistics: Alejandro A. >>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>>>> John L. Spouge, >>>>>>>> Yuri >>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>>>> "Improving the >>>>>>>> accuracy of PSI-BLAST protein database searches with >>>>>>> composition-based >>>>>>>> statistics and other refinements", Nucleic Acids Res. >>>> 29:2994-3005. >>>>>>>> >>>>>>>> >>>>>>>> RID: 53STX5G2013 >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding >>>> environmental samples >>>>>>>> from WGS projects >>>>>>>> 9,252,587 sequences; 3,169,972,781 total >> letters Query= >>>>>>>> >>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>>>> >>>>>>> >>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA >>>> M >>>>>>>> ATGPDPDDEYE >>>>>>>> Length=150 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Score >>>>>>>> E >>>>>>>> Sequences producing significant alignments: >>>>>>> (Bits) >>>>>>>> Value >>>>>>>> >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 174 >>>>>>>> 2e-42 >>>>>>>> >>>>>>>> >>>>>>>> ALIGNMENTS >>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>> reinhardtii] >>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>>>> Length=303 >>>>>>>> >>>>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>>>> Composition-based >>>>>>>> stats. >>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>>>> Gaps = 0/150 >>>>>>>> (0%) >>>>>>>> >>>>>>>> Query 1 >>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>>>> 60 >>>>>>>> >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> Sbjct 154 >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> 213 >>>>>>>> >>>>>>>> Query 61 >>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 120 >>>>>>>> >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> Sbjct 214 >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 273 >>>>>>>> >>>>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>>>> excluding environmental samples from WGS projects >>>>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in >>>>>>>> database: -1,124,994,511 Number of sequences in database: >>>>>>>> 9,252,587 >>>>>>>> >>>>>>>> Lambda K H >>>>>>>> 0.309 0.122 0.345 >>>>>>>> Gapped >>>>>>>> Lambda K H >>>>>>>> 0.267 0.0410 0.140 >>>>>>>> Matrix: BLOSUM62 >>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >> Sequences: >>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: >>>>>>>> 1448367 Number of successful extensions: 2103 Number >> of sequences >>>>>>>> better than 10: 0 Number of HSP's better than 10 >> without gapping: >>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully >>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 >>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective >>>>>>>> length of database: 2124430450 Effective search space: >>>>>>>> 78603926650 Effective search space used: 78603926650 >>>>>>>> T: 11 >>>>>>>> A: 40 >>>>>>>> X1: 16 (7.1 bits) >>>>>>>> X2: 38 (14.6 bits) >>>>>>>> X3: 64 (24.7 bits) >>>>>>>> S1: 42 (20.8 bits) >>>>>>>> S2: 74 (33.1 bits) >>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> and here are the hits (?) of the blast-algorithm on the >>>>>>> ncbi-homepage with >>>>>>>> the same query of course: >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 300 >>>>>>>> 3e-80 >>>>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>>>> [Acyrtho... 36.2 >>>>>>>> 1.1 >>>>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>>>> [Blautia... 35.4 >>>>>>>> 1.8 >>>>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>>>> brazil... 34.3 >>>>>>>> 4.2 >>>>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>>>> [Aspergillus n... 33.5 >>>>>>>> 6.0 >>>>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>>>> [Methyloba... 33.5 >>>>>>>> 7.0 >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ###################################at >>>>>>>> least the first hit is the same, but even there there is a >>>>>>> different score >>>>>>>> and e-value. >>>>>>>> >>>>>>>> thanks so much for any help :) >>>>>>>> regards, jonas >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Chris Fields" <cjfields@...> >>>>>>>> To: "Jason Stajich" <jason@...> >>>>>>>> Cc: "Smithies, Russell" >>>>>>> <Russell.Smithies@...>; "'BioPerl >>>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'" >>>>>>>> <Jonas_Schaer@...> >>>>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> >>>>>>>>> That inspires confidence ;> >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>>>> >>>>>>>>>> integer overflow in blast.... >>>>>>>>>> >>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>>>> >>>>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>>>> Interesting that both have the number of letters in >> the db as >>>>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>>>> >>>>>>>>>>> Stats from your remote_blast: >>>>>>>>>>> >>>>>>>>>>> 'stats' => { >>>>>>>>>>> 'S1' => '42', >>>>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>>>> 'lambda' => '0.309', >>>>>>>>>>> 'entropy' => '0.345', >>>>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>>>> 'T' => '11', >>>>>>>>>>> 'kappa' => '0.122', >>>>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>>>> 'X1' => '16', >>>>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>>>> 'X2' => '38', >>>>>>>>>>> 'S2' => '74', >>>>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>>>> 'A' => '40', >>>>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>>>> 'X3' => '64', >>>>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>>>> 'dbentries' => '9252258', >>>>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>>>> >>>>>>>>>>> Database: All non-redundant GenBank CDS >>>>>>> translations+PDB+SwissProt >>>>>>>>>>> +PIR+PRF >>>>>>>>>>> excluding environmental samples from WGS projects >> Posted date: >>>>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: >>>>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 >>>>>>>>>>> >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.309 0.124 0.340 >>>>>>>>>>> Gapped >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>>>> Matrix: BLOSUM62 >>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of >>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 >>>>>>>>>>> Number of sequences better than 100: 65 Number of >> HSP's better >>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 >>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of >> query: 150 >>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113 >>>>>>>>>>> Effective length of query: 37 Effective length of database: >>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective >>>>>>>>>>> search space used: 78602501521 >>>>>>>>>>> T: 11 >>>>>>>>>>> A: 40 >>>>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: bioperl-l-bounces@... >> [mailto:bioperl-l- >>>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>>>> To: BioPerl List >>>>>>>>>>>> Subject: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>>>>>> >>>>>>>>>>>> Hi again :) >>>>>>>>>>>> please, I only have this little question: >>>>>>>>>>>> why do I get different results with my remote::blast >>>>>>> perl skript >>>>>>>>>>>> then on the >>>>>>>>>>>> ncbi blast homepage? >>>>>>>>>>>> I am using blastp, the query is an amino-sequence >> (different >>>>>>>>>>>> results with any sequence, differences not only in >> number of >>>>>>>>>>>> hits but >>>> even in e- >>>>>>>>>>>> values, scores >>>>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>>>> PLEASE help me, >>>>>>>>>>>> thank you in advance, >>>>>>>>>>>> Jonas >>>>>>>>>>>> >>>>>>>>>>>> ps: my skript: >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> ## >>>>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my >>>>>>>>>>>> @blast_report; my $prog = 'blastp'; >>>>>>>>>>>> my $db = 'nr'; >>>>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>>>> #my $e_val= '10'; >>>>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>>>> '-data' => $db, >>>>>>>>>>>> '-expect' => $e_val, >>>>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = >>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} >> = '11 1'; >>>>>>>>>>>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = >> '10'; $ Bio >>>>>>>>>>>> >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>>>> = '1'; >>>>>>>>>>>> >>>>>>>>>>>> my >>>>>>>>>>>> $ >>>>>>>>>>>> blast_seq >>>>>>>>>>>> >>>>>>> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>>>> >>>>>>>> >>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>>>> AFRQAHQNTAMATGPD >>>>>>>>>>>> PDDEYE'; >>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my >>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq >>>>>>>>>>>> =>$blast_seq, >>>> -display_id => >>>>>>>>>>>> "$blast_seq"); >>>>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR >>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = >>>>>>>>>>>> $factory->each_rid ) { >>>>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>>>> { >>>>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>>>> if( !ref($rc) ) >>>>>>>>>>>> { >>>>>>>>>>>> if( $rc < 0 ) >>>>>>>>>>>> { >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> } >>>>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>>>> } >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>>>> $factory->save_output($filename); >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> print "\nQuery Name: ", >>>>>>> $result->query_name(), >>>>>>>>>>>> "\n"; >>>>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>>>> { >>>>>>>>>>>> next unless ( $v > 0); >>>>>>>>>>>> print "\thit name is ", >>>> $hit->name, "\n"; >>>>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>>>> { >>>>>>>>>>>> print "\t\tscore is ", >>>>>>> $hsp->score, "\n"; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> @blast_report = get_file_data ($filename); return >>>>>>>>>>>> @blast_report; >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> #### >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>>> Bioperl-l@... >>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>>>> attachments from AgResearch Limited is intended only for the >>>>>>> persons or entities >>>>>>>>>>> to which it is addressed and may contain >> confidential and/or >>>>>>>>>>> privileged material. Any review, retransmission, >> dissemination >>>> or other use >>>>>>>>>>> of, or >>>>>>>>>>> taking of any action in reliance upon, this information >>>>>>> by persons or >>>>>>>>>>> entities other than the intended recipients is >> prohibited by >>>>>>>>>>> AgResearch Limited. If you have received this message in >>>>>>>>>>> error, >>>>>>> please notify >>>>>>>>>>> the >>>>>>>>>>> sender immediately. >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l@... >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason Stajich >>>>>>>>>> jason@... >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l@... >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ---------------- >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> No virus found in this incoming message. >>>>>>>> Checked by AVG - www.avg.com >>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>>>> Date: 07/05/09 >>>>>>>> 05:53:00 >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> No virus found in this incoming message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>>>> Date: 07/05/09 >>>>>>> 17:54:00 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@... >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>> >>>> >>>> -------------------------------------------------------------- >>>> ------------------ >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >>>> Date: 07/09/09 >>>> 05:55:00 >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: cdd-search with remoteblast?Chris,
I figured it was something like that. Isolating the test in a file worked for me to decouple whatever cross-contamination was going on. So, I'll just leave it at that for now unless any objections. I suppose the correct follow on would be to file a bug.... I'm poised to go on vacation.... perhaps when I get back.... I don't use RemoteBlast myself at all in fact.... just wrote the rps thing to make sure I knew how the ncbi service worked... Cheers, Malcolm ________________________________________ From: Chris Fields [cjfields1@...] Sent: Wednesday, July 22, 2009 6:32 PM To: Cook, Malcolm Cc: 'BioPerl List'; 'Jonas Schaer' Subject: Re: [Bioperl-l] cdd-search with remoteblast? Malcolm, it's probably not you. Looks like the get/put parameters are set as globals, so there may be cross-contamination of instances (worth checking JIC). You can probably work around that to an extent by encompassing any calls in blocks to localize changes. chris On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote: > Chris, > > I wound up adding a new test > > # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ > > with the comment : > > # malcolm_cook@...: this test is in a separate file from > # RemoteBlast.t (on which it is modelled) since there is some sort of > # side-effecting between the multiple remote blasts that is causing > # this test to fail, if it comes last, or the other test to fail, if > # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY > # UNDERSTANDING, i.e. of how to initialize it. > > In any case, the test passes and demos rpsblast usage. > > Cheers, > > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields1@...] >> Sent: Friday, July 10, 2009 1:05 PM >> To: Cook, Malcolm >> Cc: 'Jonas Schaer'; 'BioPerl List' >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Malcolm, >> >> Nice! Go ahead and add the test in; we can look at trying to >> get CDD_SEARCH working at some point but this is a nice workaround. >> >> chris >> >> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: >> >>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates >>> the following. Is it appropriate to submit it? >>> >>> Jonas, OK, I was a little quick on the gun... but I've got it now. >>> >>> You don't need to change the wrapper. Here is what you need to do: >>> >>> # 1) set your database like this: >>> >>> -database => 'cdsearch/cdd', # c.f. >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html >>> for other cdd database options >>> >>> # 2) add this line before submitting the job: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; >>> >>> You're in - No other changes needed. >>> >>> Malcolm Cook >>> Stowers Institute for Medical Research - Kansas City, Missouri >>> >>> >>>> -----Original Message----- >>>> From: Jonas Schaer [mailto:Brotelzwieb@...] >>>> Sent: Friday, July 10, 2009 4:18 AM >>>> To: BioPerl List; Cook, Malcolm; Chris Fields >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> Hi, >>>> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >>>> my $db = >>>> 'CDD';) but that didn't work. >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match >>>> expression t?blast[ pnx]. Rejecting. >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:/Perl/site/lib/Bio/Tools >>>> /Run/RemoteBlast.pm:329 >>>> STACK: Bio::Tools::Run::RemoteBlast::new >>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >>>> ast.pm:257 >>>> STACK: blast_a_seq2.pm:14 >>>> ----------------------------------------------------------- >>>> So I should try to "change the wrapper to allow >> 'rpsblast'", right? >>>> Could You tell me how to do that, please? So sorry but I >> have no idea >>>> yet...:) If that doesn't work, is there any other way to run >>>> cdd-searches with perl? >>>> Thank you so much! >>>> Regards, Jonas >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" <cjfields1@...> >>>> To: "Cook, Malcolm" <MEC@...> >>>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'" >>>> <bioperl-l@...>; "'Smithies, Russell'" >>>> <Russell.Smithies@...>; <cjfields@...> >>>> Sent: Thursday, July 09, 2009 9:19 PM >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> >>>>> I've scheduled this tentatively for the 1.6 release >> series (just not >>>>> sure when yet). It may work as is, but I haven't tried >> it out yet >>>>> (and am hazarding to guess it only retrieves the single >> main RID at >>>>> the moment). >>>>> >>>>> chris >>>>> >>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>>>> >>>>>> Jonas, >>>>>> >>>>>> If you want to continue to use the bioperl remoteblast >> interface, >>>>>> probably what you should do is simply call it twice. >>>>>> >>>>>> Once, as you already know how to do, which will return >> without CDD >>>>>> results. >>>>>> >>>>>> Secondly, to get the CDD results, call remoteblast a second time. >>>>>> This time, using >>>>>> -database => 'CDD' >>>>>> -program => 'rpsblast' >>>>>> >>>>>> However, the wrapper may object to the 'rpsblast' >> program. It is >>>>>> not listed in the POD - >>>>>> >>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >>>> emoteBlast.pm) >>>>>> If so, my guess is that changing the perl wrapper to allow >>>>>> rpsblast will "just work" (tm). I've cc:ed >>>> cjfields@... for >>>>>> his opinion on this. >>>>>> >>>>>> Also, you might want to perform the CDD search first, >> especially if >>>>>> you are streaming results to eyeball that might like >> something to >>>>>> look at while the second (presumably longer) search is running. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Malcolm Cook >>>>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces@... >>>>>>> [mailto:bioperl-l-bounces@...] On Behalf >> Of Jonas >>>>>>> Schaer >>>>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>>>> To: BioPerl List; Smithies, Russell >>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>>>> >>>>>>> Hi guys, >>>>>>> Thank you all so much for your help and patience :). Of >> course you >>>>>>> were right and I finaly found the right put-parameter to get >>>>>>> exactly the same hits as on the homepage. >>>>>>> I do have an other question though :)... >>>>>>> I now want to include a search for conserved domains, >> but when I >>>>>>> try to use the CDD_SEARCH-parameter >>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>>>> sub:CDD_SEARCH) >>>>>>> like the other put-parameters the way chris once told me(works >>>>>>> fine with the other params): >>>>>>> >>>>>>> my %put = ( >>>>>>> WORD_SIZE => 3, >>>>>>> HITLIST_SIZE => 100, >>>>>>> THRESHOLD => 11, >>>>>>> FILTER => 'R', >>>>>>> GENETIC_CODE => 1, >>>>>>> CDD_SEARCH => 'on' >>>>>>> ###I tried it >>>>>>> with 'true' and '1', too. >>>>>>> >>>>>>> ); >>>>>>> >>>>>>> for my $putName (keys %put) { >>>>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> ...an exception is thrown: >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw >>>> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>>>> C:/Perl/site/lib/Bio/Tools >>>>>>> /Run/RemoteBlast.pm:325 >>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>>>> STACK: firsteval0.8.pm:35 >>>>>>> ----------------------------------------------------------- . >>>>>>> I guess somehow this could be the solution to my problem: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>>>> ub:RID-for-Simultaneous >>>>>>> , but unfortunately I don't understand what to do. >>>>>>> I'm so sorry to bother you with this but please help me once >>>>>>> more...:) >>>>>>> >>>>>>> Best regards and thanks in advance, Jonas >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Smithies, Russell" <Russell.Smithies@...> >>>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...> >>>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" >>>>>>> <bioperl-l@...> >>>>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>>>> Subject: RE: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>> >>>>>>> >>>>>>> Hi Jonas, >>>>>>> You can't just play with the BLAST parameters and hope >>>> for a "better" >>>>>>> result. >>>>>>> I'd suggest that if you aren't sure what they do, you >> should leave >>>>>>> them alone as small changes can make huge differences in the >>>>>>> output - it's quite possible to miss finding what >> you're looking >>>>>>> for by using >>>> the wrong >>>>>>> parameters. >>>>>>> If all else fails, read the blast manual: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>>>> _all.html >>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>>>> Or Read Ian Korfs' excellent book: >>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>>>> >>>>>>> Don't worry about the integer overflow bug as there's >> nothing you >>>>>>> can do about it. If you're interested, Google and Wikipedia are >>>>>>> your >>>>>>> friends: >>>>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>>>> >>>>>>> >>>>>>> Russell >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l- >>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>>>> To: BioPerl List; Chris Fields >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> Hi guys, thanks for your answers so far. >>>>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>>>> you mean by that? >>>>>>>> how can I fix it...? >>>>>>>> >>>>>>>> Since I never really changed any parameters I thought them >>>>>>> all to be >>>>>>>> default. >>>>>>>> whatever, I tried to get "better" results with my prog >>>> by changing >>>>>>>> these: >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>>>> STICS'} = >>>>>>>> '1'; >>>>>>>> with no effect...I guess these were default values anyway. >>>>>>>> >>>>>>>> So please maybe you can tell me all the other parameters I >>>>>>> can change with >>>>>>>> my >>>>>>>> perl-skript AND how to do that? >>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>>>> much new to >>>>>>>> me, >>>>>>>> maybe thats why I just cannot find out how to do that on my >>>>>>> own... :/ >>>>>>>> >>>>>>>> Here is the output I get with my remote-blast skript: >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> Query Name: >>>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>>>> L >>>>>>>> hit name is ref|XP_001702807.1| >>>>>>>> score is 442 >>>>>>>> BLASTP 2.2.21+ >>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>>>> A. Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>>> Lipman (1997), >>>>>>>> "Gapped >>>>>>>> BLAST and PSI-BLAST: a new generation of protein >> database search >>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> >>>>>>>> Reference for composition-based statistics: Alejandro A. >>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>>>> John L. Spouge, >>>>>>>> Yuri >>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>>>> "Improving the >>>>>>>> accuracy of PSI-BLAST protein database searches with >>>>>>> composition-based >>>>>>>> statistics and other refinements", Nucleic Acids Res. >>>> 29:2994-3005. >>>>>>>> >>>>>>>> >>>>>>>> RID: 53STX5G2013 >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding >>>> environmental samples >>>>>>>> from WGS projects >>>>>>>> 9,252,587 sequences; 3,169,972,781 total >> letters Query= >>>>>>>> >>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>>>> >>>>>>> >>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA >>>> M >>>>>>>> ATGPDPDDEYE >>>>>>>> Length=150 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Score >>>>>>>> E >>>>>>>> Sequences producing significant alignments: >>>>>>> (Bits) >>>>>>>> Value >>>>>>>> >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 174 >>>>>>>> 2e-42 >>>>>>>> >>>>>>>> >>>>>>>> ALIGNMENTS >>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>> reinhardtii] >>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>>>> Length=303 >>>>>>>> >>>>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>>>> Composition-based >>>>>>>> stats. >>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>>>> Gaps = 0/150 >>>>>>>> (0%) >>>>>>>> >>>>>>>> Query 1 >>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>>>> 60 >>>>>>>> >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> Sbjct 154 >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> 213 >>>>>>>> >>>>>>>> Query 61 >>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 120 >>>>>>>> >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> Sbjct 214 >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 273 >>>>>>>> >>>>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>>>> excluding environmental samples from WGS projects >>>>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in >>>>>>>> database: -1,124,994,511 Number of sequences in database: >>>>>>>> 9,252,587 >>>>>>>> >>>>>>>> Lambda K H >>>>>>>> 0.309 0.122 0.345 >>>>>>>> Gapped >>>>>>>> Lambda K H >>>>>>>> 0.267 0.0410 0.140 >>>>>>>> Matrix: BLOSUM62 >>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >> Sequences: >>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: >>>>>>>> 1448367 Number of successful extensions: 2103 Number >> of sequences >>>>>>>> better than 10: 0 Number of HSP's better than 10 >> without gapping: >>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully >>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 >>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective >>>>>>>> length of database: 2124430450 Effective search space: >>>>>>>> 78603926650 Effective search space used: 78603926650 >>>>>>>> T: 11 >>>>>>>> A: 40 >>>>>>>> X1: 16 (7.1 bits) >>>>>>>> X2: 38 (14.6 bits) >>>>>>>> X3: 64 (24.7 bits) >>>>>>>> S1: 42 (20.8 bits) >>>>>>>> S2: 74 (33.1 bits) >>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> and here are the hits (?) of the blast-algorithm on the >>>>>>> ncbi-homepage with >>>>>>>> the same query of course: >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 300 >>>>>>>> 3e-80 >>>>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>>>> [Acyrtho... 36.2 >>>>>>>> 1.1 >>>>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>>>> [Blautia... 35.4 >>>>>>>> 1.8 >>>>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>>>> brazil... 34.3 >>>>>>>> 4.2 >>>>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>>>> [Aspergillus n... 33.5 >>>>>>>> 6.0 >>>>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>>>> [Methyloba... 33.5 >>>>>>>> 7.0 >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ###################################at >>>>>>>> least the first hit is the same, but even there there is a >>>>>>> different score >>>>>>>> and e-value. >>>>>>>> >>>>>>>> thanks so much for any help :) >>>>>>>> regards, jonas >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Chris Fields" <cjfields@...> >>>>>>>> To: "Jason Stajich" <jason@...> >>>>>>>> Cc: "Smithies, Russell" >>>>>>> <Russell.Smithies@...>; "'BioPerl >>>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'" >>>>>>>> <Jonas_Schaer@...> >>>>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> >>>>>>>>> That inspires confidence ;> >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>>>> >>>>>>>>>> integer overflow in blast.... >>>>>>>>>> >>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>>>> >>>>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>>>> Interesting that both have the number of letters in >> the db as >>>>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>>>> >>>>>>>>>>> Stats from your remote_blast: >>>>>>>>>>> >>>>>>>>>>> 'stats' => { >>>>>>>>>>> 'S1' => '42', >>>>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>>>> 'lambda' => '0.309', >>>>>>>>>>> 'entropy' => '0.345', >>>>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>>>> 'T' => '11', >>>>>>>>>>> 'kappa' => '0.122', >>>>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>>>> 'X1' => '16', >>>>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>>>> 'X2' => '38', >>>>>>>>>>> 'S2' => '74', >>>>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>>>> 'A' => '40', >>>>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>>>> 'X3' => '64', >>>>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>>>> 'dbentries' => '9252258', >>>>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>>>> >>>>>>>>>>> Database: All non-redundant GenBank CDS >>>>>>> translations+PDB+SwissProt >>>>>>>>>>> +PIR+PRF >>>>>>>>>>> excluding environmental samples from WGS projects >> Posted date: >>>>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: >>>>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 >>>>>>>>>>> >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.309 0.124 0.340 >>>>>>>>>>> Gapped >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>>>> Matrix: BLOSUM62 >>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of >>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 >>>>>>>>>>> Number of sequences better than 100: 65 Number of >> HSP's better >>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 >>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of >> query: 150 >>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113 >>>>>>>>>>> Effective length of query: 37 Effective length of database: >>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective >>>>>>>>>>> search space used: 78602501521 >>>>>>>>>>> T: 11 >>>>>>>>>>> A: 40 >>>>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: bioperl-l-bounces@... >> [mailto:bioperl-l- >>>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer >>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>>>> To: BioPerl List >>>>>>>>>>>> Subject: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>>>>>> >>>>>>>>>>>> Hi again :) >>>>>>>>>>>> please, I only have this little question: >>>>>>>>>>>> why do I get different results with my remote::blast >>>>>>> perl skript >>>>>>>>>>>> then on the >>>>>>>>>>>> ncbi blast homepage? >>>>>>>>>>>> I am using blastp, the query is an amino-sequence >> (different >>>>>>>>>>>> results with any sequence, differences not only in >> number of >>>>>>>>>>>> hits but >>>> even in e- >>>>>>>>>>>> values, scores >>>>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>>>> PLEASE help me, >>>>>>>>>>>> thank you in advance, >>>>>>>>>>>> Jonas >>>>>>>>>>>> >>>>>>>>>>>> ps: my skript: >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> ## >>>>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my >>>>>>>>>>>> @blast_report; my $prog = 'blastp'; >>>>>>>>>>>> my $db = 'nr'; >>>>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>>>> #my $e_val= '10'; >>>>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>>>> '-data' => $db, >>>>>>>>>>>> '-expect' => $e_val, >>>>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = >>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} >> = '11 1'; >>>>>>>>>>>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = >> '10'; $ Bio >>>>>>>>>>>> >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>>>> = '1'; >>>>>>>>>>>> >>>>>>>>>>>> my >>>>>>>>>>>> $ >>>>>>>>>>>> blast_seq >>>>>>>>>>>> >>>>>>> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>>>> >>>>>>>> >>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>>>> AFRQAHQNTAMATGPD >>>>>>>>>>>> PDDEYE'; >>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my >>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq >>>>>>>>>>>> =>$blast_seq, >>>> -display_id => >>>>>>>>>>>> "$blast_seq"); >>>>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR >>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = >>>>>>>>>>>> $factory->each_rid ) { >>>>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>>>> { >>>>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>>>> if( !ref($rc) ) >>>>>>>>>>>> { >>>>>>>>>>>> if( $rc < 0 ) >>>>>>>>>>>> { >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> } >>>>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>>>> } >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>>>> $factory->save_output($filename); >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> print "\nQuery Name: ", >>>>>>> $result->query_name(), >>>>>>>>>>>> "\n"; >>>>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>>>> { >>>>>>>>>>>> next unless ( $v > 0); >>>>>>>>>>>> print "\thit name is ", >>>> $hit->name, "\n"; >>>>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>>>> { >>>>>>>>>>>> print "\t\tscore is ", >>>>>>> $hsp->score, "\n"; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> @blast_report = get_file_data ($filename); return >>>>>>>>>>>> @blast_report; >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> #### >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>>> Bioperl-l@... >>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>>>> attachments from AgResearch Limited is intended only for the >>>>>>> persons or entities >>>>>>>>>>> to which it is addressed and may contain >> confidential and/or >>>>>>>>>>> privileged material. Any review, retransmission, >> dissemination >>>> or other use >>>>>>>>>>> of, or >>>>>>>>>>> taking of any action in reliance upon, this information >>>>>>> by persons or >>>>>>>>>>> entities other than the intended recipients is >> prohibited by >>>>>>>>>>> AgResearch Limited. If you have received this message in >>>>>>>>>>> error, >>>>>>> please notify >>>>>>>>>>> the >>>>>>>>>>> sender immediately. >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l@... >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason Stajich >>>>>>>>>> jason@... >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l@... >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ---------------- >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> No virus found in this incoming message. >>>>>>>> Checked by AVG - www.avg.com >>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>>>> Date: 07/05/09 >>>>>>>> 05:53:00 >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> No virus found in this incoming message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>>>> Date: 07/05/09 >>>>>>> 17:54:00 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@... >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>> >>>> >>>> -------------------------------------------------------------- >>>> ------------------ >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >>>> Date: 07/09/09 >>>> 05:55:00 >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |