I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters.
Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends:
> -----Original Message-----
> From:
bioperl-l-bounces@... [mailto:bioperl-l-
>
bounces@...] On Behalf Of Jonas Schaer
> Sent: Tuesday, 7 July 2009 12:14 a.m.
> To: BioPerl List; Chris Fields
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
> Hi guys, thanks for your answers so far.
> @jason: integer overflow in blast.... sorry, but what do you mean by that?
> how can I fix it...?
>
> Since I never really changed any parameters I thought them all to be default.
> whatever, I tried to get "better" results with my prog by changing
> these:
> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
> with no effect...I guess these were default values anyway.
>
> So please maybe you can tell me all the other parameters I can change with my
> perl-skript AND how to do that?
> Unfortunately both, perl and the blast-algorithm are pretty much new to me,
> maybe thats why I just cannot find out how to do that on my own... :/
>
> Here is the output I get with my remote-blast skript:
> ##############################################################################
> ###################################
> Query Name:
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> L
> hit name is ref|XP_001702807.1|
> score is 442
> BLASTP 2.2.21+
> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped
> BLAST and PSI-BLAST: a new generation of protein database search programs",
> Nucleic Acids Res. 25:3389-3402.
>
>
> Reference for composition-based statistics: Alejandro A.
> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the
> accuracy of PSI-BLAST protein database searches with composition-based
> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>
>
> RID: 53STX5G2013
>
>
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
> 9,252,587 sequences; 3,169,972,781 total letters Query=
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> ATGPDPDDEYE
> Length=150
>
>
> Score
> E
> Sequences producing significant alignments: (Bits)
> Value
>
> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174
> 2e-42
>
>
> ALIGNMENTS
> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> Length=303
>
> Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats.
> Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%)
>
> Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> 213
>
> Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> 120
> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> 273
>
> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150
> AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303
>
>
>
> Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
> excluding environmental samples from WGS projects
> Posted date: Jul 5, 2009 4:41 AM
> Number of letters in database: -1,124,994,511
> Number of sequences in database: 9,252,587
>
> Lambda K H
> 0.309 0.122 0.345
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252587
> Number of Hits to DB: 60273703
> Number of extensions: 1448367
> Number of successful extensions: 2103
> Number of sequences better than 10: 0
> Number of HSP's better than 10 without gapping: 0
> Number of HSP's gapped: 2113
> Number of HSP's successfully gapped: 0
> Length of query: 150
> Length of database: 3169972781
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124430450
> Effective search space: 78603926650
> Effective search space used: 78603926650
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 74 (33.1 bits)
>
> ##############################################################################
> ###################################
> and here are the hits (?) of the blast-algorithm on the ncbi-homepage with
> the same query of course:
> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300
> 3e-80
> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2
> 1.1
> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4
> 1.8
> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3
> 4.2
> ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5
> 6.0
> ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5
> 7.0
> ##############################################################################
> ###################################at
> least the first hit is the same, but even there there is a different score
> and e-value.
>
> thanks so much for any help :)
> regards, jonas
>
>
> ----- Original Message -----
> From: "Chris Fields" <
cjfields@...>
> To: "Jason Stajich" <
jason@...>
> Cc: "Smithies, Russell" <
Russell.Smithies@...>; "'BioPerl
> List'" <
bioperl-l@...>; "'Jonas Schaer'"
> <
Jonas_Schaer@...>
> Sent: Monday, July 06, 2009 12:51 AM
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
>
> > That inspires confidence ;>
> >
> > chris
> >
> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >
> >> integer overflow in blast....
> >>
> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>
> >>> I'd guess it's a difference in the parameters used.
> >>> Interesting that both have the number of letters in the db as
> >>> "-1,125,070,205", I assume that's a bug :-)
> >>>
> >>> Stats from your remote_blast:
> >>>
> >>> 'stats' => {
> >>> 'S1' => '42',
> >>> 'S1_bits' => '20.8',
> >>> 'lambda' => '0.309',
> >>> 'entropy' => '0.345',
> >>> 'kappa_gapped' => '0.0410',
> >>> 'T' => '11',
> >>> 'kappa' => '0.122',
> >>> 'X3_bits' => '24.7',
> >>> 'X1' => '16',
> >>> 'lambda_gapped' => '0.267',
> >>> 'X2' => '38',
> >>> 'S2' => '74',
> >>> 'seqs_better_than_cutoff' => '0',
> >>> 'posted_date' => 'Jul 4, 2009 4:41 AM',
> >>> 'Hits_to_DB' => '60102303',
> >>> 'dbletters' => '-1125070205',
> >>> 'A' => '40',
> >>> 'num_successful_extensions' => '2004',
> >>> 'num_extensions' => '1436892',
> >>> 'X1_bits' => '7.1',
> >>> 'X3' => '64',
> >>> 'entropy_gapped' => '0.140',
> >>> 'dbentries' => '9252258',
> >>> 'X2_bits' => '14.6',
> >>> 'S2_bits' => '33.1'
> >>> }
> >>>
> >>>
> >>> Stats from a blast done on the NCBI webpage:
> >>>
> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
> >>> +PIR+PRF
> >>> excluding environmental samples from WGS projects
> >>> Posted date: Jul 4, 2009 4:41 AM
> >>> Number of letters in database: -1,125,070,205
> >>> Number of sequences in database: 9,252,258
> >>>
> >>> Lambda K H
> >>> 0.309 0.124 0.340
> >>> Gapped
> >>> Lambda K H
> >>> 0.267 0.0410 0.140
> >>> Matrix: BLOSUM62
> >>> Gap Penalties: Existence: 11, Extension: 1
> >>> Number of Sequences: 9252258
> >>> Number of Hits to DB: 86493230
> >>> Number of extensions: 3101413
> >>> Number of successful extensions: 9001
> >>> Number of sequences better than 100: 65
> >>> Number of HSP's better than 100 without gapping: 0
> >>> Number of HSP's gapped: 9000
> >>> Number of HSP's successfully gapped: 66
> >>> Length of query: 150
> >>> Length of database: 3169897087
> >>> Length adjustment: 113
> >>> Effective length of query: 37
> >>> Effective length of database: 2124391933
> >>> Effective search space: 78602501521
> >>> Effective search space used: 78602501521
> >>> T: 11
> >>> A: 40
> >>> X1: 16 (7.1 bits)
> >>> X2: 38 (14.6 bits)
> >>> X3: 64 (24.7 bits)
> >>> S1: 42 (20.8 bits)
> >>> S2: 65 (29.6 bits)
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From:
bioperl-l-bounces@... [mailto:bioperl-l-
> >>>>
bounces@...] On Behalf Of Jonas Schaer
> >>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>> To: BioPerl List
> >>>> Subject: [Bioperl-l] different results with remote-blast skript
> >>>>
> >>>> Hi again :)
> >>>> please, I only have this little question:
> >>>> why do I get different results with my remote::blast perl skript
> >>>> then on the
> >>>> ncbi blast homepage?
> >>>> I am using blastp, the query is an amino-sequence (different
> >>>> results with any
> >>>> sequence, differences not only in number of hits but even in e-
> >>>> values, scores
> >>>> etc...), the database is 'nr'.
> >>>> PLEASE help me,
> >>>> thank you in advance,
> >>>> Jonas
> >>>>
> >>>> ps: my skript:
> >>>>
> ##############################################################################
> >>>> ##
> >>>> use Bio::Seq::SeqFactory;
> >>>> use Bio::Tools::Run::RemoteBlast;
> >>>> use strict;
> >>>> my @blast_report;
> >>>> my $prog = 'blastp';
> >>>> my $db = 'nr';
> >>>> my $e_val= '1e-10';
> >>>> #my $e_val= '10';
> >>>> my @params = ( '-prog' => $prog,
> >>>> '-data' => $db,
> >>>> '-expect' => $e_val,
> >>>> '-readmethod' => 'SearchIO' );
> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>> $
> >>>> Bio
> >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>> = '1';
> >>>>
> >>>> my
> >>>> $
> >>>> blast_seq
> >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>
> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
> >>>> PDDEYE';
> >>>> #$v is just to turn on and off the messages
> >>>> my $v = 1;
> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>> 'Bio::PrimarySeq');
> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
> >>>> "$blast_seq");
> >>>> my $filename='temp2.out';
> >>>> my $r = $factory->submit_blast($seq);
> >>>> print STDERR "waiting..." if( $v > 0 );
> >>>> while ( my @rids = $factory->each_rid )
> >>>> {
> >>>> foreach my $rid ( @rids )
> >>>> {
> >>>> my $rc = $factory->retrieve_blast($rid);
> >>>> if( !ref($rc) )
> >>>> {
> >>>> if( $rc < 0 )
> >>>> {
> >>>> $factory->remove_rid($rid);
> >>>> }
> >>>> print STDERR "." if ( $v > 0 );
> >>>> }
> >>>> else
> >>>> {
> >>>> my $result = $rc->next_result();
> >>>> $factory->save_output($filename);
> >>>> $factory->remove_rid($rid);
> >>>> print "\nQuery Name: ", $result->query_name(),
> >>>> "\n";
> >>>> while ( my $hit = $result->next_hit )
> >>>> {
> >>>> next unless ( $v > 0);
> >>>> print "\thit name is ", $hit->name, "\n";
> >>>> while( my $hsp = $hit->next_hsp )
> >>>> {
> >>>> print "\t\tscore is ", $hsp->score, "\n";
> >>>> }
> >>>> }
> >>>> }
> >>>> }
> >>>>
> >>>>
> >>>> }
> >>>> @blast_report = get_file_data ($filename);
> >>>> return @blast_report;
> >>>>
> ##############################################################################
> >>>> ####
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>>
Bioperl-l@...
> >>>>
http://lists.open-bio.org/mailman/listinfo/bioperl-l> >>> =
> >>> =
> >>> =====================================================================
> >>> Attention: The information contained in this message and/or
> >>> attachments
> >>> from AgResearch Limited is intended only for the persons or entities
> >>> to which it is addressed and may contain confidential and/or
> >>> privileged
> >>> material. Any review, retransmission, dissemination or other use
> >>> of, or
> >>> taking of any action in reliance upon, this information by persons or
> >>> entities other than the intended recipients is prohibited by
> >>> AgResearch
> >>> Limited. If you have received this message in error, please notify
> >>> the
> >>> sender immediately.
> >>> =
> >>> =
> >>> =====================================================================
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>>
Bioperl-l@...
> >>>
http://lists.open-bio.org/mailman/listinfo/bioperl-l> >>
> >> --
> >> Jason Stajich
> >>
jason@...
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >>
Bioperl-l@...
> >>
http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
> ------------------------------------------------------------------------------
> --
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09
> 05:53:00