different results with remote-blast skript

View: New views
17 Messages — Rating Filter:   Alert me  

different results with remote-blast skript

by Jonas Schaer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi again :)
please, I only have this little question:
why do I get different results with my remote::blast perl skript then on the ncbi blast homepage?
I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'.
PLEASE help me,
thank you in advance,
Jonas

ps: my skript:
################################################################################
use Bio::Seq::SeqFactory;
  use Bio::Tools::Run::RemoteBlast;
  use strict;
  my @blast_report;
  my $prog = 'blastp';
  my $db   = 'nr';
  my $e_val= '1e-10';
  #my $e_val= '10';
  my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
 $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
 
  my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE';
  #$v is just to turn on and off the messages
  my $v = 1;
  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');  
  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq");
  my $filename='temp2.out';
  my $r = $factory->submit_blast($seq);
  print STDERR "waiting..." if( $v > 0 );
    while ( my @rids = $factory->each_rid )
    {
        foreach my $rid ( @rids )
        {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) )
            {
                if( $rc < 0 )
                {
                    $factory->remove_rid($rid);
                }
                print STDERR "." if ( $v > 0 );
            }
                else    
                {
                    my $result = $rc->next_result();
                    $factory->save_output($filename);
                    $factory->remove_rid($rid);
                    print "\nQuery Name: ", $result->query_name(), "\n";
                    while ( my $hit = $result->next_hit )
                    {
                        next unless ( $v > 0);
                        print "\thit name is ", $hit->name, "\n";
                        while( my $hsp = $hit->next_hsp )
                        {
                            print "\t\tscore is ", $hsp->score, "\n";
                        }
                    }
                }
        }
   
   
    }
@blast_report = get_file_data ($filename);
return @blast_report;
##################################################################################
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: different results with remote-blast skript

by Smithies, Russell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'd guess it's a difference in the parameters used.
Interesting that both have the number of letters in the db as "-1,125,070,205", I assume that's a bug  :-)

Stats from your remote_blast:

'stats' => {
             'S1' => '42',
             'S1_bits' => '20.8',
             'lambda' => '0.309',
             'entropy' => '0.345',
             'kappa_gapped' => '0.0410',
             'T' => '11',
             'kappa' => '0.122',
             'X3_bits' => '24.7',
             'X1' => '16',
             'lambda_gapped' => '0.267',
             'X2' => '38',
             'S2' => '74',
             'seqs_better_than_cutoff' => '0',
             'posted_date' => 'Jul 4, 2009  4:41 AM',
             'Hits_to_DB' => '60102303',
             'dbletters' => '-1125070205',
             'A' => '40',
             'num_successful_extensions' => '2004',
             'num_extensions' => '1436892',
             'X1_bits' => '7.1',
             'X3' => '64',
             'entropy_gapped' => '0.140',
             'dbentries' => '9252258',
             'X2_bits' => '14.6',
             'S2_bits' => '33.1'
           }


Stats from a blast done on the NCBI webpage:

  Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
    Posted date:  Jul 4, 2009  4:41 AM
  Number of letters in database: -1,125,070,205
  Number of sequences in database:  9,252,258

Lambda     K      H
   0.309    0.124    0.340
Gapped
Lambda     K      H
   0.267   0.0410    0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 9252258
Number of Hits to DB: 86493230
Number of extensions: 3101413
Number of successful extensions: 9001
Number of sequences better than 100: 65
Number of HSP's better than 100 without gapping: 0
Number of HSP's gapped: 9000
Number of HSP's successfully gapped: 66
Length of query: 150
Length of database: 3169897087
Length adjustment: 113
Effective length of query: 37
Effective length of database: 2124391933
Effective search space: 78602501521
Effective search space used: 78602501521
T: 11
A: 40
X1: 16 (7.1 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (20.8 bits)
S2: 65 (29.6 bits)


> -----Original Message-----
> From: bioperl-l-bounces@... [mailto:bioperl-l-
> bounces@...] On Behalf Of Jonas Schaer
> Sent: Sunday, 28 June 2009 10:15 p.m.
> To: BioPerl List
> Subject: [Bioperl-l] different results with remote-blast skript
>
> Hi again :)
> please, I only have this little question:
> why do I get different results with my remote::blast perl skript then on the
> ncbi blast homepage?
> I am using blastp, the query is an amino-sequence (different results with any
> sequence, differences not only in number of hits but even in e-values, scores
> etc...), the database is 'nr'.
> PLEASE help me,
> thank you in advance,
> Jonas
>
> ps: my skript:
> ##############################################################################
> ##
> use Bio::Seq::SeqFactory;
>   use Bio::Tools::Run::RemoteBlast;
>   use strict;
>   my @blast_report;
>   my $prog = 'blastp';
>   my $db   = 'nr';
>   my $e_val= '1e-10';
>   #my $e_val= '10';
>   my @params = ( '-prog' => $prog,
>          '-data' => $db,
>          '-expect' => $e_val,
>          '-readmethod' => 'SearchIO' );
>   my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>    $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>    $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>  $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
>
>   my
> $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
> PDDEYE';
>   #$v is just to turn on and off the messages
>   my $v = 1;
>   my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');
>   my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
> "$blast_seq");
>   my $filename='temp2.out';
>   my $r = $factory->submit_blast($seq);
>   print STDERR "waiting..." if( $v > 0 );
>     while ( my @rids = $factory->each_rid )
>     {
>         foreach my $rid ( @rids )
>         {
>             my $rc = $factory->retrieve_blast($rid);
>             if( !ref($rc) )
>             {
>                 if( $rc < 0 )
>                 {
>                     $factory->remove_rid($rid);
>                 }
>                 print STDERR "." if ( $v > 0 );
>             }
>                 else
>                 {
>                     my $result = $rc->next_result();
>                     $factory->save_output($filename);
>                     $factory->remove_rid($rid);
>                     print "\nQuery Name: ", $result->query_name(), "\n";
>                     while ( my $hit = $result->next_hit )
>                     {
>                         next unless ( $v > 0);
>                         print "\thit name is ", $hit->name, "\n";
>                         while( my $hsp = $hit->next_hsp )
>                         {
>                             print "\t\tscore is ", $hsp->score, "\n";
>                         }
>                     }
>                 }
>         }
>
>
>     }
> @blast_report = get_file_data ($filename);
> return @blast_report;
> ##############################################################################
> ####
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: different results with remote-blast skript

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

integer overflow in blast....

On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:

> I'd guess it's a difference in the parameters used.
> Interesting that both have the number of letters in the db as  
> "-1,125,070,205", I assume that's a bug  :-)
>
> Stats from your remote_blast:
>
> 'stats' => {
>             'S1' => '42',
>             'S1_bits' => '20.8',
>             'lambda' => '0.309',
>             'entropy' => '0.345',
>             'kappa_gapped' => '0.0410',
>             'T' => '11',
>             'kappa' => '0.122',
>             'X3_bits' => '24.7',
>             'X1' => '16',
>             'lambda_gapped' => '0.267',
>             'X2' => '38',
>             'S2' => '74',
>             'seqs_better_than_cutoff' => '0',
>             'posted_date' => 'Jul 4, 2009  4:41 AM',
>             'Hits_to_DB' => '60102303',
>             'dbletters' => '-1125070205',
>             'A' => '40',
>             'num_successful_extensions' => '2004',
>             'num_extensions' => '1436892',
>             'X1_bits' => '7.1',
>             'X3' => '64',
>             'entropy_gapped' => '0.140',
>             'dbentries' => '9252258',
>             'X2_bits' => '14.6',
>             'S2_bits' => '33.1'
>           }
>
>
> Stats from a blast done on the NCBI webpage:
>
>  Database: All non-redundant GenBank CDS translations+PDB+SwissProt
> +PIR+PRF
> excluding environmental samples from WGS projects
>    Posted date:  Jul 4, 2009  4:41 AM
>  Number of letters in database: -1,125,070,205
>  Number of sequences in database:  9,252,258
>
> Lambda     K      H
>   0.309    0.124    0.340
> Gapped
> Lambda     K      H
>   0.267   0.0410    0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252258
> Number of Hits to DB: 86493230
> Number of extensions: 3101413
> Number of successful extensions: 9001
> Number of sequences better than 100: 65
> Number of HSP's better than 100 without gapping: 0
> Number of HSP's gapped: 9000
> Number of HSP's successfully gapped: 66
> Length of query: 150
> Length of database: 3169897087
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124391933
> Effective search space: 78602501521
> Effective search space used: 78602501521
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 65 (29.6 bits)
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>> bounces@...] On Behalf Of Jonas Schaer
>> Sent: Sunday, 28 June 2009 10:15 p.m.
>> To: BioPerl List
>> Subject: [Bioperl-l] different results with remote-blast skript
>>
>> Hi again :)
>> please, I only have this little question:
>> why do I get different results with my remote::blast perl skript  
>> then on the
>> ncbi blast homepage?
>> I am using blastp, the query is an amino-sequence (different  
>> results with any
>> sequence, differences not only in number of hits but even in e-
>> values, scores
>> etc...), the database is 'nr'.
>> PLEASE help me,
>> thank you in advance,
>> Jonas
>>
>> ps: my skript:
>> ##############################################################################
>> ##
>> use Bio::Seq::SeqFactory;
>>  use Bio::Tools::Run::RemoteBlast;
>>  use strict;
>>  my @blast_report;
>>  my $prog = 'blastp';
>>  my $db   = 'nr';
>>  my $e_val= '1e-10';
>>  #my $e_val= '10';
>>  my @params = ( '-prog' => $prog,
>>         '-data' => $db,
>>         '-expect' => $e_val,
>>         '-readmethod' => 'SearchIO' );
>>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>> $
>> Bio
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} =  
>> '1';
>>
>>  my
>> $
>> blast_seq
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>> PDDEYE';
>>  #$v is just to turn on and off the messages
>>  my $v = 1;
>>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
>> 'Bio::PrimarySeq');
>>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>> "$blast_seq");
>>  my $filename='temp2.out';
>>  my $r = $factory->submit_blast($seq);
>>  print STDERR "waiting..." if( $v > 0 );
>>    while ( my @rids = $factory->each_rid )
>>    {
>>        foreach my $rid ( @rids )
>>        {
>>            my $rc = $factory->retrieve_blast($rid);
>>            if( !ref($rc) )
>>            {
>>                if( $rc < 0 )
>>                {
>>                    $factory->remove_rid($rid);
>>                }
>>                print STDERR "." if ( $v > 0 );
>>            }
>>                else
>>                {
>>                    my $result = $rc->next_result();
>>                    $factory->save_output($filename);
>>                    $factory->remove_rid($rid);
>>                    print "\nQuery Name: ", $result->query_name(),  
>> "\n";
>>                    while ( my $hit = $result->next_hit )
>>                    {
>>                        next unless ( $v > 0);
>>                        print "\thit name is ", $hit->name, "\n";
>>                        while( my $hsp = $hit->next_hsp )
>>                        {
>>                            print "\t\tscore is ", $hsp->score, "\n";
>>                        }
>>                    }
>>                }
>>        }
>>
>>
>>    }
>> @blast_report = get_file_data ($filename);
>> return @blast_report;
>> ##############################################################################
>> ####
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason@...

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: different results with remote-blast skript

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

That inspires confidence ;>

chris

On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:

> integer overflow in blast....
>
> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>
>> I'd guess it's a difference in the parameters used.
>> Interesting that both have the number of letters in the db as  
>> "-1,125,070,205", I assume that's a bug  :-)
>>
>> Stats from your remote_blast:
>>
>> 'stats' => {
>>            'S1' => '42',
>>            'S1_bits' => '20.8',
>>            'lambda' => '0.309',
>>            'entropy' => '0.345',
>>            'kappa_gapped' => '0.0410',
>>            'T' => '11',
>>            'kappa' => '0.122',
>>            'X3_bits' => '24.7',
>>            'X1' => '16',
>>            'lambda_gapped' => '0.267',
>>            'X2' => '38',
>>            'S2' => '74',
>>            'seqs_better_than_cutoff' => '0',
>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
>>            'Hits_to_DB' => '60102303',
>>            'dbletters' => '-1125070205',
>>            'A' => '40',
>>            'num_successful_extensions' => '2004',
>>            'num_extensions' => '1436892',
>>            'X1_bits' => '7.1',
>>            'X3' => '64',
>>            'entropy_gapped' => '0.140',
>>            'dbentries' => '9252258',
>>            'X2_bits' => '14.6',
>>            'S2_bits' => '33.1'
>>          }
>>
>>
>> Stats from a blast done on the NCBI webpage:
>>
>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
>> +PIR+PRF
>> excluding environmental samples from WGS projects
>>   Posted date:  Jul 4, 2009  4:41 AM
>> Number of letters in database: -1,125,070,205
>> Number of sequences in database:  9,252,258
>>
>> Lambda     K      H
>>  0.309    0.124    0.340
>> Gapped
>> Lambda     K      H
>>  0.267   0.0410    0.140
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Sequences: 9252258
>> Number of Hits to DB: 86493230
>> Number of extensions: 3101413
>> Number of successful extensions: 9001
>> Number of sequences better than 100: 65
>> Number of HSP's better than 100 without gapping: 0
>> Number of HSP's gapped: 9000
>> Number of HSP's successfully gapped: 66
>> Length of query: 150
>> Length of database: 3169897087
>> Length adjustment: 113
>> Effective length of query: 37
>> Effective length of database: 2124391933
>> Effective search space: 78602501521
>> Effective search space used: 78602501521
>> T: 11
>> A: 40
>> X1: 16 (7.1 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (20.8 bits)
>> S2: 65 (29.6 bits)
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>> bounces@...] On Behalf Of Jonas Schaer
>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>> To: BioPerl List
>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>
>>> Hi again :)
>>> please, I only have this little question:
>>> why do I get different results with my remote::blast perl skript  
>>> then on the
>>> ncbi blast homepage?
>>> I am using blastp, the query is an amino-sequence (different  
>>> results with any
>>> sequence, differences not only in number of hits but even in e-
>>> values, scores
>>> etc...), the database is 'nr'.
>>> PLEASE help me,
>>> thank you in advance,
>>> Jonas
>>>
>>> ps: my skript:
>>> ##############################################################################
>>> ##
>>> use Bio::Seq::SeqFactory;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use strict;
>>> my @blast_report;
>>> my $prog = 'blastp';
>>> my $db   = 'nr';
>>> my $e_val= '1e-10';
>>> #my $e_val= '10';
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>> $
>>> Bio
>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}  
>>> = '1';
>>>
>>> my
>>> $
>>> blast_seq
>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>>> PDDEYE';
>>> #$v is just to turn on and off the messages
>>> my $v = 1;
>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
>>> 'Bio::PrimarySeq');
>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>> "$blast_seq");
>>> my $filename='temp2.out';
>>> my $r = $factory->submit_blast($seq);
>>> print STDERR "waiting..." if( $v > 0 );
>>>   while ( my @rids = $factory->each_rid )
>>>   {
>>>       foreach my $rid ( @rids )
>>>       {
>>>           my $rc = $factory->retrieve_blast($rid);
>>>           if( !ref($rc) )
>>>           {
>>>               if( $rc < 0 )
>>>               {
>>>                   $factory->remove_rid($rid);
>>>               }
>>>               print STDERR "." if ( $v > 0 );
>>>           }
>>>               else
>>>               {
>>>                   my $result = $rc->next_result();
>>>                   $factory->save_output($filename);
>>>                   $factory->remove_rid($rid);
>>>                   print "\nQuery Name: ", $result->query_name(),  
>>> "\n";
>>>                   while ( my $hit = $result->next_hit )
>>>                   {
>>>                       next unless ( $v > 0);
>>>                       print "\thit name is ", $hit->name, "\n";
>>>                       while( my $hsp = $hit->next_hsp )
>>>                       {
>>>                           print "\t\tscore is ", $hsp->score, "\n";
>>>                       }
>>>                   }
>>>               }
>>>       }
>>>
>>>
>>>   }
>>> @blast_report = get_file_data ($filename);
>>> return @blast_report;
>>> ##############################################################################
>>> ####
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> =
>> =====================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use  
>> of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> =
>> =
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason@...
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: different results with remote-blast skript

by Jonas Schaer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi guys, thanks for your answers so far.
@jason: integer overflow in blast.... sorry, but what do you mean by that?
how can I fix it...?

Since I never really changed any parameters I thought them all to be
default. whatever, I tried to get "better" results with my prog by changing
these:
$Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
$Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
$Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
with no effect...I guess these were default values anyway.

So please maybe you can tell me all the other parameters I can change with
my perl-skript AND how to do that?
Unfortunately both, perl and the blast-algorithm are pretty much new to me,
maybe thats why I just cannot find out how to do that on my own... :/

Here is the output I get with my remote-blast skript:
#################################################################################################################
 Query Name:
MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
L
        hit name is ref|XP_001702807.1|
                score is 442
BLASTP 2.2.21+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic
Acids Res. 25:3389-3402.


Reference for composition-based statistics: Alejandro A.
Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen
F. Altschul (2001), "Improving the accuracy of PSI-BLAST
protein database searches with composition-based statistics
and other refinements", Nucleic Acids Res. 29:2994-3005.


RID: 53STX5G2013


Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding environmental samples
from WGS projects
           9,252,587 sequences; 3,169,972,781 total letters
Query=
MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
ATGPDPDDEYE
Length=150


                                                                   Score
E
Sequences producing significant alignments:                       (Bits)
Value

ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   174
2e-42


ALIGNMENTS
>ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
 gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
Length=303

 Score =  174 bits (442),  Expect = 2e-42, Method: Composition-based stats.
 Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%)

Query  1    MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds  60
            MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
Sbjct  154  MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
213

Query  61   dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
120
            DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
Sbjct  214  DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
273

Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
            AWHERDDNAFRQAHQNTAMATGPDPDDEYE
Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303



  Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
    Posted date:  Jul 5, 2009  4:41 AM
  Number of letters in database: -1,124,994,511
  Number of sequences in database:  9,252,587

Lambda     K      H
   0.309    0.122    0.345
Gapped
Lambda     K      H
   0.267   0.0410    0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 9252587
Number of Hits to DB: 60273703
Number of extensions: 1448367
Number of successful extensions: 2103
Number of sequences better than 10: 0
Number of HSP's better than 10 without gapping: 0
Number of HSP's gapped: 2113
Number of HSP's successfully gapped: 0
Length of query: 150
Length of database: 3169972781
Length adjustment: 113
Effective length of query: 37
Effective length of database: 2124430450
Effective search space: 78603926650
Effective search space used: 78603926650
T: 11
A: 40
X1: 16 (7.1 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (20.8 bits)
S2: 74 (33.1 bits)

#################################################################################################################
and here are the hits (?) of the blast-algorithm on the ncbi-homepage with
the same query of course:
ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   300
3e-80
ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA [Acyrtho...  36.2
1.1
ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880 [Blautia...  35.4
1.8
ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania brazil...  34.3
4.2
ref|XP_680841.1|  hypothetical protein AN7572.2 [Aspergillus n...  33.5
6.0
ref|YP_001768110.1|  hypothetical protein M446_1150 [Methyloba...  33.5
7.0
#################################################################################################################at
least the first hit is the same, but even there there is a different score
and e-value.

thanks so much for any help :)
regards, jonas


----- Original Message -----
From: "Chris Fields" <cjfields@...>
To: "Jason Stajich" <jason@...>
Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl
List'" <bioperl-l@...>; "'Jonas Schaer'"
<Jonas_Schaer@...>
Sent: Monday, July 06, 2009 12:51 AM
Subject: Re: [Bioperl-l] different results with remote-blast skript


> That inspires confidence ;>
>
> chris
>
> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>
>> integer overflow in blast....
>>
>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>
>>> I'd guess it's a difference in the parameters used.
>>> Interesting that both have the number of letters in the db as
>>> "-1,125,070,205", I assume that's a bug  :-)
>>>
>>> Stats from your remote_blast:
>>>
>>> 'stats' => {
>>>            'S1' => '42',
>>>            'S1_bits' => '20.8',
>>>            'lambda' => '0.309',
>>>            'entropy' => '0.345',
>>>            'kappa_gapped' => '0.0410',
>>>            'T' => '11',
>>>            'kappa' => '0.122',
>>>            'X3_bits' => '24.7',
>>>            'X1' => '16',
>>>            'lambda_gapped' => '0.267',
>>>            'X2' => '38',
>>>            'S2' => '74',
>>>            'seqs_better_than_cutoff' => '0',
>>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>            'Hits_to_DB' => '60102303',
>>>            'dbletters' => '-1125070205',
>>>            'A' => '40',
>>>            'num_successful_extensions' => '2004',
>>>            'num_extensions' => '1436892',
>>>            'X1_bits' => '7.1',
>>>            'X3' => '64',
>>>            'entropy_gapped' => '0.140',
>>>            'dbentries' => '9252258',
>>>            'X2_bits' => '14.6',
>>>            'S2_bits' => '33.1'
>>>          }
>>>
>>>
>>> Stats from a blast done on the NCBI webpage:
>>>
>>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
>>> +PIR+PRF
>>> excluding environmental samples from WGS projects
>>>   Posted date:  Jul 4, 2009  4:41 AM
>>> Number of letters in database: -1,125,070,205
>>> Number of sequences in database:  9,252,258
>>>
>>> Lambda     K      H
>>>  0.309    0.124    0.340
>>> Gapped
>>> Lambda     K      H
>>>  0.267   0.0410    0.140
>>> Matrix: BLOSUM62
>>> Gap Penalties: Existence: 11, Extension: 1
>>> Number of Sequences: 9252258
>>> Number of Hits to DB: 86493230
>>> Number of extensions: 3101413
>>> Number of successful extensions: 9001
>>> Number of sequences better than 100: 65
>>> Number of HSP's better than 100 without gapping: 0
>>> Number of HSP's gapped: 9000
>>> Number of HSP's successfully gapped: 66
>>> Length of query: 150
>>> Length of database: 3169897087
>>> Length adjustment: 113
>>> Effective length of query: 37
>>> Effective length of database: 2124391933
>>> Effective search space: 78602501521
>>> Effective search space used: 78602501521
>>> T: 11
>>> A: 40
>>> X1: 16 (7.1 bits)
>>> X2: 38 (14.6 bits)
>>> X3: 64 (24.7 bits)
>>> S1: 42 (20.8 bits)
>>> S2: 65 (29.6 bits)
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>> bounces@...] On Behalf Of Jonas Schaer
>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>> To: BioPerl List
>>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>>
>>>> Hi again :)
>>>> please, I only have this little question:
>>>> why do I get different results with my remote::blast perl skript
>>>> then on the
>>>> ncbi blast homepage?
>>>> I am using blastp, the query is an amino-sequence (different
>>>> results with any
>>>> sequence, differences not only in number of hits but even in e-
>>>> values, scores
>>>> etc...), the database is 'nr'.
>>>> PLEASE help me,
>>>> thank you in advance,
>>>> Jonas
>>>>
>>>> ps: my skript:
>>>> ##############################################################################
>>>> ##
>>>> use Bio::Seq::SeqFactory;
>>>> use Bio::Tools::Run::RemoteBlast;
>>>> use strict;
>>>> my @blast_report;
>>>> my $prog = 'blastp';
>>>> my $db   = 'nr';
>>>> my $e_val= '1e-10';
>>>> #my $e_val= '10';
>>>> my @params = ( '-prog' => $prog,
>>>>        '-data' => $db,
>>>>        '-expect' => $e_val,
>>>>        '-readmethod' => 'SearchIO' );
>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>> $
>>>> Bio
>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>> = '1';
>>>>
>>>> my
>>>> $
>>>> blast_seq
>>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>>>> PDDEYE';
>>>> #$v is just to turn on and off the messages
>>>> my $v = 1;
>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>> 'Bio::PrimarySeq');
>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>>> "$blast_seq");
>>>> my $filename='temp2.out';
>>>> my $r = $factory->submit_blast($seq);
>>>> print STDERR "waiting..." if( $v > 0 );
>>>>   while ( my @rids = $factory->each_rid )
>>>>   {
>>>>       foreach my $rid ( @rids )
>>>>       {
>>>>           my $rc = $factory->retrieve_blast($rid);
>>>>           if( !ref($rc) )
>>>>           {
>>>>               if( $rc < 0 )
>>>>               {
>>>>                   $factory->remove_rid($rid);
>>>>               }
>>>>               print STDERR "." if ( $v > 0 );
>>>>           }
>>>>               else
>>>>               {
>>>>                   my $result = $rc->next_result();
>>>>                   $factory->save_output($filename);
>>>>                   $factory->remove_rid($rid);
>>>>                   print "\nQuery Name: ", $result->query_name(),
>>>> "\n";
>>>>                   while ( my $hit = $result->next_hit )
>>>>                   {
>>>>                       next unless ( $v > 0);
>>>>                       print "\thit name is ", $hit->name, "\n";
>>>>                       while( my $hsp = $hit->next_hsp )
>>>>                       {
>>>>                           print "\t\tscore is ", $hsp->score, "\n";
>>>>                       }
>>>>                   }
>>>>               }
>>>>       }
>>>>
>>>>
>>>>   }
>>>> @blast_report = get_file_data ($filename);
>>>> return @blast_report;
>>>> ##############################################################################
>>>> ####
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@...
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> =
>>> =
>>> =====================================================================
>>> Attention: The information contained in this message and/or
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
>>> privileged
>>> material. Any review, retransmission, dissemination or other use
>>> of, or
>>> taking of any action in reliance upon, this information by persons or
>>> entities other than the intended recipients is prohibited by
>>> AgResearch
>>> Limited. If you have received this message in error, please notify
>>> the
>>> sender immediately.
>>> =
>>> =
>>> =====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason@...
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09
05:53:00



_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

U.gif (318 bytes) Download Attachment
G.gif (494 bytes) Download Attachment

Re: different results with remote-blast skript

by Smithies, Russell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jonas,
You can't just play with the BLAST parameters and hope for a "better" result.
I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters.
If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html
http://www.ncbi.nlm.nih.gov/blast/tutorial/
Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3

Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow


Russell

> -----Original Message-----
> From: bioperl-l-bounces@... [mailto:bioperl-l-
> bounces@...] On Behalf Of Jonas Schaer
> Sent: Tuesday, 7 July 2009 12:14 a.m.
> To: BioPerl List; Chris Fields
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
> Hi guys, thanks for your answers so far.
> @jason: integer overflow in blast.... sorry, but what do you mean by that?
> how can I fix it...?
>
> Since I never really changed any parameters I thought them all to be default.
> whatever, I tried to get "better" results with my prog by changing
> these:
> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
> with no effect...I guess these were default values anyway.
>
> So please maybe you can tell me all the other parameters I can change with my
> perl-skript AND how to do that?
> Unfortunately both, perl and the blast-algorithm are pretty much new to me,
> maybe thats why I just cannot find out how to do that on my own... :/
>
> Here is the output I get with my remote-blast skript:
> ##############################################################################
> ###################################
>  Query Name:
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> L
>         hit name is ref|XP_001702807.1|
>                 score is 442
> BLASTP 2.2.21+
> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped
> BLAST and PSI-BLAST: a new generation of protein database search programs",
> Nucleic Acids Res. 25:3389-3402.
>
>
> Reference for composition-based statistics: Alejandro A.
> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the
> accuracy of PSI-BLAST protein database searches with composition-based
> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>
>
> RID: 53STX5G2013
>
>
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
>            9,252,587 sequences; 3,169,972,781 total letters Query=
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> ATGPDPDDEYE
> Length=150
>
>
>                                                                    Score
> E
> Sequences producing significant alignments:                       (Bits)
> Value
>
> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   174
> 2e-42
>
>
> ALIGNMENTS
> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
>  gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> Length=303
>
>  Score =  174 bits (442),  Expect = 2e-42, Method: Composition-based stats.
>  Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%)
>
> Query  1    MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds  60
>             MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> Sbjct  154  MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> 213
>
> Query  61   dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> 120
>             DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> Sbjct  214  DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> 273
>
> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>             AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>
>
>
>   Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
> excluding environmental samples from WGS projects
>     Posted date:  Jul 5, 2009  4:41 AM
>   Number of letters in database: -1,124,994,511
>   Number of sequences in database:  9,252,587
>
> Lambda     K      H
>    0.309    0.122    0.345
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252587
> Number of Hits to DB: 60273703
> Number of extensions: 1448367
> Number of successful extensions: 2103
> Number of sequences better than 10: 0
> Number of HSP's better than 10 without gapping: 0
> Number of HSP's gapped: 2113
> Number of HSP's successfully gapped: 0
> Length of query: 150
> Length of database: 3169972781
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124430450
> Effective search space: 78603926650
> Effective search space used: 78603926650
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 74 (33.1 bits)
>
> ##############################################################################
> ###################################
> and here are the hits (?) of the blast-algorithm on the ncbi-homepage with
> the same query of course:
> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   300
> 3e-80
> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA [Acyrtho...  36.2
> 1.1
> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880 [Blautia...  35.4
> 1.8
> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania brazil...  34.3
> 4.2
> ref|XP_680841.1|  hypothetical protein AN7572.2 [Aspergillus n...  33.5
> 6.0
> ref|YP_001768110.1|  hypothetical protein M446_1150 [Methyloba...  33.5
> 7.0
> ##############################################################################
> ###################################at
> least the first hit is the same, but even there there is a different score
> and e-value.
>
> thanks so much for any help :)
> regards, jonas
>
>
> ----- Original Message -----
> From: "Chris Fields" <cjfields@...>
> To: "Jason Stajich" <jason@...>
> Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl
> List'" <bioperl-l@...>; "'Jonas Schaer'"
> <Jonas_Schaer@...>
> Sent: Monday, July 06, 2009 12:51 AM
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
>
> > That inspires confidence ;>
> >
> > chris
> >
> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >
> >> integer overflow in blast....
> >>
> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>
> >>> I'd guess it's a difference in the parameters used.
> >>> Interesting that both have the number of letters in the db as
> >>> "-1,125,070,205", I assume that's a bug  :-)
> >>>
> >>> Stats from your remote_blast:
> >>>
> >>> 'stats' => {
> >>>            'S1' => '42',
> >>>            'S1_bits' => '20.8',
> >>>            'lambda' => '0.309',
> >>>            'entropy' => '0.345',
> >>>            'kappa_gapped' => '0.0410',
> >>>            'T' => '11',
> >>>            'kappa' => '0.122',
> >>>            'X3_bits' => '24.7',
> >>>            'X1' => '16',
> >>>            'lambda_gapped' => '0.267',
> >>>            'X2' => '38',
> >>>            'S2' => '74',
> >>>            'seqs_better_than_cutoff' => '0',
> >>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>            'Hits_to_DB' => '60102303',
> >>>            'dbletters' => '-1125070205',
> >>>            'A' => '40',
> >>>            'num_successful_extensions' => '2004',
> >>>            'num_extensions' => '1436892',
> >>>            'X1_bits' => '7.1',
> >>>            'X3' => '64',
> >>>            'entropy_gapped' => '0.140',
> >>>            'dbentries' => '9252258',
> >>>            'X2_bits' => '14.6',
> >>>            'S2_bits' => '33.1'
> >>>          }
> >>>
> >>>
> >>> Stats from a blast done on the NCBI webpage:
> >>>
> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
> >>> +PIR+PRF
> >>> excluding environmental samples from WGS projects
> >>>   Posted date:  Jul 4, 2009  4:41 AM
> >>> Number of letters in database: -1,125,070,205
> >>> Number of sequences in database:  9,252,258
> >>>
> >>> Lambda     K      H
> >>>  0.309    0.124    0.340
> >>> Gapped
> >>> Lambda     K      H
> >>>  0.267   0.0410    0.140
> >>> Matrix: BLOSUM62
> >>> Gap Penalties: Existence: 11, Extension: 1
> >>> Number of Sequences: 9252258
> >>> Number of Hits to DB: 86493230
> >>> Number of extensions: 3101413
> >>> Number of successful extensions: 9001
> >>> Number of sequences better than 100: 65
> >>> Number of HSP's better than 100 without gapping: 0
> >>> Number of HSP's gapped: 9000
> >>> Number of HSP's successfully gapped: 66
> >>> Length of query: 150
> >>> Length of database: 3169897087
> >>> Length adjustment: 113
> >>> Effective length of query: 37
> >>> Effective length of database: 2124391933
> >>> Effective search space: 78602501521
> >>> Effective search space used: 78602501521
> >>> T: 11
> >>> A: 40
> >>> X1: 16 (7.1 bits)
> >>> X2: 38 (14.6 bits)
> >>> X3: 64 (24.7 bits)
> >>> S1: 42 (20.8 bits)
> >>> S2: 65 (29.6 bits)
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>> bounces@...] On Behalf Of Jonas Schaer
> >>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>> To: BioPerl List
> >>>> Subject: [Bioperl-l] different results with remote-blast skript
> >>>>
> >>>> Hi again :)
> >>>> please, I only have this little question:
> >>>> why do I get different results with my remote::blast perl skript
> >>>> then on the
> >>>> ncbi blast homepage?
> >>>> I am using blastp, the query is an amino-sequence (different
> >>>> results with any
> >>>> sequence, differences not only in number of hits but even in e-
> >>>> values, scores
> >>>> etc...), the database is 'nr'.
> >>>> PLEASE help me,
> >>>> thank you in advance,
> >>>> Jonas
> >>>>
> >>>> ps: my skript:
> >>>>
> ##############################################################################
> >>>> ##
> >>>> use Bio::Seq::SeqFactory;
> >>>> use Bio::Tools::Run::RemoteBlast;
> >>>> use strict;
> >>>> my @blast_report;
> >>>> my $prog = 'blastp';
> >>>> my $db   = 'nr';
> >>>> my $e_val= '1e-10';
> >>>> #my $e_val= '10';
> >>>> my @params = ( '-prog' => $prog,
> >>>>        '-data' => $db,
> >>>>        '-expect' => $e_val,
> >>>>        '-readmethod' => 'SearchIO' );
> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>> $
> >>>> Bio
> >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>> = '1';
> >>>>
> >>>> my
> >>>> $
> >>>> blast_seq
> >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>
> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
> >>>> PDDEYE';
> >>>> #$v is just to turn on and off the messages
> >>>> my $v = 1;
> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>> 'Bio::PrimarySeq');
> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
> >>>> "$blast_seq");
> >>>> my $filename='temp2.out';
> >>>> my $r = $factory->submit_blast($seq);
> >>>> print STDERR "waiting..." if( $v > 0 );
> >>>>   while ( my @rids = $factory->each_rid )
> >>>>   {
> >>>>       foreach my $rid ( @rids )
> >>>>       {
> >>>>           my $rc = $factory->retrieve_blast($rid);
> >>>>           if( !ref($rc) )
> >>>>           {
> >>>>               if( $rc < 0 )
> >>>>               {
> >>>>                   $factory->remove_rid($rid);
> >>>>               }
> >>>>               print STDERR "." if ( $v > 0 );
> >>>>           }
> >>>>               else
> >>>>               {
> >>>>                   my $result = $rc->next_result();
> >>>>                   $factory->save_output($filename);
> >>>>                   $factory->remove_rid($rid);
> >>>>                   print "\nQuery Name: ", $result->query_name(),
> >>>> "\n";
> >>>>                   while ( my $hit = $result->next_hit )
> >>>>                   {
> >>>>                       next unless ( $v > 0);
> >>>>                       print "\thit name is ", $hit->name, "\n";
> >>>>                       while( my $hsp = $hit->next_hsp )
> >>>>                       {
> >>>>                           print "\t\tscore is ", $hsp->score, "\n";
> >>>>                       }
> >>>>                   }
> >>>>               }
> >>>>       }
> >>>>
> >>>>
> >>>>   }
> >>>> @blast_report = get_file_data ($filename);
> >>>> return @blast_report;
> >>>>
> ##############################################################################
> >>>> ####
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l@...
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>> =
> >>> =
> >>> =====================================================================
> >>> Attention: The information contained in this message and/or
> >>> attachments
> >>> from AgResearch Limited is intended only for the persons or entities
> >>> to which it is addressed and may contain confidential and/or
> >>> privileged
> >>> material. Any review, retransmission, dissemination or other use
> >>> of, or
> >>> taking of any action in reliance upon, this information by persons or
> >>> entities other than the intended recipients is prohibited by
> >>> AgResearch
> >>> Limited. If you have received this message in error, please notify
> >>> the
> >>> sender immediately.
> >>> =
> >>> =
> >>> =====================================================================
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@...
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason@...
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@...
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ------------------------------------------------------------------------------
> --
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09
> 05:53:00

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Jonas Schaer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi guys,
Thank you all so much for your help and patience :). Of course you were
right and I finaly found the right put-parameter to get exactly the same
hits as on the homepage.
I do have an other question though :)...
I now want to include a search for conserved domains, but when I try to use
the CDD_SEARCH-parameter
(http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub:CDD_SEARCH)
like the other put-parameters the way chris once told me(works fine with the
other params):

my %put = (
    WORD_SIZE => 3,
    HITLIST_SIZE => 100,
    THRESHOLD => 11,
    FILTER => 'R',
    GENETIC_CODE => 1,
    CDD_SEARCH => 'on'                                      ###I tried it
with 'true' and '1', too.

);

for my $putName (keys %put) {
    $factory->submit_parameter($putName,$put{$putName});
}


...an exception is thrown:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: CDD_SEARCH is not a valid PUT parameter.
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
C:/Perl/site/lib/Bio/Tools
/Run/RemoteBlast.pm:325
STACK: main::blast_a_sequence firsteval0.8.pm:383
STACK: main::blast_it firsteval0.8.pm:288
STACK: firsteval0.8.pm:35
----------------------------------------------------------- .
I guess somehow this could be the solution to my problem:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID-for-Simultaneous
, but unfortunately I don't understand what to do.
I'm so sorry to bother you with this but please help me once more...:)

Best regards and thanks in advance,
Jonas

----- Original Message -----
From: "Smithies, Russell" <Russell.Smithies@...>
To: "'Jonas Schaer'" <Brotelzwieb@...>
Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
<bioperl-l@...>
Sent: Monday, July 06, 2009 10:56 PM
Subject: RE: [Bioperl-l] different results with remote-blast skript


Hi Jonas,
You can't just play with the BLAST parameters and hope for a "better"
result.
I'd suggest that if you aren't sure what they do, you should leave them
alone as small changes can make huge differences in the output - it's quite
possible to miss finding what you're looking for by using the wrong
parameters.
If all else fails, read the blast manual:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html
http://www.ncbi.nlm.nih.gov/blast/tutorial/
Or Read Ian Korfs' excellent book:
http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3

Don't worry about the integer overflow bug as there's nothing you can do
about it. If you're interested, Google and Wikipedia are your friends:
http://en.wikipedia.org/wiki/Integer_overflow


Russell

> -----Original Message-----
> From: bioperl-l-bounces@... [mailto:bioperl-l-
> bounces@...] On Behalf Of Jonas Schaer
> Sent: Tuesday, 7 July 2009 12:14 a.m.
> To: BioPerl List; Chris Fields
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
> Hi guys, thanks for your answers so far.
> @jason: integer overflow in blast.... sorry, but what do you mean by that?
> how can I fix it...?
>
> Since I never really changed any parameters I thought them all to be
> default.
> whatever, I tried to get "better" results with my prog by changing
> these:
> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} =
> '1';
> with no effect...I guess these were default values anyway.
>
> So please maybe you can tell me all the other parameters I can change with
> my
> perl-skript AND how to do that?
> Unfortunately both, perl and the blast-algorithm are pretty much new to
> me,
> maybe thats why I just cannot find out how to do that on my own... :/
>
> Here is the output I get with my remote-blast skript:
> ##############################################################################
> ###################################
>  Query Name:
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> L
>         hit name is ref|XP_001702807.1|
>                 score is 442
> BLASTP 2.2.21+
> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped
> BLAST and PSI-BLAST: a new generation of protein database search
> programs",
> Nucleic Acids Res. 25:3389-3402.
>
>
> Reference for composition-based statistics: Alejandro A.
> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge,
> Yuri
> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the
> accuracy of PSI-BLAST protein database searches with composition-based
> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>
>
> RID: 53STX5G2013
>
>
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
>            9,252,587 sequences; 3,169,972,781 total letters Query=
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> ATGPDPDDEYE
> Length=150
>
>
>                                                                    Score
> E
> Sequences producing significant alignments:                       (Bits)
> Value
>
> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   174
> 2e-42
>
>
> ALIGNMENTS
> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
>  gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> Length=303
>
>  Score =  174 bits (442),  Expect = 2e-42, Method: Composition-based
> stats.
>  Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150
> (0%)
>
> Query  1    MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> 60
>             MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> Sbjct  154  MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> 213
>
> Query  61   dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> 120
>             DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> Sbjct  214  DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> 273
>
> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>             AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>
>
>
>   Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF
> excluding environmental samples from WGS projects
>     Posted date:  Jul 5, 2009  4:41 AM
>   Number of letters in database: -1,124,994,511
>   Number of sequences in database:  9,252,587
>
> Lambda     K      H
>    0.309    0.122    0.345
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252587
> Number of Hits to DB: 60273703
> Number of extensions: 1448367
> Number of successful extensions: 2103
> Number of sequences better than 10: 0
> Number of HSP's better than 10 without gapping: 0
> Number of HSP's gapped: 2113
> Number of HSP's successfully gapped: 0
> Length of query: 150
> Length of database: 3169972781
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124430450
> Effective search space: 78603926650
> Effective search space used: 78603926650
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 74 (33.1 bits)
>
> ##############################################################################
> ###################################
> and here are the hits (?) of the blast-algorithm on the ncbi-homepage with
> the same query of course:
> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...   300
> 3e-80
> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA [Acyrtho...  36.2
> 1.1
> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880 [Blautia...  35.4
> 1.8
> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania brazil...  34.3
> 4.2
> ref|XP_680841.1|  hypothetical protein AN7572.2 [Aspergillus n...  33.5
> 6.0
> ref|YP_001768110.1|  hypothetical protein M446_1150 [Methyloba...  33.5
> 7.0
> ##############################################################################
> ###################################at
> least the first hit is the same, but even there there is a different score
> and e-value.
>
> thanks so much for any help :)
> regards, jonas
>
>
> ----- Original Message -----
> From: "Chris Fields" <cjfields@...>
> To: "Jason Stajich" <jason@...>
> Cc: "Smithies, Russell" <Russell.Smithies@...>; "'BioPerl
> List'" <bioperl-l@...>; "'Jonas Schaer'"
> <Jonas_Schaer@...>
> Sent: Monday, July 06, 2009 12:51 AM
> Subject: Re: [Bioperl-l] different results with remote-blast skript
>
>
> > That inspires confidence ;>
> >
> > chris
> >
> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >
> >> integer overflow in blast....
> >>
> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>
> >>> I'd guess it's a difference in the parameters used.
> >>> Interesting that both have the number of letters in the db as
> >>> "-1,125,070,205", I assume that's a bug  :-)
> >>>
> >>> Stats from your remote_blast:
> >>>
> >>> 'stats' => {
> >>>            'S1' => '42',
> >>>            'S1_bits' => '20.8',
> >>>            'lambda' => '0.309',
> >>>            'entropy' => '0.345',
> >>>            'kappa_gapped' => '0.0410',
> >>>            'T' => '11',
> >>>            'kappa' => '0.122',
> >>>            'X3_bits' => '24.7',
> >>>            'X1' => '16',
> >>>            'lambda_gapped' => '0.267',
> >>>            'X2' => '38',
> >>>            'S2' => '74',
> >>>            'seqs_better_than_cutoff' => '0',
> >>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>            'Hits_to_DB' => '60102303',
> >>>            'dbletters' => '-1125070205',
> >>>            'A' => '40',
> >>>            'num_successful_extensions' => '2004',
> >>>            'num_extensions' => '1436892',
> >>>            'X1_bits' => '7.1',
> >>>            'X3' => '64',
> >>>            'entropy_gapped' => '0.140',
> >>>            'dbentries' => '9252258',
> >>>            'X2_bits' => '14.6',
> >>>            'S2_bits' => '33.1'
> >>>          }
> >>>
> >>>
> >>> Stats from a blast done on the NCBI webpage:
> >>>
> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
> >>> +PIR+PRF
> >>> excluding environmental samples from WGS projects
> >>>   Posted date:  Jul 4, 2009  4:41 AM
> >>> Number of letters in database: -1,125,070,205
> >>> Number of sequences in database:  9,252,258
> >>>
> >>> Lambda     K      H
> >>>  0.309    0.124    0.340
> >>> Gapped
> >>> Lambda     K      H
> >>>  0.267   0.0410    0.140
> >>> Matrix: BLOSUM62
> >>> Gap Penalties: Existence: 11, Extension: 1
> >>> Number of Sequences: 9252258
> >>> Number of Hits to DB: 86493230
> >>> Number of extensions: 3101413
> >>> Number of successful extensions: 9001
> >>> Number of sequences better than 100: 65
> >>> Number of HSP's better than 100 without gapping: 0
> >>> Number of HSP's gapped: 9000
> >>> Number of HSP's successfully gapped: 66
> >>> Length of query: 150
> >>> Length of database: 3169897087
> >>> Length adjustment: 113
> >>> Effective length of query: 37
> >>> Effective length of database: 2124391933
> >>> Effective search space: 78602501521
> >>> Effective search space used: 78602501521
> >>> T: 11
> >>> A: 40
> >>> X1: 16 (7.1 bits)
> >>> X2: 38 (14.6 bits)
> >>> X3: 64 (24.7 bits)
> >>> S1: 42 (20.8 bits)
> >>> S2: 65 (29.6 bits)
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>> bounces@...] On Behalf Of Jonas Schaer
> >>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>> To: BioPerl List
> >>>> Subject: [Bioperl-l] different results with remote-blast skript
> >>>>
> >>>> Hi again :)
> >>>> please, I only have this little question:
> >>>> why do I get different results with my remote::blast perl skript
> >>>> then on the
> >>>> ncbi blast homepage?
> >>>> I am using blastp, the query is an amino-sequence (different
> >>>> results with any
> >>>> sequence, differences not only in number of hits but even in e-
> >>>> values, scores
> >>>> etc...), the database is 'nr'.
> >>>> PLEASE help me,
> >>>> thank you in advance,
> >>>> Jonas
> >>>>
> >>>> ps: my skript:
> >>>>
> ##############################################################################
> >>>> ##
> >>>> use Bio::Seq::SeqFactory;
> >>>> use Bio::Tools::Run::RemoteBlast;
> >>>> use strict;
> >>>> my @blast_report;
> >>>> my $prog = 'blastp';
> >>>> my $db   = 'nr';
> >>>> my $e_val= '1e-10';
> >>>> #my $e_val= '10';
> >>>> my @params = ( '-prog' => $prog,
> >>>>        '-data' => $db,
> >>>>        '-expect' => $e_val,
> >>>>        '-readmethod' => 'SearchIO' );
> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>> $
> >>>> Bio
> >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>> = '1';
> >>>>
> >>>> my
> >>>> $
> >>>> blast_seq
> >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>
> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
> >>>> PDDEYE';
> >>>> #$v is just to turn on and off the messages
> >>>> my $v = 1;
> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>> 'Bio::PrimarySeq');
> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
> >>>> "$blast_seq");
> >>>> my $filename='temp2.out';
> >>>> my $r = $factory->submit_blast($seq);
> >>>> print STDERR "waiting..." if( $v > 0 );
> >>>>   while ( my @rids = $factory->each_rid )
> >>>>   {
> >>>>       foreach my $rid ( @rids )
> >>>>       {
> >>>>           my $rc = $factory->retrieve_blast($rid);
> >>>>           if( !ref($rc) )
> >>>>           {
> >>>>               if( $rc < 0 )
> >>>>               {
> >>>>                   $factory->remove_rid($rid);
> >>>>               }
> >>>>               print STDERR "." if ( $v > 0 );
> >>>>           }
> >>>>               else
> >>>>               {
> >>>>                   my $result = $rc->next_result();
> >>>>                   $factory->save_output($filename);
> >>>>                   $factory->remove_rid($rid);
> >>>>                   print "\nQuery Name: ", $result->query_name(),
> >>>> "\n";
> >>>>                   while ( my $hit = $result->next_hit )
> >>>>                   {
> >>>>                       next unless ( $v > 0);
> >>>>                       print "\thit name is ", $hit->name, "\n";
> >>>>                       while( my $hsp = $hit->next_hsp )
> >>>>                       {
> >>>>                           print "\t\tscore is ", $hsp->score, "\n";
> >>>>                       }
> >>>>                   }
> >>>>               }
> >>>>       }
> >>>>
> >>>>
> >>>>   }
> >>>> @blast_report = get_file_data ($filename);
> >>>> return @blast_report;
> >>>>
> ##############################################################################
> >>>> ####
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l@...
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>> =
> >>> =
> >>> =====================================================================
> >>> Attention: The information contained in this message and/or
> >>> attachments
> >>> from AgResearch Limited is intended only for the persons or entities
> >>> to which it is addressed and may contain confidential and/or
> >>> privileged
> >>> material. Any review, retransmission, dissemination or other use
> >>> of, or
> >>> taking of any action in reliance upon, this information by persons or
> >>> entities other than the intended recipients is prohibited by
> >>> AgResearch
> >>> Limited. If you have received this message in error, please notify
> >>> the
> >>> sender immediately.
> >>> =
> >>> =
> >>> =====================================================================
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@...
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason@...
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@...
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ------------------------------------------------------------------------------
> --
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09
> 05:53:00


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: 07/05/09
17:54:00

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm not sure, but I think adding this in will take a little work  
(we'll need to catch the RID returned, which I'm fairly sure will  
require some modifications to checking the returned output).  I would  
also have to look at the RemoteBlast API to see how this would fit in  
(I'm assuming we could either lump it in with other returned RIDs or  
create a new method for that).

You are more than welcome to add this in as an enhancement request to  
bugzilla for BioPerl:

http://bugzilla.open-bio.org/

chris

On Jul 9, 2009, at 5:16 AM, Jonas Schaer wrote:

> Hi guys,
> Thank you all so much for your help and patience :). Of course you  
> were right and I finaly found the right put-parameter to get exactly  
> the same hits as on the homepage.
> I do have an other question though :)...
> I now want to include a search for conserved domains, but when I try  
> to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub 
> :CDD_SEARCH) like the other put-parameters the way chris once told  
> me(works fine with the other params):
>
> my %put = (
>   WORD_SIZE => 3,
>   HITLIST_SIZE => 100,
>   THRESHOLD => 11,
>   FILTER => 'R',
>   GENETIC_CODE => 1,
>   CDD_SEARCH => 'on'                                      ###I tried  
> it with 'true' and '1', too.
>
> );
>
> for my $putName (keys %put) {
>   $factory->submit_parameter($putName,$put{$putName});
> }
>
>
> ...an exception is thrown:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: CDD_SEARCH is not a valid PUT parameter.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/
> lib/Bio/Tools
> /Run/RemoteBlast.pm:325
> STACK: main::blast_a_sequence firsteval0.8.pm:383
> STACK: main::blast_it firsteval0.8.pm:288
> STACK: firsteval0.8.pm:35
> ----------------------------------------------------------- .
> I guess somehow this could be the solution to my problem:
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID- 
> for-Simultaneous
> , but unfortunately I don't understand what to do.
> I'm so sorry to bother you with this but please help me once more...:)
>
> Best regards and thanks in advance,
> Jonas
>
> ----- Original Message ----- From: "Smithies, Russell" <Russell.Smithies@...
> >
> To: "'Jonas Schaer'" <Brotelzwieb@...>
> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'" <bioperl-l@...
> >
> Sent: Monday, July 06, 2009 10:56 PM
> Subject: RE: [Bioperl-l] different results with remote-blast skript
>
>
> Hi Jonas,
> You can't just play with the BLAST parameters and hope for a  
> "better" result.
> I'd suggest that if you aren't sure what they do, you should leave  
> them alone as small changes can make huge differences in the output  
> - it's quite possible to miss finding what you're looking for by  
> using the wrong parameters.
> If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html
> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>
> Don't worry about the integer overflow bug as there's nothing you  
> can do about it. If you're interested, Google and Wikipedia are your  
> friends: http://en.wikipedia.org/wiki/Integer_overflow
>
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>> bounces@...] On Behalf Of Jonas Schaer
>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>> To: BioPerl List; Chris Fields
>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>
>> Hi guys, thanks for your answers so far.
>> @jason: integer overflow in blast.... sorry, but what do you mean  
>> by that?
>> how can I fix it...?
>>
>> Since I never really changed any parameters I thought them all to  
>> be default.
>> whatever, I tried to get "better" results with my prog by changing
>> these:
>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>> $
>> Bio
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} =  
>> '1';
>> with no effect...I guess these were default values anyway.
>>
>> So please maybe you can tell me all the other parameters I can  
>> change with my
>> perl-skript AND how to do that?
>> Unfortunately both, perl and the blast-algorithm are pretty much  
>> new to me,
>> maybe thats why I just cannot find out how to do that on my own... :/
>>
>> Here is the output I get with my remote-blast skript:
>> ##############################################################################
>> ###################################
>> Query Name:
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>> L
>>        hit name is ref|XP_001702807.1|
>>                score is 442
>> BLASTP 2.2.21+
>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.  
>> Schaffer,
>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman  
>> (1997), "Gapped
>> BLAST and PSI-BLAST: a new generation of protein database search  
>> programs",
>> Nucleic Acids Res. 25:3389-3402.
>>
>>
>> Reference for composition-based statistics: Alejandro A.
>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L.  
>> Spouge, Yuri
>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),  
>> "Improving the
>> accuracy of PSI-BLAST protein database searches with composition-
>> based
>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>>
>>
>> RID: 53STX5G2013
>>
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>> from WGS projects
>>           9,252,587 sequences; 3,169,972,781 total letters Query=
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
>> ATGPDPDDEYE
>> Length=150
>>
>>
>>                                                                    
>> Score
>> E
>> Sequences producing significant alignments:                        
>> (Bits)
>> Value
>>
>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...    
>> 174
>> 2e-42
>>
>>
>> ALIGNMENTS
>> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>> Length=303
>>
>> Score =  174 bits (442),  Expect = 2e-42, Method: Composition-based  
>> stats.
>> Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps =  
>> 0/150 (0%)
>>
>> Query  1    
>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60
>>            
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>> Sbjct  154  
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>> 213
>>
>> Query  61    
>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>> 120
>>            
>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>> Sbjct  214  
>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>> 273
>>
>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>            AWHERDDNAFRQAHQNTAMATGPDPDDEYE
>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>
>>
>>
>>  Database: All non-redundant GenBank CDS translations+PDB+SwissProt
>> +PIR+PRF
>> excluding environmental samples from WGS projects
>>    Posted date:  Jul 5, 2009  4:41 AM
>>  Number of letters in database: -1,124,994,511
>>  Number of sequences in database:  9,252,587
>>
>> Lambda     K      H
>>   0.309    0.122    0.345
>> Gapped
>> Lambda     K      H
>>   0.267   0.0410    0.140
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Sequences: 9252587
>> Number of Hits to DB: 60273703
>> Number of extensions: 1448367
>> Number of successful extensions: 2103
>> Number of sequences better than 10: 0
>> Number of HSP's better than 10 without gapping: 0
>> Number of HSP's gapped: 2113
>> Number of HSP's successfully gapped: 0
>> Length of query: 150
>> Length of database: 3169972781
>> Length adjustment: 113
>> Effective length of query: 37
>> Effective length of database: 2124430450
>> Effective search space: 78603926650
>> Effective search space used: 78603926650
>> T: 11
>> A: 40
>> X1: 16 (7.1 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (20.8 bits)
>> S2: 74 (33.1 bits)
>>
>> ##############################################################################
>> ###################################
>> and here are the hits (?) of the blast-algorithm on the ncbi-
>> homepage with
>> the same query of course:
>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas reinhard...    
>> 300
>> 3e-80
>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA [Acyrtho...  
>> 36.2
>> 1.1
>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880 [Blautia...  
>> 35.4
>> 1.8
>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania brazil...  
>> 34.3
>> 4.2
>> ref|XP_680841.1|  hypothetical protein AN7572.2 [Aspergillus n...  
>> 33.5
>> 6.0
>> ref|YP_001768110.1|  hypothetical protein M446_1150 [Methyloba...  
>> 33.5
>> 7.0
>> ##############################################################################
>> ###################################at
>> least the first hit is the same, but even there there is a  
>> different score
>> and e-value.
>>
>> thanks so much for any help :)
>> regards, jonas
>>
>>
>> ----- Original Message -----
>> From: "Chris Fields" <cjfields@...>
>> To: "Jason Stajich" <jason@...>
>> Cc: "Smithies, Russell" <Russell.Smithies@...>;  
>> "'BioPerl
>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>> <Jonas_Schaer@...>
>> Sent: Monday, July 06, 2009 12:51 AM
>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>
>>
>> > That inspires confidence ;>
>> >
>> > chris
>> >
>> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>> >
>> >> integer overflow in blast....
>> >>
>> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>> >>
>> >>> I'd guess it's a difference in the parameters used.
>> >>> Interesting that both have the number of letters in the db as
>> >>> "-1,125,070,205", I assume that's a bug  :-)
>> >>>
>> >>> Stats from your remote_blast:
>> >>>
>> >>> 'stats' => {
>> >>>            'S1' => '42',
>> >>>            'S1_bits' => '20.8',
>> >>>            'lambda' => '0.309',
>> >>>            'entropy' => '0.345',
>> >>>            'kappa_gapped' => '0.0410',
>> >>>            'T' => '11',
>> >>>            'kappa' => '0.122',
>> >>>            'X3_bits' => '24.7',
>> >>>            'X1' => '16',
>> >>>            'lambda_gapped' => '0.267',
>> >>>            'X2' => '38',
>> >>>            'S2' => '74',
>> >>>            'seqs_better_than_cutoff' => '0',
>> >>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
>> >>>            'Hits_to_DB' => '60102303',
>> >>>            'dbletters' => '-1125070205',
>> >>>            'A' => '40',
>> >>>            'num_successful_extensions' => '2004',
>> >>>            'num_extensions' => '1436892',
>> >>>            'X1_bits' => '7.1',
>> >>>            'X3' => '64',
>> >>>            'entropy_gapped' => '0.140',
>> >>>            'dbentries' => '9252258',
>> >>>            'X2_bits' => '14.6',
>> >>>            'S2_bits' => '33.1'
>> >>>          }
>> >>>
>> >>>
>> >>> Stats from a blast done on the NCBI webpage:
>> >>>
>> >>> Database: All non-redundant GenBank CDS translations+PDB
>> +SwissProt
>> >>> +PIR+PRF
>> >>> excluding environmental samples from WGS projects
>> >>>   Posted date:  Jul 4, 2009  4:41 AM
>> >>> Number of letters in database: -1,125,070,205
>> >>> Number of sequences in database:  9,252,258
>> >>>
>> >>> Lambda     K      H
>> >>>  0.309    0.124    0.340
>> >>> Gapped
>> >>> Lambda     K      H
>> >>>  0.267   0.0410    0.140
>> >>> Matrix: BLOSUM62
>> >>> Gap Penalties: Existence: 11, Extension: 1
>> >>> Number of Sequences: 9252258
>> >>> Number of Hits to DB: 86493230
>> >>> Number of extensions: 3101413
>> >>> Number of successful extensions: 9001
>> >>> Number of sequences better than 100: 65
>> >>> Number of HSP's better than 100 without gapping: 0
>> >>> Number of HSP's gapped: 9000
>> >>> Number of HSP's successfully gapped: 66
>> >>> Length of query: 150
>> >>> Length of database: 3169897087
>> >>> Length adjustment: 113
>> >>> Effective length of query: 37
>> >>> Effective length of database: 2124391933
>> >>> Effective search space: 78602501521
>> >>> Effective search space used: 78602501521
>> >>> T: 11
>> >>> A: 40
>> >>> X1: 16 (7.1 bits)
>> >>> X2: 38 (14.6 bits)
>> >>> X3: 64 (24.7 bits)
>> >>> S1: 42 (20.8 bits)
>> >>> S2: 65 (29.6 bits)
>> >>>
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>> >>>> bounces@...] On Behalf Of Jonas Schaer
>> >>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>> >>>> To: BioPerl List
>> >>>> Subject: [Bioperl-l] different results with remote-blast skript
>> >>>>
>> >>>> Hi again :)
>> >>>> please, I only have this little question:
>> >>>> why do I get different results with my remote::blast perl skript
>> >>>> then on the
>> >>>> ncbi blast homepage?
>> >>>> I am using blastp, the query is an amino-sequence (different
>> >>>> results with any
>> >>>> sequence, differences not only in number of hits but even in e-
>> >>>> values, scores
>> >>>> etc...), the database is 'nr'.
>> >>>> PLEASE help me,
>> >>>> thank you in advance,
>> >>>> Jonas
>> >>>>
>> >>>> ps: my skript:
>> >>>>
>> ##############################################################################
>> >>>> ##
>> >>>> use Bio::Seq::SeqFactory;
>> >>>> use Bio::Tools::Run::RemoteBlast;
>> >>>> use strict;
>> >>>> my @blast_report;
>> >>>> my $prog = 'blastp';
>> >>>> my $db   = 'nr';
>> >>>> my $e_val= '1e-10';
>> >>>> #my $e_val= '10';
>> >>>> my @params = ( '-prog' => $prog,
>> >>>>        '-data' => $db,
>> >>>>        '-expect' => $e_val,
>> >>>>        '-readmethod' => 'SearchIO' );
>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>> >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>> >>>> $
>> >>>> Bio
>> >
>> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>> >>>> = '1';
>> >>>>
>> >>>> my
>> >>>> $
>> >>>> blast_seq
>> >>>>  
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>> >>>>
>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>> >>>> PDDEYE';
>> >>>> #$v is just to turn on and off the messages
>> >>>> my $v = 1;
>> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>> >>>> 'Bio::PrimarySeq');
>> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>> >>>> "$blast_seq");
>> >>>> my $filename='temp2.out';
>> >>>> my $r = $factory->submit_blast($seq);
>> >>>> print STDERR "waiting..." if( $v > 0 );
>> >>>>   while ( my @rids = $factory->each_rid )
>> >>>>   {
>> >>>>       foreach my $rid ( @rids )
>> >>>>       {
>> >>>>           my $rc = $factory->retrieve_blast($rid);
>> >>>>           if( !ref($rc) )
>> >>>>           {
>> >>>>               if( $rc < 0 )
>> >>>>               {
>> >>>>                   $factory->remove_rid($rid);
>> >>>>               }
>> >>>>               print STDERR "." if ( $v > 0 );
>> >>>>           }
>> >>>>               else
>> >>>>               {
>> >>>>                   my $result = $rc->next_result();
>> >>>>                   $factory->save_output($filename);
>> >>>>                   $factory->remove_rid($rid);
>> >>>>                   print "\nQuery Name: ", $result->query_name(),
>> >>>> "\n";
>> >>>>                   while ( my $hit = $result->next_hit )
>> >>>>                   {
>> >>>>                       next unless ( $v > 0);
>> >>>>                       print "\thit name is ", $hit->name, "\n";
>> >>>>                       while( my $hsp = $hit->next_hsp )
>> >>>>                       {
>> >>>>                           print "\t\tscore is ", $hsp->score,  
>> "\n";
>> >>>>                       }
>> >>>>                   }
>> >>>>               }
>> >>>>       }
>> >>>>
>> >>>>
>> >>>>   }
>> >>>> @blast_report = get_file_data ($filename);
>> >>>> return @blast_report;
>> >>>>
>> ##############################################################################
>> >>>> ####
>> >>>> _______________________________________________
>> >>>> Bioperl-l mailing list
>> >>>> Bioperl-l@...
>> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>> =
>> >>> =
>> >>>  
>> =====================================================================
>> >>> Attention: The information contained in this message and/or
>> >>> attachments
>> >>> from AgResearch Limited is intended only for the persons or  
>> entities
>> >>> to which it is addressed and may contain confidential and/or
>> >>> privileged
>> >>> material. Any review, retransmission, dissemination or other use
>> >>> of, or
>> >>> taking of any action in reliance upon, this information by  
>> persons or
>> >>> entities other than the intended recipients is prohibited by
>> >>> AgResearch
>> >>> Limited. If you have received this message in error, please  
>> notify
>> >>> the
>> >>> sender immediately.
>> >>> =
>> >>> =
>> >>>  
>> =====================================================================
>> >>>
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l@...
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >> --
>> >> Jason Stajich
>> >> jason@...
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l@...
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> ------------------------------------------------------------------------------
>> --
>>
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date:  
>> 07/05/09
>> 05:53:00
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date:  
> 07/05/09 17:54:00
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Cook, Malcolm-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jonas,

If you want to continue to use the bioperl remoteblast interface, probably what you should do is simply call it twice.

Once, as you already know how to do, which will return without CDD results.

Secondly, to get the CDD results, call remoteblast a second time.  This time, using
-database => 'CDD'
-program => 'rpsblast'

However, the wrapper may object to the 'rpsblast' program.  It is not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm)   If so, my guess is that changing the perl wrapper to allow rpsblast will "just work" (tm).  I've cc:ed cjfields@... for his opinion on this.

Also, you might want to perform the CDD search first, especially if you are streaming results to eyeball that might like something to look at while the second (presumably longer) search is running.

Cheers,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: bioperl-l-bounces@...
> [mailto:bioperl-l-bounces@...] On Behalf Of
> Jonas Schaer
> Sent: Thursday, July 09, 2009 5:16 AM
> To: BioPerl List; Smithies, Russell
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
> Hi guys,
> Thank you all so much for your help and patience :). Of
> course you were right and I finaly found the right
> put-parameter to get exactly the same hits as on the homepage.
> I do have an other question though :)...
> I now want to include a search for conserved domains, but
> when I try to use the CDD_SEARCH-parameter
> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
> sub:CDD_SEARCH)
> like the other put-parameters the way chris once told
> me(works fine with the other params):
>
> my %put = (
>     WORD_SIZE => 3,
>     HITLIST_SIZE => 100,
>     THRESHOLD => 11,
>     FILTER => 'R',
>     GENETIC_CODE => 1,
>     CDD_SEARCH => 'on'
> ###I tried it
> with 'true' and '1', too.
>
> );
>
> for my $putName (keys %put) {
>     $factory->submit_parameter($putName,$put{$putName});
> }
>
>
> ...an exception is thrown:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: CDD_SEARCH is not a valid PUT parameter.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> C:/Perl/site/lib/Bio/Tools
> /Run/RemoteBlast.pm:325
> STACK: main::blast_a_sequence firsteval0.8.pm:383
> STACK: main::blast_it firsteval0.8.pm:288
> STACK: firsteval0.8.pm:35
> ----------------------------------------------------------- .
> I guess somehow this could be the solution to my problem:
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
> ub:RID-for-Simultaneous
> , but unfortunately I don't understand what to do.
> I'm so sorry to bother you with this but please help me once more...:)
>
> Best regards and thanks in advance,
> Jonas
>
> ----- Original Message -----
> From: "Smithies, Russell" <Russell.Smithies@...>
> To: "'Jonas Schaer'" <Brotelzwieb@...>
> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
> <bioperl-l@...>
> Sent: Monday, July 06, 2009 10:56 PM
> Subject: RE: [Bioperl-l] different results with remote-blast skript
>
>
> Hi Jonas,
> You can't just play with the BLAST parameters and hope for a "better"
> result.
> I'd suggest that if you aren't sure what they do, you should
> leave them
> alone as small changes can make huge differences in the
> output - it's quite
> possible to miss finding what you're looking for by using the wrong
> parameters.
> If all else fails, read the blast manual:
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
> _all.html
> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> Or Read Ian Korfs' excellent book:
> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3

>
> Don't worry about the integer overflow bug as there's nothing
> you can do
> about it. If you're interested, Google and Wikipedia are your
> friends:
> http://en.wikipedia.org/wiki/Integer_overflow
>
>
> Russell
>
> > -----Original Message-----
> > From: bioperl-l-bounces@... [mailto:bioperl-l-
> > bounces@...] On Behalf Of Jonas Schaer
> > Sent: Tuesday, 7 July 2009 12:14 a.m.
> > To: BioPerl List; Chris Fields
> > Subject: Re: [Bioperl-l] different results with remote-blast skript
> >
> > Hi guys, thanks for your answers so far.
> > @jason: integer overflow in blast.... sorry, but what do
> you mean by that?
> > how can I fix it...?
> >
> > Since I never really changed any parameters I thought them
> all to be
> > default.
> > whatever, I tried to get "better" results with my prog by changing
> > these:
> > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
> STICS'} =
> > '1';
> > with no effect...I guess these were default values anyway.
> >
> > So please maybe you can tell me all the other parameters I
> can change with
> > my
> > perl-skript AND how to do that?
> > Unfortunately both, perl and the blast-algorithm are pretty
> much new to
> > me,
> > maybe thats why I just cannot find out how to do that on my
> own... :/
> >
> > Here is the output I get with my remote-blast skript:
> >
> ##############################################################
> ################
> > ###################################
> >  Query Name:
> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> > L
> >         hit name is ref|XP_001702807.1|
> >                 score is 442
> > BLASTP 2.2.21+
> > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
> A. Schaffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> Lipman (1997),
> > "Gapped
> > BLAST and PSI-BLAST: a new generation of protein database search
> > programs",
> > Nucleic Acids Res. 25:3389-3402.
> >
> >
> > Reference for composition-based statistics: Alejandro A.
> > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
> John L. Spouge,
> > Yuri
> > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
> "Improving the
> > accuracy of PSI-BLAST protein database searches with
> composition-based
> > statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
> >
> >
> > RID: 53STX5G2013
> >
> >
> > Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> > from WGS projects
> >            9,252,587 sequences; 3,169,972,781 total letters Query=
> >
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> >
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> > ATGPDPDDEYE
> > Length=150
> >
> >
> >
>        Score
> > E
> > Sequences producing significant alignments:
>       (Bits)
> > Value
> >
> > ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> reinhard...   174
> > 2e-42
> >
> >
> > ALIGNMENTS
> > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
> >  gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> > Length=303
> >
> >  Score =  174 bits (442),  Expect = 2e-42, Method:
> Composition-based
> > stats.
> >  Identities = 150/150 (100%), Positives = 150/150 (100%),
> Gaps = 0/150
> > (0%)
> >
> > Query  1
> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> > 60
> >
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> > Sbjct  154
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> > 213
> >
> > Query  61
> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> > 120
> >
> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> > Sbjct  214
> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> > 273
> >
> > Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
> >             AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> > Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
> >
> >
> >
> >   Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF
> > excluding environmental samples from WGS projects
> >     Posted date:  Jul 5, 2009  4:41 AM
> >   Number of letters in database: -1,124,994,511
> >   Number of sequences in database:  9,252,587
> >
> > Lambda     K      H
> >    0.309    0.122    0.345
> > Gapped
> > Lambda     K      H
> >    0.267   0.0410    0.140
> > Matrix: BLOSUM62
> > Gap Penalties: Existence: 11, Extension: 1
> > Number of Sequences: 9252587
> > Number of Hits to DB: 60273703
> > Number of extensions: 1448367
> > Number of successful extensions: 2103
> > Number of sequences better than 10: 0
> > Number of HSP's better than 10 without gapping: 0
> > Number of HSP's gapped: 2113
> > Number of HSP's successfully gapped: 0
> > Length of query: 150
> > Length of database: 3169972781
> > Length adjustment: 113
> > Effective length of query: 37
> > Effective length of database: 2124430450
> > Effective search space: 78603926650
> > Effective search space used: 78603926650
> > T: 11
> > A: 40
> > X1: 16 (7.1 bits)
> > X2: 38 (14.6 bits)
> > X3: 64 (24.7 bits)
> > S1: 42 (20.8 bits)
> > S2: 74 (33.1 bits)
> >
> >
> ##############################################################
> ################
> > ###################################
> > and here are the hits (?) of the blast-algorithm on the
> ncbi-homepage with
> > the same query of course:
> > ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> reinhard...   300
> > 3e-80
> > ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
> [Acyrtho...  36.2
> > 1.1
> > ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
> [Blautia...  35.4
> > 1.8
> > ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
> brazil...  34.3
> > 4.2
> > ref|XP_680841.1|  hypothetical protein AN7572.2
> [Aspergillus n...  33.5
> > 6.0
> > ref|YP_001768110.1|  hypothetical protein M446_1150
> [Methyloba...  33.5
> > 7.0
> >
> ##############################################################
> ################
> > ###################################at
> > least the first hit is the same, but even there there is a
> different score
> > and e-value.
> >
> > thanks so much for any help :)
> > regards, jonas
> >
> >
> > ----- Original Message -----
> > From: "Chris Fields" <cjfields@...>
> > To: "Jason Stajich" <jason@...>
> > Cc: "Smithies, Russell"
> <Russell.Smithies@...>; "'BioPerl
> > List'" <bioperl-l@...>; "'Jonas Schaer'"
> > <Jonas_Schaer@...>
> > Sent: Monday, July 06, 2009 12:51 AM
> > Subject: Re: [Bioperl-l] different results with remote-blast skript
> >
> >
> > > That inspires confidence ;>
> > >
> > > chris
> > >
> > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> > >
> > >> integer overflow in blast....
> > >>
> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> > >>
> > >>> I'd guess it's a difference in the parameters used.
> > >>> Interesting that both have the number of letters in the db as
> > >>> "-1,125,070,205", I assume that's a bug  :-)
> > >>>
> > >>> Stats from your remote_blast:
> > >>>
> > >>> 'stats' => {
> > >>>            'S1' => '42',
> > >>>            'S1_bits' => '20.8',
> > >>>            'lambda' => '0.309',
> > >>>            'entropy' => '0.345',
> > >>>            'kappa_gapped' => '0.0410',
> > >>>            'T' => '11',
> > >>>            'kappa' => '0.122',
> > >>>            'X3_bits' => '24.7',
> > >>>            'X1' => '16',
> > >>>            'lambda_gapped' => '0.267',
> > >>>            'X2' => '38',
> > >>>            'S2' => '74',
> > >>>            'seqs_better_than_cutoff' => '0',
> > >>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
> > >>>            'Hits_to_DB' => '60102303',
> > >>>            'dbletters' => '-1125070205',
> > >>>            'A' => '40',
> > >>>            'num_successful_extensions' => '2004',
> > >>>            'num_extensions' => '1436892',
> > >>>            'X1_bits' => '7.1',
> > >>>            'X3' => '64',
> > >>>            'entropy_gapped' => '0.140',
> > >>>            'dbentries' => '9252258',
> > >>>            'X2_bits' => '14.6',
> > >>>            'S2_bits' => '33.1'
> > >>>          }
> > >>>
> > >>>
> > >>> Stats from a blast done on the NCBI webpage:
> > >>>
> > >>> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt
> > >>> +PIR+PRF
> > >>> excluding environmental samples from WGS projects
> > >>>   Posted date:  Jul 4, 2009  4:41 AM
> > >>> Number of letters in database: -1,125,070,205
> > >>> Number of sequences in database:  9,252,258
> > >>>
> > >>> Lambda     K      H
> > >>>  0.309    0.124    0.340
> > >>> Gapped
> > >>> Lambda     K      H
> > >>>  0.267   0.0410    0.140
> > >>> Matrix: BLOSUM62
> > >>> Gap Penalties: Existence: 11, Extension: 1
> > >>> Number of Sequences: 9252258
> > >>> Number of Hits to DB: 86493230
> > >>> Number of extensions: 3101413
> > >>> Number of successful extensions: 9001
> > >>> Number of sequences better than 100: 65
> > >>> Number of HSP's better than 100 without gapping: 0
> > >>> Number of HSP's gapped: 9000
> > >>> Number of HSP's successfully gapped: 66
> > >>> Length of query: 150
> > >>> Length of database: 3169897087
> > >>> Length adjustment: 113
> > >>> Effective length of query: 37
> > >>> Effective length of database: 2124391933
> > >>> Effective search space: 78602501521
> > >>> Effective search space used: 78602501521
> > >>> T: 11
> > >>> A: 40
> > >>> X1: 16 (7.1 bits)
> > >>> X2: 38 (14.6 bits)
> > >>> X3: 64 (24.7 bits)
> > >>> S1: 42 (20.8 bits)
> > >>> S2: 65 (29.6 bits)
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> > >>>> bounces@...] On Behalf Of Jonas Schaer
> > >>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> > >>>> To: BioPerl List
> > >>>> Subject: [Bioperl-l] different results with remote-blast skript
> > >>>>
> > >>>> Hi again :)
> > >>>> please, I only have this little question:
> > >>>> why do I get different results with my remote::blast
> perl skript
> > >>>> then on the
> > >>>> ncbi blast homepage?
> > >>>> I am using blastp, the query is an amino-sequence (different
> > >>>> results with any
> > >>>> sequence, differences not only in number of hits but even in e-
> > >>>> values, scores
> > >>>> etc...), the database is 'nr'.
> > >>>> PLEASE help me,
> > >>>> thank you in advance,
> > >>>> Jonas
> > >>>>
> > >>>> ps: my skript:
> > >>>>
> >
> ##############################################################
> ################
> > >>>> ##
> > >>>> use Bio::Seq::SeqFactory;
> > >>>> use Bio::Tools::Run::RemoteBlast;
> > >>>> use strict;
> > >>>> my @blast_report;
> > >>>> my $prog = 'blastp';
> > >>>> my $db   = 'nr';
> > >>>> my $e_val= '1e-10';
> > >>>> #my $e_val= '10';
> > >>>> my @params = ( '-prog' => $prog,
> > >>>>        '-data' => $db,
> > >>>>        '-expect' => $e_val,
> > >>>>        '-readmethod' => 'SearchIO' );
> > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> > >>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> > >>>> $
> > >>>> Bio
> > >>>>
> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> > >>>> = '1';
> > >>>>
> > >>>> my
> > >>>> $
> > >>>> blast_seq
> > >>>>
> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> > >>>>
> >
> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
> AFRQAHQNTAMATGPD
> > >>>> PDDEYE';
> > >>>> #$v is just to turn on and off the messages
> > >>>> my $v = 1;
> > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> > >>>> 'Bio::PrimarySeq');
> > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
> > >>>> "$blast_seq");
> > >>>> my $filename='temp2.out';
> > >>>> my $r = $factory->submit_blast($seq);
> > >>>> print STDERR "waiting..." if( $v > 0 );
> > >>>>   while ( my @rids = $factory->each_rid )
> > >>>>   {
> > >>>>       foreach my $rid ( @rids )
> > >>>>       {
> > >>>>           my $rc = $factory->retrieve_blast($rid);
> > >>>>           if( !ref($rc) )
> > >>>>           {
> > >>>>               if( $rc < 0 )
> > >>>>               {
> > >>>>                   $factory->remove_rid($rid);
> > >>>>               }
> > >>>>               print STDERR "." if ( $v > 0 );
> > >>>>           }
> > >>>>               else
> > >>>>               {
> > >>>>                   my $result = $rc->next_result();
> > >>>>                   $factory->save_output($filename);
> > >>>>                   $factory->remove_rid($rid);
> > >>>>                   print "\nQuery Name: ",
> $result->query_name(),
> > >>>> "\n";
> > >>>>                   while ( my $hit = $result->next_hit )
> > >>>>                   {
> > >>>>                       next unless ( $v > 0);
> > >>>>                       print "\thit name is ", $hit->name, "\n";
> > >>>>                       while( my $hsp = $hit->next_hsp )
> > >>>>                       {
> > >>>>                           print "\t\tscore is ",
> $hsp->score, "\n";
> > >>>>                       }
> > >>>>                   }
> > >>>>               }
> > >>>>       }
> > >>>>
> > >>>>
> > >>>>   }
> > >>>> @blast_report = get_file_data ($filename);
> > >>>> return @blast_report;
> > >>>>
> >
> ##############################################################
> ################
> > >>>> ####
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l@...
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>> =
> > >>> =
> > >>>
> =====================================================================
> > >>> Attention: The information contained in this message and/or
> > >>> attachments
> > >>> from AgResearch Limited is intended only for the
> persons or entities
> > >>> to which it is addressed and may contain confidential and/or
> > >>> privileged
> > >>> material. Any review, retransmission, dissemination or other use
> > >>> of, or
> > >>> taking of any action in reliance upon, this information
> by persons or
> > >>> entities other than the intended recipients is prohibited by
> > >>> AgResearch
> > >>> Limited. If you have received this message in error,
> please notify
> > >>> the
> > >>> sender immediately.
> > >>> =
> > >>> =
> > >>>
> =====================================================================
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l@...
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> --
> > >> Jason Stajich
> > >> jason@...
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l@...
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> --------------------------------------------------------------
> ----------------
> > --
> >
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
> Date: 07/05/09
> > 05:53:00
>
>
> --------------------------------------------------------------
> ------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
> Date: 07/05/09
> 17:54:00
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Chris Fields-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've scheduled this tentatively for the 1.6 release series (just not  
sure when yet).  It may work as is, but I haven't tried it out yet  
(and am hazarding to guess it only retrieves the single main RID at  
the moment).

chris

On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:

> Jonas,
>
> If you want to continue to use the bioperl remoteblast interface,  
> probably what you should do is simply call it twice.
>
> Once, as you already know how to do, which will return without CDD  
> results.
>
> Secondly, to get the CDD results, call remoteblast a second time.  
> This time, using
> -database => 'CDD'
> -program => 'rpsblast'
>
> However, the wrapper may object to the 'rpsblast' program.  It is  
> not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm)
>    If so, my guess is that changing the perl wrapper to allow  
> rpsblast will "just work" (tm).  I've cc:ed cjfields@... for  
> his opinion on this.
>
> Also, you might want to perform the CDD search first, especially if  
> you are streaming results to eyeball that might like something to  
> look at while the second (presumably longer) search is running.
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces@...
>> [mailto:bioperl-l-bounces@...] On Behalf Of
>> Jonas Schaer
>> Sent: Thursday, July 09, 2009 5:16 AM
>> To: BioPerl List; Smithies, Russell
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Hi guys,
>> Thank you all so much for your help and patience :). Of
>> course you were right and I finaly found the right
>> put-parameter to get exactly the same hits as on the homepage.
>> I do have an other question though :)...
>> I now want to include a search for conserved domains, but
>> when I try to use the CDD_SEARCH-parameter
>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>> sub:CDD_SEARCH)
>> like the other put-parameters the way chris once told
>> me(works fine with the other params):
>>
>> my %put = (
>>    WORD_SIZE => 3,
>>    HITLIST_SIZE => 100,
>>    THRESHOLD => 11,
>>    FILTER => 'R',
>>    GENETIC_CODE => 1,
>>    CDD_SEARCH => 'on'
>> ###I tried it
>> with 'true' and '1', too.
>>
>> );
>>
>> for my $putName (keys %put) {
>>    $factory->submit_parameter($putName,$put{$putName});
>> }
>>
>>
>> ...an exception is thrown:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: CDD_SEARCH is not a valid PUT parameter.
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>> C:/Perl/site/lib/Bio/Tools
>> /Run/RemoteBlast.pm:325
>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>> STACK: main::blast_it firsteval0.8.pm:288
>> STACK: firsteval0.8.pm:35
>> ----------------------------------------------------------- .
>> I guess somehow this could be the solution to my problem:
>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>> ub:RID-for-Simultaneous
>> , but unfortunately I don't understand what to do.
>> I'm so sorry to bother you with this but please help me once  
>> more...:)
>>
>> Best regards and thanks in advance,
>> Jonas
>>
>> ----- Original Message -----
>> From: "Smithies, Russell" <Russell.Smithies@...>
>> To: "'Jonas Schaer'" <Brotelzwieb@...>
>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
>> <bioperl-l@...>
>> Sent: Monday, July 06, 2009 10:56 PM
>> Subject: RE: [Bioperl-l] different results with remote-blast skript
>>
>>
>> Hi Jonas,
>> You can't just play with the BLAST parameters and hope for a "better"
>> result.
>> I'd suggest that if you aren't sure what they do, you should
>> leave them
>> alone as small changes can make huge differences in the
>> output - it's quite
>> possible to miss finding what you're looking for by using the wrong
>> parameters.
>> If all else fails, read the blast manual:
>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>> _all.html
>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>> Or Read Ian Korfs' excellent book:
>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>
>> Don't worry about the integer overflow bug as there's nothing
>> you can do
>> about it. If you're interested, Google and Wikipedia are your
>> friends:
>> http://en.wikipedia.org/wiki/Integer_overflow
>>
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>> bounces@...] On Behalf Of Jonas Schaer
>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>> To: BioPerl List; Chris Fields
>>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>>
>>> Hi guys, thanks for your answers so far.
>>> @jason: integer overflow in blast.... sorry, but what do
>> you mean by that?
>>> how can I fix it...?
>>>
>>> Since I never really changed any parameters I thought them
>> all to be
>>> default.
>>> whatever, I tried to get "better" results with my prog by changing
>>> these:
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>> STICS'} =
>>> '1';
>>> with no effect...I guess these were default values anyway.
>>>
>>> So please maybe you can tell me all the other parameters I
>> can change with
>>> my
>>> perl-skript AND how to do that?
>>> Unfortunately both, perl and the blast-algorithm are pretty
>> much new to
>>> me,
>>> maybe thats why I just cannot find out how to do that on my
>> own... :/
>>>
>>> Here is the output I get with my remote-blast skript:
>>>
>> ##############################################################
>> ################
>>> ###################################
>>> Query Name:
>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>> L
>>>        hit name is ref|XP_001702807.1|
>>>                score is 442
>>> BLASTP 2.2.21+
>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>> A. Schaffer,
>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>> Lipman (1997),
>>> "Gapped
>>> BLAST and PSI-BLAST: a new generation of protein database search
>>> programs",
>>> Nucleic Acids Res. 25:3389-3402.
>>>
>>>
>>> Reference for composition-based statistics: Alejandro A.
>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>> John L. Spouge,
>>> Yuri
>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>> "Improving the
>>> accuracy of PSI-BLAST protein database searches with
>> composition-based
>>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>>>
>>>
>>> RID: 53STX5G2013
>>>
>>>
>>> Database: All non-redundant GenBank CDS
>>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>> from WGS projects
>>>           9,252,587 sequences; 3,169,972,781 total letters Query=
>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
>>> ATGPDPDDEYE
>>> Length=150
>>>
>>>
>>>
>>       Score
>>> E
>>> Sequences producing significant alignments:
>>      (Bits)
>>> Value
>>>
>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>> reinhard...   174
>>> 2e-42
>>>
>>>
>>> ALIGNMENTS
>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>> Length=303
>>>
>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>> Composition-based
>>> stats.
>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>> Gaps = 0/150
>>> (0%)
>>>
>>> Query  1
>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>> 60
>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>> Sbjct  154
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>> 213
>>>
>>> Query  61
>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>> 120
>>>
>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>> Sbjct  214
>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>> 273
>>>
>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>            AWHERDDNAFRQAHQNTAMATGPDPDDEYE
>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>
>>>
>>>
>>>  Database: All non-redundant GenBank CDS
>>> translations+PDB+SwissProt+PIR+PRF
>>> excluding environmental samples from WGS projects
>>>    Posted date:  Jul 5, 2009  4:41 AM
>>>  Number of letters in database: -1,124,994,511
>>>  Number of sequences in database:  9,252,587
>>>
>>> Lambda     K      H
>>>   0.309    0.122    0.345
>>> Gapped
>>> Lambda     K      H
>>>   0.267   0.0410    0.140
>>> Matrix: BLOSUM62
>>> Gap Penalties: Existence: 11, Extension: 1
>>> Number of Sequences: 9252587
>>> Number of Hits to DB: 60273703
>>> Number of extensions: 1448367
>>> Number of successful extensions: 2103
>>> Number of sequences better than 10: 0
>>> Number of HSP's better than 10 without gapping: 0
>>> Number of HSP's gapped: 2113
>>> Number of HSP's successfully gapped: 0
>>> Length of query: 150
>>> Length of database: 3169972781
>>> Length adjustment: 113
>>> Effective length of query: 37
>>> Effective length of database: 2124430450
>>> Effective search space: 78603926650
>>> Effective search space used: 78603926650
>>> T: 11
>>> A: 40
>>> X1: 16 (7.1 bits)
>>> X2: 38 (14.6 bits)
>>> X3: 64 (24.7 bits)
>>> S1: 42 (20.8 bits)
>>> S2: 74 (33.1 bits)
>>>
>>>
>> ##############################################################
>> ################
>>> ###################################
>>> and here are the hits (?) of the blast-algorithm on the
>> ncbi-homepage with
>>> the same query of course:
>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>> reinhard...   300
>>> 3e-80
>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>> [Acyrtho...  36.2
>>> 1.1
>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>> [Blautia...  35.4
>>> 1.8
>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>> brazil...  34.3
>>> 4.2
>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>> [Aspergillus n...  33.5
>>> 6.0
>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>> [Methyloba...  33.5
>>> 7.0
>>>
>> ##############################################################
>> ################
>>> ###################################at
>>> least the first hit is the same, but even there there is a
>> different score
>>> and e-value.
>>>
>>> thanks so much for any help :)
>>> regards, jonas
>>>
>>>
>>> ----- Original Message -----
>>> From: "Chris Fields" <cjfields@...>
>>> To: "Jason Stajich" <jason@...>
>>> Cc: "Smithies, Russell"
>> <Russell.Smithies@...>; "'BioPerl
>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>>> <Jonas_Schaer@...>
>>> Sent: Monday, July 06, 2009 12:51 AM
>>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>>
>>>
>>>> That inspires confidence ;>
>>>>
>>>> chris
>>>>
>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>
>>>>> integer overflow in blast....
>>>>>
>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>
>>>>>> I'd guess it's a difference in the parameters used.
>>>>>> Interesting that both have the number of letters in the db as
>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>
>>>>>> Stats from your remote_blast:
>>>>>>
>>>>>> 'stats' => {
>>>>>>           'S1' => '42',
>>>>>>           'S1_bits' => '20.8',
>>>>>>           'lambda' => '0.309',
>>>>>>           'entropy' => '0.345',
>>>>>>           'kappa_gapped' => '0.0410',
>>>>>>           'T' => '11',
>>>>>>           'kappa' => '0.122',
>>>>>>           'X3_bits' => '24.7',
>>>>>>           'X1' => '16',
>>>>>>           'lambda_gapped' => '0.267',
>>>>>>           'X2' => '38',
>>>>>>           'S2' => '74',
>>>>>>           'seqs_better_than_cutoff' => '0',
>>>>>>           'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>           'Hits_to_DB' => '60102303',
>>>>>>           'dbletters' => '-1125070205',
>>>>>>           'A' => '40',
>>>>>>           'num_successful_extensions' => '2004',
>>>>>>           'num_extensions' => '1436892',
>>>>>>           'X1_bits' => '7.1',
>>>>>>           'X3' => '64',
>>>>>>           'entropy_gapped' => '0.140',
>>>>>>           'dbentries' => '9252258',
>>>>>>           'X2_bits' => '14.6',
>>>>>>           'S2_bits' => '33.1'
>>>>>>         }
>>>>>>
>>>>>>
>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>
>>>>>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt
>>>>>> +PIR+PRF
>>>>>> excluding environmental samples from WGS projects
>>>>>>  Posted date:  Jul 4, 2009  4:41 AM
>>>>>> Number of letters in database: -1,125,070,205
>>>>>> Number of sequences in database:  9,252,258
>>>>>>
>>>>>> Lambda     K      H
>>>>>> 0.309    0.124    0.340
>>>>>> Gapped
>>>>>> Lambda     K      H
>>>>>> 0.267   0.0410    0.140
>>>>>> Matrix: BLOSUM62
>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>> Number of Sequences: 9252258
>>>>>> Number of Hits to DB: 86493230
>>>>>> Number of extensions: 3101413
>>>>>> Number of successful extensions: 9001
>>>>>> Number of sequences better than 100: 65
>>>>>> Number of HSP's better than 100 without gapping: 0
>>>>>> Number of HSP's gapped: 9000
>>>>>> Number of HSP's successfully gapped: 66
>>>>>> Length of query: 150
>>>>>> Length of database: 3169897087
>>>>>> Length adjustment: 113
>>>>>> Effective length of query: 37
>>>>>> Effective length of database: 2124391933
>>>>>> Effective search space: 78602501521
>>>>>> Effective search space used: 78602501521
>>>>>> T: 11
>>>>>> A: 40
>>>>>> X1: 16 (7.1 bits)
>>>>>> X2: 38 (14.6 bits)
>>>>>> X3: 64 (24.7 bits)
>>>>>> S1: 42 (20.8 bits)
>>>>>> S2: 65 (29.6 bits)
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>> To: BioPerl List
>>>>>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>>>>>
>>>>>>> Hi again :)
>>>>>>> please, I only have this little question:
>>>>>>> why do I get different results with my remote::blast
>> perl skript
>>>>>>> then on the
>>>>>>> ncbi blast homepage?
>>>>>>> I am using blastp, the query is an amino-sequence (different
>>>>>>> results with any
>>>>>>> sequence, differences not only in number of hits but even in e-
>>>>>>> values, scores
>>>>>>> etc...), the database is 'nr'.
>>>>>>> PLEASE help me,
>>>>>>> thank you in advance,
>>>>>>> Jonas
>>>>>>>
>>>>>>> ps: my skript:
>>>>>>>
>>>
>> ##############################################################
>> ################
>>>>>>> ##
>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>> use strict;
>>>>>>> my @blast_report;
>>>>>>> my $prog = 'blastp';
>>>>>>> my $db   = 'nr';
>>>>>>> my $e_val= '1e-10';
>>>>>>> #my $e_val= '10';
>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>       '-data' => $db,
>>>>>>>       '-expect' => $e_val,
>>>>>>>       '-readmethod' => 'SearchIO' );
>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>> $
>>>>>>> Bio
>>>>>>>
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>> = '1';
>>>>>>>
>>>>>>> my
>>>>>>> $
>>>>>>> blast_seq
>>>>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>
>>>
>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>> AFRQAHQNTAMATGPD
>>>>>>> PDDEYE';
>>>>>>> #$v is just to turn on and off the messages
>>>>>>> my $v = 1;
>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>> 'Bio::PrimarySeq');
>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>>>>>> "$blast_seq");
>>>>>>> my $filename='temp2.out';
>>>>>>> my $r = $factory->submit_blast($seq);
>>>>>>> print STDERR "waiting..." if( $v > 0 );
>>>>>>>  while ( my @rids = $factory->each_rid )
>>>>>>>  {
>>>>>>>      foreach my $rid ( @rids )
>>>>>>>      {
>>>>>>>          my $rc = $factory->retrieve_blast($rid);
>>>>>>>          if( !ref($rc) )
>>>>>>>          {
>>>>>>>              if( $rc < 0 )
>>>>>>>              {
>>>>>>>                  $factory->remove_rid($rid);
>>>>>>>              }
>>>>>>>              print STDERR "." if ( $v > 0 );
>>>>>>>          }
>>>>>>>              else
>>>>>>>              {
>>>>>>>                  my $result = $rc->next_result();
>>>>>>>                  $factory->save_output($filename);
>>>>>>>                  $factory->remove_rid($rid);
>>>>>>>                  print "\nQuery Name: ",
>> $result->query_name(),
>>>>>>> "\n";
>>>>>>>                  while ( my $hit = $result->next_hit )
>>>>>>>                  {
>>>>>>>                      next unless ( $v > 0);
>>>>>>>                      print "\thit name is ", $hit->name, "\n";
>>>>>>>                      while( my $hsp = $hit->next_hsp )
>>>>>>>                      {
>>>>>>>                          print "\t\tscore is ",
>> $hsp->score, "\n";
>>>>>>>                      }
>>>>>>>                  }
>>>>>>>              }
>>>>>>>      }
>>>>>>>
>>>>>>>
>>>>>>>  }
>>>>>>> @blast_report = get_file_data ($filename);
>>>>>>> return @blast_report;
>>>>>>>
>>>
>> ##############################################################
>> ################
>>>>>>> ####
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@...
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> =
>>>>>> =
>>>>>>
>> =====================================================================
>>>>>> Attention: The information contained in this message and/or
>>>>>> attachments
>>>>>> from AgResearch Limited is intended only for the
>> persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use
>>>>>> of, or
>>>>>> taking of any action in reliance upon, this information
>> by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>>>> AgResearch
>>>>>> Limited. If you have received this message in error,
>> please notify
>>>>>> the
>>>>>> sender immediately.
>>>>>> =
>>>>>> =
>>>>>>
>> =====================================================================
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l@...
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> --
>>>>> Jason Stajich
>>>>> jason@...
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@...
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>> --------------------------------------------------------------
>> ----------------
>>> --
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>> Date: 07/05/09
>>> 05:53:00
>>
>>
>> --------------------------------------------------------------
>> ------------------
>>
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>> Date: 07/05/09
>> 17:54:00
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Jonas Schaer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I tried to do what Malcom proposed my ($prog = 'rpsblast';   my $db   =
'CDD';)  but that didn't work.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Value rpsblast for PUT parameter PROGRAM does not match expression
t?blast[
pnx]. Rejecting.
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
C:/Perl/site/lib/Bio/Tools
/Run/RemoteBlast.pm:329
STACK: Bio::Tools::Run::RemoteBlast::new
C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
ast.pm:257
STACK: blast_a_seq2.pm:14
-----------------------------------------------------------
So I should try to "change the wrapper to allow 'rpsblast'", right? Could
You tell me how to do that, please? So sorry but I have no idea yet...:)
If that doesn't work, is there any other way to run cdd-searches with perl?
Thank you so much!
Regards, Jonas

----- Original Message -----
From: "Chris Fields" <cjfields1@...>
To: "Cook, Malcolm" <MEC@...>
Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
<bioperl-l@...>; "'Smithies, Russell'"
<Russell.Smithies@...>; <cjfields@...>
Sent: Thursday, July 09, 2009 9:19 PM
Subject: Re: [Bioperl-l] cdd-search with remoteblast?


> I've scheduled this tentatively for the 1.6 release series (just not
> sure when yet).  It may work as is, but I haven't tried it out yet
> (and am hazarding to guess it only retrieves the single main RID at
> the moment).
>
> chris
>
> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>
>> Jonas,
>>
>> If you want to continue to use the bioperl remoteblast interface,
>> probably what you should do is simply call it twice.
>>
>> Once, as you already know how to do, which will return without CDD
>> results.
>>
>> Secondly, to get the CDD results, call remoteblast a second time.
>> This time, using
>> -database => 'CDD'
>> -program => 'rpsblast'
>>
>> However, the wrapper may object to the 'rpsblast' program.  It is
>> not listed in the POD -
>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm)
>>    If so, my guess is that changing the perl wrapper to allow
>> rpsblast will "just work" (tm).  I've cc:ed cjfields@... for
>> his opinion on this.
>>
>> Also, you might want to perform the CDD search first, especially if
>> you are streaming results to eyeball that might like something to
>> look at while the second (presumably longer) search is running.
>>
>> Cheers,
>>
>> Malcolm Cook
>> Stowers Institute for Medical Research - Kansas City, Missouri
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces@...
>>> [mailto:bioperl-l-bounces@...] On Behalf Of
>>> Jonas Schaer
>>> Sent: Thursday, July 09, 2009 5:16 AM
>>> To: BioPerl List; Smithies, Russell
>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>
>>> Hi guys,
>>> Thank you all so much for your help and patience :). Of
>>> course you were right and I finaly found the right
>>> put-parameter to get exactly the same hits as on the homepage.
>>> I do have an other question though :)...
>>> I now want to include a search for conserved domains, but
>>> when I try to use the CDD_SEARCH-parameter
>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>> sub:CDD_SEARCH)
>>> like the other put-parameters the way chris once told
>>> me(works fine with the other params):
>>>
>>> my %put = (
>>>    WORD_SIZE => 3,
>>>    HITLIST_SIZE => 100,
>>>    THRESHOLD => 11,
>>>    FILTER => 'R',
>>>    GENETIC_CODE => 1,
>>>    CDD_SEARCH => 'on'
>>> ###I tried it
>>> with 'true' and '1', too.
>>>
>>> );
>>>
>>> for my $putName (keys %put) {
>>>    $factory->submit_parameter($putName,$put{$putName});
>>> }
>>>
>>>
>>> ...an exception is thrown:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>> C:/Perl/site/lib/Bio/Tools
>>> /Run/RemoteBlast.pm:325
>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>> STACK: main::blast_it firsteval0.8.pm:288
>>> STACK: firsteval0.8.pm:35
>>> ----------------------------------------------------------- .
>>> I guess somehow this could be the solution to my problem:
>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>> ub:RID-for-Simultaneous
>>> , but unfortunately I don't understand what to do.
>>> I'm so sorry to bother you with this but please help me once
>>> more...:)
>>>
>>> Best regards and thanks in advance,
>>> Jonas
>>>
>>> ----- Original Message -----
>>> From: "Smithies, Russell" <Russell.Smithies@...>
>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
>>> <bioperl-l@...>
>>> Sent: Monday, July 06, 2009 10:56 PM
>>> Subject: RE: [Bioperl-l] different results with remote-blast skript
>>>
>>>
>>> Hi Jonas,
>>> You can't just play with the BLAST parameters and hope for a "better"
>>> result.
>>> I'd suggest that if you aren't sure what they do, you should
>>> leave them
>>> alone as small changes can make huge differences in the
>>> output - it's quite
>>> possible to miss finding what you're looking for by using the wrong
>>> parameters.
>>> If all else fails, read the blast manual:
>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>> _all.html
>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>> Or Read Ian Korfs' excellent book:
>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>
>>> Don't worry about the integer overflow bug as there's nothing
>>> you can do
>>> about it. If you're interested, Google and Wikipedia are your
>>> friends:
>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>
>>>
>>> Russell
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>> bounces@...] On Behalf Of Jonas Schaer
>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>> To: BioPerl List; Chris Fields
>>>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>>>
>>>> Hi guys, thanks for your answers so far.
>>>> @jason: integer overflow in blast.... sorry, but what do
>>> you mean by that?
>>>> how can I fix it...?
>>>>
>>>> Since I never really changed any parameters I thought them
>>> all to be
>>>> default.
>>>> whatever, I tried to get "better" results with my prog by changing
>>>> these:
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>> STICS'} =
>>>> '1';
>>>> with no effect...I guess these were default values anyway.
>>>>
>>>> So please maybe you can tell me all the other parameters I
>>> can change with
>>>> my
>>>> perl-skript AND how to do that?
>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>> much new to
>>>> me,
>>>> maybe thats why I just cannot find out how to do that on my
>>> own... :/
>>>>
>>>> Here is the output I get with my remote-blast skript:
>>>>
>>> ##############################################################
>>> ################
>>>> ###################################
>>>> Query Name:
>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>> L
>>>>        hit name is ref|XP_001702807.1|
>>>>                score is 442
>>>> BLASTP 2.2.21+
>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>> A. Schaffer,
>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>> Lipman (1997),
>>>> "Gapped
>>>> BLAST and PSI-BLAST: a new generation of protein database search
>>>> programs",
>>>> Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>
>>>> Reference for composition-based statistics: Alejandro A.
>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>> John L. Spouge,
>>>> Yuri
>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>> "Improving the
>>>> accuracy of PSI-BLAST protein database searches with
>>> composition-based
>>>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>>>>
>>>>
>>>> RID: 53STX5G2013
>>>>
>>>>
>>>> Database: All non-redundant GenBank CDS
>>>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>> from WGS projects
>>>>           9,252,587 sequences; 3,169,972,781 total letters Query=
>>>>
>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>
>>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
>>>> ATGPDPDDEYE
>>>> Length=150
>>>>
>>>>
>>>>
>>>       Score
>>>> E
>>>> Sequences producing significant alignments:
>>>      (Bits)
>>>> Value
>>>>
>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>> reinhard...   174
>>>> 2e-42
>>>>
>>>>
>>>> ALIGNMENTS
>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>> Length=303
>>>>
>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>> Composition-based
>>>> stats.
>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>> Gaps = 0/150
>>>> (0%)
>>>>
>>>> Query  1
>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>> 60
>>>>
>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>> Sbjct  154
>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>> 213
>>>>
>>>> Query  61
>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>> 120
>>>>
>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>> Sbjct  214
>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>> 273
>>>>
>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>            AWHERDDNAFRQAHQNTAMATGPDPDDEYE
>>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>
>>>>
>>>>
>>>>  Database: All non-redundant GenBank CDS
>>>> translations+PDB+SwissProt+PIR+PRF
>>>> excluding environmental samples from WGS projects
>>>>    Posted date:  Jul 5, 2009  4:41 AM
>>>>  Number of letters in database: -1,124,994,511
>>>>  Number of sequences in database:  9,252,587
>>>>
>>>> Lambda     K      H
>>>>   0.309    0.122    0.345
>>>> Gapped
>>>> Lambda     K      H
>>>>   0.267   0.0410    0.140
>>>> Matrix: BLOSUM62
>>>> Gap Penalties: Existence: 11, Extension: 1
>>>> Number of Sequences: 9252587
>>>> Number of Hits to DB: 60273703
>>>> Number of extensions: 1448367
>>>> Number of successful extensions: 2103
>>>> Number of sequences better than 10: 0
>>>> Number of HSP's better than 10 without gapping: 0
>>>> Number of HSP's gapped: 2113
>>>> Number of HSP's successfully gapped: 0
>>>> Length of query: 150
>>>> Length of database: 3169972781
>>>> Length adjustment: 113
>>>> Effective length of query: 37
>>>> Effective length of database: 2124430450
>>>> Effective search space: 78603926650
>>>> Effective search space used: 78603926650
>>>> T: 11
>>>> A: 40
>>>> X1: 16 (7.1 bits)
>>>> X2: 38 (14.6 bits)
>>>> X3: 64 (24.7 bits)
>>>> S1: 42 (20.8 bits)
>>>> S2: 74 (33.1 bits)
>>>>
>>>>
>>> ##############################################################
>>> ################
>>>> ###################################
>>>> and here are the hits (?) of the blast-algorithm on the
>>> ncbi-homepage with
>>>> the same query of course:
>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>> reinhard...   300
>>>> 3e-80
>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>> [Acyrtho...  36.2
>>>> 1.1
>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>> [Blautia...  35.4
>>>> 1.8
>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>> brazil...  34.3
>>>> 4.2
>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>> [Aspergillus n...  33.5
>>>> 6.0
>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>> [Methyloba...  33.5
>>>> 7.0
>>>>
>>> ##############################################################
>>> ################
>>>> ###################################at
>>>> least the first hit is the same, but even there there is a
>>> different score
>>>> and e-value.
>>>>
>>>> thanks so much for any help :)
>>>> regards, jonas
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: "Chris Fields" <cjfields@...>
>>>> To: "Jason Stajich" <jason@...>
>>>> Cc: "Smithies, Russell"
>>> <Russell.Smithies@...>; "'BioPerl
>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>>>> <Jonas_Schaer@...>
>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>> Subject: Re: [Bioperl-l] different results with remote-blast skript
>>>>
>>>>
>>>>> That inspires confidence ;>
>>>>>
>>>>> chris
>>>>>
>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>
>>>>>> integer overflow in blast....
>>>>>>
>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>> Interesting that both have the number of letters in the db as
>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>
>>>>>>> Stats from your remote_blast:
>>>>>>>
>>>>>>> 'stats' => {
>>>>>>>           'S1' => '42',
>>>>>>>           'S1_bits' => '20.8',
>>>>>>>           'lambda' => '0.309',
>>>>>>>           'entropy' => '0.345',
>>>>>>>           'kappa_gapped' => '0.0410',
>>>>>>>           'T' => '11',
>>>>>>>           'kappa' => '0.122',
>>>>>>>           'X3_bits' => '24.7',
>>>>>>>           'X1' => '16',
>>>>>>>           'lambda_gapped' => '0.267',
>>>>>>>           'X2' => '38',
>>>>>>>           'S2' => '74',
>>>>>>>           'seqs_better_than_cutoff' => '0',
>>>>>>>           'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>           'Hits_to_DB' => '60102303',
>>>>>>>           'dbletters' => '-1125070205',
>>>>>>>           'A' => '40',
>>>>>>>           'num_successful_extensions' => '2004',
>>>>>>>           'num_extensions' => '1436892',
>>>>>>>           'X1_bits' => '7.1',
>>>>>>>           'X3' => '64',
>>>>>>>           'entropy_gapped' => '0.140',
>>>>>>>           'dbentries' => '9252258',
>>>>>>>           'X2_bits' => '14.6',
>>>>>>>           'S2_bits' => '33.1'
>>>>>>>         }
>>>>>>>
>>>>>>>
>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>
>>>>>>> Database: All non-redundant GenBank CDS
>>> translations+PDB+SwissProt
>>>>>>> +PIR+PRF
>>>>>>> excluding environmental samples from WGS projects
>>>>>>>  Posted date:  Jul 4, 2009  4:41 AM
>>>>>>> Number of letters in database: -1,125,070,205
>>>>>>> Number of sequences in database:  9,252,258
>>>>>>>
>>>>>>> Lambda     K      H
>>>>>>> 0.309    0.124    0.340
>>>>>>> Gapped
>>>>>>> Lambda     K      H
>>>>>>> 0.267   0.0410    0.140
>>>>>>> Matrix: BLOSUM62
>>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>>> Number of Sequences: 9252258
>>>>>>> Number of Hits to DB: 86493230
>>>>>>> Number of extensions: 3101413
>>>>>>> Number of successful extensions: 9001
>>>>>>> Number of sequences better than 100: 65
>>>>>>> Number of HSP's better than 100 without gapping: 0
>>>>>>> Number of HSP's gapped: 9000
>>>>>>> Number of HSP's successfully gapped: 66
>>>>>>> Length of query: 150
>>>>>>> Length of database: 3169897087
>>>>>>> Length adjustment: 113
>>>>>>> Effective length of query: 37
>>>>>>> Effective length of database: 2124391933
>>>>>>> Effective search space: 78602501521
>>>>>>> Effective search space used: 78602501521
>>>>>>> T: 11
>>>>>>> A: 40
>>>>>>> X1: 16 (7.1 bits)
>>>>>>> X2: 38 (14.6 bits)
>>>>>>> X3: 64 (24.7 bits)
>>>>>>> S1: 42 (20.8 bits)
>>>>>>> S2: 65 (29.6 bits)
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>> To: BioPerl List
>>>>>>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>>>>>>
>>>>>>>> Hi again :)
>>>>>>>> please, I only have this little question:
>>>>>>>> why do I get different results with my remote::blast
>>> perl skript
>>>>>>>> then on the
>>>>>>>> ncbi blast homepage?
>>>>>>>> I am using blastp, the query is an amino-sequence (different
>>>>>>>> results with any
>>>>>>>> sequence, differences not only in number of hits but even in e-
>>>>>>>> values, scores
>>>>>>>> etc...), the database is 'nr'.
>>>>>>>> PLEASE help me,
>>>>>>>> thank you in advance,
>>>>>>>> Jonas
>>>>>>>>
>>>>>>>> ps: my skript:
>>>>>>>>
>>>>
>>> ##############################################################
>>> ################
>>>>>>>> ##
>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use strict;
>>>>>>>> my @blast_report;
>>>>>>>> my $prog = 'blastp';
>>>>>>>> my $db   = 'nr';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> #my $e_val= '10';
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO' );
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>> $
>>>>>>>> Bio
>>>>>>>>
>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>> = '1';
>>>>>>>>
>>>>>>>> my
>>>>>>>> $
>>>>>>>> blast_seq
>>>>>>>>
>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>
>>>>
>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>> AFRQAHQNTAMATGPD
>>>>>>>> PDDEYE';
>>>>>>>> #$v is just to turn on and off the messages
>>>>>>>> my $v = 1;
>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>> 'Bio::PrimarySeq');
>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>>>>>>> "$blast_seq");
>>>>>>>> my $filename='temp2.out';
>>>>>>>> my $r = $factory->submit_blast($seq);
>>>>>>>> print STDERR "waiting..." if( $v > 0 );
>>>>>>>>  while ( my @rids = $factory->each_rid )
>>>>>>>>  {
>>>>>>>>      foreach my $rid ( @rids )
>>>>>>>>      {
>>>>>>>>          my $rc = $factory->retrieve_blast($rid);
>>>>>>>>          if( !ref($rc) )
>>>>>>>>          {
>>>>>>>>              if( $rc < 0 )
>>>>>>>>              {
>>>>>>>>                  $factory->remove_rid($rid);
>>>>>>>>              }
>>>>>>>>              print STDERR "." if ( $v > 0 );
>>>>>>>>          }
>>>>>>>>              else
>>>>>>>>              {
>>>>>>>>                  my $result = $rc->next_result();
>>>>>>>>                  $factory->save_output($filename);
>>>>>>>>                  $factory->remove_rid($rid);
>>>>>>>>                  print "\nQuery Name: ",
>>> $result->query_name(),
>>>>>>>> "\n";
>>>>>>>>                  while ( my $hit = $result->next_hit )
>>>>>>>>                  {
>>>>>>>>                      next unless ( $v > 0);
>>>>>>>>                      print "\thit name is ", $hit->name, "\n";
>>>>>>>>                      while( my $hsp = $hit->next_hsp )
>>>>>>>>                      {
>>>>>>>>                          print "\t\tscore is ",
>>> $hsp->score, "\n";
>>>>>>>>                      }
>>>>>>>>                  }
>>>>>>>>              }
>>>>>>>>      }
>>>>>>>>
>>>>>>>>
>>>>>>>>  }
>>>>>>>> @blast_report = get_file_data ($filename);
>>>>>>>> return @blast_report;
>>>>>>>>
>>>>
>>> ##############################################################
>>> ################
>>>>>>>> ####
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l@...
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>> =
>>>>>>> =
>>>>>>>
>>> =====================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the
>>> persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use
>>>>>>> of, or
>>>>>>> taking of any action in reliance upon, this information
>>> by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error,
>>> please notify
>>>>>>> the
>>>>>>> sender immediately.
>>>>>>> =
>>>>>>> =
>>>>>>>
>>> =====================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@...
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> --
>>>>>> Jason Stajich
>>>>>> jason@...
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l@...
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>> --------------------------------------------------------------
>>> ----------------
>>>> --
>>>>
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>> Date: 07/05/09
>>>> 05:53:00
>>>
>>>
>>> --------------------------------------------------------------
>>> ------------------
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>> Date: 07/05/09
>>> 17:54:00
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release Date: 07/09/09
05:55:00

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Cook, Malcolm-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris, I've added a test to bioperl RemoteBlast.t that demonstrates the following.  Is it appropriate to submit it?

Jonas, OK, I was a little quick on the gun... but I've got it now.

You don't need to change the wrapper.  Here is what you need to do:

# 1) set your database like this:

 -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html for other cdd database options

# 2) add this line before submitting the job:
$Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';

You're in - No other changes needed.

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: Jonas Schaer [mailto:Brotelzwieb@...]
> Sent: Friday, July 10, 2009 4:18 AM
> To: BioPerl List; Cook, Malcolm; Chris Fields
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
> Hi,
> I tried to do what Malcom proposed my ($prog = 'rpsblast';
> my $db   =
> 'CDD';)  but that didn't work.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Value rpsblast for PUT parameter PROGRAM does not match
> expression t?blast[ pnx]. Rejecting.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> C:/Perl/site/lib/Bio/Tools
> /Run/RemoteBlast.pm:329
> STACK: Bio::Tools::Run::RemoteBlast::new
> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
> ast.pm:257
> STACK: blast_a_seq2.pm:14
> -----------------------------------------------------------
> So I should try to "change the wrapper to allow 'rpsblast'",
> right? Could You tell me how to do that, please? So sorry but
> I have no idea yet...:) If that doesn't work, is there any
> other way to run cdd-searches with perl?
> Thank you so much!
> Regards, Jonas
>
> ----- Original Message -----
> From: "Chris Fields" <cjfields1@...>
> To: "Cook, Malcolm" <MEC@...>
> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
> <bioperl-l@...>; "'Smithies, Russell'"
> <Russell.Smithies@...>; <cjfields@...>
> Sent: Thursday, July 09, 2009 9:19 PM
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
>
> > I've scheduled this tentatively for the 1.6 release series (just not
> > sure when yet).  It may work as is, but I haven't tried it out yet
> > (and am hazarding to guess it only retrieves the single main RID at
> > the moment).
> >
> > chris
> >
> > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
> >
> >> Jonas,
> >>
> >> If you want to continue to use the bioperl remoteblast interface,
> >> probably what you should do is simply call it twice.
> >>
> >> Once, as you already know how to do, which will return without CDD
> >> results.
> >>
> >> Secondly, to get the CDD results, call remoteblast a second time.
> >> This time, using
> >> -database => 'CDD'
> >> -program => 'rpsblast'
> >>
> >> However, the wrapper may object to the 'rpsblast' program.  It is
> >> not listed in the POD -
> >>
> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
> emoteBlast.pm)
> >>    If so, my guess is that changing the perl wrapper to allow
> >> rpsblast will "just work" (tm).  I've cc:ed
> cjfields@... for
> >> his opinion on this.
> >>
> >> Also, you might want to perform the CDD search first, especially if
> >> you are streaming results to eyeball that might like something to
> >> look at while the second (presumably longer) search is running.
> >>
> >> Cheers,
> >>
> >> Malcolm Cook
> >> Stowers Institute for Medical Research - Kansas City, Missouri
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces@...
> >>> [mailto:bioperl-l-bounces@...] On Behalf Of
> >>> Jonas Schaer
> >>> Sent: Thursday, July 09, 2009 5:16 AM
> >>> To: BioPerl List; Smithies, Russell
> >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>
> >>> Hi guys,
> >>> Thank you all so much for your help and patience :). Of
> >>> course you were right and I finaly found the right
> >>> put-parameter to get exactly the same hits as on the homepage.
> >>> I do have an other question though :)...
> >>> I now want to include a search for conserved domains, but
> >>> when I try to use the CDD_SEARCH-parameter
> >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
> >>> sub:CDD_SEARCH)
> >>> like the other put-parameters the way chris once told
> >>> me(works fine with the other params):
> >>>
> >>> my %put = (
> >>>    WORD_SIZE => 3,
> >>>    HITLIST_SIZE => 100,
> >>>    THRESHOLD => 11,
> >>>    FILTER => 'R',
> >>>    GENETIC_CODE => 1,
> >>>    CDD_SEARCH => 'on'
> >>> ###I tried it
> >>> with 'true' and '1', too.
> >>>
> >>> );
> >>>
> >>> for my $putName (keys %put) {
> >>>    $factory->submit_parameter($putName,$put{$putName});
> >>> }
> >>>
> >>>
> >>> ...an exception is thrown:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: CDD_SEARCH is not a valid PUT parameter.
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> C:/Perl/site/lib/Bio/Root/Root.pm:359
> >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>> C:/Perl/site/lib/Bio/Tools
> >>> /Run/RemoteBlast.pm:325
> >>> STACK: main::blast_a_sequence firsteval0.8.pm:383
> >>> STACK: main::blast_it firsteval0.8.pm:288
> >>> STACK: firsteval0.8.pm:35
> >>> ----------------------------------------------------------- .
> >>> I guess somehow this could be the solution to my problem:
> >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
> >>> ub:RID-for-Simultaneous
> >>> , but unfortunately I don't understand what to do.
> >>> I'm so sorry to bother you with this but please help me once
> >>> more...:)
> >>>
> >>> Best regards and thanks in advance,
> >>> Jonas
> >>>
> >>> ----- Original Message -----
> >>> From: "Smithies, Russell" <Russell.Smithies@...>
> >>> To: "'Jonas Schaer'" <Brotelzwieb@...>
> >>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
> >>> <bioperl-l@...>
> >>> Sent: Monday, July 06, 2009 10:56 PM
> >>> Subject: RE: [Bioperl-l] different results with
> remote-blast skript
> >>>
> >>>
> >>> Hi Jonas,
> >>> You can't just play with the BLAST parameters and hope
> for a "better"
> >>> result.
> >>> I'd suggest that if you aren't sure what they do, you should
> >>> leave them
> >>> alone as small changes can make huge differences in the
> >>> output - it's quite
> >>> possible to miss finding what you're looking for by using
> the wrong
> >>> parameters.
> >>> If all else fails, read the blast manual:
> >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
> >>> _all.html
> >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> >>> Or Read Ian Korfs' excellent book:
> >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
> >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
> >>>
> >>> Don't worry about the integer overflow bug as there's nothing
> >>> you can do
> >>> about it. If you're interested, Google and Wikipedia are your
> >>> friends:
> >>> http://en.wikipedia.org/wiki/Integer_overflow
> >>>
> >>>
> >>> Russell
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>> bounces@...] On Behalf Of Jonas Schaer
> >>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
> >>>> To: BioPerl List; Chris Fields
> >>>> Subject: Re: [Bioperl-l] different results with
> remote-blast skript
> >>>>
> >>>> Hi guys, thanks for your answers so far.
> >>>> @jason: integer overflow in blast.... sorry, but what do
> >>> you mean by that?
> >>>> how can I fix it...?
> >>>>
> >>>> Since I never really changed any parameters I thought them
> >>> all to be
> >>>> default.
> >>>> whatever, I tried to get "better" results with my prog
> by changing
> >>>> these:
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>
> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
> >>> STICS'} =
> >>>> '1';
> >>>> with no effect...I guess these were default values anyway.
> >>>>
> >>>> So please maybe you can tell me all the other parameters I
> >>> can change with
> >>>> my
> >>>> perl-skript AND how to do that?
> >>>> Unfortunately both, perl and the blast-algorithm are pretty
> >>> much new to
> >>>> me,
> >>>> maybe thats why I just cannot find out how to do that on my
> >>> own... :/
> >>>>
> >>>> Here is the output I get with my remote-blast skript:
> >>>>
> >>> ##############################################################
> >>> ################
> >>>> ###################################
> >>>> Query Name:
> >>>>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> >>>> L
> >>>>        hit name is ref|XP_001702807.1|
> >>>>                score is 442
> >>>> BLASTP 2.2.21+
> >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
> >>> A. Schaffer,
> >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> >>> Lipman (1997),
> >>>> "Gapped
> >>>> BLAST and PSI-BLAST: a new generation of protein database search
> >>>> programs",
> >>>> Nucleic Acids Res. 25:3389-3402.
> >>>>
> >>>>
> >>>> Reference for composition-based statistics: Alejandro A.
> >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
> >>> John L. Spouge,
> >>>> Yuri
> >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
> >>> "Improving the
> >>>> accuracy of PSI-BLAST protein database searches with
> >>> composition-based
> >>>> statistics and other refinements", Nucleic Acids Res.
> 29:2994-3005.
> >>>>
> >>>>
> >>>> RID: 53STX5G2013
> >>>>
> >>>>
> >>>> Database: All non-redundant GenBank CDS
> >>>> translations+PDB+SwissProt+PIR+PRF excluding
> environmental samples
> >>>> from WGS projects
> >>>>           9,252,587 sequences; 3,169,972,781 total letters Query=
> >>>>
> >>>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> >>>>
> >>>
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> >>>> ATGPDPDDEYE
> >>>> Length=150
> >>>>
> >>>>
> >>>>
> >>>       Score
> >>>> E
> >>>> Sequences producing significant alignments:
> >>>      (Bits)
> >>>> Value
> >>>>
> >>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>> reinhard...   174
> >>>> 2e-42
> >>>>
> >>>>
> >>>> ALIGNMENTS
> >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
> reinhardtii]
> >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> >>>> Length=303
> >>>>
> >>>> Score =  174 bits (442),  Expect = 2e-42, Method:
> >>> Composition-based
> >>>> stats.
> >>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
> >>> Gaps = 0/150
> >>>> (0%)
> >>>>
> >>>> Query  1
> >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> >>>> 60
> >>>>
> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>> Sbjct  154
> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>> 213
> >>>>
> >>>> Query  61
> >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>> 120
> >>>>
> >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>> Sbjct  214
> >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>> 273
> >>>>
> >>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
> >>>>            AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> >>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
> >>>>
> >>>>
> >>>>
> >>>>  Database: All non-redundant GenBank CDS
> >>>> translations+PDB+SwissProt+PIR+PRF
> >>>> excluding environmental samples from WGS projects
> >>>>    Posted date:  Jul 5, 2009  4:41 AM
> >>>>  Number of letters in database: -1,124,994,511
> >>>>  Number of sequences in database:  9,252,587
> >>>>
> >>>> Lambda     K      H
> >>>>   0.309    0.122    0.345
> >>>> Gapped
> >>>> Lambda     K      H
> >>>>   0.267   0.0410    0.140
> >>>> Matrix: BLOSUM62
> >>>> Gap Penalties: Existence: 11, Extension: 1
> >>>> Number of Sequences: 9252587
> >>>> Number of Hits to DB: 60273703
> >>>> Number of extensions: 1448367
> >>>> Number of successful extensions: 2103
> >>>> Number of sequences better than 10: 0
> >>>> Number of HSP's better than 10 without gapping: 0
> >>>> Number of HSP's gapped: 2113
> >>>> Number of HSP's successfully gapped: 0
> >>>> Length of query: 150
> >>>> Length of database: 3169972781
> >>>> Length adjustment: 113
> >>>> Effective length of query: 37
> >>>> Effective length of database: 2124430450
> >>>> Effective search space: 78603926650
> >>>> Effective search space used: 78603926650
> >>>> T: 11
> >>>> A: 40
> >>>> X1: 16 (7.1 bits)
> >>>> X2: 38 (14.6 bits)
> >>>> X3: 64 (24.7 bits)
> >>>> S1: 42 (20.8 bits)
> >>>> S2: 74 (33.1 bits)
> >>>>
> >>>>
> >>> ##############################################################
> >>> ################
> >>>> ###################################
> >>>> and here are the hits (?) of the blast-algorithm on the
> >>> ncbi-homepage with
> >>>> the same query of course:
> >>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>> reinhard...   300
> >>>> 3e-80
> >>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
> >>> [Acyrtho...  36.2
> >>>> 1.1
> >>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
> >>> [Blautia...  35.4
> >>>> 1.8
> >>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
> >>> brazil...  34.3
> >>>> 4.2
> >>>> ref|XP_680841.1|  hypothetical protein AN7572.2
> >>> [Aspergillus n...  33.5
> >>>> 6.0
> >>>> ref|YP_001768110.1|  hypothetical protein M446_1150
> >>> [Methyloba...  33.5
> >>>> 7.0
> >>>>
> >>> ##############################################################
> >>> ################
> >>>> ###################################at
> >>>> least the first hit is the same, but even there there is a
> >>> different score
> >>>> and e-value.
> >>>>
> >>>> thanks so much for any help :)
> >>>> regards, jonas
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>> From: "Chris Fields" <cjfields@...>
> >>>> To: "Jason Stajich" <jason@...>
> >>>> Cc: "Smithies, Russell"
> >>> <Russell.Smithies@...>; "'BioPerl
> >>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
> >>>> <Jonas_Schaer@...>
> >>>> Sent: Monday, July 06, 2009 12:51 AM
> >>>> Subject: Re: [Bioperl-l] different results with
> remote-blast skript
> >>>>
> >>>>
> >>>>> That inspires confidence ;>
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >>>>>
> >>>>>> integer overflow in blast....
> >>>>>>
> >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>>>>>
> >>>>>>> I'd guess it's a difference in the parameters used.
> >>>>>>> Interesting that both have the number of letters in the db as
> >>>>>>> "-1,125,070,205", I assume that's a bug  :-)
> >>>>>>>
> >>>>>>> Stats from your remote_blast:
> >>>>>>>
> >>>>>>> 'stats' => {
> >>>>>>>           'S1' => '42',
> >>>>>>>           'S1_bits' => '20.8',
> >>>>>>>           'lambda' => '0.309',
> >>>>>>>           'entropy' => '0.345',
> >>>>>>>           'kappa_gapped' => '0.0410',
> >>>>>>>           'T' => '11',
> >>>>>>>           'kappa' => '0.122',
> >>>>>>>           'X3_bits' => '24.7',
> >>>>>>>           'X1' => '16',
> >>>>>>>           'lambda_gapped' => '0.267',
> >>>>>>>           'X2' => '38',
> >>>>>>>           'S2' => '74',
> >>>>>>>           'seqs_better_than_cutoff' => '0',
> >>>>>>>           'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>>>>>           'Hits_to_DB' => '60102303',
> >>>>>>>           'dbletters' => '-1125070205',
> >>>>>>>           'A' => '40',
> >>>>>>>           'num_successful_extensions' => '2004',
> >>>>>>>           'num_extensions' => '1436892',
> >>>>>>>           'X1_bits' => '7.1',
> >>>>>>>           'X3' => '64',
> >>>>>>>           'entropy_gapped' => '0.140',
> >>>>>>>           'dbentries' => '9252258',
> >>>>>>>           'X2_bits' => '14.6',
> >>>>>>>           'S2_bits' => '33.1'
> >>>>>>>         }
> >>>>>>>
> >>>>>>>
> >>>>>>> Stats from a blast done on the NCBI webpage:
> >>>>>>>
> >>>>>>> Database: All non-redundant GenBank CDS
> >>> translations+PDB+SwissProt
> >>>>>>> +PIR+PRF
> >>>>>>> excluding environmental samples from WGS projects
> >>>>>>>  Posted date:  Jul 4, 2009  4:41 AM
> >>>>>>> Number of letters in database: -1,125,070,205
> >>>>>>> Number of sequences in database:  9,252,258
> >>>>>>>
> >>>>>>> Lambda     K      H
> >>>>>>> 0.309    0.124    0.340
> >>>>>>> Gapped
> >>>>>>> Lambda     K      H
> >>>>>>> 0.267   0.0410    0.140
> >>>>>>> Matrix: BLOSUM62
> >>>>>>> Gap Penalties: Existence: 11, Extension: 1
> >>>>>>> Number of Sequences: 9252258
> >>>>>>> Number of Hits to DB: 86493230
> >>>>>>> Number of extensions: 3101413
> >>>>>>> Number of successful extensions: 9001
> >>>>>>> Number of sequences better than 100: 65
> >>>>>>> Number of HSP's better than 100 without gapping: 0
> >>>>>>> Number of HSP's gapped: 9000
> >>>>>>> Number of HSP's successfully gapped: 66
> >>>>>>> Length of query: 150
> >>>>>>> Length of database: 3169897087
> >>>>>>> Length adjustment: 113
> >>>>>>> Effective length of query: 37
> >>>>>>> Effective length of database: 2124391933
> >>>>>>> Effective search space: 78602501521
> >>>>>>> Effective search space used: 78602501521
> >>>>>>> T: 11
> >>>>>>> A: 40
> >>>>>>> X1: 16 (7.1 bits)
> >>>>>>> X2: 38 (14.6 bits)
> >>>>>>> X3: 64 (24.7 bits)
> >>>>>>> S1: 42 (20.8 bits)
> >>>>>>> S2: 65 (29.6 bits)
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>>>>>> bounces@...] On Behalf Of Jonas Schaer
> >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>>>>>> To: BioPerl List
> >>>>>>>> Subject: [Bioperl-l] different results with
> remote-blast skript
> >>>>>>>>
> >>>>>>>> Hi again :)
> >>>>>>>> please, I only have this little question:
> >>>>>>>> why do I get different results with my remote::blast
> >>> perl skript
> >>>>>>>> then on the
> >>>>>>>> ncbi blast homepage?
> >>>>>>>> I am using blastp, the query is an amino-sequence (different
> >>>>>>>> results with any
> >>>>>>>> sequence, differences not only in number of hits but
> even in e-
> >>>>>>>> values, scores
> >>>>>>>> etc...), the database is 'nr'.
> >>>>>>>> PLEASE help me,
> >>>>>>>> thank you in advance,
> >>>>>>>> Jonas
> >>>>>>>>
> >>>>>>>> ps: my skript:
> >>>>>>>>
> >>>>
> >>> ##############################################################
> >>> ################
> >>>>>>>> ##
> >>>>>>>> use Bio::Seq::SeqFactory;
> >>>>>>>> use Bio::Tools::Run::RemoteBlast;
> >>>>>>>> use strict;
> >>>>>>>> my @blast_report;
> >>>>>>>> my $prog = 'blastp';
> >>>>>>>> my $db   = 'nr';
> >>>>>>>> my $e_val= '1e-10';
> >>>>>>>> #my $e_val= '10';
> >>>>>>>> my @params = ( '-prog' => $prog,
> >>>>>>>>       '-data' => $db,
> >>>>>>>>       '-expect' => $e_val,
> >>>>>>>>       '-readmethod' => 'SearchIO' );
> >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>>>>> $
> >>>>>>>> Bio
> >>>>>>>>
> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>>>>>> = '1';
> >>>>>>>>
> >>>>>>>> my
> >>>>>>>> $
> >>>>>>>> blast_seq
> >>>>>>>>
> >>>
> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>>>>>
> >>>>
> >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
> >>> AFRQAHQNTAMATGPD
> >>>>>>>> PDDEYE';
> >>>>>>>> #$v is just to turn on and off the messages
> >>>>>>>> my $v = 1;
> >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>>>>>> 'Bio::PrimarySeq');
> >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq,
> -display_id =>
> >>>>>>>> "$blast_seq");
> >>>>>>>> my $filename='temp2.out';
> >>>>>>>> my $r = $factory->submit_blast($seq);
> >>>>>>>> print STDERR "waiting..." if( $v > 0 );
> >>>>>>>>  while ( my @rids = $factory->each_rid )
> >>>>>>>>  {
> >>>>>>>>      foreach my $rid ( @rids )
> >>>>>>>>      {
> >>>>>>>>          my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>          if( !ref($rc) )
> >>>>>>>>          {
> >>>>>>>>              if( $rc < 0 )
> >>>>>>>>              {
> >>>>>>>>                  $factory->remove_rid($rid);
> >>>>>>>>              }
> >>>>>>>>              print STDERR "." if ( $v > 0 );
> >>>>>>>>          }
> >>>>>>>>              else
> >>>>>>>>              {
> >>>>>>>>                  my $result = $rc->next_result();
> >>>>>>>>                  $factory->save_output($filename);
> >>>>>>>>                  $factory->remove_rid($rid);
> >>>>>>>>                  print "\nQuery Name: ",
> >>> $result->query_name(),
> >>>>>>>> "\n";
> >>>>>>>>                  while ( my $hit = $result->next_hit )
> >>>>>>>>                  {
> >>>>>>>>                      next unless ( $v > 0);
> >>>>>>>>                      print "\thit name is ",
> $hit->name, "\n";
> >>>>>>>>                      while( my $hsp = $hit->next_hsp )
> >>>>>>>>                      {
> >>>>>>>>                          print "\t\tscore is ",
> >>> $hsp->score, "\n";
> >>>>>>>>                      }
> >>>>>>>>                  }
> >>>>>>>>              }
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>> @blast_report = get_file_data ($filename);
> >>>>>>>> return @blast_report;
> >>>>>>>>
> >>>>
> >>> ##############################################################
> >>> ################
> >>>>>>>> ####
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l@...
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>> =
> >>>>>>> =
> >>>>>>>
> >>>
> =====================================================================
> >>>>>>> Attention: The information contained in this message and/or
> >>>>>>> attachments
> >>>>>>> from AgResearch Limited is intended only for the
> >>> persons or entities
> >>>>>>> to which it is addressed and may contain confidential and/or
> >>>>>>> privileged
> >>>>>>> material. Any review, retransmission, dissemination
> or other use
> >>>>>>> of, or
> >>>>>>> taking of any action in reliance upon, this information
> >>> by persons or
> >>>>>>> entities other than the intended recipients is prohibited by
> >>>>>>> AgResearch
> >>>>>>> Limited. If you have received this message in error,
> >>> please notify
> >>>>>>> the
> >>>>>>> sender immediately.
> >>>>>>> =
> >>>>>>> =
> >>>>>>>
> >>>
> =====================================================================
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l@...
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>> --
> >>>>>> Jason Stajich
> >>>>>> jason@...
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l@...
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>>
> >>> --------------------------------------------------------------
> >>> ----------------
> >>>> --
> >>>>
> >>>>
> >>>>
> >>>> No virus found in this incoming message.
> >>>> Checked by AVG - www.avg.com
> >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
> >>> Date: 07/05/09
> >>>> 05:53:00
> >>>
> >>>
> >>> --------------------------------------------------------------
> >>> ------------------
> >>>
> >>>
> >>>
> >>> No virus found in this incoming message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
> >>> Date: 07/05/09
> >>> 17:54:00
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@...
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
>
>
> --------------------------------------------------------------
> ------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
> Date: 07/09/09
> 05:55:00
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Chris Fields-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Malcolm,

Nice!  Go ahead and add the test in; we can look at trying to get  
CDD_SEARCH working at some point but this is a nice workaround.

chris

On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:

> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates  
> the following.  Is it appropriate to submit it?
>
> Jonas, OK, I was a little quick on the gun... but I've got it now.
>
> You don't need to change the wrapper.  Here is what you need to do:
>
> # 1) set your database like this:
>
> -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html 
>  for other cdd database options
>
> # 2) add this line before submitting the job:
> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
>
> You're in - No other changes needed.
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: Jonas Schaer [mailto:Brotelzwieb@...]
>> Sent: Friday, July 10, 2009 4:18 AM
>> To: BioPerl List; Cook, Malcolm; Chris Fields
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Hi,
>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
>> my $db   =
>> 'CDD';)  but that didn't work.
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
>> expression t?blast[ pnx]. Rejecting.
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>> C:/Perl/site/lib/Bio/Tools
>> /Run/RemoteBlast.pm:329
>> STACK: Bio::Tools::Run::RemoteBlast::new
>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
>> ast.pm:257
>> STACK: blast_a_seq2.pm:14
>> -----------------------------------------------------------
>> So I should try to "change the wrapper to allow 'rpsblast'",
>> right? Could You tell me how to do that, please? So sorry but
>> I have no idea yet...:) If that doesn't work, is there any
>> other way to run cdd-searches with perl?
>> Thank you so much!
>> Regards, Jonas
>>
>> ----- Original Message -----
>> From: "Chris Fields" <cjfields1@...>
>> To: "Cook, Malcolm" <MEC@...>
>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
>> <bioperl-l@...>; "'Smithies, Russell'"
>> <Russell.Smithies@...>; <cjfields@...>
>> Sent: Thursday, July 09, 2009 9:19 PM
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>>
>>> I've scheduled this tentatively for the 1.6 release series (just not
>>> sure when yet).  It may work as is, but I haven't tried it out yet
>>> (and am hazarding to guess it only retrieves the single main RID at
>>> the moment).
>>>
>>> chris
>>>
>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>>>
>>>> Jonas,
>>>>
>>>> If you want to continue to use the bioperl remoteblast interface,
>>>> probably what you should do is simply call it twice.
>>>>
>>>> Once, as you already know how to do, which will return without CDD
>>>> results.
>>>>
>>>> Secondly, to get the CDD results, call remoteblast a second time.
>>>> This time, using
>>>> -database => 'CDD'
>>>> -program => 'rpsblast'
>>>>
>>>> However, the wrapper may object to the 'rpsblast' program.  It is
>>>> not listed in the POD -
>>>>
>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
>> emoteBlast.pm)
>>>>   If so, my guess is that changing the perl wrapper to allow
>>>> rpsblast will "just work" (tm).  I've cc:ed
>> cjfields@... for
>>>> his opinion on this.
>>>>
>>>> Also, you might want to perform the CDD search first, especially if
>>>> you are streaming results to eyeball that might like something to
>>>> look at while the second (presumably longer) search is running.
>>>>
>>>> Cheers,
>>>>
>>>> Malcolm Cook
>>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces@...
>>>>> [mailto:bioperl-l-bounces@...] On Behalf Of
>>>>> Jonas Schaer
>>>>> Sent: Thursday, July 09, 2009 5:16 AM
>>>>> To: BioPerl List; Smithies, Russell
>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>>
>>>>> Hi guys,
>>>>> Thank you all so much for your help and patience :). Of
>>>>> course you were right and I finaly found the right
>>>>> put-parameter to get exactly the same hits as on the homepage.
>>>>> I do have an other question though :)...
>>>>> I now want to include a search for conserved domains, but
>>>>> when I try to use the CDD_SEARCH-parameter
>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>>>> sub:CDD_SEARCH)
>>>>> like the other put-parameters the way chris once told
>>>>> me(works fine with the other params):
>>>>>
>>>>> my %put = (
>>>>>   WORD_SIZE => 3,
>>>>>   HITLIST_SIZE => 100,
>>>>>   THRESHOLD => 11,
>>>>>   FILTER => 'R',
>>>>>   GENETIC_CODE => 1,
>>>>>   CDD_SEARCH => 'on'
>>>>> ###I tried it
>>>>> with 'true' and '1', too.
>>>>>
>>>>> );
>>>>>
>>>>> for my $putName (keys %put) {
>>>>>   $factory->submit_parameter($putName,$put{$putName});
>>>>> }
>>>>>
>>>>>
>>>>> ...an exception is thrown:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw
>> C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>>> C:/Perl/site/lib/Bio/Tools
>>>>> /Run/RemoteBlast.pm:325
>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>>>> STACK: main::blast_it firsteval0.8.pm:288
>>>>> STACK: firsteval0.8.pm:35
>>>>> ----------------------------------------------------------- .
>>>>> I guess somehow this could be the solution to my problem:
>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>>>> ub:RID-for-Simultaneous
>>>>> , but unfortunately I don't understand what to do.
>>>>> I'm so sorry to bother you with this but please help me once
>>>>> more...:)
>>>>>
>>>>> Best regards and thanks in advance,
>>>>> Jonas
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Smithies, Russell" <Russell.Smithies@...>
>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
>>>>> <bioperl-l@...>
>>>>> Sent: Monday, July 06, 2009 10:56 PM
>>>>> Subject: RE: [Bioperl-l] different results with
>> remote-blast skript
>>>>>
>>>>>
>>>>> Hi Jonas,
>>>>> You can't just play with the BLAST parameters and hope
>> for a "better"
>>>>> result.
>>>>> I'd suggest that if you aren't sure what they do, you should
>>>>> leave them
>>>>> alone as small changes can make huge differences in the
>>>>> output - it's quite
>>>>> possible to miss finding what you're looking for by using
>> the wrong
>>>>> parameters.
>>>>> If all else fails, read the blast manual:
>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>>>> _all.html
>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>>>> Or Read Ian Korfs' excellent book:
>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>>>
>>>>> Don't worry about the integer overflow bug as there's nothing
>>>>> you can do
>>>>> about it. If you're interested, Google and Wikipedia are your
>>>>> friends:
>>>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>>>
>>>>>
>>>>> Russell
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>>>> To: BioPerl List; Chris Fields
>>>>>> Subject: Re: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>
>>>>>> Hi guys, thanks for your answers so far.
>>>>>> @jason: integer overflow in blast.... sorry, but what do
>>>>> you mean by that?
>>>>>> how can I fix it...?
>>>>>>
>>>>>> Since I never really changed any parameters I thought them
>>>>> all to be
>>>>>> default.
>>>>>> whatever, I tried to get "better" results with my prog
>> by changing
>>>>>> these:
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>
>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>>>> STICS'} =
>>>>>> '1';
>>>>>> with no effect...I guess these were default values anyway.
>>>>>>
>>>>>> So please maybe you can tell me all the other parameters I
>>>>> can change with
>>>>>> my
>>>>>> perl-skript AND how to do that?
>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>>>> much new to
>>>>>> me,
>>>>>> maybe thats why I just cannot find out how to do that on my
>>>>> own... :/
>>>>>>
>>>>>> Here is the output I get with my remote-blast skript:
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################
>>>>>> Query Name:
>>>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>>>> L
>>>>>>       hit name is ref|XP_001702807.1|
>>>>>>               score is 442
>>>>>> BLASTP 2.2.21+
>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>>>> A. Schaffer,
>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>>>> Lipman (1997),
>>>>>> "Gapped
>>>>>> BLAST and PSI-BLAST: a new generation of protein database search
>>>>>> programs",
>>>>>> Nucleic Acids Res. 25:3389-3402.
>>>>>>
>>>>>>
>>>>>> Reference for composition-based statistics: Alejandro A.
>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>>>> John L. Spouge,
>>>>>> Yuri
>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>>>> "Improving the
>>>>>> accuracy of PSI-BLAST protein database searches with
>>>>> composition-based
>>>>>> statistics and other refinements", Nucleic Acids Res.
>> 29:2994-3005.
>>>>>>
>>>>>>
>>>>>> RID: 53STX5G2013
>>>>>>
>>>>>>
>>>>>> Database: All non-redundant GenBank CDS
>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
>> environmental samples
>>>>>> from WGS projects
>>>>>>          9,252,587 sequences; 3,169,972,781 total letters Query=
>>>>>>
>>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>>>
>>>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
>>>>>> ATGPDPDDEYE
>>>>>> Length=150
>>>>>>
>>>>>>
>>>>>>
>>>>>      Score
>>>>>> E
>>>>>> Sequences producing significant alignments:
>>>>>     (Bits)
>>>>>> Value
>>>>>>
>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>> reinhard...   174
>>>>>> 2e-42
>>>>>>
>>>>>>
>>>>>> ALIGNMENTS
>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
>> reinhardtii]
>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>>>> Length=303
>>>>>>
>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>>>> Composition-based
>>>>>> stats.
>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>>>> Gaps = 0/150
>>>>>> (0%)
>>>>>>
>>>>>> Query  1
>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>>>> 60
>>>>>>
>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>> Sbjct  154
>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>> 213
>>>>>>
>>>>>> Query  61
>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> 120
>>>>>>
>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> Sbjct  214
>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>> 273
>>>>>>
>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>>>           AWHERDDNAFRQAHQNTAMATGPDPDDEYE
>>>>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>>>
>>>>>>
>>>>>>
>>>>>> Database: All non-redundant GenBank CDS
>>>>>> translations+PDB+SwissProt+PIR+PRF
>>>>>> excluding environmental samples from WGS projects
>>>>>>   Posted date:  Jul 5, 2009  4:41 AM
>>>>>> Number of letters in database: -1,124,994,511
>>>>>> Number of sequences in database:  9,252,587
>>>>>>
>>>>>> Lambda     K      H
>>>>>>  0.309    0.122    0.345
>>>>>> Gapped
>>>>>> Lambda     K      H
>>>>>>  0.267   0.0410    0.140
>>>>>> Matrix: BLOSUM62
>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>> Number of Sequences: 9252587
>>>>>> Number of Hits to DB: 60273703
>>>>>> Number of extensions: 1448367
>>>>>> Number of successful extensions: 2103
>>>>>> Number of sequences better than 10: 0
>>>>>> Number of HSP's better than 10 without gapping: 0
>>>>>> Number of HSP's gapped: 2113
>>>>>> Number of HSP's successfully gapped: 0
>>>>>> Length of query: 150
>>>>>> Length of database: 3169972781
>>>>>> Length adjustment: 113
>>>>>> Effective length of query: 37
>>>>>> Effective length of database: 2124430450
>>>>>> Effective search space: 78603926650
>>>>>> Effective search space used: 78603926650
>>>>>> T: 11
>>>>>> A: 40
>>>>>> X1: 16 (7.1 bits)
>>>>>> X2: 38 (14.6 bits)
>>>>>> X3: 64 (24.7 bits)
>>>>>> S1: 42 (20.8 bits)
>>>>>> S2: 74 (33.1 bits)
>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################
>>>>>> and here are the hits (?) of the blast-algorithm on the
>>>>> ncbi-homepage with
>>>>>> the same query of course:
>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>> reinhard...   300
>>>>>> 3e-80
>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>>>> [Acyrtho...  36.2
>>>>>> 1.1
>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>>>> [Blautia...  35.4
>>>>>> 1.8
>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>>>> brazil...  34.3
>>>>>> 4.2
>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>>>> [Aspergillus n...  33.5
>>>>>> 6.0
>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>>>> [Methyloba...  33.5
>>>>>> 7.0
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>> ###################################at
>>>>>> least the first hit is the same, but even there there is a
>>>>> different score
>>>>>> and e-value.
>>>>>>
>>>>>> thanks so much for any help :)
>>>>>> regards, jonas
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: "Chris Fields" <cjfields@...>
>>>>>> To: "Jason Stajich" <jason@...>
>>>>>> Cc: "Smithies, Russell"
>>>>> <Russell.Smithies@...>; "'BioPerl
>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>>>>>> <Jonas_Schaer@...>
>>>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>>>> Subject: Re: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>
>>>>>>
>>>>>>> That inspires confidence ;>
>>>>>>>
>>>>>>> chris
>>>>>>>
>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>>>
>>>>>>>> integer overflow in blast....
>>>>>>>>
>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>>>
>>>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>>>> Interesting that both have the number of letters in the db as
>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>>>
>>>>>>>>> Stats from your remote_blast:
>>>>>>>>>
>>>>>>>>> 'stats' => {
>>>>>>>>>          'S1' => '42',
>>>>>>>>>          'S1_bits' => '20.8',
>>>>>>>>>          'lambda' => '0.309',
>>>>>>>>>          'entropy' => '0.345',
>>>>>>>>>          'kappa_gapped' => '0.0410',
>>>>>>>>>          'T' => '11',
>>>>>>>>>          'kappa' => '0.122',
>>>>>>>>>          'X3_bits' => '24.7',
>>>>>>>>>          'X1' => '16',
>>>>>>>>>          'lambda_gapped' => '0.267',
>>>>>>>>>          'X2' => '38',
>>>>>>>>>          'S2' => '74',
>>>>>>>>>          'seqs_better_than_cutoff' => '0',
>>>>>>>>>          'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>>>          'Hits_to_DB' => '60102303',
>>>>>>>>>          'dbletters' => '-1125070205',
>>>>>>>>>          'A' => '40',
>>>>>>>>>          'num_successful_extensions' => '2004',
>>>>>>>>>          'num_extensions' => '1436892',
>>>>>>>>>          'X1_bits' => '7.1',
>>>>>>>>>          'X3' => '64',
>>>>>>>>>          'entropy_gapped' => '0.140',
>>>>>>>>>          'dbentries' => '9252258',
>>>>>>>>>          'X2_bits' => '14.6',
>>>>>>>>>          'S2_bits' => '33.1'
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>>>
>>>>>>>>> Database: All non-redundant GenBank CDS
>>>>> translations+PDB+SwissProt
>>>>>>>>> +PIR+PRF
>>>>>>>>> excluding environmental samples from WGS projects
>>>>>>>>> Posted date:  Jul 4, 2009  4:41 AM
>>>>>>>>> Number of letters in database: -1,125,070,205
>>>>>>>>> Number of sequences in database:  9,252,258
>>>>>>>>>
>>>>>>>>> Lambda     K      H
>>>>>>>>> 0.309    0.124    0.340
>>>>>>>>> Gapped
>>>>>>>>> Lambda     K      H
>>>>>>>>> 0.267   0.0410    0.140
>>>>>>>>> Matrix: BLOSUM62
>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1
>>>>>>>>> Number of Sequences: 9252258
>>>>>>>>> Number of Hits to DB: 86493230
>>>>>>>>> Number of extensions: 3101413
>>>>>>>>> Number of successful extensions: 9001
>>>>>>>>> Number of sequences better than 100: 65
>>>>>>>>> Number of HSP's better than 100 without gapping: 0
>>>>>>>>> Number of HSP's gapped: 9000
>>>>>>>>> Number of HSP's successfully gapped: 66
>>>>>>>>> Length of query: 150
>>>>>>>>> Length of database: 3169897087
>>>>>>>>> Length adjustment: 113
>>>>>>>>> Effective length of query: 37
>>>>>>>>> Effective length of database: 2124391933
>>>>>>>>> Effective search space: 78602501521
>>>>>>>>> Effective search space used: 78602501521
>>>>>>>>> T: 11
>>>>>>>>> A: 40
>>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>>> S2: 65 (29.6 bits)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>>>> To: BioPerl List
>>>>>>>>>> Subject: [Bioperl-l] different results with
>> remote-blast skript
>>>>>>>>>>
>>>>>>>>>> Hi again :)
>>>>>>>>>> please, I only have this little question:
>>>>>>>>>> why do I get different results with my remote::blast
>>>>> perl skript
>>>>>>>>>> then on the
>>>>>>>>>> ncbi blast homepage?
>>>>>>>>>> I am using blastp, the query is an amino-sequence (different
>>>>>>>>>> results with any
>>>>>>>>>> sequence, differences not only in number of hits but
>> even in e-
>>>>>>>>>> values, scores
>>>>>>>>>> etc...), the database is 'nr'.
>>>>>>>>>> PLEASE help me,
>>>>>>>>>> thank you in advance,
>>>>>>>>>> Jonas
>>>>>>>>>>
>>>>>>>>>> ps: my skript:
>>>>>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>>>>>> ##
>>>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>>>> use strict;
>>>>>>>>>> my @blast_report;
>>>>>>>>>> my $prog = 'blastp';
>>>>>>>>>> my $db   = 'nr';
>>>>>>>>>> my $e_val= '1e-10';
>>>>>>>>>> #my $e_val= '10';
>>>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>>>      '-data' => $db,
>>>>>>>>>>      '-expect' => $e_val,
>>>>>>>>>>      '-readmethod' => 'SearchIO' );
>>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>>>> $
>>>>>>>>>> Bio
>>>>>>>>>>
>>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>>>> = '1';
>>>>>>>>>>
>>>>>>>>>> my
>>>>>>>>>> $
>>>>>>>>>> blast_seq
>>>>>>>>>>
>>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>>>
>>>>>>
>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>>>> AFRQAHQNTAMATGPD
>>>>>>>>>> PDDEYE';
>>>>>>>>>> #$v is just to turn on and off the messages
>>>>>>>>>> my $v = 1;
>>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>>>> 'Bio::PrimarySeq');
>>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq,
>> -display_id =>
>>>>>>>>>> "$blast_seq");
>>>>>>>>>> my $filename='temp2.out';
>>>>>>>>>> my $r = $factory->submit_blast($seq);
>>>>>>>>>> print STDERR "waiting..." if( $v > 0 );
>>>>>>>>>> while ( my @rids = $factory->each_rid )
>>>>>>>>>> {
>>>>>>>>>>     foreach my $rid ( @rids )
>>>>>>>>>>     {
>>>>>>>>>>         my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>>         if( !ref($rc) )
>>>>>>>>>>         {
>>>>>>>>>>             if( $rc < 0 )
>>>>>>>>>>             {
>>>>>>>>>>                 $factory->remove_rid($rid);
>>>>>>>>>>             }
>>>>>>>>>>             print STDERR "." if ( $v > 0 );
>>>>>>>>>>         }
>>>>>>>>>>             else
>>>>>>>>>>             {
>>>>>>>>>>                 my $result = $rc->next_result();
>>>>>>>>>>                 $factory->save_output($filename);
>>>>>>>>>>                 $factory->remove_rid($rid);
>>>>>>>>>>                 print "\nQuery Name: ",
>>>>> $result->query_name(),
>>>>>>>>>> "\n";
>>>>>>>>>>                 while ( my $hit = $result->next_hit )
>>>>>>>>>>                 {
>>>>>>>>>>                     next unless ( $v > 0);
>>>>>>>>>>                     print "\thit name is ",
>> $hit->name, "\n";
>>>>>>>>>>                     while( my $hsp = $hit->next_hsp )
>>>>>>>>>>                     {
>>>>>>>>>>                         print "\t\tscore is ",
>>>>> $hsp->score, "\n";
>>>>>>>>>>                     }
>>>>>>>>>>                 }
>>>>>>>>>>             }
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>> @blast_report = get_file_data ($filename);
>>>>>>>>>> return @blast_report;
>>>>>>>>>>
>>>>>>
>>>>> ##############################################################
>>>>> ################
>>>>>>>>>> ####
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l@...
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>>
>>>>>
>> =====================================================================
>>>>>>>>> Attention: The information contained in this message and/or
>>>>>>>>> attachments
>>>>>>>>> from AgResearch Limited is intended only for the
>>>>> persons or entities
>>>>>>>>> to which it is addressed and may contain confidential and/or
>>>>>>>>> privileged
>>>>>>>>> material. Any review, retransmission, dissemination
>> or other use
>>>>>>>>> of, or
>>>>>>>>> taking of any action in reliance upon, this information
>>>>> by persons or
>>>>>>>>> entities other than the intended recipients is prohibited by
>>>>>>>>> AgResearch
>>>>>>>>> Limited. If you have received this message in error,
>>>>> please notify
>>>>>>>>> the
>>>>>>>>> sender immediately.
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>>
>>>>>
>> =====================================================================
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l@...
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason Stajich
>>>>>>>> jason@...
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l@...
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>> --------------------------------------------------------------
>>>>> ----------------
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>>>> Date: 07/05/09
>>>>>> 05:53:00
>>>>>
>>>>>
>>>>> --------------------------------------------------------------
>>>>> ------------------
>>>>>
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - www.avg.com
>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>>>> Date: 07/05/09
>>>>> 17:54:00
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@...
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>
>>
>> --------------------------------------------------------------
>> ------------------
>>
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
>> Date: 07/09/09
>> 05:55:00
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Parent Message unknown Re: cdd-search with remoteblast?

by Cook, Malcolm-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jonas,

I'm glad to hear it works for you too.

I noted the recipe for success here: http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast#Status

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: Jonas Schaer [mailto:Brotelzwieb@...]
> Sent: Tuesday, July 14, 2009 12:05 PM
> To: Cook, Malcolm; Chris Fields
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
> Hi chris and malcom,
> seems to work here, too. thank u both a lot. great job :)
> best regards, jonas
> ----- Original Message -----
> From: "Chris Fields" <cjfields1@...>
> To: "Cook, Malcolm" <MEC@...>
> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
> <bioperl-l@...>
> Sent: Friday, July 10, 2009 8:04 PM
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
>
> > Malcolm,
> >
> > Nice!  Go ahead and add the test in; we can look at trying to get
> > CDD_SEARCH working at some point but this is a nice workaround.
> >
> > chris
> >
> > On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
> >
> >> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
> >> the following.  Is it appropriate to submit it?
> >>
> >> Jonas, OK, I was a little quick on the gun... but I've got it now.
> >>
> >> You don't need to change the wrapper.  Here is what you need to do:
> >>
> >> # 1) set your database like this:
> >>
> >> -database => 'cdsearch/cdd', # c.f.
> >>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
> >>  for other cdd database options
> >>
> >> # 2) add this line before submitting the job:
> >> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
> >>
> >> You're in - No other changes needed.
> >>
> >> Malcolm Cook
> >> Stowers Institute for Medical Research - Kansas City, Missouri
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jonas Schaer [mailto:Brotelzwieb@...]
> >>> Sent: Friday, July 10, 2009 4:18 AM
> >>> To: BioPerl List; Cook, Malcolm; Chris Fields
> >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>
> >>> Hi,
> >>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
> >>> my $db   =
> >>> 'CDD';)  but that didn't work.
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
> >>> expression t?blast[ pnx]. Rejecting.
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> C:/Perl/site/lib/Bio/Root/Root.pm:359
> >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>> C:/Perl/site/lib/Bio/Tools
> >>> /Run/RemoteBlast.pm:329
> >>> STACK: Bio::Tools::Run::RemoteBlast::new
> >>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
> >>> ast.pm:257
> >>> STACK: blast_a_seq2.pm:14
> >>> -----------------------------------------------------------
> >>> So I should try to "change the wrapper to allow 'rpsblast'",
> >>> right? Could You tell me how to do that, please? So sorry but
> >>> I have no idea yet...:) If that doesn't work, is there any
> >>> other way to run cdd-searches with perl?
> >>> Thank you so much!
> >>> Regards, Jonas
> >>>
> >>> ----- Original Message -----
> >>> From: "Chris Fields" <cjfields1@...>
> >>> To: "Cook, Malcolm" <MEC@...>
> >>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
> >>> <bioperl-l@...>; "'Smithies, Russell'"
> >>> <Russell.Smithies@...>; <cjfields@...>
> >>> Sent: Thursday, July 09, 2009 9:19 PM
> >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>
> >>>
> >>>> I've scheduled this tentatively for the 1.6 release
> series (just not
> >>>> sure when yet).  It may work as is, but I haven't tried
> it out yet
> >>>> (and am hazarding to guess it only retrieves the single
> main RID at
> >>>> the moment).
> >>>>
> >>>> chris
> >>>>
> >>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
> >>>>
> >>>>> Jonas,
> >>>>>
> >>>>> If you want to continue to use the bioperl remoteblast
> interface,
> >>>>> probably what you should do is simply call it twice.
> >>>>>
> >>>>> Once, as you already know how to do, which will return
> without CDD
> >>>>> results.
> >>>>>
> >>>>> Secondly, to get the CDD results, call remoteblast a
> second time.
> >>>>> This time, using
> >>>>> -database => 'CDD'
> >>>>> -program => 'rpsblast'
> >>>>>
> >>>>> However, the wrapper may object to the 'rpsblast'
> program.  It is
> >>>>> not listed in the POD -
> >>>>>
> >>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
> >>> emoteBlast.pm)
> >>>>>   If so, my guess is that changing the perl wrapper to allow
> >>>>> rpsblast will "just work" (tm).  I've cc:ed
> >>> cjfields@... for
> >>>>> his opinion on this.
> >>>>>
> >>>>> Also, you might want to perform the CDD search first,
> especially if
> >>>>> you are streaming results to eyeball that might like
> something to
> >>>>> look at while the second (presumably longer) search is running.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Malcolm Cook
> >>>>> Stowers Institute for Medical Research - Kansas City, Missouri
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces@...
> >>>>>> [mailto:bioperl-l-bounces@...] On Behalf Of
> >>>>>> Jonas Schaer
> >>>>>> Sent: Thursday, July 09, 2009 5:16 AM
> >>>>>> To: BioPerl List; Smithies, Russell
> >>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>>>>
> >>>>>> Hi guys,
> >>>>>> Thank you all so much for your help and patience :). Of
> >>>>>> course you were right and I finaly found the right
> >>>>>> put-parameter to get exactly the same hits as on the homepage.
> >>>>>> I do have an other question though :)...
> >>>>>> I now want to include a search for conserved domains, but
> >>>>>> when I try to use the CDD_SEARCH-parameter
> >>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
> >>>>>> sub:CDD_SEARCH)
> >>>>>> like the other put-parameters the way chris once told
> >>>>>> me(works fine with the other params):
> >>>>>>
> >>>>>> my %put = (
> >>>>>>   WORD_SIZE => 3,
> >>>>>>   HITLIST_SIZE => 100,
> >>>>>>   THRESHOLD => 11,
> >>>>>>   FILTER => 'R',
> >>>>>>   GENETIC_CODE => 1,
> >>>>>>   CDD_SEARCH => 'on'
> >>>>>> ###I tried it
> >>>>>> with 'true' and '1', too.
> >>>>>>
> >>>>>> );
> >>>>>>
> >>>>>> for my $putName (keys %put) {
> >>>>>>   $factory->submit_parameter($putName,$put{$putName});
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> ...an exception is thrown:
> >>>>>>
> >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
> >>>>>> STACK: Error::throw
> >>>>>> STACK: Bio::Root::Root::throw
> >>> C:/Perl/site/lib/Bio/Root/Root.pm:359
> >>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>>>> C:/Perl/site/lib/Bio/Tools
> >>>>>> /Run/RemoteBlast.pm:325
> >>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
> >>>>>> STACK: main::blast_it firsteval0.8.pm:288
> >>>>>> STACK: firsteval0.8.pm:35
> >>>>>> ----------------------------------------------------------- .
> >>>>>> I guess somehow this could be the solution to my problem:
> >>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
> >>>>>> ub:RID-for-Simultaneous
> >>>>>> , but unfortunately I don't understand what to do.
> >>>>>> I'm so sorry to bother you with this but please help me once
> >>>>>> more...:)
> >>>>>>
> >>>>>> Best regards and thanks in advance,
> >>>>>> Jonas
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>> From: "Smithies, Russell" <Russell.Smithies@...>
> >>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
> >>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
> >>>>>> <bioperl-l@...>
> >>>>>> Sent: Monday, July 06, 2009 10:56 PM
> >>>>>> Subject: RE: [Bioperl-l] different results with
> >>> remote-blast skript
> >>>>>>
> >>>>>>
> >>>>>> Hi Jonas,
> >>>>>> You can't just play with the BLAST parameters and hope
> >>> for a "better"
> >>>>>> result.
> >>>>>> I'd suggest that if you aren't sure what they do, you should
> >>>>>> leave them
> >>>>>> alone as small changes can make huge differences in the
> >>>>>> output - it's quite
> >>>>>> possible to miss finding what you're looking for by using
> >>> the wrong
> >>>>>> parameters.
> >>>>>> If all else fails, read the blast manual:
> >>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
> >>>>>> _all.html
> >>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> >>>>>> Or Read Ian Korfs' excellent book:
> >>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
> >>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
> >>>>>>
> >>>>>> Don't worry about the integer overflow bug as there's nothing
> >>>>>> you can do
> >>>>>> about it. If you're interested, Google and Wikipedia are your
> >>>>>> friends:
> >>>>>> http://en.wikipedia.org/wiki/Integer_overflow
> >>>>>>
> >>>>>>
> >>>>>> Russell
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>>>>> bounces@...] On Behalf Of Jonas Schaer
> >>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
> >>>>>>> To: BioPerl List; Chris Fields
> >>>>>>> Subject: Re: [Bioperl-l] different results with
> >>> remote-blast skript
> >>>>>>>
> >>>>>>> Hi guys, thanks for your answers so far.
> >>>>>>> @jason: integer overflow in blast.... sorry, but what do
> >>>>>> you mean by that?
> >>>>>>> how can I fix it...?
> >>>>>>>
> >>>>>>> Since I never really changed any parameters I thought them
> >>>>>> all to be
> >>>>>>> default.
> >>>>>>> whatever, I tried to get "better" results with my prog
> >>> by changing
> >>>>>>> these:
> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>>>>
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
> >>>>>> STICS'} =
> >>>>>>> '1';
> >>>>>>> with no effect...I guess these were default values anyway.
> >>>>>>>
> >>>>>>> So please maybe you can tell me all the other parameters I
> >>>>>> can change with
> >>>>>>> my
> >>>>>>> perl-skript AND how to do that?
> >>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
> >>>>>> much new to
> >>>>>>> me,
> >>>>>>> maybe thats why I just cannot find out how to do that on my
> >>>>>> own... :/
> >>>>>>>
> >>>>>>> Here is the output I get with my remote-blast skript:
> >>>>>>>
> >>>>>> ##############################################################
> >>>>>> ################
> >>>>>>> ###################################
> >>>>>>> Query Name:
> >>>>>>>
> >>>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> >>>>>>> L
> >>>>>>>       hit name is ref|XP_001702807.1|
> >>>>>>>               score is 442
> >>>>>>> BLASTP 2.2.21+
> >>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
> >>>>>> A. Schaffer,
> >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> >>>>>> Lipman (1997),
> >>>>>>> "Gapped
> >>>>>>> BLAST and PSI-BLAST: a new generation of protein
> database search
> >>>>>>> programs",
> >>>>>>> Nucleic Acids Res. 25:3389-3402.
> >>>>>>>
> >>>>>>>
> >>>>>>> Reference for composition-based statistics: Alejandro A.
> >>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
> >>>>>> John L. Spouge,
> >>>>>>> Yuri
> >>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
> >>>>>> "Improving the
> >>>>>>> accuracy of PSI-BLAST protein database searches with
> >>>>>> composition-based
> >>>>>>> statistics and other refinements", Nucleic Acids Res.
> >>> 29:2994-3005.
> >>>>>>>
> >>>>>>>
> >>>>>>> RID: 53STX5G2013
> >>>>>>>
> >>>>>>>
> >>>>>>> Database: All non-redundant GenBank CDS
> >>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
> >>> environmental samples
> >>>>>>> from WGS projects
> >>>>>>>          9,252,587 sequences; 3,169,972,781 total
> letters Query=
> >>>>>>>
> >>>>>>
> >>>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> >>>>>>>
> >>>>>>
> >>>
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
> >>>>>>> ATGPDPDDEYE
> >>>>>>> Length=150
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>      Score
> >>>>>>> E
> >>>>>>> Sequences producing significant alignments:
> >>>>>>     (Bits)
> >>>>>>> Value
> >>>>>>>
> >>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>>> reinhard...   174
> >>>>>>> 2e-42
> >>>>>>>
> >>>>>>>
> >>>>>>> ALIGNMENTS
> >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
> >>> reinhardtii]
> >>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> >>>>>>> Length=303
> >>>>>>>
> >>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
> >>>>>> Composition-based
> >>>>>>> stats.
> >>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
> >>>>>> Gaps = 0/150
> >>>>>>> (0%)
> >>>>>>>
> >>>>>>> Query  1
> >>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> >>>>>>> 60
> >>>>>>>
> >>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>>> Sbjct  154
> >>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>>> 213
> >>>>>>>
> >>>>>>> Query  61
> >>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>>> 120
> >>>>>>>
> >>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>>> Sbjct  214
> >>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>>> 273
> >>>>>>>
> >>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
> >>>>>>>           AWHERDDNAFRQAHQNTAMATGPDPDDEYE
> >>>>>>> Sbjct  274  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Database: All non-redundant GenBank CDS
> >>>>>>> translations+PDB+SwissProt+PIR+PRF
> >>>>>>> excluding environmental samples from WGS projects
> >>>>>>>   Posted date:  Jul 5, 2009  4:41 AM
> >>>>>>> Number of letters in database: -1,124,994,511
> >>>>>>> Number of sequences in database:  9,252,587
> >>>>>>>
> >>>>>>> Lambda     K      H
> >>>>>>>  0.309    0.122    0.345
> >>>>>>> Gapped
> >>>>>>> Lambda     K      H
> >>>>>>>  0.267   0.0410    0.140
> >>>>>>> Matrix: BLOSUM62
> >>>>>>> Gap Penalties: Existence: 11, Extension: 1
> >>>>>>> Number of Sequences: 9252587
> >>>>>>> Number of Hits to DB: 60273703
> >>>>>>> Number of extensions: 1448367
> >>>>>>> Number of successful extensions: 2103
> >>>>>>> Number of sequences better than 10: 0
> >>>>>>> Number of HSP's better than 10 without gapping: 0
> >>>>>>> Number of HSP's gapped: 2113
> >>>>>>> Number of HSP's successfully gapped: 0
> >>>>>>> Length of query: 150
> >>>>>>> Length of database: 3169972781
> >>>>>>> Length adjustment: 113
> >>>>>>> Effective length of query: 37
> >>>>>>> Effective length of database: 2124430450
> >>>>>>> Effective search space: 78603926650
> >>>>>>> Effective search space used: 78603926650
> >>>>>>> T: 11
> >>>>>>> A: 40
> >>>>>>> X1: 16 (7.1 bits)
> >>>>>>> X2: 38 (14.6 bits)
> >>>>>>> X3: 64 (24.7 bits)
> >>>>>>> S1: 42 (20.8 bits)
> >>>>>>> S2: 74 (33.1 bits)
> >>>>>>>
> >>>>>>>
> >>>>>> ##############################################################
> >>>>>> ################
> >>>>>>> ###################################
> >>>>>>> and here are the hits (?) of the blast-algorithm on the
> >>>>>> ncbi-homepage with
> >>>>>>> the same query of course:
> >>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>>> reinhard...   300
> >>>>>>> 3e-80
> >>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
> >>>>>> [Acyrtho...  36.2
> >>>>>>> 1.1
> >>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
> >>>>>> [Blautia...  35.4
> >>>>>>> 1.8
> >>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
> >>>>>> brazil...  34.3
> >>>>>>> 4.2
> >>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
> >>>>>> [Aspergillus n...  33.5
> >>>>>>> 6.0
> >>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
> >>>>>> [Methyloba...  33.5
> >>>>>>> 7.0
> >>>>>>>
> >>>>>> ##############################################################
> >>>>>> ################
> >>>>>>> ###################################at
> >>>>>>> least the first hit is the same, but even there there is a
> >>>>>> different score
> >>>>>>> and e-value.
> >>>>>>>
> >>>>>>> thanks so much for any help :)
> >>>>>>> regards, jonas
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>> From: "Chris Fields" <cjfields@...>
> >>>>>>> To: "Jason Stajich" <jason@...>
> >>>>>>> Cc: "Smithies, Russell"
> >>>>>> <Russell.Smithies@...>; "'BioPerl
> >>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
> >>>>>>> <Jonas_Schaer@...>
> >>>>>>> Sent: Monday, July 06, 2009 12:51 AM
> >>>>>>> Subject: Re: [Bioperl-l] different results with
> >>> remote-blast skript
> >>>>>>>
> >>>>>>>
> >>>>>>>> That inspires confidence ;>
> >>>>>>>>
> >>>>>>>> chris
> >>>>>>>>
> >>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >>>>>>>>
> >>>>>>>>> integer overflow in blast....
> >>>>>>>>>
> >>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>>>>>>>>
> >>>>>>>>>> I'd guess it's a difference in the parameters used.
> >>>>>>>>>> Interesting that both have the number of letters
> in the db as
> >>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
> >>>>>>>>>>
> >>>>>>>>>> Stats from your remote_blast:
> >>>>>>>>>>
> >>>>>>>>>> 'stats' => {
> >>>>>>>>>>          'S1' => '42',
> >>>>>>>>>>          'S1_bits' => '20.8',
> >>>>>>>>>>          'lambda' => '0.309',
> >>>>>>>>>>          'entropy' => '0.345',
> >>>>>>>>>>          'kappa_gapped' => '0.0410',
> >>>>>>>>>>          'T' => '11',
> >>>>>>>>>>          'kappa' => '0.122',
> >>>>>>>>>>          'X3_bits' => '24.7',
> >>>>>>>>>>          'X1' => '16',
> >>>>>>>>>>          'lambda_gapped' => '0.267',
> >>>>>>>>>>          'X2' => '38',
> >>>>>>>>>>          'S2' => '74',
> >>>>>>>>>>          'seqs_better_than_cutoff' => '0',
> >>>>>>>>>>          'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>>>>>>>>          'Hits_to_DB' => '60102303',
> >>>>>>>>>>          'dbletters' => '-1125070205',
> >>>>>>>>>>          'A' => '40',
> >>>>>>>>>>          'num_successful_extensions' => '2004',
> >>>>>>>>>>          'num_extensions' => '1436892',
> >>>>>>>>>>          'X1_bits' => '7.1',
> >>>>>>>>>>          'X3' => '64',
> >>>>>>>>>>          'entropy_gapped' => '0.140',
> >>>>>>>>>>          'dbentries' => '9252258',
> >>>>>>>>>>          'X2_bits' => '14.6',
> >>>>>>>>>>          'S2_bits' => '33.1'
> >>>>>>>>>>        }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Stats from a blast done on the NCBI webpage:
> >>>>>>>>>>
> >>>>>>>>>> Database: All non-redundant GenBank CDS
> >>>>>> translations+PDB+SwissProt
> >>>>>>>>>> +PIR+PRF
> >>>>>>>>>> excluding environmental samples from WGS projects
> >>>>>>>>>> Posted date:  Jul 4, 2009  4:41 AM
> >>>>>>>>>> Number of letters in database: -1,125,070,205
> >>>>>>>>>> Number of sequences in database:  9,252,258
> >>>>>>>>>>
> >>>>>>>>>> Lambda     K      H
> >>>>>>>>>> 0.309    0.124    0.340
> >>>>>>>>>> Gapped
> >>>>>>>>>> Lambda     K      H
> >>>>>>>>>> 0.267   0.0410    0.140
> >>>>>>>>>> Matrix: BLOSUM62
> >>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1
> >>>>>>>>>> Number of Sequences: 9252258
> >>>>>>>>>> Number of Hits to DB: 86493230
> >>>>>>>>>> Number of extensions: 3101413
> >>>>>>>>>> Number of successful extensions: 9001
> >>>>>>>>>> Number of sequences better than 100: 65
> >>>>>>>>>> Number of HSP's better than 100 without gapping: 0
> >>>>>>>>>> Number of HSP's gapped: 9000
> >>>>>>>>>> Number of HSP's successfully gapped: 66
> >>>>>>>>>> Length of query: 150
> >>>>>>>>>> Length of database: 3169897087
> >>>>>>>>>> Length adjustment: 113
> >>>>>>>>>> Effective length of query: 37
> >>>>>>>>>> Effective length of database: 2124391933
> >>>>>>>>>> Effective search space: 78602501521
> >>>>>>>>>> Effective search space used: 78602501521
> >>>>>>>>>> T: 11
> >>>>>>>>>> A: 40
> >>>>>>>>>> X1: 16 (7.1 bits)
> >>>>>>>>>> X2: 38 (14.6 bits)
> >>>>>>>>>> X3: 64 (24.7 bits)
> >>>>>>>>>> S1: 42 (20.8 bits)
> >>>>>>>>>> S2: 65 (29.6 bits)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: bioperl-l-bounces@...
> [mailto:bioperl-l-
> >>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
> >>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>>>>>>>>> To: BioPerl List
> >>>>>>>>>>> Subject: [Bioperl-l] different results with
> >>> remote-blast skript
> >>>>>>>>>>>
> >>>>>>>>>>> Hi again :)
> >>>>>>>>>>> please, I only have this little question:
> >>>>>>>>>>> why do I get different results with my remote::blast
> >>>>>> perl skript
> >>>>>>>>>>> then on the
> >>>>>>>>>>> ncbi blast homepage?
> >>>>>>>>>>> I am using blastp, the query is an amino-sequence
> (different
> >>>>>>>>>>> results with any
> >>>>>>>>>>> sequence, differences not only in number of hits but
> >>> even in e-
> >>>>>>>>>>> values, scores
> >>>>>>>>>>> etc...), the database is 'nr'.
> >>>>>>>>>>> PLEASE help me,
> >>>>>>>>>>> thank you in advance,
> >>>>>>>>>>> Jonas
> >>>>>>>>>>>
> >>>>>>>>>>> ps: my skript:
> >>>>>>>>>>>
> >>>>>>>
> >>>>>> ##############################################################
> >>>>>> ################
> >>>>>>>>>>> ##
> >>>>>>>>>>> use Bio::Seq::SeqFactory;
> >>>>>>>>>>> use Bio::Tools::Run::RemoteBlast;
> >>>>>>>>>>> use strict;
> >>>>>>>>>>> my @blast_report;
> >>>>>>>>>>> my $prog = 'blastp';
> >>>>>>>>>>> my $db   = 'nr';
> >>>>>>>>>>> my $e_val= '1e-10';
> >>>>>>>>>>> #my $e_val= '10';
> >>>>>>>>>>> my @params = ( '-prog' => $prog,
> >>>>>>>>>>>      '-data' => $db,
> >>>>>>>>>>>      '-expect' => $e_val,
> >>>>>>>>>>>      '-readmethod' => 'SearchIO' );
> >>>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
> = '11 1';
> >>>>>>>>>>>
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>>>>>>>> $
> >>>>>>>>>>> Bio
> >>>>>>>>>>>
> >>>>>>
> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>>>>>>>>> = '1';
> >>>>>>>>>>>
> >>>>>>>>>>> my
> >>>>>>>>>>> $
> >>>>>>>>>>> blast_seq
> >>>>>>>>>>>
> >>>>>>
> >>>
> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>>>>>>>>
> >>>>>>>
> >>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
> >>>>>> AFRQAHQNTAMATGPD
> >>>>>>>>>>> PDDEYE';
> >>>>>>>>>>> #$v is just to turn on and off the messages
> >>>>>>>>>>> my $v = 1;
> >>>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>>>>>>>>> 'Bio::PrimarySeq');
> >>>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq,
> >>> -display_id =>
> >>>>>>>>>>> "$blast_seq");
> >>>>>>>>>>> my $filename='temp2.out';
> >>>>>>>>>>> my $r = $factory->submit_blast($seq);
> >>>>>>>>>>> print STDERR "waiting..." if( $v > 0 );
> >>>>>>>>>>> while ( my @rids = $factory->each_rid )
> >>>>>>>>>>> {
> >>>>>>>>>>>     foreach my $rid ( @rids )
> >>>>>>>>>>>     {
> >>>>>>>>>>>         my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>>>         if( !ref($rc) )
> >>>>>>>>>>>         {
> >>>>>>>>>>>             if( $rc < 0 )
> >>>>>>>>>>>             {
> >>>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>>             }
> >>>>>>>>>>>             print STDERR "." if ( $v > 0 );
> >>>>>>>>>>>         }
> >>>>>>>>>>>             else
> >>>>>>>>>>>             {
> >>>>>>>>>>>                 my $result = $rc->next_result();
> >>>>>>>>>>>                 $factory->save_output($filename);
> >>>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>>                 print "\nQuery Name: ",
> >>>>>> $result->query_name(),
> >>>>>>>>>>> "\n";
> >>>>>>>>>>>                 while ( my $hit = $result->next_hit )
> >>>>>>>>>>>                 {
> >>>>>>>>>>>                     next unless ( $v > 0);
> >>>>>>>>>>>                     print "\thit name is ",
> >>> $hit->name, "\n";
> >>>>>>>>>>>                     while( my $hsp = $hit->next_hsp )
> >>>>>>>>>>>                     {
> >>>>>>>>>>>                         print "\t\tscore is ",
> >>>>>> $hsp->score, "\n";
> >>>>>>>>>>>                     }
> >>>>>>>>>>>                 }
> >>>>>>>>>>>             }
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> }
> >>>>>>>>>>> @blast_report = get_file_data ($filename);
> >>>>>>>>>>> return @blast_report;
> >>>>>>>>>>>
> >>>>>>>
> >>>>>> ##############################################################
> >>>>>> ################
> >>>>>>>>>>> ####
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>>> Bioperl-l@...
> >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>> =
> >>>>>>>>>> =
> >>>>>>>>>>
> >>>>>>
> >>>
> =====================================================================
> >>>>>>>>>> Attention: The information contained in this message and/or
> >>>>>>>>>> attachments
> >>>>>>>>>> from AgResearch Limited is intended only for the
> >>>>>> persons or entities
> >>>>>>>>>> to which it is addressed and may contain
> confidential and/or
> >>>>>>>>>> privileged
> >>>>>>>>>> material. Any review, retransmission, dissemination
> >>> or other use
> >>>>>>>>>> of, or
> >>>>>>>>>> taking of any action in reliance upon, this information
> >>>>>> by persons or
> >>>>>>>>>> entities other than the intended recipients is
> prohibited by
> >>>>>>>>>> AgResearch
> >>>>>>>>>> Limited. If you have received this message in error,
> >>>>>> please notify
> >>>>>>>>>> the
> >>>>>>>>>> sender immediately.
> >>>>>>>>>> =
> >>>>>>>>>> =
> >>>>>>>>>>
> >>>>>>
> >>>
> =====================================================================
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l@...
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Jason Stajich
> >>>>>>>>> jason@...
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l@...
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> --------------------------------------------------------------
> >>>>>> ----------------
> >>>>>>> --
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> No virus found in this incoming message.
> >>>>>>> Checked by AVG - www.avg.com
> >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
> >>>>>> Date: 07/05/09
> >>>>>>> 05:53:00
> >>>>>>
> >>>>>>
> >>>>>> --------------------------------------------------------------
> >>>>>> ------------------
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> No virus found in this incoming message.
> >>>>>> Checked by AVG - www.avg.com
> >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
> >>>>>> Date: 07/05/09
> >>>>>> 17:54:00
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l@...
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>
> >>>
> >>> --------------------------------------------------------------
> >>> ------------------
> >>>
> >>>
> >>>
> >>> No virus found in this incoming message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
> >>> Date: 07/09/09
> >>> 05:55:00
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@...
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --------------------------------------------------------------
> ------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.9/2229 - Release
> Date: 07/10/09
> 07:05:00
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Cook, Malcolm-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris,

I wound up adding a new test

#       $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $

with the comment :

# malcolm_cook@...: this test is in a separate file from
# RemoteBlast.t (on which it is modelled) since there is some sort of
# side-effecting between the multiple remote blasts that is causing
# this test to fail, if it comes last, or the other test to fail, if
# this one comes first.  THIS IS A BUG EITHER IN REMOTE BLAST OR MY
# UNDERSTANDING, i.e. of how to initialize it.

In any case, the test passes and demos rpsblast usage.

Cheers,


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: Chris Fields [mailto:cjfields1@...]
> Sent: Friday, July 10, 2009 1:05 PM
> To: Cook, Malcolm
> Cc: 'Jonas Schaer'; 'BioPerl List'
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
> Malcolm,
>
> Nice!  Go ahead and add the test in; we can look at trying to
> get CDD_SEARCH working at some point but this is a nice workaround.
>
> chris
>
> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
>
> > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
> > the following.  Is it appropriate to submit it?
> >
> > Jonas, OK, I was a little quick on the gun... but I've got it now.
> >
> > You don't need to change the wrapper.  Here is what you need to do:
> >
> > # 1) set your database like this:
> >
> > -database => 'cdsearch/cdd', # c.f.
> > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
> >  for other cdd database options
> >
> > # 2) add this line before submitting the job:
> > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
> >
> > You're in - No other changes needed.
> >
> > Malcolm Cook
> > Stowers Institute for Medical Research - Kansas City, Missouri
> >
> >
> >> -----Original Message-----
> >> From: Jonas Schaer [mailto:Brotelzwieb@...]
> >> Sent: Friday, July 10, 2009 4:18 AM
> >> To: BioPerl List; Cook, Malcolm; Chris Fields
> >> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>
> >> Hi,
> >> I tried to do what Malcom proposed my ($prog = 'rpsblast';
> >> my $db   =
> >> 'CDD';)  but that didn't work.
> >>
> >> ------------- EXCEPTION: Bio::Root::Exception -------------
> >> MSG: Value rpsblast for PUT parameter PROGRAM does not match
> >> expression t?blast[ pnx]. Rejecting.
> >> STACK: Error::throw
> >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >> C:/Perl/site/lib/Bio/Tools
> >> /Run/RemoteBlast.pm:329
> >> STACK: Bio::Tools::Run::RemoteBlast::new
> >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
> >> ast.pm:257
> >> STACK: blast_a_seq2.pm:14
> >> -----------------------------------------------------------
> >> So I should try to "change the wrapper to allow
> 'rpsblast'", right?
> >> Could You tell me how to do that, please? So sorry but I
> have no idea
> >> yet...:) If that doesn't work, is there any other way to run
> >> cdd-searches with perl?
> >> Thank you so much!
> >> Regards, Jonas
> >>
> >> ----- Original Message -----
> >> From: "Chris Fields" <cjfields1@...>
> >> To: "Cook, Malcolm" <MEC@...>
> >> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
> >> <bioperl-l@...>; "'Smithies, Russell'"
> >> <Russell.Smithies@...>; <cjfields@...>
> >> Sent: Thursday, July 09, 2009 9:19 PM
> >> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>
> >>
> >>> I've scheduled this tentatively for the 1.6 release
> series (just not
> >>> sure when yet).  It may work as is, but I haven't tried
> it out yet
> >>> (and am hazarding to guess it only retrieves the single
> main RID at
> >>> the moment).
> >>>
> >>> chris
> >>>
> >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
> >>>
> >>>> Jonas,
> >>>>
> >>>> If you want to continue to use the bioperl remoteblast
> interface,
> >>>> probably what you should do is simply call it twice.
> >>>>
> >>>> Once, as you already know how to do, which will return
> without CDD
> >>>> results.
> >>>>
> >>>> Secondly, to get the CDD results, call remoteblast a second time.
> >>>> This time, using
> >>>> -database => 'CDD'
> >>>> -program => 'rpsblast'
> >>>>
> >>>> However, the wrapper may object to the 'rpsblast'
> program.  It is
> >>>> not listed in the POD -
> >>>>
> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
> >> emoteBlast.pm)
> >>>>   If so, my guess is that changing the perl wrapper to allow
> >>>> rpsblast will "just work" (tm).  I've cc:ed
> >> cjfields@... for
> >>>> his opinion on this.
> >>>>
> >>>> Also, you might want to perform the CDD search first,
> especially if
> >>>> you are streaming results to eyeball that might like
> something to
> >>>> look at while the second (presumably longer) search is running.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Malcolm Cook
> >>>> Stowers Institute for Medical Research - Kansas City, Missouri
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces@...
> >>>>> [mailto:bioperl-l-bounces@...] On Behalf
> Of Jonas
> >>>>> Schaer
> >>>>> Sent: Thursday, July 09, 2009 5:16 AM
> >>>>> To: BioPerl List; Smithies, Russell
> >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>>>
> >>>>> Hi guys,
> >>>>> Thank you all so much for your help and patience :). Of
> course you
> >>>>> were right and I finaly found the right put-parameter to get
> >>>>> exactly the same hits as on the homepage.
> >>>>> I do have an other question though :)...
> >>>>> I now want to include a search for conserved domains,
> but when I
> >>>>> try to use the CDD_SEARCH-parameter
> >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
> >>>>> sub:CDD_SEARCH)
> >>>>> like the other put-parameters the way chris once told me(works
> >>>>> fine with the other params):
> >>>>>
> >>>>> my %put = (
> >>>>>   WORD_SIZE => 3,
> >>>>>   HITLIST_SIZE => 100,
> >>>>>   THRESHOLD => 11,
> >>>>>   FILTER => 'R',
> >>>>>   GENETIC_CODE => 1,
> >>>>>   CDD_SEARCH => 'on'
> >>>>> ###I tried it
> >>>>> with 'true' and '1', too.
> >>>>>
> >>>>> );
> >>>>>
> >>>>> for my $putName (keys %put) {
> >>>>>   $factory->submit_parameter($putName,$put{$putName});
> >>>>> }
> >>>>>
> >>>>>
> >>>>> ...an exception is thrown:
> >>>>>
> >>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
> >>>>> STACK: Error::throw
> >>>>> STACK: Bio::Root::Root::throw
> >> C:/Perl/site/lib/Bio/Root/Root.pm:359
> >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>>> C:/Perl/site/lib/Bio/Tools
> >>>>> /Run/RemoteBlast.pm:325
> >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
> >>>>> STACK: main::blast_it firsteval0.8.pm:288
> >>>>> STACK: firsteval0.8.pm:35
> >>>>> ----------------------------------------------------------- .
> >>>>> I guess somehow this could be the solution to my problem:
> >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
> >>>>> ub:RID-for-Simultaneous
> >>>>> , but unfortunately I don't understand what to do.
> >>>>> I'm so sorry to bother you with this but please help me once
> >>>>> more...:)
> >>>>>
> >>>>> Best regards and thanks in advance, Jonas
> >>>>>
> >>>>> ----- Original Message -----
> >>>>> From: "Smithies, Russell" <Russell.Smithies@...>
> >>>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
> >>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
> >>>>> <bioperl-l@...>
> >>>>> Sent: Monday, July 06, 2009 10:56 PM
> >>>>> Subject: RE: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>
> >>>>>
> >>>>> Hi Jonas,
> >>>>> You can't just play with the BLAST parameters and hope
> >> for a "better"
> >>>>> result.
> >>>>> I'd suggest that if you aren't sure what they do, you
> should leave
> >>>>> them alone as small changes can make huge differences in the
> >>>>> output - it's quite possible to miss finding what
> you're looking
> >>>>> for by using
> >> the wrong
> >>>>> parameters.
> >>>>> If all else fails, read the blast manual:
> >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
> >>>>> _all.html
> >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> >>>>> Or Read Ian Korfs' excellent book:
> >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
> >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
> >>>>>
> >>>>> Don't worry about the integer overflow bug as there's
> nothing you
> >>>>> can do about it. If you're interested, Google and Wikipedia are
> >>>>> your
> >>>>> friends:
> >>>>> http://en.wikipedia.org/wiki/Integer_overflow
> >>>>>
> >>>>>
> >>>>> Russell
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
> >>>>>> bounces@...] On Behalf Of Jonas Schaer
> >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
> >>>>>> To: BioPerl List; Chris Fields
> >>>>>> Subject: Re: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>
> >>>>>> Hi guys, thanks for your answers so far.
> >>>>>> @jason: integer overflow in blast.... sorry, but what do
> >>>>> you mean by that?
> >>>>>> how can I fix it...?
> >>>>>>
> >>>>>> Since I never really changed any parameters I thought them
> >>>>> all to be
> >>>>>> default.
> >>>>>> whatever, I tried to get "better" results with my prog
> >> by changing
> >>>>>> these:
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>>>
> >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
> >>>>> STICS'} =
> >>>>>> '1';
> >>>>>> with no effect...I guess these were default values anyway.
> >>>>>>
> >>>>>> So please maybe you can tell me all the other parameters I
> >>>>> can change with
> >>>>>> my
> >>>>>> perl-skript AND how to do that?
> >>>>>> Unfortunately both, perl and the blast-algorithm are pretty
> >>>>> much new to
> >>>>>> me,
> >>>>>> maybe thats why I just cannot find out how to do that on my
> >>>>> own... :/
> >>>>>>
> >>>>>> Here is the output I get with my remote-blast skript:
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################
> >>>>>> Query Name:
> >>>>>>
> >>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> >>>>>> L
> >>>>>>       hit name is ref|XP_001702807.1|
> >>>>>>               score is 442
> >>>>>> BLASTP 2.2.21+
> >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
> >>>>> A. Schaffer,
> >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> >>>>> Lipman (1997),
> >>>>>> "Gapped
> >>>>>> BLAST and PSI-BLAST: a new generation of protein
> database search
> >>>>>> programs", Nucleic Acids Res. 25:3389-3402.
> >>>>>>
> >>>>>>
> >>>>>> Reference for composition-based statistics: Alejandro A.
> >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
> >>>>> John L. Spouge,
> >>>>>> Yuri
> >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
> >>>>> "Improving the
> >>>>>> accuracy of PSI-BLAST protein database searches with
> >>>>> composition-based
> >>>>>> statistics and other refinements", Nucleic Acids Res.
> >> 29:2994-3005.
> >>>>>>
> >>>>>>
> >>>>>> RID: 53STX5G2013
> >>>>>>
> >>>>>>
> >>>>>> Database: All non-redundant GenBank CDS
> >>>>>> translations+PDB+SwissProt+PIR+PRF excluding
> >> environmental samples
> >>>>>> from WGS projects
> >>>>>>          9,252,587 sequences; 3,169,972,781 total
> letters Query=
> >>>>>>
> >>>>>
> >>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> >>>>>>
> >>>>>
> >>
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA
> >> M
> >>>>>> ATGPDPDDEYE
> >>>>>> Length=150
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>      Score
> >>>>>> E
> >>>>>> Sequences producing significant alignments:
> >>>>>     (Bits)
> >>>>>> Value
> >>>>>>
> >>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>> reinhard...   174
> >>>>>> 2e-42
> >>>>>>
> >>>>>>
> >>>>>> ALIGNMENTS
> >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
> >> reinhardtii]
> >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> >>>>>> Length=303
> >>>>>>
> >>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
> >>>>> Composition-based
> >>>>>> stats.
> >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
> >>>>> Gaps = 0/150
> >>>>>> (0%)
> >>>>>>
> >>>>>> Query  1
> >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> >>>>>> 60
> >>>>>>
> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>> Sbjct  154
> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>> 213
> >>>>>>
> >>>>>> Query  61
> >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> 120
> >>>>>>
> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> Sbjct  214
> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> 273
> >>>>>>
> >>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
> >>>>>>           AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct  274
> >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Database: All non-redundant GenBank CDS
> >>>>>> translations+PDB+SwissProt+PIR+PRF
> >>>>>> excluding environmental samples from WGS projects
> >>>>>>   Posted date:  Jul 5, 2009  4:41 AM Number of letters in
> >>>>>> database: -1,124,994,511 Number of sequences in database:
> >>>>>> 9,252,587
> >>>>>>
> >>>>>> Lambda     K      H
> >>>>>>  0.309    0.122    0.345
> >>>>>> Gapped
> >>>>>> Lambda     K      H
> >>>>>>  0.267   0.0410    0.140
> >>>>>> Matrix: BLOSUM62
> >>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
> Sequences:
> >>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions:
> >>>>>> 1448367 Number of successful extensions: 2103 Number
> of sequences
> >>>>>> better than 10: 0 Number of HSP's better than 10
> without gapping:
> >>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully
> >>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781
> >>>>>> Length adjustment: 113 Effective length of query: 37 Effective
> >>>>>> length of database: 2124430450 Effective search space:
> >>>>>> 78603926650 Effective search space used: 78603926650
> >>>>>> T: 11
> >>>>>> A: 40
> >>>>>> X1: 16 (7.1 bits)
> >>>>>> X2: 38 (14.6 bits)
> >>>>>> X3: 64 (24.7 bits)
> >>>>>> S1: 42 (20.8 bits)
> >>>>>> S2: 74 (33.1 bits)
> >>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################
> >>>>>> and here are the hits (?) of the blast-algorithm on the
> >>>>> ncbi-homepage with
> >>>>>> the same query of course:
> >>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>> reinhard...   300
> >>>>>> 3e-80
> >>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
> >>>>> [Acyrtho...  36.2
> >>>>>> 1.1
> >>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
> >>>>> [Blautia...  35.4
> >>>>>> 1.8
> >>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
> >>>>> brazil...  34.3
> >>>>>> 4.2
> >>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
> >>>>> [Aspergillus n...  33.5
> >>>>>> 6.0
> >>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
> >>>>> [Methyloba...  33.5
> >>>>>> 7.0
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################at
> >>>>>> least the first hit is the same, but even there there is a
> >>>>> different score
> >>>>>> and e-value.
> >>>>>>
> >>>>>> thanks so much for any help :)
> >>>>>> regards, jonas
> >>>>>>
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>> From: "Chris Fields" <cjfields@...>
> >>>>>> To: "Jason Stajich" <jason@...>
> >>>>>> Cc: "Smithies, Russell"
> >>>>> <Russell.Smithies@...>; "'BioPerl
> >>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
> >>>>>> <Jonas_Schaer@...>
> >>>>>> Sent: Monday, July 06, 2009 12:51 AM
> >>>>>> Subject: Re: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>
> >>>>>>
> >>>>>>> That inspires confidence ;>
> >>>>>>>
> >>>>>>> chris
> >>>>>>>
> >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >>>>>>>
> >>>>>>>> integer overflow in blast....
> >>>>>>>>
> >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>>>>>>>
> >>>>>>>>> I'd guess it's a difference in the parameters used.
> >>>>>>>>> Interesting that both have the number of letters in
> the db as
> >>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
> >>>>>>>>>
> >>>>>>>>> Stats from your remote_blast:
> >>>>>>>>>
> >>>>>>>>> 'stats' => {
> >>>>>>>>>          'S1' => '42',
> >>>>>>>>>          'S1_bits' => '20.8',
> >>>>>>>>>          'lambda' => '0.309',
> >>>>>>>>>          'entropy' => '0.345',
> >>>>>>>>>          'kappa_gapped' => '0.0410',
> >>>>>>>>>          'T' => '11',
> >>>>>>>>>          'kappa' => '0.122',
> >>>>>>>>>          'X3_bits' => '24.7',
> >>>>>>>>>          'X1' => '16',
> >>>>>>>>>          'lambda_gapped' => '0.267',
> >>>>>>>>>          'X2' => '38',
> >>>>>>>>>          'S2' => '74',
> >>>>>>>>>          'seqs_better_than_cutoff' => '0',
> >>>>>>>>>          'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>>>>>>>          'Hits_to_DB' => '60102303',
> >>>>>>>>>          'dbletters' => '-1125070205',
> >>>>>>>>>          'A' => '40',
> >>>>>>>>>          'num_successful_extensions' => '2004',
> >>>>>>>>>          'num_extensions' => '1436892',
> >>>>>>>>>          'X1_bits' => '7.1',
> >>>>>>>>>          'X3' => '64',
> >>>>>>>>>          'entropy_gapped' => '0.140',
> >>>>>>>>>          'dbentries' => '9252258',
> >>>>>>>>>          'X2_bits' => '14.6',
> >>>>>>>>>          'S2_bits' => '33.1'
> >>>>>>>>>        }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Stats from a blast done on the NCBI webpage:
> >>>>>>>>>
> >>>>>>>>> Database: All non-redundant GenBank CDS
> >>>>> translations+PDB+SwissProt
> >>>>>>>>> +PIR+PRF
> >>>>>>>>> excluding environmental samples from WGS projects
> Posted date:
> >>>>>>>>> Jul 4, 2009  4:41 AM Number of letters in database:
> >>>>>>>>> -1,125,070,205 Number of sequences in database:  9,252,258
> >>>>>>>>>
> >>>>>>>>> Lambda     K      H
> >>>>>>>>> 0.309    0.124    0.340
> >>>>>>>>> Gapped
> >>>>>>>>> Lambda     K      H
> >>>>>>>>> 0.267   0.0410    0.140
> >>>>>>>>> Matrix: BLOSUM62
> >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
> >>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of
> >>>>>>>>> extensions: 3101413 Number of successful extensions: 9001
> >>>>>>>>> Number of sequences better than 100: 65 Number of
> HSP's better
> >>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000
> >>>>>>>>> Number of HSP's successfully gapped: 66 Length of
> query: 150
> >>>>>>>>> Length of database: 3169897087 Length adjustment: 113
> >>>>>>>>> Effective length of query: 37 Effective length of database:
> >>>>>>>>> 2124391933 Effective search space: 78602501521 Effective
> >>>>>>>>> search space used: 78602501521
> >>>>>>>>> T: 11
> >>>>>>>>> A: 40
> >>>>>>>>> X1: 16 (7.1 bits)
> >>>>>>>>> X2: 38 (14.6 bits)
> >>>>>>>>> X3: 64 (24.7 bits)
> >>>>>>>>> S1: 42 (20.8 bits)
> >>>>>>>>> S2: 65 (29.6 bits)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces@...
> [mailto:bioperl-l-
> >>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
> >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>>>>>>>> To: BioPerl List
> >>>>>>>>>> Subject: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>>>>>
> >>>>>>>>>> Hi again :)
> >>>>>>>>>> please, I only have this little question:
> >>>>>>>>>> why do I get different results with my remote::blast
> >>>>> perl skript
> >>>>>>>>>> then on the
> >>>>>>>>>> ncbi blast homepage?
> >>>>>>>>>> I am using blastp, the query is an amino-sequence
> (different
> >>>>>>>>>> results with any sequence, differences not only in
> number of
> >>>>>>>>>> hits but
> >> even in e-
> >>>>>>>>>> values, scores
> >>>>>>>>>> etc...), the database is 'nr'.
> >>>>>>>>>> PLEASE help me,
> >>>>>>>>>> thank you in advance,
> >>>>>>>>>> Jonas
> >>>>>>>>>>
> >>>>>>>>>> ps: my skript:
> >>>>>>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>>>>>> ##
> >>>>>>>>>> use Bio::Seq::SeqFactory;
> >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my
> >>>>>>>>>> @blast_report; my $prog = 'blastp';
> >>>>>>>>>> my $db   = 'nr';
> >>>>>>>>>> my $e_val= '1e-10';
> >>>>>>>>>> #my $e_val= '10';
> >>>>>>>>>> my @params = ( '-prog' => $prog,
> >>>>>>>>>>      '-data' => $db,
> >>>>>>>>>>      '-expect' => $e_val,
> >>>>>>>>>>      '-readmethod' => 'SearchIO' ); my $factory =
> >>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
> = '11 1';
> >>>>>>>>>>
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} =
> '10'; $ Bio
> >>>>>>>>>>
> >>>>>
> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>>>>>>>> = '1';
> >>>>>>>>>>
> >>>>>>>>>> my
> >>>>>>>>>> $
> >>>>>>>>>> blast_seq
> >>>>>>>>>>
> >>>>>
> >>
> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>>>>>>>
> >>>>>>
> >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
> >>>>> AFRQAHQNTAMATGPD
> >>>>>>>>>> PDDEYE';
> >>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my
> >>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq
> >>>>>>>>>> =>$blast_seq,
> >> -display_id =>
> >>>>>>>>>> "$blast_seq");
> >>>>>>>>>> my $filename='temp2.out';
> >>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR
> >>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids =
> >>>>>>>>>> $factory->each_rid ) {
> >>>>>>>>>>     foreach my $rid ( @rids )
> >>>>>>>>>>     {
> >>>>>>>>>>         my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>>         if( !ref($rc) )
> >>>>>>>>>>         {
> >>>>>>>>>>             if( $rc < 0 )
> >>>>>>>>>>             {
> >>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>             }
> >>>>>>>>>>             print STDERR "." if ( $v > 0 );
> >>>>>>>>>>         }
> >>>>>>>>>>             else
> >>>>>>>>>>             {
> >>>>>>>>>>                 my $result = $rc->next_result();
> >>>>>>>>>>                 $factory->save_output($filename);
> >>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>                 print "\nQuery Name: ",
> >>>>> $result->query_name(),
> >>>>>>>>>> "\n";
> >>>>>>>>>>                 while ( my $hit = $result->next_hit )
> >>>>>>>>>>                 {
> >>>>>>>>>>                     next unless ( $v > 0);
> >>>>>>>>>>                     print "\thit name is ",
> >> $hit->name, "\n";
> >>>>>>>>>>                     while( my $hsp = $hit->next_hsp )
> >>>>>>>>>>                     {
> >>>>>>>>>>                         print "\t\tscore is ",
> >>>>> $hsp->score, "\n";
> >>>>>>>>>>                     }
> >>>>>>>>>>                 }
> >>>>>>>>>>             }
> >>>>>>>>>>     }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> }
> >>>>>>>>>> @blast_report = get_file_data ($filename); return
> >>>>>>>>>> @blast_report;
> >>>>>>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>>>>>> ####
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l@...
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>> =
> >>>>>>>>> =
> >>>>>>>>>
> >>>>>
> >>
> =====================================================================
> >>>>>>>>> Attention: The information contained in this message and/or
> >>>>>>>>> attachments from AgResearch Limited is intended only for the
> >>>>> persons or entities
> >>>>>>>>> to which it is addressed and may contain
> confidential and/or
> >>>>>>>>> privileged material. Any review, retransmission,
> dissemination
> >> or other use
> >>>>>>>>> of, or
> >>>>>>>>> taking of any action in reliance upon, this information
> >>>>> by persons or
> >>>>>>>>> entities other than the intended recipients is
> prohibited by
> >>>>>>>>> AgResearch Limited. If you have received this message in
> >>>>>>>>> error,
> >>>>> please notify
> >>>>>>>>> the
> >>>>>>>>> sender immediately.
> >>>>>>>>> =
> >>>>>>>>> =
> >>>>>>>>>
> >>>>>
> >>
> =====================================================================
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l@...
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jason Stajich
> >>>>>>>> jason@...
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l@...
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> --------------------------------------------------------------
> >>>>> ----------------
> >>>>>> --
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> No virus found in this incoming message.
> >>>>>> Checked by AVG - www.avg.com
> >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
> >>>>> Date: 07/05/09
> >>>>>> 05:53:00
> >>>>>
> >>>>>
> >>>>> --------------------------------------------------------------
> >>>>> ------------------
> >>>>>
> >>>>>
> >>>>>
> >>>>> No virus found in this incoming message.
> >>>>> Checked by AVG - www.avg.com
> >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
> >>>>> Date: 07/05/09
> >>>>> 17:54:00
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l@...
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>
> >>
> >> --------------------------------------------------------------
> >> ------------------
> >>
> >>
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com
> >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
> >> Date: 07/09/09
> >> 05:55:00
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@...
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Chris Fields-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Malcolm, it's probably not you.  Looks like the get/put parameters are  
set as globals, so there may be cross-contamination of instances  
(worth checking JIC).

You can probably work around that to an extent by encompassing any  
calls in blocks to localize changes.

chris

On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote:

> Chris,
>
> I wound up adding a new test
>
> #       $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $
>
> with the comment :
>
> # malcolm_cook@...: this test is in a separate file from
> # RemoteBlast.t (on which it is modelled) since there is some sort of
> # side-effecting between the multiple remote blasts that is causing
> # this test to fail, if it comes last, or the other test to fail, if
> # this one comes first.  THIS IS A BUG EITHER IN REMOTE BLAST OR MY
> # UNDERSTANDING, i.e. of how to initialize it.
>
> In any case, the test passes and demos rpsblast usage.
>
> Cheers,
>
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields1@...]
>> Sent: Friday, July 10, 2009 1:05 PM
>> To: Cook, Malcolm
>> Cc: 'Jonas Schaer'; 'BioPerl List'
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Malcolm,
>>
>> Nice!  Go ahead and add the test in; we can look at trying to
>> get CDD_SEARCH working at some point but this is a nice workaround.
>>
>> chris
>>
>> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
>>
>>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
>>> the following.  Is it appropriate to submit it?
>>>
>>> Jonas, OK, I was a little quick on the gun... but I've got it now.
>>>
>>> You don't need to change the wrapper.  Here is what you need to do:
>>>
>>> # 1) set your database like this:
>>>
>>> -database => 'cdsearch/cdd', # c.f.
>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
>>> for other cdd database options
>>>
>>> # 2) add this line before submitting the job:
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
>>>
>>> You're in - No other changes needed.
>>>
>>> Malcolm Cook
>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jonas Schaer [mailto:Brotelzwieb@...]
>>>> Sent: Friday, July 10, 2009 4:18 AM
>>>> To: BioPerl List; Cook, Malcolm; Chris Fields
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>> Hi,
>>>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
>>>> my $db   =
>>>> 'CDD';)  but that didn't work.
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
>>>> expression t?blast[ pnx]. Rejecting.
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:/Perl/site/lib/Bio/Tools
>>>> /Run/RemoteBlast.pm:329
>>>> STACK: Bio::Tools::Run::RemoteBlast::new
>>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
>>>> ast.pm:257
>>>> STACK: blast_a_seq2.pm:14
>>>> -----------------------------------------------------------
>>>> So I should try to "change the wrapper to allow
>> 'rpsblast'", right?
>>>> Could You tell me how to do that, please? So sorry but I
>> have no idea
>>>> yet...:) If that doesn't work, is there any other way to run
>>>> cdd-searches with perl?
>>>> Thank you so much!
>>>> Regards, Jonas
>>>>
>>>> ----- Original Message -----
>>>> From: "Chris Fields" <cjfields1@...>
>>>> To: "Cook, Malcolm" <MEC@...>
>>>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
>>>> <bioperl-l@...>; "'Smithies, Russell'"
>>>> <Russell.Smithies@...>; <cjfields@...>
>>>> Sent: Thursday, July 09, 2009 9:19 PM
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>>
>>>>> I've scheduled this tentatively for the 1.6 release
>> series (just not
>>>>> sure when yet).  It may work as is, but I haven't tried
>> it out yet
>>>>> (and am hazarding to guess it only retrieves the single
>> main RID at
>>>>> the moment).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>>>>>
>>>>>> Jonas,
>>>>>>
>>>>>> If you want to continue to use the bioperl remoteblast
>> interface,
>>>>>> probably what you should do is simply call it twice.
>>>>>>
>>>>>> Once, as you already know how to do, which will return
>> without CDD
>>>>>> results.
>>>>>>
>>>>>> Secondly, to get the CDD results, call remoteblast a second time.
>>>>>> This time, using
>>>>>> -database => 'CDD'
>>>>>> -program => 'rpsblast'
>>>>>>
>>>>>> However, the wrapper may object to the 'rpsblast'
>> program.  It is
>>>>>> not listed in the POD -
>>>>>>
>>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
>>>> emoteBlast.pm)
>>>>>>  If so, my guess is that changing the perl wrapper to allow
>>>>>> rpsblast will "just work" (tm).  I've cc:ed
>>>> cjfields@... for
>>>>>> his opinion on this.
>>>>>>
>>>>>> Also, you might want to perform the CDD search first,
>> especially if
>>>>>> you are streaming results to eyeball that might like
>> something to
>>>>>> look at while the second (presumably longer) search is running.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Malcolm Cook
>>>>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces@...
>>>>>>> [mailto:bioperl-l-bounces@...] On Behalf
>> Of Jonas
>>>>>>> Schaer
>>>>>>> Sent: Thursday, July 09, 2009 5:16 AM
>>>>>>> To: BioPerl List; Smithies, Russell
>>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>>>>
>>>>>>> Hi guys,
>>>>>>> Thank you all so much for your help and patience :). Of
>> course you
>>>>>>> were right and I finaly found the right put-parameter to get
>>>>>>> exactly the same hits as on the homepage.
>>>>>>> I do have an other question though :)...
>>>>>>> I now want to include a search for conserved domains,
>> but when I
>>>>>>> try to use the CDD_SEARCH-parameter
>>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>>>>>> sub:CDD_SEARCH)
>>>>>>> like the other put-parameters the way chris once told me(works
>>>>>>> fine with the other params):
>>>>>>>
>>>>>>> my %put = (
>>>>>>>  WORD_SIZE => 3,
>>>>>>>  HITLIST_SIZE => 100,
>>>>>>>  THRESHOLD => 11,
>>>>>>>  FILTER => 'R',
>>>>>>>  GENETIC_CODE => 1,
>>>>>>>  CDD_SEARCH => 'on'
>>>>>>> ###I tried it
>>>>>>> with 'true' and '1', too.
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>> for my $putName (keys %put) {
>>>>>>>  $factory->submit_parameter($putName,$put{$putName});
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ...an exception is thrown:
>>>>>>>
>>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>>>>>> STACK: Error::throw
>>>>>>> STACK: Bio::Root::Root::throw
>>>> C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>>>>> C:/Perl/site/lib/Bio/Tools
>>>>>>> /Run/RemoteBlast.pm:325
>>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>>>>>> STACK: main::blast_it firsteval0.8.pm:288
>>>>>>> STACK: firsteval0.8.pm:35
>>>>>>> ----------------------------------------------------------- .
>>>>>>> I guess somehow this could be the solution to my problem:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>>>>>> ub:RID-for-Simultaneous
>>>>>>> , but unfortunately I don't understand what to do.
>>>>>>> I'm so sorry to bother you with this but please help me once
>>>>>>> more...:)
>>>>>>>
>>>>>>> Best regards and thanks in advance, Jonas
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> From: "Smithies, Russell" <Russell.Smithies@...>
>>>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
>>>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
>>>>>>> <bioperl-l@...>
>>>>>>> Sent: Monday, July 06, 2009 10:56 PM
>>>>>>> Subject: RE: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>
>>>>>>>
>>>>>>> Hi Jonas,
>>>>>>> You can't just play with the BLAST parameters and hope
>>>> for a "better"
>>>>>>> result.
>>>>>>> I'd suggest that if you aren't sure what they do, you
>> should leave
>>>>>>> them alone as small changes can make huge differences in the
>>>>>>> output - it's quite possible to miss finding what
>> you're looking
>>>>>>> for by using
>>>> the wrong
>>>>>>> parameters.
>>>>>>> If all else fails, read the blast manual:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>>>>>> _all.html
>>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>>>>>> Or Read Ian Korfs' excellent book:
>>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>>>>>
>>>>>>> Don't worry about the integer overflow bug as there's
>> nothing you
>>>>>>> can do about it. If you're interested, Google and Wikipedia are
>>>>>>> your
>>>>>>> friends:
>>>>>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>>>>>
>>>>>>>
>>>>>>> Russell
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>>>>>> To: BioPerl List; Chris Fields
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>> Hi guys, thanks for your answers so far.
>>>>>>>> @jason: integer overflow in blast.... sorry, but what do
>>>>>>> you mean by that?
>>>>>>>> how can I fix it...?
>>>>>>>>
>>>>>>>> Since I never really changed any parameters I thought them
>>>>>>> all to be
>>>>>>>> default.
>>>>>>>> whatever, I tried to get "better" results with my prog
>>>> by changing
>>>>>>>> these:
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>>
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>>>>>> STICS'} =
>>>>>>>> '1';
>>>>>>>> with no effect...I guess these were default values anyway.
>>>>>>>>
>>>>>>>> So please maybe you can tell me all the other parameters I
>>>>>>> can change with
>>>>>>>> my
>>>>>>>> perl-skript AND how to do that?
>>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>>>>>> much new to
>>>>>>>> me,
>>>>>>>> maybe thats why I just cannot find out how to do that on my
>>>>>>> own... :/
>>>>>>>>
>>>>>>>> Here is the output I get with my remote-blast skript:
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> Query Name:
>>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>>>>>> L
>>>>>>>>      hit name is ref|XP_001702807.1|
>>>>>>>>              score is 442
>>>>>>>> BLASTP 2.2.21+
>>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>>>>>> A. Schaffer,
>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>>>>>> Lipman (1997),
>>>>>>>> "Gapped
>>>>>>>> BLAST and PSI-BLAST: a new generation of protein
>> database search
>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>>>>
>>>>>>>>
>>>>>>>> Reference for composition-based statistics: Alejandro A.
>>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>>>>>> John L. Spouge,
>>>>>>>> Yuri
>>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>>>>>> "Improving the
>>>>>>>> accuracy of PSI-BLAST protein database searches with
>>>>>>> composition-based
>>>>>>>> statistics and other refinements", Nucleic Acids Res.
>>>> 29:2994-3005.
>>>>>>>>
>>>>>>>>
>>>>>>>> RID: 53STX5G2013
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
>>>> environmental samples
>>>>>>>> from WGS projects
>>>>>>>>         9,252,587 sequences; 3,169,972,781 total
>> letters Query=
>>>>>>>>
>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>>>>>
>>>>>>>
>>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA
>>>> M
>>>>>>>> ATGPDPDDEYE
>>>>>>>> Length=150
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>     Score
>>>>>>>> E
>>>>>>>> Sequences producing significant alignments:
>>>>>>>    (Bits)
>>>>>>>> Value
>>>>>>>>
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   174
>>>>>>>> 2e-42
>>>>>>>>
>>>>>>>>
>>>>>>>> ALIGNMENTS
>>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
>>>> reinhardtii]
>>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>>>>>> Length=303
>>>>>>>>
>>>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>>>>>> Composition-based
>>>>>>>> stats.
>>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>>>>>> Gaps = 0/150
>>>>>>>> (0%)
>>>>>>>>
>>>>>>>> Query  1
>>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>>>>>> 60
>>>>>>>>
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> Sbjct  154
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> 213
>>>>>>>>
>>>>>>>> Query  61
>>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 120
>>>>>>>>
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> Sbjct  214
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 273
>>>>>>>>
>>>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>>>>>          AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct  274
>>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF
>>>>>>>> excluding environmental samples from WGS projects
>>>>>>>>  Posted date:  Jul 5, 2009  4:41 AM Number of letters in
>>>>>>>> database: -1,124,994,511 Number of sequences in database:
>>>>>>>> 9,252,587
>>>>>>>>
>>>>>>>> Lambda     K      H
>>>>>>>> 0.309    0.122    0.345
>>>>>>>> Gapped
>>>>>>>> Lambda     K      H
>>>>>>>> 0.267   0.0410    0.140
>>>>>>>> Matrix: BLOSUM62
>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>> Sequences:
>>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions:
>>>>>>>> 1448367 Number of successful extensions: 2103 Number
>> of sequences
>>>>>>>> better than 10: 0 Number of HSP's better than 10
>> without gapping:
>>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully
>>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781
>>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective
>>>>>>>> length of database: 2124430450 Effective search space:
>>>>>>>> 78603926650 Effective search space used: 78603926650
>>>>>>>> T: 11
>>>>>>>> A: 40
>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>> S2: 74 (33.1 bits)
>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> and here are the hits (?) of the blast-algorithm on the
>>>>>>> ncbi-homepage with
>>>>>>>> the same query of course:
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   300
>>>>>>>> 3e-80
>>>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>>>>>> [Acyrtho...  36.2
>>>>>>>> 1.1
>>>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>>>>>> [Blautia...  35.4
>>>>>>>> 1.8
>>>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>>>>>> brazil...  34.3
>>>>>>>> 4.2
>>>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>>>>>> [Aspergillus n...  33.5
>>>>>>>> 6.0
>>>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>>>>>> [Methyloba...  33.5
>>>>>>>> 7.0
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################at
>>>>>>>> least the first hit is the same, but even there there is a
>>>>>>> different score
>>>>>>>> and e-value.
>>>>>>>>
>>>>>>>> thanks so much for any help :)
>>>>>>>> regards, jonas
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Chris Fields" <cjfields@...>
>>>>>>>> To: "Jason Stajich" <jason@...>
>>>>>>>> Cc: "Smithies, Russell"
>>>>>>> <Russell.Smithies@...>; "'BioPerl
>>>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>>>>>>>> <Jonas_Schaer@...>
>>>>>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>>
>>>>>>>>> That inspires confidence ;>
>>>>>>>>>
>>>>>>>>> chris
>>>>>>>>>
>>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>>>>>
>>>>>>>>>> integer overflow in blast....
>>>>>>>>>>
>>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>>>>>
>>>>>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>>>>>> Interesting that both have the number of letters in
>> the db as
>>>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>>>>>
>>>>>>>>>>> Stats from your remote_blast:
>>>>>>>>>>>
>>>>>>>>>>> 'stats' => {
>>>>>>>>>>>         'S1' => '42',
>>>>>>>>>>>         'S1_bits' => '20.8',
>>>>>>>>>>>         'lambda' => '0.309',
>>>>>>>>>>>         'entropy' => '0.345',
>>>>>>>>>>>         'kappa_gapped' => '0.0410',
>>>>>>>>>>>         'T' => '11',
>>>>>>>>>>>         'kappa' => '0.122',
>>>>>>>>>>>         'X3_bits' => '24.7',
>>>>>>>>>>>         'X1' => '16',
>>>>>>>>>>>         'lambda_gapped' => '0.267',
>>>>>>>>>>>         'X2' => '38',
>>>>>>>>>>>         'S2' => '74',
>>>>>>>>>>>         'seqs_better_than_cutoff' => '0',
>>>>>>>>>>>         'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>>>>>         'Hits_to_DB' => '60102303',
>>>>>>>>>>>         'dbletters' => '-1125070205',
>>>>>>>>>>>         'A' => '40',
>>>>>>>>>>>         'num_successful_extensions' => '2004',
>>>>>>>>>>>         'num_extensions' => '1436892',
>>>>>>>>>>>         'X1_bits' => '7.1',
>>>>>>>>>>>         'X3' => '64',
>>>>>>>>>>>         'entropy_gapped' => '0.140',
>>>>>>>>>>>         'dbentries' => '9252258',
>>>>>>>>>>>         'X2_bits' => '14.6',
>>>>>>>>>>>         'S2_bits' => '33.1'
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>>>>>
>>>>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>> translations+PDB+SwissProt
>>>>>>>>>>> +PIR+PRF
>>>>>>>>>>> excluding environmental samples from WGS projects
>> Posted date:
>>>>>>>>>>> Jul 4, 2009  4:41 AM Number of letters in database:
>>>>>>>>>>> -1,125,070,205 Number of sequences in database:  9,252,258
>>>>>>>>>>>
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.309    0.124    0.340
>>>>>>>>>>> Gapped
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.267   0.0410    0.140
>>>>>>>>>>> Matrix: BLOSUM62
>>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of
>>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001
>>>>>>>>>>> Number of sequences better than 100: 65 Number of
>> HSP's better
>>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000
>>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of
>> query: 150
>>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113
>>>>>>>>>>> Effective length of query: 37 Effective length of database:
>>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective
>>>>>>>>>>> search space used: 78602501521
>>>>>>>>>>> T: 11
>>>>>>>>>>> A: 40
>>>>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>>>>> S2: 65 (29.6 bits)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces@...
>> [mailto:bioperl-l-
>>>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>>>>>> To: BioPerl List
>>>>>>>>>>>> Subject: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>>>>>
>>>>>>>>>>>> Hi again :)
>>>>>>>>>>>> please, I only have this little question:
>>>>>>>>>>>> why do I get different results with my remote::blast
>>>>>>> perl skript
>>>>>>>>>>>> then on the
>>>>>>>>>>>> ncbi blast homepage?
>>>>>>>>>>>> I am using blastp, the query is an amino-sequence
>> (different
>>>>>>>>>>>> results with any sequence, differences not only in
>> number of
>>>>>>>>>>>> hits but
>>>> even in e-
>>>>>>>>>>>> values, scores
>>>>>>>>>>>> etc...), the database is 'nr'.
>>>>>>>>>>>> PLEASE help me,
>>>>>>>>>>>> thank you in advance,
>>>>>>>>>>>> Jonas
>>>>>>>>>>>>
>>>>>>>>>>>> ps: my skript:
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ##
>>>>>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my
>>>>>>>>>>>> @blast_report; my $prog = 'blastp';
>>>>>>>>>>>> my $db   = 'nr';
>>>>>>>>>>>> my $e_val= '1e-10';
>>>>>>>>>>>> #my $e_val= '10';
>>>>>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>>>>>     '-data' => $db,
>>>>>>>>>>>>     '-expect' => $e_val,
>>>>>>>>>>>>     '-readmethod' => 'SearchIO' ); my $factory =
>>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
>> = '11 1';
>>>>>>>>>>>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} =
>> '10'; $ Bio
>>>>>>>>>>>>
>>>>>>>
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>>>>>> = '1';
>>>>>>>>>>>>
>>>>>>>>>>>> my
>>>>>>>>>>>> $
>>>>>>>>>>>> blast_seq
>>>>>>>>>>>>
>>>>>>>
>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>>>>>> AFRQAHQNTAMATGPD
>>>>>>>>>>>> PDDEYE';
>>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my
>>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq
>>>>>>>>>>>> =>$blast_seq,
>>>> -display_id =>
>>>>>>>>>>>> "$blast_seq");
>>>>>>>>>>>> my $filename='temp2.out';
>>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR
>>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids =
>>>>>>>>>>>> $factory->each_rid ) {
>>>>>>>>>>>>    foreach my $rid ( @rids )
>>>>>>>>>>>>    {
>>>>>>>>>>>>        my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>>>>        if( !ref($rc) )
>>>>>>>>>>>>        {
>>>>>>>>>>>>            if( $rc < 0 )
>>>>>>>>>>>>            {
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>            }
>>>>>>>>>>>>            print STDERR "." if ( $v > 0 );
>>>>>>>>>>>>        }
>>>>>>>>>>>>            else
>>>>>>>>>>>>            {
>>>>>>>>>>>>                my $result = $rc->next_result();
>>>>>>>>>>>>                $factory->save_output($filename);
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>                print "\nQuery Name: ",
>>>>>>> $result->query_name(),
>>>>>>>>>>>> "\n";
>>>>>>>>>>>>                while ( my $hit = $result->next_hit )
>>>>>>>>>>>>                {
>>>>>>>>>>>>                    next unless ( $v > 0);
>>>>>>>>>>>>                    print "\thit name is ",
>>>> $hit->name, "\n";
>>>>>>>>>>>>                    while( my $hsp = $hit->next_hsp )
>>>>>>>>>>>>                    {
>>>>>>>>>>>>                        print "\t\tscore is ",
>>>>>>> $hsp->score, "\n";
>>>>>>>>>>>>                    }
>>>>>>>>>>>>                }
>>>>>>>>>>>>            }
>>>>>>>>>>>>    }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>> @blast_report = get_file_data ($filename); return
>>>>>>>>>>>> @blast_report;
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ####
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l@...
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>> Attention: The information contained in this message and/or
>>>>>>>>>>> attachments from AgResearch Limited is intended only for the
>>>>>>> persons or entities
>>>>>>>>>>> to which it is addressed and may contain
>> confidential and/or
>>>>>>>>>>> privileged material. Any review, retransmission,
>> dissemination
>>>> or other use
>>>>>>>>>>> of, or
>>>>>>>>>>> taking of any action in reliance upon, this information
>>>>>>> by persons or
>>>>>>>>>>> entities other than the intended recipients is
>> prohibited by
>>>>>>>>>>> AgResearch Limited. If you have received this message in
>>>>>>>>>>> error,
>>>>>>> please notify
>>>>>>>>>>> the
>>>>>>>>>>> sender immediately.
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l@...
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason Stajich
>>>>>>>>>> jason@...
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l@...
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ----------------
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - www.avg.com
>>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>>>>>> Date: 07/05/09
>>>>>>>> 05:53:00
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>>>>>> Date: 07/05/09
>>>>>>> 17:54:00
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@...
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------
>>>> ------------------
>>>>
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
>>>> Date: 07/09/09
>>>> 05:55:00
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: cdd-search with remoteblast?

by Cook, Malcolm-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris,

I figured it was something like that.  Isolating the test in a file worked for me to decouple whatever cross-contamination was going on.  So, I'll just leave it at that for now unless any objections.  I suppose the correct follow on would be to file a bug....  I'm poised to go on vacation.... perhaps when I get back....

I don't use RemoteBlast myself at all in fact.... just wrote the rps thing to make sure I knew how the ncbi service worked...

Cheers,

Malcolm
________________________________________
From: Chris Fields [cjfields1@...]
Sent: Wednesday, July 22, 2009 6:32 PM
To: Cook, Malcolm
Cc: 'BioPerl List'; 'Jonas Schaer'
Subject: Re: [Bioperl-l] cdd-search with remoteblast?

Malcolm, it's probably not you.  Looks like the get/put parameters are
set as globals, so there may be cross-contamination of instances
(worth checking JIC).

You can probably work around that to an extent by encompassing any
calls in blocks to localize changes.

chris

On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote:

> Chris,
>
> I wound up adding a new test
>
> #       $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $
>
> with the comment :
>
> # malcolm_cook@...: this test is in a separate file from
> # RemoteBlast.t (on which it is modelled) since there is some sort of
> # side-effecting between the multiple remote blasts that is causing
> # this test to fail, if it comes last, or the other test to fail, if
> # this one comes first.  THIS IS A BUG EITHER IN REMOTE BLAST OR MY
> # UNDERSTANDING, i.e. of how to initialize it.
>
> In any case, the test passes and demos rpsblast usage.
>
> Cheers,
>
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields1@...]
>> Sent: Friday, July 10, 2009 1:05 PM
>> To: Cook, Malcolm
>> Cc: 'Jonas Schaer'; 'BioPerl List'
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Malcolm,
>>
>> Nice!  Go ahead and add the test in; we can look at trying to
>> get CDD_SEARCH working at some point but this is a nice workaround.
>>
>> chris
>>
>> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
>>
>>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
>>> the following.  Is it appropriate to submit it?
>>>
>>> Jonas, OK, I was a little quick on the gun... but I've got it now.
>>>
>>> You don't need to change the wrapper.  Here is what you need to do:
>>>
>>> # 1) set your database like this:
>>>
>>> -database => 'cdsearch/cdd', # c.f.
>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
>>> for other cdd database options
>>>
>>> # 2) add this line before submitting the job:
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
>>>
>>> You're in - No other changes needed.
>>>
>>> Malcolm Cook
>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jonas Schaer [mailto:Brotelzwieb@...]
>>>> Sent: Friday, July 10, 2009 4:18 AM
>>>> To: BioPerl List; Cook, Malcolm; Chris Fields
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>> Hi,
>>>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
>>>> my $db   =
>>>> 'CDD';)  but that didn't work.
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
>>>> expression t?blast[ pnx]. Rejecting.
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:/Perl/site/lib/Bio/Tools
>>>> /Run/RemoteBlast.pm:329
>>>> STACK: Bio::Tools::Run::RemoteBlast::new
>>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
>>>> ast.pm:257
>>>> STACK: blast_a_seq2.pm:14
>>>> -----------------------------------------------------------
>>>> So I should try to "change the wrapper to allow
>> 'rpsblast'", right?
>>>> Could You tell me how to do that, please? So sorry but I
>> have no idea
>>>> yet...:) If that doesn't work, is there any other way to run
>>>> cdd-searches with perl?
>>>> Thank you so much!
>>>> Regards, Jonas
>>>>
>>>> ----- Original Message -----
>>>> From: "Chris Fields" <cjfields1@...>
>>>> To: "Cook, Malcolm" <MEC@...>
>>>> Cc: "'Jonas Schaer'" <Brotelzwieb@...>; "'BioPerl List'"
>>>> <bioperl-l@...>; "'Smithies, Russell'"
>>>> <Russell.Smithies@...>; <cjfields@...>
>>>> Sent: Thursday, July 09, 2009 9:19 PM
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>>
>>>>> I've scheduled this tentatively for the 1.6 release
>> series (just not
>>>>> sure when yet).  It may work as is, but I haven't tried
>> it out yet
>>>>> (and am hazarding to guess it only retrieves the single
>> main RID at
>>>>> the moment).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>>>>>
>>>>>> Jonas,
>>>>>>
>>>>>> If you want to continue to use the bioperl remoteblast
>> interface,
>>>>>> probably what you should do is simply call it twice.
>>>>>>
>>>>>> Once, as you already know how to do, which will return
>> without CDD
>>>>>> results.
>>>>>>
>>>>>> Secondly, to get the CDD results, call remoteblast a second time.
>>>>>> This time, using
>>>>>> -database => 'CDD'
>>>>>> -program => 'rpsblast'
>>>>>>
>>>>>> However, the wrapper may object to the 'rpsblast'
>> program.  It is
>>>>>> not listed in the POD -
>>>>>>
>>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
>>>> emoteBlast.pm)
>>>>>>  If so, my guess is that changing the perl wrapper to allow
>>>>>> rpsblast will "just work" (tm).  I've cc:ed
>>>> cjfields@... for
>>>>>> his opinion on this.
>>>>>>
>>>>>> Also, you might want to perform the CDD search first,
>> especially if
>>>>>> you are streaming results to eyeball that might like
>> something to
>>>>>> look at while the second (presumably longer) search is running.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Malcolm Cook
>>>>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces@...
>>>>>>> [mailto:bioperl-l-bounces@...] On Behalf
>> Of Jonas
>>>>>>> Schaer
>>>>>>> Sent: Thursday, July 09, 2009 5:16 AM
>>>>>>> To: BioPerl List; Smithies, Russell
>>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>>>>
>>>>>>> Hi guys,
>>>>>>> Thank you all so much for your help and patience :). Of
>> course you
>>>>>>> were right and I finaly found the right put-parameter to get
>>>>>>> exactly the same hits as on the homepage.
>>>>>>> I do have an other question though :)...
>>>>>>> I now want to include a search for conserved domains,
>> but when I
>>>>>>> try to use the CDD_SEARCH-parameter
>>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>>>>>> sub:CDD_SEARCH)
>>>>>>> like the other put-parameters the way chris once told me(works
>>>>>>> fine with the other params):
>>>>>>>
>>>>>>> my %put = (
>>>>>>>  WORD_SIZE => 3,
>>>>>>>  HITLIST_SIZE => 100,
>>>>>>>  THRESHOLD => 11,
>>>>>>>  FILTER => 'R',
>>>>>>>  GENETIC_CODE => 1,
>>>>>>>  CDD_SEARCH => 'on'
>>>>>>> ###I tried it
>>>>>>> with 'true' and '1', too.
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>> for my $putName (keys %put) {
>>>>>>>  $factory->submit_parameter($putName,$put{$putName});
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ...an exception is thrown:
>>>>>>>
>>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>>>>>> STACK: Error::throw
>>>>>>> STACK: Bio::Root::Root::throw
>>>> C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>>>>> C:/Perl/site/lib/Bio/Tools
>>>>>>> /Run/RemoteBlast.pm:325
>>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>>>>>> STACK: main::blast_it firsteval0.8.pm:288
>>>>>>> STACK: firsteval0.8.pm:35
>>>>>>> ----------------------------------------------------------- .
>>>>>>> I guess somehow this could be the solution to my problem:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>>>>>> ub:RID-for-Simultaneous
>>>>>>> , but unfortunately I don't understand what to do.
>>>>>>> I'm so sorry to bother you with this but please help me once
>>>>>>> more...:)
>>>>>>>
>>>>>>> Best regards and thanks in advance, Jonas
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> From: "Smithies, Russell" <Russell.Smithies@...>
>>>>>>> To: "'Jonas Schaer'" <Brotelzwieb@...>
>>>>>>> Cc: "'Chris Fields'" <cjfields@...>; "'BioPerl List'"
>>>>>>> <bioperl-l@...>
>>>>>>> Sent: Monday, July 06, 2009 10:56 PM
>>>>>>> Subject: RE: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>
>>>>>>>
>>>>>>> Hi Jonas,
>>>>>>> You can't just play with the BLAST parameters and hope
>>>> for a "better"
>>>>>>> result.
>>>>>>> I'd suggest that if you aren't sure what they do, you
>> should leave
>>>>>>> them alone as small changes can make huge differences in the
>>>>>>> output - it's quite possible to miss finding what
>> you're looking
>>>>>>> for by using
>>>> the wrong
>>>>>>> parameters.
>>>>>>> If all else fails, read the blast manual:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>>>>>> _all.html
>>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>>>>>> Or Read Ian Korfs' excellent book:
>>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>>>>>
>>>>>>> Don't worry about the integer overflow bug as there's
>> nothing you
>>>>>>> can do about it. If you're interested, Google and Wikipedia are
>>>>>>> your
>>>>>>> friends:
>>>>>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>>>>>
>>>>>>>
>>>>>>> Russell
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces@... [mailto:bioperl-l-
>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>>>>>> To: BioPerl List; Chris Fields
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>> Hi guys, thanks for your answers so far.
>>>>>>>> @jason: integer overflow in blast.... sorry, but what do
>>>>>>> you mean by that?
>>>>>>>> how can I fix it...?
>>>>>>>>
>>>>>>>> Since I never really changed any parameters I thought them
>>>>>>> all to be
>>>>>>>> default.
>>>>>>>> whatever, I tried to get "better" results with my prog
>>>> by changing
>>>>>>>> these:
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>>
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>>>>>> STICS'} =
>>>>>>>> '1';
>>>>>>>> with no effect...I guess these were default values anyway.
>>>>>>>>
>>>>>>>> So please maybe you can tell me all the other parameters I
>>>>>>> can change with
>>>>>>>> my
>>>>>>>> perl-skript AND how to do that?
>>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>>>>>> much new to
>>>>>>>> me,
>>>>>>>> maybe thats why I just cannot find out how to do that on my
>>>>>>> own... :/
>>>>>>>>
>>>>>>>> Here is the output I get with my remote-blast skript:
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> Query Name:
>>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>>>>>> L
>>>>>>>>      hit name is ref|XP_001702807.1|
>>>>>>>>              score is 442
>>>>>>>> BLASTP 2.2.21+
>>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>>>>>> A. Schaffer,
>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>>>>>> Lipman (1997),
>>>>>>>> "Gapped
>>>>>>>> BLAST and PSI-BLAST: a new generation of protein
>> database search
>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>>>>
>>>>>>>>
>>>>>>>> Reference for composition-based statistics: Alejandro A.
>>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>>>>>> John L. Spouge,
>>>>>>>> Yuri
>>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>>>>>> "Improving the
>>>>>>>> accuracy of PSI-BLAST protein database searches with
>>>>>>> composition-based
>>>>>>>> statistics and other refinements", Nucleic Acids Res.
>>>> 29:2994-3005.
>>>>>>>>
>>>>>>>>
>>>>>>>> RID: 53STX5G2013
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
>>>> environmental samples
>>>>>>>> from WGS projects
>>>>>>>>         9,252,587 sequences; 3,169,972,781 total
>> letters Query=
>>>>>>>>
>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>>>>>
>>>>>>>
>>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA
>>>> M
>>>>>>>> ATGPDPDDEYE
>>>>>>>> Length=150
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>     Score
>>>>>>>> E
>>>>>>>> Sequences producing significant alignments:
>>>>>>>    (Bits)
>>>>>>>> Value
>>>>>>>>
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   174
>>>>>>>> 2e-42
>>>>>>>>
>>>>>>>>
>>>>>>>> ALIGNMENTS
>>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
>>>> reinhardtii]
>>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>>>>>> Length=303
>>>>>>>>
>>>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>>>>>> Composition-based
>>>>>>>> stats.
>>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>>>>>> Gaps = 0/150
>>>>>>>> (0%)
>>>>>>>>
>>>>>>>> Query  1
>>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>>>>>> 60
>>>>>>>>
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> Sbjct  154
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> 213
>>>>>>>>
>>>>>>>> Query  61
>>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 120
>>>>>>>>
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> Sbjct  214
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 273
>>>>>>>>
>>>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>>>>>          AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct  274
>>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF
>>>>>>>> excluding environmental samples from WGS projects
>>>>>>>>  Posted date:  Jul 5, 2009  4:41 AM Number of letters in
>>>>>>>> database: -1,124,994,511 Number of sequences in database:
>>>>>>>> 9,252,587
>>>>>>>>
>>>>>>>> Lambda     K      H
>>>>>>>> 0.309    0.122    0.345
>>>>>>>> Gapped
>>>>>>>> Lambda     K      H
>>>>>>>> 0.267   0.0410    0.140
>>>>>>>> Matrix: BLOSUM62
>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>> Sequences:
>>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions:
>>>>>>>> 1448367 Number of successful extensions: 2103 Number
>> of sequences
>>>>>>>> better than 10: 0 Number of HSP's better than 10
>> without gapping:
>>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully
>>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781
>>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective
>>>>>>>> length of database: 2124430450 Effective search space:
>>>>>>>> 78603926650 Effective search space used: 78603926650
>>>>>>>> T: 11
>>>>>>>> A: 40
>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>> S2: 74 (33.1 bits)
>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> and here are the hits (?) of the blast-algorithm on the
>>>>>>> ncbi-homepage with
>>>>>>>> the same query of course:
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   300
>>>>>>>> 3e-80
>>>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>>>>>> [Acyrtho...  36.2
>>>>>>>> 1.1
>>>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>>>>>> [Blautia...  35.4
>>>>>>>> 1.8
>>>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>>>>>> brazil...  34.3
>>>>>>>> 4.2
>>>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>>>>>> [Aspergillus n...  33.5
>>>>>>>> 6.0
>>>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>>>>>> [Methyloba...  33.5
>>>>>>>> 7.0
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################at
>>>>>>>> least the first hit is the same, but even there there is a
>>>>>>> different score
>>>>>>>> and e-value.
>>>>>>>>
>>>>>>>> thanks so much for any help :)
>>>>>>>> regards, jonas
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Chris Fields" <cjfields@...>
>>>>>>>> To: "Jason Stajich" <jason@...>
>>>>>>>> Cc: "Smithies, Russell"
>>>>>>> <Russell.Smithies@...>; "'BioPerl
>>>>>>>> List'" <bioperl-l@...>; "'Jonas Schaer'"
>>>>>>>> <Jonas_Schaer@...>
>>>>>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>>
>>>>>>>>> That inspires confidence ;>
>>>>>>>>>
>>>>>>>>> chris
>>>>>>>>>
>>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>>>>>
>>>>>>>>>> integer overflow in blast....
>>>>>>>>>>
>>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>>>>>
>>>>>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>>>>>> Interesting that both have the number of letters in
>> the db as
>>>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>>>>>
>>>>>>>>>>> Stats from your remote_blast:
>>>>>>>>>>>
>>>>>>>>>>> 'stats' => {
>>>>>>>>>>>         'S1' => '42',
>>>>>>>>>>>         'S1_bits' => '20.8',
>>>>>>>>>>>         'lambda' => '0.309',
>>>>>>>>>>>         'entropy' => '0.345',
>>>>>>>>>>>         'kappa_gapped' => '0.0410',
>>>>>>>>>>>         'T' => '11',
>>>>>>>>>>>         'kappa' => '0.122',
>>>>>>>>>>>         'X3_bits' => '24.7',
>>>>>>>>>>>         'X1' => '16',
>>>>>>>>>>>         'lambda_gapped' => '0.267',
>>>>>>>>>>>         'X2' => '38',
>>>>>>>>>>>         'S2' => '74',
>>>>>>>>>>>         'seqs_better_than_cutoff' => '0',
>>>>>>>>>>>         'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>>>>>         'Hits_to_DB' => '60102303',
>>>>>>>>>>>         'dbletters' => '-1125070205',
>>>>>>>>>>>         'A' => '40',
>>>>>>>>>>>         'num_successful_extensions' => '2004',
>>>>>>>>>>>         'num_extensions' => '1436892',
>>>>>>>>>>>         'X1_bits' => '7.1',
>>>>>>>>>>>         'X3' => '64',
>>>>>>>>>>>         'entropy_gapped' => '0.140',
>>>>>>>>>>>         'dbentries' => '9252258',
>>>>>>>>>>>         'X2_bits' => '14.6',
>>>>>>>>>>>         'S2_bits' => '33.1'
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>>>>>
>>>>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>> translations+PDB+SwissProt
>>>>>>>>>>> +PIR+PRF
>>>>>>>>>>> excluding environmental samples from WGS projects
>> Posted date:
>>>>>>>>>>> Jul 4, 2009  4:41 AM Number of letters in database:
>>>>>>>>>>> -1,125,070,205 Number of sequences in database:  9,252,258
>>>>>>>>>>>
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.309    0.124    0.340
>>>>>>>>>>> Gapped
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.267   0.0410    0.140
>>>>>>>>>>> Matrix: BLOSUM62
>>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of
>>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001
>>>>>>>>>>> Number of sequences better than 100: 65 Number of
>> HSP's better
>>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000
>>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of
>> query: 150
>>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113
>>>>>>>>>>> Effective length of query: 37 Effective length of database:
>>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective
>>>>>>>>>>> search space used: 78602501521
>>>>>>>>>>> T: 11
>>>>>>>>>>> A: 40
>>>>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>>>>> S2: 65 (29.6 bits)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces@...
>> [mailto:bioperl-l-
>>>>>>>>>>>> bounces@...] On Behalf Of Jonas Schaer
>>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>>>>>> To: BioPerl List
>>>>>>>>>>>> Subject: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>>>>>
>>>>>>>>>>>> Hi again :)
>>>>>>>>>>>> please, I only have this little question:
>>>>>>>>>>>> why do I get different results with my remote::blast
>>>>>>> perl skript
>>>>>>>>>>>> then on the
>>>>>>>>>>>> ncbi blast homepage?
>>>>>>>>>>>> I am using blastp, the query is an amino-sequence
>> (different
>>>>>>>>>>>> results with any sequence, differences not only in
>> number of
>>>>>>>>>>>> hits but
>>>> even in e-
>>>>>>>>>>>> values, scores
>>>>>>>>>>>> etc...), the database is 'nr'.
>>>>>>>>>>>> PLEASE help me,
>>>>>>>>>>>> thank you in advance,
>>>>>>>>>>>> Jonas
>>>>>>>>>>>>
>>>>>>>>>>>> ps: my skript:
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ##
>>>>>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my
>>>>>>>>>>>> @blast_report; my $prog = 'blastp';
>>>>>>>>>>>> my $db   = 'nr';
>>>>>>>>>>>> my $e_val= '1e-10';
>>>>>>>>>>>> #my $e_val= '10';
>>>>>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>>>>>     '-data' => $db,
>>>>>>>>>>>>     '-expect' => $e_val,
>>>>>>>>>>>>     '-readmethod' => 'SearchIO' ); my $factory =
>>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
>> = '11 1';
>>>>>>>>>>>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} =
>> '10'; $ Bio
>>>>>>>>>>>>
>>>>>>>
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>>>>>> = '1';
>>>>>>>>>>>>
>>>>>>>>>>>> my
>>>>>>>>>>>> $
>>>>>>>>>>>> blast_seq
>>>>>>>>>>>>
>>>>>>>
>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>>>>>> AFRQAHQNTAMATGPD
>>>>>>>>>>>> PDDEYE';
>>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my
>>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq
>>>>>>>>>>>> =>$blast_seq,
>>>> -display_id =>
>>>>>>>>>>>> "$blast_seq");
>>>>>>>>>>>> my $filename='temp2.out';
>>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR
>>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids =
>>>>>>>>>>>> $factory->each_rid ) {
>>>>>>>>>>>>    foreach my $rid ( @rids )
>>>>>>>>>>>>    {
>>>>>>>>>>>>        my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>>>>        if( !ref($rc) )
>>>>>>>>>>>>        {
>>>>>>>>>>>>            if( $rc < 0 )
>>>>>>>>>>>>            {
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>            }
>>>>>>>>>>>>            print STDERR "." if ( $v > 0 );
>>>>>>>>>>>>        }
>>>>>>>>>>>>            else
>>>>>>>>>>>>            {
>>>>>>>>>>>>                my $result = $rc->next_result();
>>>>>>>>>>>>                $factory->save_output($filename);
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>                print "\nQuery Name: ",
>>>>>>> $result->query_name(),
>>>>>>>>>>>> "\n";
>>>>>>>>>>>>                while ( my $hit = $result->next_hit )
>>>>>>>>>>>>                {
>>>>>>>>>>>>                    next unless ( $v > 0);
>>>>>>>>>>>>                    print "\thit name is ",
>>>> $hit->name, "\n";
>>>>>>>>>>>>                    while( my $hsp = $hit->next_hsp )
>>>>>>>>>>>>                    {
>>>>>>>>>>>>                        print "\t\tscore is ",
>>>>>>> $hsp->score, "\n";
>>>>>>>>>>>>                    }
>>>>>>>>>>>>                }
>>>>>>>>>>>>            }
>>>>>>>>>>>>    }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>> @blast_report = get_file_data ($filename); return
>>>>>>>>>>>> @blast_report;
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ####
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l@...
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>> Attention: The information contained in this message and/or
>>>>>>>>>>> attachments from AgResearch Limited is intended only for the
>>>>>>> persons or entities
>>>>>>>>>>> to which it is addressed and may contain
>> confidential and/or
>>>>>>>>>>> privileged material. Any review, retransmission,
>> dissemination
>>>> or other use
>>>>>>>>>>> of, or
>>>>>>>>>>> taking of any action in reliance upon, this information
>>>>>>> by persons or
>>>>>>>>>>> entities other than the intended recipients is
>> prohibited by
>>>>>>>>>>> AgResearch Limited. If you have received this message in
>>>>>>>>>>> error,
>>>>>>> please notify
>>>>>>>>>>> the
>>>>>>>>>>> sender immediately.
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l@...
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason Stajich
>>>>>>>>>> jason@...
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l@...
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ----------------
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - www.avg.com
>>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>>>>>> Date: 07/05/09
>>>>>>>> 05:53:00
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>>>>>> Date: 07/05/09
>>>>>>> 17:54:00
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@...
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------
>>>> ------------------
>>>>
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
>>>> Date: 07/09/09
>>>> 05:55:00
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l