updating a reciprocal blast file

View: New views
3 Messages — Rating Filter:   Alert me  

updating a reciprocal blast file

by JustinV :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a large reciprocal blast file that contains 3 proteomes.  I'd like to integrate another proteome for downstream clustering.    I imagine a command-line script that takes as input the new proteome in fasta format, the directory of the the old proteomes in fasta format, and the pre-existing reciprocal blast file, and then performs the proper blasts and updates the pre-existing reciprocal blast file accordingly.  I am using blast locally and the downstream processing is done by OrthoMCL.  I assume this has been handled before, but I can't track down the code.  If anyone is familiar with a pre-exisiting script or has pertinent advice, I'd be much obliged.

Justin

Re: updating a reciprocal blast file

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Are you keeping each pairwise in a separate file and then combining  
it all?
  http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl

Are you fixing E-values so they are scaled across different sized  
databases? You will probably want to add a Z= parameter to insure  
values are useable.

I also had to hack ORTHOMCL locally to cache things in DB_Files as it  
was too memory intensive the way it runs on my big datasets.

-jason
On Jun 5, 2008, at 8:53 AM, JustinV wrote:

>
> I have a large reciprocal blast file that contains 3 proteomes.  
> I'd like to
> integrate another proteome for downstream clustering.    I imagine a
> command-line script that takes as input the new proteome in fasta  
> format,
> the directory of the the old proteomes in fasta format, and the pre-
> existing
> reciprocal blast file, and then performs the proper blasts and  
> updates the
> pre-existing reciprocal blast file accordingly.  I am using blast  
> locally
> and the downstream processing is done by OrthoMCL.  I assume this  
> has been
> handled before, but I can't track down the code.  If anyone is  
> familiar with
> a pre-exisiting script or has pertinent advice, I'd be much obliged.
>
> Justin
> --
> View this message in context: http://www.nabble.com/updating-a- 
> reciprocal-blast-file-tp17673277p17673277.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: updating a reciprocal blast file

by JustinV :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason,

Thanks for the suggestions.

The blast report is a single file containing each query in the 3 proteomes against a database of the 3 proteomes.  As you probably remember, OrthoMCL offers many modes of running that head-off various levels of processing.  In any case, the reciprocal blast is the bottle-neck.  Mode -3 forgoes this step, but it requires a single, exhaustive blast report.  Perhaps, I could run the reciprocal blasts separately as you suggested and then integrate them with your pairwise_blast_jobs_big.pl.  I have to say, this code looks a little spooky.  Wouldn't it be possible to just resort (by normalized e-value, discussed below) and reprint each set of query hits based on a hash or index of the results of that query against the new proteome?  And then tack on the results of new proteome against the updated database to the end of the total blast report.

In terms of normalizing the e-value, since I am using a consistent scoring matrix, can't I just recalculate the scores based on new database size:

(new e-value) = (new database size) * (old e-value) / (old database size)

as in http://www.springerlink.com/content/55m318wwqdgtw85h/ (methods) and elsewhere.

As of yet, I've been satisfied with the run time downstream of the reciprocal blast, but, as I've said, I'm currently only using three plant (dicot) proteomes.  By "hack ORTHOMCL locally to cache things in DB_Files" do you mean serializing the blastSeq objects from early blasts in the blast_parse subroutine using bioperl-db or something?  Maybe this is dumb assumption.  In any case, I'd be curious to see your modified version of the OrthoMCL script.

Justin

Jason Stajich-3 wrote:
Are you keeping each pairwise in a separate file and then combining  
it all?
  http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl

Are you fixing E-values so they are scaled across different sized  
databases? You will probably want to add a Z= parameter to insure  
values are useable.

I also had to hack ORTHOMCL locally to cache things in DB_Files as it  
was too memory intensive the way it runs on my big datasets.

-jason
On Jun 5, 2008, at 8:53 AM, JustinV wrote:

>
> I have a large reciprocal blast file that contains 3 proteomes.  
> I'd like to
> integrate another proteome for downstream clustering.    I imagine a
> command-line script that takes as input the new proteome in fasta  
> format,
> the directory of the the old proteomes in fasta format, and the pre-
> existing
> reciprocal blast file, and then performs the proper blasts and  
> updates the
> pre-existing reciprocal blast file accordingly.  I am using blast  
> locally
> and the downstream processing is done by OrthoMCL.  I assume this  
> has been
> handled before, but I can't track down the code.  If anyone is  
> familiar with
> a pre-exisiting script or has pertinent advice, I'd be much obliged.
>
> Justin
> --
> View this message in context: http://www.nabble.com/updating-a- 
> reciprocal-blast-file-tp17673277p17673277.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l