hacking BlastTable.pm to support blast -m 8

View: New views
5 Messages — Rating Filter:   Alert me  

hacking BlastTable.pm to support blast -m 8

by Tristan Lefebure-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,
As far as I understand, Bio::Index::BlastTable only supports
the -m 9 blast format. Another popular and more compact
format is -m 8, the main difference being that the blast
program, the query and database, and the different field
names are not reported between each search, i.e. you get
a much cleaner table (which looks much easier to parse).

By looking at BlastTable.pm, it looks like the main
hack would be in the sub _index_file. Right now it is:

sub _index_file {
        my( $self,
                 $file, # File name
                 $i,    # Index-number of file being indexed
          ) = @_;

        my( $begin,  # Offset from start of file of the start
                          # of the last found record.
          );

        open(my $BLAST, '<', $file) or $self->throw("cannot open file $file\n");
        my $indexpoint = 0;
        my $lastline = 0;
        while( <$BLAST> ) {
                if(m{^#\s+T?BLAST[PNX]} ) {
            my $len = length $_;
            $indexpoint = tell($BLAST)-$len;
                }
        if(m{^#\s+Query:\s+([^\n]+)}) {
            foreach my $id ($self->id_parser()->($1)) {
                                $self->debug("id is $id, begin is $indexpoint\n");
                                $self->add_record($id, $i, $indexpoint);
                        }
        }
        }
}

Using the -m 8 format, is it me or this could be
done by getting the query name from the first row
of the blast table, find when the hits for this query
starts and stop, and give this to add_record()?

I'm kind of not sure to get all the details
regarding the $i and $indexpoint... so well, if an
expert eye could give me some advice or hack the code
that would be nice ;)

--Tristan


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: hacking BlastTable.pm to support blast -m 8

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Oct 27, 2009, at 2:31 PM, Tristan Lefebure wrote:

> Hello,
> As far as I understand, Bio::Index::BlastTable only supports
> the -m 9 blast format. Another popular and more compact
> format is -m 8, the main difference being that the blast
> program, the query and database, and the different field
> names are not reported between each search, i.e. you get
> a much cleaner table (which looks much easier to parse).
>
> By looking at BlastTable.pm, it looks like the main
> hack would be in the sub _index_file. Right now it is:
>
> sub _index_file {
>        my( $self,
>                 $file, # File name
>                 $i,    # Index-number of file being indexed
>          ) = @_;
>
>        my( $begin,  # Offset from start of file of the start
>                          # of the last found record.
>          );
>
>        open(my $BLAST, '<', $file) or $self->throw("cannot open file  
> $file\n");
>        my $indexpoint = 0;
>        my $lastline = 0;
>        while( <$BLAST> ) {
>                if(m{^#\s+T?BLAST[PNX]} ) {
>            my $len = length $_;
>            $indexpoint = tell($BLAST)-$len;
>                }
>        if(m{^#\s+Query:\s+([^\n]+)}) {
>            foreach my $id ($self->id_parser()->($1)) {
>                                $self->debug("id is $id, begin is  
> $indexpoint\n");
>                                $self->add_record($id, $i,  
> $indexpoint);
>                        }
>        }
>        }
> }
>
> Using the -m 8 format, is it me or this could be
> done by getting the query name from the first row
> of the blast table, find when the hits for this query
> starts and stop, and give this to add_record()?
>
> I'm kind of not sure to get all the details
> regarding the $i and $indexpoint... so well, if an
> expert eye could give me some advice or hack the code
> that would be nice ;)
>
> --Tristan

That should be feasible, yes, and you are correct.  The main thing to  
make sure of is to retain the '#' for -m9, so the parser catches the  
BLAST executable and other info.

I'll go ahead and do this based on your suggestion, unless you have a  
patch ready.  Also, it looks like the module is missing tests, so I  
can work on adding those for both -m8 and -m9 output.

chris

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: hacking BlastTable.pm to support blast -m 8

by Tristan Lefebure-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 27 October 2009 15:50:24 Chris Fields wrote:
> I'll go ahead and do this based on your suggestion,
>  unless you have a   patch ready.  Also, it looks like
>  the module is missing tests, so I can work on adding
>  those for both -m8 and -m9 output.
>

Great! I did nothing, you can go ahead. I bet you can do
this 100x faster than me...
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: hacking BlastTable.pm to support blast -m 8

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 27, 2009, at 2:59 PM, Tristan Lefebure wrote:

> On Tuesday 27 October 2009 15:50:24 Chris Fields wrote:
>> I'll go ahead and do this based on your suggestion,
>> unless you have a   patch ready.  Also, it looks like
>> the module is missing tests, so I can work on adding
>> those for both -m8 and -m9 output.
>>
>
> Great! I did nothing, you can go ahead. I bet you can do
> this 100x faster than me...

Committed to svn in r16301, along with some tests.  Let me know if  
this doesn't work.

chris
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: hacking BlastTable.pm to support blast -m 8

by Tristan Lefebure-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Works perfectly, thanks Chris.

On Tuesday 27 October 2009 19:39:26 Chris Fields wrote:

> On Oct 27, 2009, at 2:59 PM, Tristan Lefebure wrote:
> > On Tuesday 27 October 2009 15:50:24 Chris Fields wrote:
> >> I'll go ahead and do this based on your suggestion,
> >> unless you have a   patch ready.  Also, it looks like
> >> the module is missing tests, so I can work on adding
> >> those for both -m8 and -m9 output.
> >
> > Great! I did nothing, you can go ahead. I bet you can
> > do this 100x faster than me...
>
> Committed to svn in r16301, along with some tests.  Let
>  me know if this doesn't work.
>
> chris
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l