Other than that I myself don't have a strong preference either way. I
> When I can I could try generating a method which accepts a regex/
> Bio::Tools::SeqPattern and returns an AlignIO stream or array of
> SimpleAlign instances (the former could be attached to a temp file
> for iteration). Any preference?
>
> chris
>
> ---- Original message ----
>> Date: Sat, 9 Aug 2008 12:07:30 -0400
>> From: Hilmar Lapp <
hlapp@...>
>> Subject: Re: [Bioperl-l] Finding possible primers regex
>> To: Chris Fields <
cjfields@...>
>> Cc: Benbo <
btemperton@...>,
Bioperl-l@...
>>
>> This looks like a neat trick. Do you think it's worth including as a
>> SimpleAlign method (obviously w/o the printing to STDOUT)? I can
>> imagine that a lot of people might appreciate it.
>>
>> -hilmar
>>
>> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>>
>>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>>
>>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple entry
>>>> fasta
>>>> file and find possible primers. So far I've produced a string which
>>>> contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char blocks,
>>>> rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>
>>> There is a trick to this which is discussed more extensively in
>>> 'Mastering Regular Expressions'. Essentially you have to embed code
>>> into the regex and trick the parser into backtracking using a
>>> negative lookahead. The match itself fails (i.e. no match is
>>> returned), but the embedded code is executed for each match attempt,
>>>
>>> The following script is a slight modification of one I used which
>>> checks the consensus string from the input alignment (in aligned
>>> FASTA format here), extracts the alignment slice using that match,
>>> then spit the alignment out to STDOUT in clustalw format. This
>>> should work for perl 5.8 and up, but it's only been tested on perl
>>> 5.10. You should be able to use this to fit what you want.
>>>
>>> my $in = Bio::AlignIO->new(-file => $file,
>>> -format => 'fasta');
>>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>> -format => 'clustalw');
>>>
>>> while (my $aln = $in->next_aln) {
>>> my $c = $aln->consensus_string(100);
>>> my @matches;
>>> $c =~ m/
>>> ([GTAC?]{18,21})
>>> (?{my $match = check_match($1);
>>> push @matches, [$match,
>>> pos(),
>>> length($match)]
>>> if defined $match;})
>>> (?!)
>>> /xig;
>>> for my $match (@matches) {
>>> my ($hit, $st, $end) = ($match->[0],
>>> $match->[1] - $match->[2] + 1,
>>> $match->[1]);
>>> my $newaln = $aln->slice($st, $end);
>>> $out->write_aln($newaln);
>>> }
>>> }
>>>
>>> sub check_match {
>>> my $match = shift;
>>> return unless $match;
>>> my $ct = $match =~ tr/?/?/;
>>> return $match if $ct <= 4;
>>> }
>>>
>>>
>>> chris
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>>
Bioperl-l@...
>>>
http://lists.open-bio.org/mailman/listinfo/bioperl-l>>
>> --
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>