« Return to Thread: Finding possible primers regex

Finding possible primers regex

by Benbo :: Rate this Message:

Reply to Author | View in Thread

Hi there,
I'm trying to write a perl script to scan an aligned multiple entry fasta file and find possible primers. So far I've produced a string which contains bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that doesn't work as it has essentially chunked it into 21 char blocks, rather than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one would go about it with regex. Sadly I'm fairly new to perl and getting to grips with BioPerl, so please treat me gently :).

Many thanks,

Ben


 « Return to Thread: Finding possible primers regex