|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Help in Programming a substitution routineHi there,
i need a little help in programming. I am still stucking in a problem and have no idea how to get out a solution. I have a fasta alignment file (with two aligned sequences) and I want to optimize the genetic code for a better alignment score. Everything is fine up to the line where I'd like to compare the codons of each sequence. If both codons are equal i push them into a new array, if not I'd like to lookup all possible codons for the respective amino acid and compare all codons from one amino acid of one site with the corresponding on the other: lets say in sequence 1 is a "A" (GCC, GCA, GCG,GCT) and in sequence 2 is a "E" (GAA,GAG). So the best hit should be the combination GCA and GAA or GCG and GAG. But I don't have clue how to get this working - as you might see in the code... I've probably made a lot of mistakes. But I would feel better if someone could suggest some code... #the code starts here use Bio::SeqIO; use Bio::AlignIO; my ($i, $j, $seq1, $seq2, $seqName1, $seqName2); #the complete hash is not shown for the sake of clarity my(%cod2aa) = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine ... ... ); my %aa2cod = reverse %cod2aa; my $input = Bio::AlignIO->new(-file => 'align1.fas' , '-format' => 'fasta'); while (my $aln = $input->next_aln()) { push @nuc_seqs, $aln; $seq1 = $aln->get_seq_by_pos(1)->seq(); $seq2 = $aln->get_seq_by_pos(2)->seq(); } #create an array for the sequences push @sequences, $seq1; push @sequences, $seq2; #separate whole sequence into single triplets (subroutine is not shown) foreach (@sequences) { push @tmpArray, CreateTripletArray($_); } #count number of triplets in the array $count = @tmpArray; #compare pairwise and substitute if possible (assuming the length of both genes is equal) foreach $i(0 .. (($count/2)+1)) { $value1=@tmpArray[$i]; $value2=@tmpArray[($count/2)+$i]; if ($value1 eq $value2) { #push equal triplets in new arrays push @newArray1, $value1; push @newArray2, $value2; } else { #split codons to compare each base in a triplet @zwvalue1 = split(//, $value1); @zwvalue2 = split(//, $value2); #I want to get back all codons for one amino acid $aminoacid1 = $cod2aa{$value1}; $aminoacid2 = $cod2aa{$value2}; push @codon1, $aa2cod{$aminoacid1}; push @codon2, $aa2cod{$aminoacid2}; #then something like go through all bases of both codons a every position to find the best fitting partners if (exists $cod2aa{$value1} and $cod2aa{$value2} ) { if (@zwvalue1[0] eq @zwvalue2[0]){ push @newArray1, @zwvalue1[0]; push @newArray2, @zwvalue2[0]; } } } } |
|
|
Re: Help in Programming a substitution routineMarkus
> #the complete hash is not shown for the sake of clarity > my(%cod2aa) = ( > 'TCA' => 'S', # Serine > 'TCC' => 'S', # Serine > 'TCG' => 'S', # Serine > 'TCT' => 'S', # Serine > 'TTC' => 'F', # Phenylalanine > 'TTT' => 'F', # Phenylalanine > 'TTA' => 'L', # Leucine > ); > my %aa2cod = reverse %cod2aa; Your hash %cod2aa is many-to-one - it has many keys (codons) which map to the same value (amino acid). When you reverse this hash, it will NOT become one-to-many; it will become one-to-one. $aa2cod{'S'} will only return ONE value (which one is random and depends on how Perl orders them). Is this what you intended? -- --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |