Help in Programming a substitution routine

View: New views
2 Messages — Rating Filter:   Alert me  

Help in Programming a substitution routine

by manni122 :: Rate this Message:

| View Threaded | Show Only this Message

Hi there,
i need a little help in programming. I am still stucking in a problem and have no idea how to get out a solution. I have a fasta alignment file (with two aligned sequences) and I want to optimize the genetic code for a better alignment score. Everything is fine up to the line where I'd like to compare the codons of each sequence.
If both codons are equal i push them into a new array, if not I'd like to lookup all possible codons for the respective amino acid and compare all codons from one amino acid of one site with the corresponding on the other: lets say in sequence 1 is a "A" (GCC, GCA, GCG,GCT) and in sequence 2 is a "E" (GAA,GAG). So the best hit should be the combination GCA and GAA or GCG and GAG. But I don't have clue how to get this working - as you might see in the code...
I've probably made a lot of mistakes. But I would feel better if someone could suggest some code...

#the code starts here
use Bio::SeqIO;
use Bio::AlignIO;

my ($i, $j, $seq1, $seq2, $seqName1, $seqName2);

#the complete hash is not shown for the sake of clarity
my(%cod2aa) = (
   
    'TCA' => 'S',    # Serine
    'TCC' => 'S',    # Serine
    'TCG' => 'S',    # Serine
    'TCT' => 'S',    # Serine
    'TTC' => 'F',    # Phenylalanine
    'TTT' => 'F',    # Phenylalanine
    'TTA' => 'L',    # Leucine
   ...
   ...
);

my %aa2cod = reverse %cod2aa;

my $input  = Bio::AlignIO->new(-file => 'align1.fas' , '-format' => 'fasta');

  while (my $aln = $input->next_aln()) {
  push @nuc_seqs, $aln;
  $seq1 = $aln->get_seq_by_pos(1)->seq();
  $seq2 = $aln->get_seq_by_pos(2)->seq();
}

#create an array for the sequences
push @sequences, $seq1;
push @sequences, $seq2;

#separate whole sequence into single triplets (subroutine is not shown)
foreach (@sequences) {
push @tmpArray, CreateTripletArray($_);
}

#count number of triplets in the array
$count = @tmpArray;

#compare pairwise and substitute if possible (assuming the length of both genes is equal)
foreach $i(0 .. (($count/2)+1)) {

$value1=@tmpArray[$i];
$value2=@tmpArray[($count/2)+$i];
 
  if ($value1 eq $value2) {
        #push equal triplets in new arrays
                push @newArray1, $value1;
                push @newArray2, $value2;
        }
       
   
  else {
        #split codons to compare each base in a triplet
                @zwvalue1 = split(//, $value1);
               @zwvalue2 = split(//, $value2);
                 
                #I want to get back all codons for one amino acid
               
                $aminoacid1 = $cod2aa{$value1};
                $aminoacid2 = $cod2aa{$value2};
               
                push @codon1, $aa2cod{$aminoacid1};
                push @codon2, $aa2cod{$aminoacid2};
               
                #then something like go through all bases of both codons a every position to find the best fitting partners
                if (exists $cod2aa{$value1} and $cod2aa{$value2} ) {
               
                        if (@zwvalue1[0] eq @zwvalue2[0]){
                       
                        push @newArray1, @zwvalue1[0];
                        push @newArray2, @zwvalue2[0];
                        }
                       
                        }
        }
}

Re: Help in Programming a substitution routine

by Torsten Seemann :: Rate this Message:

| View Threaded | Show Only this Message

Markus

> #the complete hash is not shown for the sake of clarity
> my(%cod2aa) = (
>    'TCA' => 'S',    # Serine
>    'TCC' => 'S',    # Serine
>    'TCG' => 'S',    # Serine
>    'TCT' => 'S',    # Serine
>    'TTC' => 'F',    # Phenylalanine
>    'TTT' => 'F',    # Phenylalanine
>    'TTA' => 'L',    # Leucine
> );
> my %aa2cod = reverse %cod2aa;

Your hash %cod2aa is many-to-one - it has many keys (codons) which map
to the same value (amino acid).
When you reverse this hash, it will NOT become one-to-many; it will
become one-to-one.
$aa2cod{'S'} will only return ONE value (which one is random and
depends on how Perl orders them).

Is this what you intended?

--
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
University, AUSTRALIA

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l