Classifying SNPs

View: New views
13 Messages — Rating Filter:   Alert me  

Classifying SNPs

by Abhishek Pratap :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All

This might seem to be an old track question. However I was not able to
find a good answer in the many diff mailing list archives.

For all our SNP predictions we would like to know whether they are
synonymous / non-synonymous. If Non-synonymous/Exonic  then find the
position on the gene where amino acid is getting changed and to what
 ...Also info about indels will help.

I am not sure if something like this already exists. If not even some
pointers on how to move forward will help.

Thanks,
-Abhi

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Abhishek-
You might root around in Bio::PopGen. Here's a script to get stuff from
raw fasta data--see comments within.
cheers
Mark

use Bio::AlignIO;
use Bio::PopGen::Utilities;

$file = "your_raw_file.fas";


my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
# get the alignment into a Bio::PopGen::Population format, with codons
# as the marker sites
my $pop =
Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, -site_model=>'cod');
# here are your variable codons...
my @cdnpos = $pop->get_marker_names;
# here are your individuals represented in the alignment
my @inds = $pop->get_Individuals;
# which have names like "Codon-3-9", "Codon-4-12", etc
foreach my $cdn (@cdnpos) {
    # calculate the unique codons represented at this codon position
    my (%ucdns, @ucdns);
    @genos = $pop->get_Genotypes(-marker=>$cdn);
    $ucdns{$_->get_Alleles}++ for @genos;
    @ucdns = sort keys %ucdns;
    #
    # here, use translate or something faster to identify syn/non-syn
    # check out code in Bio::Align::DNAStatistics for various methods

}
# relate back to individuals with this
foreach my $ind (@inds) {
    print "Individual ".$ind->unique_id."\n";
    print "Site\tAllele\n";
    foreach my $cdn (@cdnpos) {
 print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
    }
}


1;

----- Original Message -----
From: "Abhishek Pratap" <abhishek.vit@...>
To: <bioperl-l@...>
Sent: Wednesday, July 08, 2009 10:24 AM
Subject: [Bioperl-l] Classifying SNPs


Hi All

This might seem to be an old track question. However I was not able to
find a good answer in the many diff mailing list archives.

For all our SNP predictions we would like to know whether they are
synonymous / non-synonymous. If Non-synonymous/Exonic then find the
position on the gene where amino acid is getting changed and to what
...Also info about indels will help.

I am not sure if something like this already exists. If not even some
pointers on how to move forward will help.

Thanks,
-Abhi

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Abhishek Pratap :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Mark
Sorry I was not able to reply earlier. Many Thanks for your detailed
explanation. However this is not exactly what I am looking for. May be my
initial mail was not well articulated or I am not able to infer your reply
fully. My bad.

Well as an input what we have is the just the genomic coordinates for SNP's
predicted by Illumina propriety software CASAVA. What we would like to do is
to further classify these predicted SNP's  . If they fall into Coding region
then whether they are synonymous/non-syn SNPs.

So I guess something which translates
1. SNP genomic coordinate into mRNA offset
2. Then identify the ORF and target codon and check whether the SNP
substitution will be syn/non-syn.

Thanks,
-Abhi

On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> wrote:

> Hey Abhishek-
> You might root around in Bio::PopGen. Here's a script to get stuff from
> raw fasta data--see comments within.
> cheers
> Mark
>
> use Bio::AlignIO;
> use Bio::PopGen::Utilities;
>
> $file = "your_raw_file.fas";
>
>
> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
> # get the alignment into a Bio::PopGen::Population format, with codons
> # as the marker sites
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
> -site_model=>'cod');
> # here are your variable codons...
> my @cdnpos = $pop->get_marker_names;
> # here are your individuals represented in the alignment
> my @inds = $pop->get_Individuals;
> # which have names like "Codon-3-9", "Codon-4-12", etc
> foreach my $cdn (@cdnpos) {
>   # calculate the unique codons represented at this codon position
>   my (%ucdns, @ucdns);
>   @genos = $pop->get_Genotypes(-marker=>$cdn);
>   $ucdns{$_->get_Alleles}++ for @genos;
>   @ucdns = sort keys %ucdns;
>   #
>   # here, use translate or something faster to identify syn/non-syn
>   # check out code in Bio::Align::DNAStatistics for various methods
>
> }
> # relate back to individuals with this
> foreach my $ind (@inds) {
>   print "Individual ".$ind->unique_id."\n";
>   print "Site\tAllele\n";
>   foreach my $cdn (@cdnpos) {
> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>   }
> }
>
>
> 1;
>
> ----- Original Message ----- From: "Abhishek Pratap" <
> abhishek.vit@...>
> To: <bioperl-l@...>
> Sent: Wednesday, July 08, 2009 10:24 AM
> Subject: [Bioperl-l] Classifying SNPs
>
>
>
> Hi All
>
> This might seem to be an old track question. However I was not able to
> find a good answer in the many diff mailing list archives.
>
> For all our SNP predictions we would like to know whether they are
> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
> position on the gene where amino acid is getting changed and to what
> ...Also info about indels will help.
>
> I am not sure if something like this already exists. If not even some
> pointers on how to move forward will help.
>
> Thanks,
> -Abhi
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks Abhi-- I had a feeling there was more (or "less") to it-- this would be a
nice feature to have, don't think it exists. Will think about it-- cheers
----- Original Message -----
From: "Abhishek Pratap" <abhishek.vit@...>
To: "Mark A. Jensen" <maj@...>
Cc: <bioperl-l@...>
Sent: Monday, July 13, 2009 11:10 AM
Subject: Re: [Bioperl-l] Classifying SNPs


> Dear Mark
> Sorry I was not able to reply earlier. Many Thanks for your detailed
> explanation. However this is not exactly what I am looking for. May be my
> initial mail was not well articulated or I am not able to infer your reply
> fully. My bad.
>
> Well as an input what we have is the just the genomic coordinates for SNP's
> predicted by Illumina propriety software CASAVA. What we would like to do is
> to further classify these predicted SNP's  . If they fall into Coding region
> then whether they are synonymous/non-syn SNPs.
>
> So I guess something which translates
> 1. SNP genomic coordinate into mRNA offset
> 2. Then identify the ORF and target codon and check whether the SNP
> substitution will be syn/non-syn.
>
> Thanks,
> -Abhi
>
> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> wrote:
>
>> Hey Abhishek-
>> You might root around in Bio::PopGen. Here's a script to get stuff from
>> raw fasta data--see comments within.
>> cheers
>> Mark
>>
>> use Bio::AlignIO;
>> use Bio::PopGen::Utilities;
>>
>> $file = "your_raw_file.fas";
>>
>>
>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
>> # get the alignment into a Bio::PopGen::Population format, with codons
>> # as the marker sites
>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
>> -site_model=>'cod');
>> # here are your variable codons...
>> my @cdnpos = $pop->get_marker_names;
>> # here are your individuals represented in the alignment
>> my @inds = $pop->get_Individuals;
>> # which have names like "Codon-3-9", "Codon-4-12", etc
>> foreach my $cdn (@cdnpos) {
>>   # calculate the unique codons represented at this codon position
>>   my (%ucdns, @ucdns);
>>   @genos = $pop->get_Genotypes(-marker=>$cdn);
>>   $ucdns{$_->get_Alleles}++ for @genos;
>>   @ucdns = sort keys %ucdns;
>>   #
>>   # here, use translate or something faster to identify syn/non-syn
>>   # check out code in Bio::Align::DNAStatistics for various methods
>>
>> }
>> # relate back to individuals with this
>> foreach my $ind (@inds) {
>>   print "Individual ".$ind->unique_id."\n";
>>   print "Site\tAllele\n";
>>   foreach my $cdn (@cdnpos) {
>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>   }
>> }
>>
>>
>> 1;
>>
>> ----- Original Message ----- From: "Abhishek Pratap" <
>> abhishek.vit@...>
>> To: <bioperl-l@...>
>> Sent: Wednesday, July 08, 2009 10:24 AM
>> Subject: [Bioperl-l] Classifying SNPs
>>
>>
>>
>> Hi All
>>
>> This might seem to be an old track question. However I was not able to
>> find a good answer in the many diff mailing list archives.
>>
>> For all our SNP predictions we would like to know whether they are
>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>> position on the gene where amino acid is getting changed and to what
>> ...Also info about indels will help.
>>
>> I am not sure if something like this already exists. If not even some
>> pointers on how to move forward will help.
>>
>> Thanks,
>> -Abhi
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bio::Coordinate might help with coordinate conversion.  However, much  
of this sounds very Ensembl-like.  Have you looked at the Ensembl perl  
API?  It can do #1 (coordinate conversion), and I'm sure something  
could be written up to do the second.

chris

On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:

> Thanks Abhi-- I had a feeling there was more (or "less") to it--  
> this would be a nice feature to have, don't think it exists. Will  
> think about it-- cheers
> ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@...
> >
> To: "Mark A. Jensen" <maj@...>
> Cc: <bioperl-l@...>
> Sent: Monday, July 13, 2009 11:10 AM
> Subject: Re: [Bioperl-l] Classifying SNPs
>
>
>> Dear Mark
>> Sorry I was not able to reply earlier. Many Thanks for your detailed
>> explanation. However this is not exactly what I am looking for. May  
>> be my
>> initial mail was not well articulated or I am not able to infer  
>> your reply
>> fully. My bad.
>>
>> Well as an input what we have is the just the genomic coordinates  
>> for SNP's
>> predicted by Illumina propriety software CASAVA. What we would like  
>> to do is
>> to further classify these predicted SNP's  . If they fall into  
>> Coding region
>> then whether they are synonymous/non-syn SNPs.
>>
>> So I guess something which translates
>> 1. SNP genomic coordinate into mRNA offset
>> 2. Then identify the ORF and target codon and check whether the SNP
>> substitution will be syn/non-syn.
>>
>> Thanks,
>> -Abhi
>>
>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...>  
>> wrote:
>>
>>> Hey Abhishek-
>>> You might root around in Bio::PopGen. Here's a script to get stuff  
>>> from
>>> raw fasta data--see comments within.
>>> cheers
>>> Mark
>>>
>>> use Bio::AlignIO;
>>> use Bio::PopGen::Utilities;
>>>
>>> $file = "your_raw_file.fas";
>>>
>>>
>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)-
>>> >next_aln;
>>> # get the alignment into a Bio::PopGen::Population format, with  
>>> codons
>>> # as the marker sites
>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>
>>> $aln,
>>> -site_model=>'cod');
>>> # here are your variable codons...
>>> my @cdnpos = $pop->get_marker_names;
>>> # here are your individuals represented in the alignment
>>> my @inds = $pop->get_Individuals;
>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>> foreach my $cdn (@cdnpos) {
>>>  # calculate the unique codons represented at this codon position
>>>  my (%ucdns, @ucdns);
>>>  @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>  $ucdns{$_->get_Alleles}++ for @genos;
>>>  @ucdns = sort keys %ucdns;
>>>  #
>>>  # here, use translate or something faster to identify syn/non-syn
>>>  # check out code in Bio::Align::DNAStatistics for various methods
>>>
>>> }
>>> # relate back to individuals with this
>>> foreach my $ind (@inds) {
>>>  print "Individual ".$ind->unique_id."\n";
>>>  print "Site\tAllele\n";
>>>  foreach my $cdn (@cdnpos) {
>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>  }
>>> }
>>>
>>>
>>> 1;
>>>
>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>> abhishek.vit@...>
>>> To: <bioperl-l@...>
>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>> Subject: [Bioperl-l] Classifying SNPs
>>>
>>>
>>>
>>> Hi All
>>>
>>> This might seem to be an old track question. However I was not  
>>> able to
>>> find a good answer in the many diff mailing list archives.
>>>
>>> For all our SNP predictions we would like to know whether they are
>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>>> position on the gene where amino acid is getting changed and to what
>>> ...Also info about indels will help.
>>>
>>> I am not sure if something like this already exists. If not even  
>>> some
>>> pointers on how to move forward will help.
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ensembl would be best place to go if you are working with human SNPs  
but for those who aren't so data lucky...

Aspects of this also relates to the dn/dS code in the  
Bio::Align::DNAStatistics -- thought it does the classification and  
comparison all at once so you'd have to dig code out.

And the mcdonald_kreitman code in Bio::PopGen::Statistics which  
computes a synonymous or nonsynonymous via lookup table that is stored  
in Bio::MolEvol::CodonModel which compares the edit path which is  
encoded as the two codons concatenated together -- i.ee

use Bio::MolEvol::CodonModel;
  my $codon_path = Bio::MolEvol::CodonModel->codon_path;
  my ($ns, $syn) = $codon_path->{'AATAAC'};
  print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";


It all kind of depends on how you have the data organized, if it is  
just SNPs and you are trying to figure out if they are syn or non-syn  
then you kind of need a good database to do this since you'll have to  
know what gene they are in, CDS of the gene, etc.  It is possible to  
do with something as basic as GFF3 for your genome and the SNP  
locations and Bio::DB::SeqFeature::Store.  While I can think of a way  
to code it up from those bare-bones - maybe you should report back if  
you can just use the Ensembl classification of the SNPs?

-jason

On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:

> Bio::Coordinate might help with coordinate conversion.  However,  
> much of this sounds very Ensembl-like.  Have you looked at the  
> Ensembl perl API?  It can do #1 (coordinate conversion), and I'm  
> sure something could be written up to do the second.
>
> chris
>
> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>
>> Thanks Abhi-- I had a feeling there was more (or "less") to it--  
>> this would be a nice feature to have, don't think it exists. Will  
>> think about it-- cheers
>> ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@...
>> >
>> To: "Mark A. Jensen" <maj@...>
>> Cc: <bioperl-l@...>
>> Sent: Monday, July 13, 2009 11:10 AM
>> Subject: Re: [Bioperl-l] Classifying SNPs
>>
>>
>>> Dear Mark
>>> Sorry I was not able to reply earlier. Many Thanks for your detailed
>>> explanation. However this is not exactly what I am looking for.  
>>> May be my
>>> initial mail was not well articulated or I am not able to infer  
>>> your reply
>>> fully. My bad.
>>>
>>> Well as an input what we have is the just the genomic coordinates  
>>> for SNP's
>>> predicted by Illumina propriety software CASAVA. What we would  
>>> like to do is
>>> to further classify these predicted SNP's  . If they fall into  
>>> Coding region
>>> then whether they are synonymous/non-syn SNPs.
>>>
>>> So I guess something which translates
>>> 1. SNP genomic coordinate into mRNA offset
>>> 2. Then identify the ORF and target codon and check whether the SNP
>>> substitution will be syn/non-syn.
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen  
>>> <maj@...> wrote:
>>>
>>>> Hey Abhishek-
>>>> You might root around in Bio::PopGen. Here's a script to get  
>>>> stuff from
>>>> raw fasta data--see comments within.
>>>> cheers
>>>> Mark
>>>>
>>>> use Bio::AlignIO;
>>>> use Bio::PopGen::Utilities;
>>>>
>>>> $file = "your_raw_file.fas";
>>>>
>>>>
>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)-
>>>> >next_aln;
>>>> # get the alignment into a Bio::PopGen::Population format, with  
>>>> codons
>>>> # as the marker sites
>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>
>>>> $aln,
>>>> -site_model=>'cod');
>>>> # here are your variable codons...
>>>> my @cdnpos = $pop->get_marker_names;
>>>> # here are your individuals represented in the alignment
>>>> my @inds = $pop->get_Individuals;
>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>> foreach my $cdn (@cdnpos) {
>>>> # calculate the unique codons represented at this codon position
>>>> my (%ucdns, @ucdns);
>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>> @ucdns = sort keys %ucdns;
>>>> #
>>>> # here, use translate or something faster to identify syn/non-syn
>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>
>>>> }
>>>> # relate back to individuals with this
>>>> foreach my $ind (@inds) {
>>>> print "Individual ".$ind->unique_id."\n";
>>>> print "Site\tAllele\n";
>>>> foreach my $cdn (@cdnpos) {
>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>> }
>>>> }
>>>>
>>>>
>>>> 1;
>>>>
>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>> abhishek.vit@...>
>>>> To: <bioperl-l@...>
>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>
>>>>
>>>>
>>>> Hi All
>>>>
>>>> This might seem to be an old track question. However I was not  
>>>> able to
>>>> find a good answer in the many diff mailing list archives.
>>>>
>>>> For all our SNP predictions we would like to know whether they are
>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>>>> position on the gene where amino acid is getting changed and to  
>>>> what
>>>> ...Also info about indels will help.
>>>>
>>>> I am not sure if something like this already exists. If not even  
>>>> some
>>>> pointers on how to move forward will help.
>>>>
>>>> Thanks,
>>>> -Abhi
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@...
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason@...

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 13, 2009, at 11:54 AM, Jason Stajich wrote:

> Ensembl would be best place to go if you are working with human SNPs  
> but for those who aren't so data lucky...

My mouse bias is showing ;>

chris
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I hate meeces to pieces.
----- Original Message -----
From: "Chris Fields" <cjfields@...>
To: "Jason Stajich" <jason@...>
Cc: "BioPerl List" <bioperl-l@...>; "Abhishek Pratap"
<abhishek.vit@...>
Sent: Monday, July 13, 2009 1:02 PM
Subject: Re: [Bioperl-l] Classifying SNPs


> On Jul 13, 2009, at 11:54 AM, Jason Stajich wrote:
>
>> Ensembl would be best place to go if you are working with human SNPs  but for
>> those who aren't so data lucky...
>
> My mouse bias is showing ;>
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Abhishek Pratap :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason

Thanks for a detailed insight. I would definitely go the ensembl way first
and try to see if it can do exactly what we want.

 In case it does/'nt I will report back on this same thread. I think having
something like this in the Bioperl will help the NGS community. Lot of
people are predicting SNPs from NGS.oops(next generation sequencing ) data
and looking for ways to better annotate/classify their predictions.

Thanks guys .. It is a pleasure to interact with you all. Just overwhelmed
to see the responses.

best,
-Abhi



On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason@...> wrote:

> Ensembl would be best place to go if you are working with human SNPs but
> for those who aren't so data lucky...
>
> Aspects of this also relates to the dn/dS code in the
> Bio::Align::DNAStatistics -- thought it does the classification and
> comparison all at once so you'd have to dig code out.
>
> And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a
> synonymous or nonsynonymous via lookup table that is stored in
> Bio::MolEvol::CodonModel which compares the edit path which is encoded as
> the two codons concatenated together -- i.ee
>
> use Bio::MolEvol::CodonModel;
>  my $codon_path = Bio::MolEvol::CodonModel->codon_path;
>  my ($ns, $syn) = $codon_path->{'AATAAC'};
>  print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
>
>
> It all kind of depends on how you have the data organized, if it is just
> SNPs and you are trying to figure out if they are syn or non-syn then you
> kind of need a good database to do this since you'll have to know what gene
> they are in, CDS of the gene, etc.  It is possible to do with something as
> basic as GFF3 for your genome and the SNP locations and
> Bio::DB::SeqFeature::Store.  While I can think of a way to code it up from
> those bare-bones - maybe you should report back if you can just use the
> Ensembl classification of the SNPs?
>
> -jason
>
>
> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
>
>  Bio::Coordinate might help with coordinate conversion.  However, much of
>> this sounds very Ensembl-like.  Have you looked at the Ensembl perl API?  It
>> can do #1 (coordinate conversion), and I'm sure something could be written
>> up to do the second.
>>
>> chris
>>
>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>>
>>  Thanks Abhi-- I had a feeling there was more (or "less") to it-- this
>>> would be a nice feature to have, don't think it exists. Will think about
>>> it-- cheers
>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>> abhishek.vit@...>
>>> To: "Mark A. Jensen" <maj@...>
>>> Cc: <bioperl-l@...>
>>> Sent: Monday, July 13, 2009 11:10 AM
>>> Subject: Re: [Bioperl-l] Classifying SNPs
>>>
>>>
>>>  Dear Mark
>>>> Sorry I was not able to reply earlier. Many Thanks for your detailed
>>>> explanation. However this is not exactly what I am looking for. May be
>>>> my
>>>> initial mail was not well articulated or I am not able to infer your
>>>> reply
>>>> fully. My bad.
>>>>
>>>> Well as an input what we have is the just the genomic coordinates for
>>>> SNP's
>>>> predicted by Illumina propriety software CASAVA. What we would like to
>>>> do is
>>>> to further classify these predicted SNP's  . If they fall into Coding
>>>> region
>>>> then whether they are synonymous/non-syn SNPs.
>>>>
>>>> So I guess something which translates
>>>> 1. SNP genomic coordinate into mRNA offset
>>>> 2. Then identify the ORF and target codon and check whether the SNP
>>>> substitution will be syn/non-syn.
>>>>
>>>> Thanks,
>>>> -Abhi
>>>>
>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...>
>>>> wrote:
>>>>
>>>>  Hey Abhishek-
>>>>> You might root around in Bio::PopGen. Here's a script to get stuff from
>>>>> raw fasta data--see comments within.
>>>>> cheers
>>>>> Mark
>>>>>
>>>>> use Bio::AlignIO;
>>>>> use Bio::PopGen::Utilities;
>>>>>
>>>>> $file = "your_raw_file.fas";
>>>>>
>>>>>
>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
>>>>> # get the alignment into a Bio::PopGen::Population format, with codons
>>>>> # as the marker sites
>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
>>>>> -site_model=>'cod');
>>>>> # here are your variable codons...
>>>>> my @cdnpos = $pop->get_marker_names;
>>>>> # here are your individuals represented in the alignment
>>>>> my @inds = $pop->get_Individuals;
>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>>> foreach my $cdn (@cdnpos) {
>>>>> # calculate the unique codons represented at this codon position
>>>>> my (%ucdns, @ucdns);
>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>>> @ucdns = sort keys %ucdns;
>>>>> #
>>>>> # here, use translate or something faster to identify syn/non-syn
>>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>>
>>>>> }
>>>>> # relate back to individuals with this
>>>>> foreach my $ind (@inds) {
>>>>> print "Individual ".$ind->unique_id."\n";
>>>>> print "Site\tAllele\n";
>>>>> foreach my $cdn (@cdnpos) {
>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> 1;
>>>>>
>>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>>> abhishek.vit@...>
>>>>> To: <bioperl-l@...>
>>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>>
>>>>>
>>>>>
>>>>> Hi All
>>>>>
>>>>> This might seem to be an old track question. However I was not able to
>>>>> find a good answer in the many diff mailing list archives.
>>>>>
>>>>> For all our SNP predictions we would like to know whether they are
>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>>>>> position on the gene where amino acid is getting changed and to what
>>>>> ...Also info about indels will help.
>>>>>
>>>>> I am not sure if something like this already exists. If not even some
>>>>> pointers on how to move forward will help.
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@...
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@...
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@...
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Jason Stajich
> jason@...
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Jason Stajich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Having a lightweight system for this could be helpful if it isn't  
replicated somewhere else.  I don't think the NGS is really changing  
per se -- really it is just that more people have SNPs called in more  
research systems.

It would be a simple little project I think coding up some basics if  
you think you'd actually take it over and run with it and/or push it  
into a semi-organized set of scripts?

-jason
On Jul 13, 2009, at 10:13 AM, Abhishek Pratap wrote:

> Hi Jason
>
> Thanks for a detailed insight. I would definitely go the ensembl way  
> first
> and try to see if it can do exactly what we want.
>
> In case it does/'nt I will report back on this same thread. I think  
> having
> something like this in the Bioperl will help the NGS community. Lot of
> people are predicting SNPs from NGS.oops(next generation  
> sequencing ) data
> and looking for ways to better annotate/classify their predictions.
>
> Thanks guys .. It is a pleasure to interact with you all. Just  
> overwhelmed
> to see the responses.
>
> best,
> -Abhi
>
>
>
> On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason@...>  
> wrote:
>
>> Ensembl would be best place to go if you are working with human  
>> SNPs but
>> for those who aren't so data lucky...
>>
>> Aspects of this also relates to the dn/dS code in the
>> Bio::Align::DNAStatistics -- thought it does the classification and
>> comparison all at once so you'd have to dig code out.
>>
>> And the mcdonald_kreitman code in Bio::PopGen::Statistics which  
>> computes a
>> synonymous or nonsynonymous via lookup table that is stored in
>> Bio::MolEvol::CodonModel which compares the edit path which is  
>> encoded as
>> the two codons concatenated together -- i.ee
>>
>> use Bio::MolEvol::CodonModel;
>> my $codon_path = Bio::MolEvol::CodonModel->codon_path;
>> my ($ns, $syn) = $codon_path->{'AATAAC'};
>> print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
>>
>>
>> It all kind of depends on how you have the data organized, if it is  
>> just
>> SNPs and you are trying to figure out if they are syn or non-syn  
>> then you
>> kind of need a good database to do this since you'll have to know  
>> what gene
>> they are in, CDS of the gene, etc.  It is possible to do with  
>> something as
>> basic as GFF3 for your genome and the SNP locations and
>> Bio::DB::SeqFeature::Store.  While I can think of a way to code it  
>> up from
>> those bare-bones - maybe you should report back if you can just use  
>> the
>> Ensembl classification of the SNPs?
>>
>> -jason
>>
>>
>> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
>>
>> Bio::Coordinate might help with coordinate conversion.  However,  
>> much of
>>> this sounds very Ensembl-like.  Have you looked at the Ensembl  
>>> perl API?  It
>>> can do #1 (coordinate conversion), and I'm sure something could be  
>>> written
>>> up to do the second.
>>>
>>> chris
>>>
>>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>>>
>>> Thanks Abhi-- I had a feeling there was more (or "less") to it--  
>>> this
>>>> would be a nice feature to have, don't think it exists. Will  
>>>> think about
>>>> it-- cheers
>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>> abhishek.vit@...>
>>>> To: "Mark A. Jensen" <maj@...>
>>>> Cc: <bioperl-l@...>
>>>> Sent: Monday, July 13, 2009 11:10 AM
>>>> Subject: Re: [Bioperl-l] Classifying SNPs
>>>>
>>>>
>>>> Dear Mark
>>>>> Sorry I was not able to reply earlier. Many Thanks for your  
>>>>> detailed
>>>>> explanation. However this is not exactly what I am looking for.  
>>>>> May be
>>>>> my
>>>>> initial mail was not well articulated or I am not able to infer  
>>>>> your
>>>>> reply
>>>>> fully. My bad.
>>>>>
>>>>> Well as an input what we have is the just the genomic  
>>>>> coordinates for
>>>>> SNP's
>>>>> predicted by Illumina propriety software CASAVA. What we would  
>>>>> like to
>>>>> do is
>>>>> to further classify these predicted SNP's  . If they fall into  
>>>>> Coding
>>>>> region
>>>>> then whether they are synonymous/non-syn SNPs.
>>>>>
>>>>> So I guess something which translates
>>>>> 1. SNP genomic coordinate into mRNA offset
>>>>> 2. Then identify the ORF and target codon and check whether the  
>>>>> SNP
>>>>> substitution will be syn/non-syn.
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen  
>>>>> <maj@...>
>>>>> wrote:
>>>>>
>>>>> Hey Abhishek-
>>>>>> You might root around in Bio::PopGen. Here's a script to get  
>>>>>> stuff from
>>>>>> raw fasta data--see comments within.
>>>>>> cheers
>>>>>> Mark
>>>>>>
>>>>>> use Bio::AlignIO;
>>>>>> use Bio::PopGen::Utilities;
>>>>>>
>>>>>> $file = "your_raw_file.fas";
>>>>>>
>>>>>>
>>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)-
>>>>>> >next_aln;
>>>>>> # get the alignment into a Bio::PopGen::Population format, with  
>>>>>> codons
>>>>>> # as the marker sites
>>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>
>>>>>> $aln,
>>>>>> -site_model=>'cod');
>>>>>> # here are your variable codons...
>>>>>> my @cdnpos = $pop->get_marker_names;
>>>>>> # here are your individuals represented in the alignment
>>>>>> my @inds = $pop->get_Individuals;
>>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>>>> foreach my $cdn (@cdnpos) {
>>>>>> # calculate the unique codons represented at this codon position
>>>>>> my (%ucdns, @ucdns);
>>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>>>> @ucdns = sort keys %ucdns;
>>>>>> #
>>>>>> # here, use translate or something faster to identify syn/non-syn
>>>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>>>
>>>>>> }
>>>>>> # relate back to individuals with this
>>>>>> foreach my $ind (@inds) {
>>>>>> print "Individual ".$ind->unique_id."\n";
>>>>>> print "Site\tAllele\n";
>>>>>> foreach my $cdn (@cdnpos) {
>>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> 1;
>>>>>>
>>>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>>>> abhishek.vit@...>
>>>>>> To: <bioperl-l@...>
>>>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All
>>>>>>
>>>>>> This might seem to be an old track question. However I was not  
>>>>>> able to
>>>>>> find a good answer in the many diff mailing list archives.
>>>>>>
>>>>>> For all our SNP predictions we would like to know whether they  
>>>>>> are
>>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find  
>>>>>> the
>>>>>> position on the gene where amino acid is getting changed and to  
>>>>>> what
>>>>>> ...Also info about indels will help.
>>>>>>
>>>>>> I am not sure if something like this already exists. If not  
>>>>>> even some
>>>>>> pointers on how to move forward will help.
>>>>>>
>>>>>> Thanks,
>>>>>> -Abhi
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l@...
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@...
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@...
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@...
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> jason@...
>>
>>

--
Jason Stajich
jason@...

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Parent Message unknown Re: Classifying SNPs

by Pablo Marin-Garcia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello Abhishek

Ensembl has a module for calculate SNP consequences in a transcript.

The script that they use to create their consequences is located in:

ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl

The important bit is to  convert your snp
coordenates and the variation_allele into a ConsequenceType object

  $consequence_type =
Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles);

and pass this and a transcript to the  type_variation
Bio::EnsEMBL::Utils::TranscriptAlleles exported method

   $consequences = type_variation($tr, "", $consequence_type);

in the module

ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm

The other important bit in this script is that now the functional_genomics
consequences are calculated in this script instead in the type_variation()

The only drawback is that it return only the ensembl classes of
consequences , but you can extend that later if you need more specific
consequences (I have done that in the past for different projects).

This ensembl aproach will save you a lot of problems with the mapping from
gene to protein and with multiple snps in a codon.

If you have experience with ensembl then is easy to follow the code. If
not you can always ask for help in the ensembl-dev mailing list
(ensembl-dev@...)


If you want to read the code without checking out the whole api:


http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log


hope this helps


   - Pablo




Abhishek Pratap <abhishek.vit <at> gmail.com> writes:

>
> Hi Jason
>
> Thanks for a detailed insight. I would definitely go the ensembl way
first
> and try to see if it can do exactly what we want.
>
>  In case it does/'nt I will report back on this same thread. I think
having
> something like this in the Bioperl will help the NGS community. Lot of
> people are predicting SNPs from NGS.oops(next generation sequencing )
data
> and looking for ways to better annotate/classify their predictions.
>
> Thanks guys .. It is a pleasure to interact with you all. Just
overwhelmed
> to see the responses.
>
> best,
> -Abhi
>
> On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason <at> bioperl.org>
wrote:
>
> > Ensembl would be best place to go if you are working with human SNPs
but
> > for those who aren't so data lucky...
> >
> > Aspects of this also relates to the dn/dS code in the
> > Bio::Align::DNAStatistics -- thought it does the classification and
> > comparison all at once so you'd have to dig code out.
> >
> > And the mcdonald_kreitman code in Bio::PopGen::Statistics which
computes a
> > synonymous or nonsynonymous via lookup table that is stored in
> > Bio::MolEvol::CodonModel which compares the edit path which is encoded
as
> > the two codons concatenated together -- i.ee
> >
> > use Bio::MolEvol::CodonModel;
> >  my $codon_path = Bio::MolEvol::CodonModel->codon_path;
> >  my ($ns, $syn) = $codon_path->{'AATAAC'};
> >  print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
> >
> >
> > It all kind of depends on how you have the data organized, if it is
just
> > SNPs and you are trying to figure out if they are syn or non-syn then
you
> > kind of need a good database to do this since you'll have to know what
gene
> > they are in, CDS of the gene, etc.  It is possible to do with
something as
> > basic as GFF3 for your genome and the SNP locations and
> > Bio::DB::SeqFeature::Store.  While I can think of a way to code it up
from
> > those bare-bones - maybe you should report back if you can just use
the
> > Ensembl classification of the SNPs?
> >
> > -jason
> >
> >
> > On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
> >
> >  Bio::Coordinate might help with coordinate conversion.  However, much
of
> >> this sounds very Ensembl-like.  Have you looked at the Ensembl perl
API?  It
> >> can do #1 (coordinate conversion), and I'm sure something could be
written
> >> up to do the second.
> >>
> >> chris
> >>
> >> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
> >>
> >>  Thanks Abhi-- I had a feeling there was more (or "less") to it--
this
> >>> would be a nice feature to have, don't think it exists. Will think
about

> >>> it-- cheers
> >>> ----- Original Message ----- From: "Abhishek Pratap" <
> >>> abhishek.vit <at> gmail.com>
> >>> To: "Mark A. Jensen" <maj <at> fortinbras.us>
> >>> Cc: <bioperl-l <at> lists.open-bio.org>
> >>> Sent: Monday, July 13, 2009 11:10 AM
> >>> Subject: Re: [Bioperl-l] Classifying SNPs
> >>>
> >>>
> >>>  Dear Mark
> >>>> Sorry I was not able to reply earlier. Many Thanks for your
detailed
> >>>> explanation. However this is not exactly what I am looking for. May
be
> >>>> my
> >>>> initial mail was not well articulated or I am not able to infer
your
> >>>> reply
> >>>> fully. My bad.
> >>>>
> >>>> Well as an input what we have is the just the genomic coordinates
for
> >>>> SNP's
> >>>> predicted by Illumina propriety software CASAVA. What we would like
to
> >>>> do is
> >>>> to further classify these predicted SNP's  . If they fall into
Coding

> >>>> region
> >>>> then whether they are synonymous/non-syn SNPs.
> >>>>
> >>>> So I guess something which translates
> >>>> 1. SNP genomic coordinate into mRNA offset
> >>>> 2. Then identify the ORF and target codon and check whether the SNP
> >>>> substitution will be syn/non-syn.
> >>>>
> >>>> Thanks,
> >>>> -Abhi
> >>>>
> >>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj <at>
fortinbras.us>
> >>>> wrote:
> >>>>
> >>>>  Hey Abhishek-
> >>>>> You might root around in Bio::PopGen. Here's a script to get stuff
from

> >>>>> raw fasta data--see comments within.
> >>>>> cheers
> >>>>> Mark
> >>>>>
> >>>>> use Bio::AlignIO;
> >>>>> use Bio::PopGen::Utilities;
> >>>>>
> >>>>> $file = "your_raw_file.fas";
> >>>>>
> >>>>>
> >>>>> my $aln = Bio::AlignIO->new(-format=>'fasta',
-file=>$file)->next_aln;
> >>>>> # get the alignment into a Bio::PopGen::Population format, with
codons
> >>>>> # as the marker sites
> >>>>> my $pop =
Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,

> >>>>> -site_model=>'cod');
> >>>>> # here are your variable codons...
> >>>>> my @cdnpos = $pop->get_marker_names;
> >>>>> # here are your individuals represented in the alignment
> >>>>> my @inds = $pop->get_Individuals;
> >>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
> >>>>> foreach my $cdn (@cdnpos) {
> >>>>> # calculate the unique codons represented at this codon position
> >>>>> my (%ucdns, @ucdns);
> >>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
> >>>>> $ucdns{$_->get_Alleles}++ for @genos;
> >>>>> @ucdns = sort keys %ucdns;
> >>>>> #
> >>>>> # here, use translate or something faster to identify syn/non-syn
> >>>>> # check out code in Bio::Align::DNAStatistics for various methods
> >>>>>
> >>>>> }
> >>>>> # relate back to individuals with this
> >>>>> foreach my $ind (@inds) {
> >>>>> print "Individual ".$ind->unique_id."\n";
> >>>>> print "Site\tAllele\n";
> >>>>> foreach my $cdn (@cdnpos) {
> >>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
> >>>>> }
> >>>>> }
> >>>>>
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> ----- Original Message ----- From: "Abhishek Pratap" <
> >>>>> abhishek.vit <at> gmail.com>
> >>>>> To: <bioperl-l <at> lists.open-bio.org>
> >>>>> Sent: Wednesday, July 08, 2009 10:24 AM
> >>>>> Subject: [Bioperl-l] Classifying SNPs
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi All
> >>>>>
> >>>>> This might seem to be an old track question. However I was not
able to
> >>>>> find a good answer in the many diff mailing list archives.
> >>>>>
> >>>>> For all our SNP predictions we would like to know whether they are
> >>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find
the
> >>>>> position on the gene where amino acid is getting changed and to
what
> >>>>> ...Also info about indels will help.
> >>>>>
> >>>>> I am not sure if something like this already exists. If not even
some

> >>>>> pointers on how to move forward will help.
> >>>>>
> >>>>> Thanks,
> >>>>> -Abhi
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l <at> lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>  _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l <at> lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l <at> lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l <at> lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > --
> > Jason Stajich
> > jason <at> bioperl.org
> >
> >


=====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin@...
====================================================================










--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Pablo Marin-Garcia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


fixing a typo and explaining a gotcha

On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote:

>
> Hello Abhishek
>
> Ensembl has a module for calculate SNP consequences in a transcript.
>
> The script that they use to create their consequences is located in:
>
> ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl
>
> The important bit is to  convert your snp coordenates and the
> variation_allele into a ConsequenceType object
>
> $consequence_type =
> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles);
>

fixing typo: (instead $chr it would be a $variation_id)

Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles);

warning:

The transcript_id and the variation_id are not important if you are not
building a ensembl database.

BUT the gotcha part is that the start and end of the variation should
refer to the same slice start than the transcript used in the next step
(type_variation). Be careful because depending how you select the gene or slice to retrieve
your transcripts your transcript start and end would be the chromosome
coordinates or a relative start/end from the slice start.

You should work with chr positions for the variations and the transcripts
(where start/end == seq_region_start/seq_region_end) to avoid problems.

> and pass this and a transcript to the  type_variation
> Bio::EnsEMBL::Utils::TranscriptAlleles exported method
>
>  $consequences = type_variation($tr, $gene, $consequence_type);
>

The $gene is optional


> in the module
>
> ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm
>
> The other important bit in this script is that now the functional_genomics
> consequences are calculated in this script instead in the type_variation()
>
> The only drawback is that it return only the ensembl classes of consequences
> , but you can extend that later if you need more specific consequences (I
> have done that in the past for different projects).
>
> This ensembl aproach will save you a lot of problems with the mapping from
> gene to protein and with multiple snps in a codon.
>
> If you have experience with ensembl then is easy to follow the code. If not
> you can always ask for help in the ensembl-dev mailing list
> (ensembl-dev@...)
>
>
> If you want to read the code without checking out the whole api:
>
>
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log
>
>
> hope this helps
>
>
>  - Pablo
>
>
>
>

=====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin@...
====================================================================










--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Classifying SNPs

by Abhishek Pratap :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Pablo
Many thanks for for your reply. I am currently attending a workshop so might
not get time to check out your suggestions. Once I am back I will get back
to you in case I have any questions.

Thanks,
-Abhi

On Tue, Jul 14, 2009 at 7:57 PM, Pablo Marin-Garcia <pg4@...>wrote:

>
> fixing a typo and explaining a gotcha
>
> On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote:
>
>
>> Hello Abhishek
>>
>> Ensembl has a module for calculate SNP consequences in a transcript.
>>
>> The script that they use to create their consequences is located in:
>>
>>
>> ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl
>>
>> The important bit is to  convert your snp coordenates and the
>> variation_allele into a ConsequenceType object
>>
>> $consequence_type =
>> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles);
>>
>>
> fixing typo: (instead $chr it would be a $variation_id)
>
>
> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles);
>
> warning:
>
> The transcript_id and the variation_id are not important if you are not
> building a ensembl database.
>
> BUT the gotcha part is that the start and end of the variation should refer
> to the same slice start than the transcript used in the next step
> (type_variation). Be careful because depending how you select the gene or
> slice to retrieve your transcripts your transcript start and end would be
> the chromosome coordinates or a relative start/end from the slice start.
>
> You should work with chr positions for the variations and the transcripts
> (where start/end == seq_region_start/seq_region_end) to avoid problems.
>
>  and pass this and a transcript to the  type_variation
>> Bio::EnsEMBL::Utils::TranscriptAlleles exported method
>>
>>  $consequences = type_variation($tr, $gene, $consequence_type);
>>
>>
> The $gene is optional
>
>
>  in the module
>>
>> ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm
>>
>> The other important bit in this script is that now the functional_genomics
>> consequences are calculated in this script instead in the type_variation()
>>
>> The only drawback is that it return only the ensembl classes of
>> consequences , but you can extend that later if you need more specific
>> consequences (I have done that in the past for different projects).
>>
>> This ensembl aproach will save you a lot of problems with the mapping from
>> gene to protein and with multiple snps in a codon.
>>
>> If you have experience with ensembl then is easy to follow the code. If
>> not you can always ask for help in the ensembl-dev mailing list (
>> ensembl-dev@...)
>>
>>
>> If you want to read the code without checking out the whole api:
>>
>>
>>
>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup
>>
>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log
>>
>>
>> hope this helps
>>
>>
>>  - Pablo
>>
>>
>>
>>
>>
> =====================================================================
>                     Pablo Marin-Garcia, PhD
>
>                    \\//          (Argiope bruennichi
>               \/\/`(||>O:'\/\/   with stabilimentum)
>                    //\\
>
> Sanger Institute                |  PostDoc / Computer Biologist
> Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
> Hinxton, Cambridge CB10 1HH     |  room : N333
> United Kingdom                  |  email: pablo.marin@...
> ====================================================================
>
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE. _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l