|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
Classifying SNPsHi All
This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic then find the position on the gene where amino acid is getting changed and to what ...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsHey Abhishek-
You might root around in Bio::PopGen. Here's a script to get stuff from raw fasta data--see comments within. cheers Mark use Bio::AlignIO; use Bio::PopGen::Utilities; $file = "your_raw_file.fas"; my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; # get the alignment into a Bio::PopGen::Population format, with codons # as the marker sites my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, -site_model=>'cod'); # here are your variable codons... my @cdnpos = $pop->get_marker_names; # here are your individuals represented in the alignment my @inds = $pop->get_Individuals; # which have names like "Codon-3-9", "Codon-4-12", etc foreach my $cdn (@cdnpos) { # calculate the unique codons represented at this codon position my (%ucdns, @ucdns); @genos = $pop->get_Genotypes(-marker=>$cdn); $ucdns{$_->get_Alleles}++ for @genos; @ucdns = sort keys %ucdns; # # here, use translate or something faster to identify syn/non-syn # check out code in Bio::Align::DNAStatistics for various methods } # relate back to individuals with this foreach my $ind (@inds) { print "Individual ".$ind->unique_id."\n"; print "Site\tAllele\n"; foreach my $cdn (@cdnpos) { print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; } } 1; ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@...> To: <bioperl-l@...> Sent: Wednesday, July 08, 2009 10:24 AM Subject: [Bioperl-l] Classifying SNPs Hi All This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic then find the position on the gene where amino acid is getting changed and to what ...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsDear Mark
Sorry I was not able to reply earlier. Many Thanks for your detailed explanation. However this is not exactly what I am looking for. May be my initial mail was not well articulated or I am not able to infer your reply fully. My bad. Well as an input what we have is the just the genomic coordinates for SNP's predicted by Illumina propriety software CASAVA. What we would like to do is to further classify these predicted SNP's . If they fall into Coding region then whether they are synonymous/non-syn SNPs. So I guess something which translates 1. SNP genomic coordinate into mRNA offset 2. Then identify the ORF and target codon and check whether the SNP substitution will be syn/non-syn. Thanks, -Abhi On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> wrote: > Hey Abhishek- > You might root around in Bio::PopGen. Here's a script to get stuff from > raw fasta data--see comments within. > cheers > Mark > > use Bio::AlignIO; > use Bio::PopGen::Utilities; > > $file = "your_raw_file.fas"; > > > my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; > # get the alignment into a Bio::PopGen::Population format, with codons > # as the marker sites > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, > -site_model=>'cod'); > # here are your variable codons... > my @cdnpos = $pop->get_marker_names; > # here are your individuals represented in the alignment > my @inds = $pop->get_Individuals; > # which have names like "Codon-3-9", "Codon-4-12", etc > foreach my $cdn (@cdnpos) { > # calculate the unique codons represented at this codon position > my (%ucdns, @ucdns); > @genos = $pop->get_Genotypes(-marker=>$cdn); > $ucdns{$_->get_Alleles}++ for @genos; > @ucdns = sort keys %ucdns; > # > # here, use translate or something faster to identify syn/non-syn > # check out code in Bio::Align::DNAStatistics for various methods > > } > # relate back to individuals with this > foreach my $ind (@inds) { > print "Individual ".$ind->unique_id."\n"; > print "Site\tAllele\n"; > foreach my $cdn (@cdnpos) { > print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; > } > } > > > 1; > > ----- Original Message ----- From: "Abhishek Pratap" < > abhishek.vit@...> > To: <bioperl-l@...> > Sent: Wednesday, July 08, 2009 10:24 AM > Subject: [Bioperl-l] Classifying SNPs > > > > Hi All > > This might seem to be an old track question. However I was not able to > find a good answer in the many diff mailing list archives. > > For all our SNP predictions we would like to know whether they are > synonymous / non-synonymous. If Non-synonymous/Exonic then find the > position on the gene where amino acid is getting changed and to what > ...Also info about indels will help. > > I am not sure if something like this already exists. If not even some > pointers on how to move forward will help. > > Thanks, > -Abhi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsThanks Abhi-- I had a feeling there was more (or "less") to it-- this would be a
nice feature to have, don't think it exists. Will think about it-- cheers ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@...> To: "Mark A. Jensen" <maj@...> Cc: <bioperl-l@...> Sent: Monday, July 13, 2009 11:10 AM Subject: Re: [Bioperl-l] Classifying SNPs > Dear Mark > Sorry I was not able to reply earlier. Many Thanks for your detailed > explanation. However this is not exactly what I am looking for. May be my > initial mail was not well articulated or I am not able to infer your reply > fully. My bad. > > Well as an input what we have is the just the genomic coordinates for SNP's > predicted by Illumina propriety software CASAVA. What we would like to do is > to further classify these predicted SNP's . If they fall into Coding region > then whether they are synonymous/non-syn SNPs. > > So I guess something which translates > 1. SNP genomic coordinate into mRNA offset > 2. Then identify the ORF and target codon and check whether the SNP > substitution will be syn/non-syn. > > Thanks, > -Abhi > > On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> wrote: > >> Hey Abhishek- >> You might root around in Bio::PopGen. Here's a script to get stuff from >> raw fasta data--see comments within. >> cheers >> Mark >> >> use Bio::AlignIO; >> use Bio::PopGen::Utilities; >> >> $file = "your_raw_file.fas"; >> >> >> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; >> # get the alignment into a Bio::PopGen::Population format, with codons >> # as the marker sites >> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, >> -site_model=>'cod'); >> # here are your variable codons... >> my @cdnpos = $pop->get_marker_names; >> # here are your individuals represented in the alignment >> my @inds = $pop->get_Individuals; >> # which have names like "Codon-3-9", "Codon-4-12", etc >> foreach my $cdn (@cdnpos) { >> # calculate the unique codons represented at this codon position >> my (%ucdns, @ucdns); >> @genos = $pop->get_Genotypes(-marker=>$cdn); >> $ucdns{$_->get_Alleles}++ for @genos; >> @ucdns = sort keys %ucdns; >> # >> # here, use translate or something faster to identify syn/non-syn >> # check out code in Bio::Align::DNAStatistics for various methods >> >> } >> # relate back to individuals with this >> foreach my $ind (@inds) { >> print "Individual ".$ind->unique_id."\n"; >> print "Site\tAllele\n"; >> foreach my $cdn (@cdnpos) { >> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >> } >> } >> >> >> 1; >> >> ----- Original Message ----- From: "Abhishek Pratap" < >> abhishek.vit@...> >> To: <bioperl-l@...> >> Sent: Wednesday, July 08, 2009 10:24 AM >> Subject: [Bioperl-l] Classifying SNPs >> >> >> >> Hi All >> >> This might seem to be an old track question. However I was not able to >> find a good answer in the many diff mailing list archives. >> >> For all our SNP predictions we would like to know whether they are >> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >> position on the gene where amino acid is getting changed and to what >> ...Also info about indels will help. >> >> I am not sure if something like this already exists. If not even some >> pointers on how to move forward will help. >> >> Thanks, >> -Abhi >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsBio::Coordinate might help with coordinate conversion. However, much
of this sounds very Ensembl-like. Have you looked at the Ensembl perl API? It can do #1 (coordinate conversion), and I'm sure something could be written up to do the second. chris On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: > Thanks Abhi-- I had a feeling there was more (or "less") to it-- > this would be a nice feature to have, don't think it exists. Will > think about it-- cheers > ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@... > > > To: "Mark A. Jensen" <maj@...> > Cc: <bioperl-l@...> > Sent: Monday, July 13, 2009 11:10 AM > Subject: Re: [Bioperl-l] Classifying SNPs > > >> Dear Mark >> Sorry I was not able to reply earlier. Many Thanks for your detailed >> explanation. However this is not exactly what I am looking for. May >> be my >> initial mail was not well articulated or I am not able to infer >> your reply >> fully. My bad. >> >> Well as an input what we have is the just the genomic coordinates >> for SNP's >> predicted by Illumina propriety software CASAVA. What we would like >> to do is >> to further classify these predicted SNP's . If they fall into >> Coding region >> then whether they are synonymous/non-syn SNPs. >> >> So I guess something which translates >> 1. SNP genomic coordinate into mRNA offset >> 2. Then identify the ORF and target codon and check whether the SNP >> substitution will be syn/non-syn. >> >> Thanks, >> -Abhi >> >> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> >> wrote: >> >>> Hey Abhishek- >>> You might root around in Bio::PopGen. Here's a script to get stuff >>> from >>> raw fasta data--see comments within. >>> cheers >>> Mark >>> >>> use Bio::AlignIO; >>> use Bio::PopGen::Utilities; >>> >>> $file = "your_raw_file.fas"; >>> >>> >>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>> >next_aln; >>> # get the alignment into a Bio::PopGen::Population format, with >>> codons >>> # as the marker sites >>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>> $aln, >>> -site_model=>'cod'); >>> # here are your variable codons... >>> my @cdnpos = $pop->get_marker_names; >>> # here are your individuals represented in the alignment >>> my @inds = $pop->get_Individuals; >>> # which have names like "Codon-3-9", "Codon-4-12", etc >>> foreach my $cdn (@cdnpos) { >>> # calculate the unique codons represented at this codon position >>> my (%ucdns, @ucdns); >>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>> $ucdns{$_->get_Alleles}++ for @genos; >>> @ucdns = sort keys %ucdns; >>> # >>> # here, use translate or something faster to identify syn/non-syn >>> # check out code in Bio::Align::DNAStatistics for various methods >>> >>> } >>> # relate back to individuals with this >>> foreach my $ind (@inds) { >>> print "Individual ".$ind->unique_id."\n"; >>> print "Site\tAllele\n"; >>> foreach my $cdn (@cdnpos) { >>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>> } >>> } >>> >>> >>> 1; >>> >>> ----- Original Message ----- From: "Abhishek Pratap" < >>> abhishek.vit@...> >>> To: <bioperl-l@...> >>> Sent: Wednesday, July 08, 2009 10:24 AM >>> Subject: [Bioperl-l] Classifying SNPs >>> >>> >>> >>> Hi All >>> >>> This might seem to be an old track question. However I was not >>> able to >>> find a good answer in the many diff mailing list archives. >>> >>> For all our SNP predictions we would like to know whether they are >>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>> position on the gene where amino acid is getting changed and to what >>> ...Also info about indels will help. >>> >>> I am not sure if something like this already exists. If not even >>> some >>> pointers on how to move forward will help. >>> >>> Thanks, >>> -Abhi >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsEnsembl would be best place to go if you are working with human SNPs
but for those who aren't so data lucky... Aspects of this also relates to the dn/dS code in the Bio::Align::DNAStatistics -- thought it does the classification and comparison all at once so you'd have to dig code out. And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a synonymous or nonsynonymous via lookup table that is stored in Bio::MolEvol::CodonModel which compares the edit path which is encoded as the two codons concatenated together -- i.ee use Bio::MolEvol::CodonModel; my $codon_path = Bio::MolEvol::CodonModel->codon_path; my ($ns, $syn) = $codon_path->{'AATAAC'}; print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; It all kind of depends on how you have the data organized, if it is just SNPs and you are trying to figure out if they are syn or non-syn then you kind of need a good database to do this since you'll have to know what gene they are in, CDS of the gene, etc. It is possible to do with something as basic as GFF3 for your genome and the SNP locations and Bio::DB::SeqFeature::Store. While I can think of a way to code it up from those bare-bones - maybe you should report back if you can just use the Ensembl classification of the SNPs? -jason On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: > Bio::Coordinate might help with coordinate conversion. However, > much of this sounds very Ensembl-like. Have you looked at the > Ensembl perl API? It can do #1 (coordinate conversion), and I'm > sure something could be written up to do the second. > > chris > > On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: > >> Thanks Abhi-- I had a feeling there was more (or "less") to it-- >> this would be a nice feature to have, don't think it exists. Will >> think about it-- cheers >> ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit@... >> > >> To: "Mark A. Jensen" <maj@...> >> Cc: <bioperl-l@...> >> Sent: Monday, July 13, 2009 11:10 AM >> Subject: Re: [Bioperl-l] Classifying SNPs >> >> >>> Dear Mark >>> Sorry I was not able to reply earlier. Many Thanks for your detailed >>> explanation. However this is not exactly what I am looking for. >>> May be my >>> initial mail was not well articulated or I am not able to infer >>> your reply >>> fully. My bad. >>> >>> Well as an input what we have is the just the genomic coordinates >>> for SNP's >>> predicted by Illumina propriety software CASAVA. What we would >>> like to do is >>> to further classify these predicted SNP's . If they fall into >>> Coding region >>> then whether they are synonymous/non-syn SNPs. >>> >>> So I guess something which translates >>> 1. SNP genomic coordinate into mRNA offset >>> 2. Then identify the ORF and target codon and check whether the SNP >>> substitution will be syn/non-syn. >>> >>> Thanks, >>> -Abhi >>> >>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >>> <maj@...> wrote: >>> >>>> Hey Abhishek- >>>> You might root around in Bio::PopGen. Here's a script to get >>>> stuff from >>>> raw fasta data--see comments within. >>>> cheers >>>> Mark >>>> >>>> use Bio::AlignIO; >>>> use Bio::PopGen::Utilities; >>>> >>>> $file = "your_raw_file.fas"; >>>> >>>> >>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>>> >next_aln; >>>> # get the alignment into a Bio::PopGen::Population format, with >>>> codons >>>> # as the marker sites >>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>>> $aln, >>>> -site_model=>'cod'); >>>> # here are your variable codons... >>>> my @cdnpos = $pop->get_marker_names; >>>> # here are your individuals represented in the alignment >>>> my @inds = $pop->get_Individuals; >>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>> foreach my $cdn (@cdnpos) { >>>> # calculate the unique codons represented at this codon position >>>> my (%ucdns, @ucdns); >>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>> $ucdns{$_->get_Alleles}++ for @genos; >>>> @ucdns = sort keys %ucdns; >>>> # >>>> # here, use translate or something faster to identify syn/non-syn >>>> # check out code in Bio::Align::DNAStatistics for various methods >>>> >>>> } >>>> # relate back to individuals with this >>>> foreach my $ind (@inds) { >>>> print "Individual ".$ind->unique_id."\n"; >>>> print "Site\tAllele\n"; >>>> foreach my $cdn (@cdnpos) { >>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>> } >>>> } >>>> >>>> >>>> 1; >>>> >>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>> abhishek.vit@...> >>>> To: <bioperl-l@...> >>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>> Subject: [Bioperl-l] Classifying SNPs >>>> >>>> >>>> >>>> Hi All >>>> >>>> This might seem to be an old track question. However I was not >>>> able to >>>> find a good answer in the many diff mailing list archives. >>>> >>>> For all our SNP predictions we would like to know whether they are >>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>>> position on the gene where amino acid is getting changed and to >>>> what >>>> ...Also info about indels will help. >>>> >>>> I am not sure if something like this already exists. If not even >>>> some >>>> pointers on how to move forward will help. >>>> >>>> Thanks, >>>> -Abhi >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@... >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsOn Jul 13, 2009, at 11:54 AM, Jason Stajich wrote:
> Ensembl would be best place to go if you are working with human SNPs > but for those who aren't so data lucky... My mouse bias is showing ;> chris _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsI hate meeces to pieces.
----- Original Message ----- From: "Chris Fields" <cjfields@...> To: "Jason Stajich" <jason@...> Cc: "BioPerl List" <bioperl-l@...>; "Abhishek Pratap" <abhishek.vit@...> Sent: Monday, July 13, 2009 1:02 PM Subject: Re: [Bioperl-l] Classifying SNPs > On Jul 13, 2009, at 11:54 AM, Jason Stajich wrote: > >> Ensembl would be best place to go if you are working with human SNPs but for >> those who aren't so data lucky... > > My mouse bias is showing ;> > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsHi Jason
Thanks for a detailed insight. I would definitely go the ensembl way first and try to see if it can do exactly what we want. In case it does/'nt I will report back on this same thread. I think having something like this in the Bioperl will help the NGS community. Lot of people are predicting SNPs from NGS.oops(next generation sequencing ) data and looking for ways to better annotate/classify their predictions. Thanks guys .. It is a pleasure to interact with you all. Just overwhelmed to see the responses. best, -Abhi On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason@...> wrote: > Ensembl would be best place to go if you are working with human SNPs but > for those who aren't so data lucky... > > Aspects of this also relates to the dn/dS code in the > Bio::Align::DNAStatistics -- thought it does the classification and > comparison all at once so you'd have to dig code out. > > And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a > synonymous or nonsynonymous via lookup table that is stored in > Bio::MolEvol::CodonModel which compares the edit path which is encoded as > the two codons concatenated together -- i.ee > > use Bio::MolEvol::CodonModel; > my $codon_path = Bio::MolEvol::CodonModel->codon_path; > my ($ns, $syn) = $codon_path->{'AATAAC'}; > print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; > > > It all kind of depends on how you have the data organized, if it is just > SNPs and you are trying to figure out if they are syn or non-syn then you > kind of need a good database to do this since you'll have to know what gene > they are in, CDS of the gene, etc. It is possible to do with something as > basic as GFF3 for your genome and the SNP locations and > Bio::DB::SeqFeature::Store. While I can think of a way to code it up from > those bare-bones - maybe you should report back if you can just use the > Ensembl classification of the SNPs? > > -jason > > > On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: > > Bio::Coordinate might help with coordinate conversion. However, much of >> this sounds very Ensembl-like. Have you looked at the Ensembl perl API? It >> can do #1 (coordinate conversion), and I'm sure something could be written >> up to do the second. >> >> chris >> >> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: >> >> Thanks Abhi-- I had a feeling there was more (or "less") to it-- this >>> would be a nice feature to have, don't think it exists. Will think about >>> it-- cheers >>> ----- Original Message ----- From: "Abhishek Pratap" < >>> abhishek.vit@...> >>> To: "Mark A. Jensen" <maj@...> >>> Cc: <bioperl-l@...> >>> Sent: Monday, July 13, 2009 11:10 AM >>> Subject: Re: [Bioperl-l] Classifying SNPs >>> >>> >>> Dear Mark >>>> Sorry I was not able to reply earlier. Many Thanks for your detailed >>>> explanation. However this is not exactly what I am looking for. May be >>>> my >>>> initial mail was not well articulated or I am not able to infer your >>>> reply >>>> fully. My bad. >>>> >>>> Well as an input what we have is the just the genomic coordinates for >>>> SNP's >>>> predicted by Illumina propriety software CASAVA. What we would like to >>>> do is >>>> to further classify these predicted SNP's . If they fall into Coding >>>> region >>>> then whether they are synonymous/non-syn SNPs. >>>> >>>> So I guess something which translates >>>> 1. SNP genomic coordinate into mRNA offset >>>> 2. Then identify the ORF and target codon and check whether the SNP >>>> substitution will be syn/non-syn. >>>> >>>> Thanks, >>>> -Abhi >>>> >>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj@...> >>>> wrote: >>>> >>>> Hey Abhishek- >>>>> You might root around in Bio::PopGen. Here's a script to get stuff from >>>>> raw fasta data--see comments within. >>>>> cheers >>>>> Mark >>>>> >>>>> use Bio::AlignIO; >>>>> use Bio::PopGen::Utilities; >>>>> >>>>> $file = "your_raw_file.fas"; >>>>> >>>>> >>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; >>>>> # get the alignment into a Bio::PopGen::Population format, with codons >>>>> # as the marker sites >>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, >>>>> -site_model=>'cod'); >>>>> # here are your variable codons... >>>>> my @cdnpos = $pop->get_marker_names; >>>>> # here are your individuals represented in the alignment >>>>> my @inds = $pop->get_Individuals; >>>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>>> foreach my $cdn (@cdnpos) { >>>>> # calculate the unique codons represented at this codon position >>>>> my (%ucdns, @ucdns); >>>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>>> $ucdns{$_->get_Alleles}++ for @genos; >>>>> @ucdns = sort keys %ucdns; >>>>> # >>>>> # here, use translate or something faster to identify syn/non-syn >>>>> # check out code in Bio::Align::DNAStatistics for various methods >>>>> >>>>> } >>>>> # relate back to individuals with this >>>>> foreach my $ind (@inds) { >>>>> print "Individual ".$ind->unique_id."\n"; >>>>> print "Site\tAllele\n"; >>>>> foreach my $cdn (@cdnpos) { >>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>>> } >>>>> } >>>>> >>>>> >>>>> 1; >>>>> >>>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>>> abhishek.vit@...> >>>>> To: <bioperl-l@...> >>>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>>> Subject: [Bioperl-l] Classifying SNPs >>>>> >>>>> >>>>> >>>>> Hi All >>>>> >>>>> This might seem to be an old track question. However I was not able to >>>>> find a good answer in the many diff mailing list archives. >>>>> >>>>> For all our SNP predictions we would like to know whether they are >>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>>>> position on the gene where amino acid is getting changed and to what >>>>> ...Also info about indels will help. >>>>> >>>>> I am not sure if something like this already exists. If not even some >>>>> pointers on how to move forward will help. >>>>> >>>>> Thanks, >>>>> -Abhi >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@... >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@... >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@... >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason@... > > Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsHaving a lightweight system for this could be helpful if it isn't
replicated somewhere else. I don't think the NGS is really changing per se -- really it is just that more people have SNPs called in more research systems. It would be a simple little project I think coding up some basics if you think you'd actually take it over and run with it and/or push it into a semi-organized set of scripts? -jason On Jul 13, 2009, at 10:13 AM, Abhishek Pratap wrote: > Hi Jason > > Thanks for a detailed insight. I would definitely go the ensembl way > first > and try to see if it can do exactly what we want. > > In case it does/'nt I will report back on this same thread. I think > having > something like this in the Bioperl will help the NGS community. Lot of > people are predicting SNPs from NGS.oops(next generation > sequencing ) data > and looking for ways to better annotate/classify their predictions. > > Thanks guys .. It is a pleasure to interact with you all. Just > overwhelmed > to see the responses. > > best, > -Abhi > > > > On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason@...> > wrote: > >> Ensembl would be best place to go if you are working with human >> SNPs but >> for those who aren't so data lucky... >> >> Aspects of this also relates to the dn/dS code in the >> Bio::Align::DNAStatistics -- thought it does the classification and >> comparison all at once so you'd have to dig code out. >> >> And the mcdonald_kreitman code in Bio::PopGen::Statistics which >> computes a >> synonymous or nonsynonymous via lookup table that is stored in >> Bio::MolEvol::CodonModel which compares the edit path which is >> encoded as >> the two codons concatenated together -- i.ee >> >> use Bio::MolEvol::CodonModel; >> my $codon_path = Bio::MolEvol::CodonModel->codon_path; >> my ($ns, $syn) = $codon_path->{'AATAAC'}; >> print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; >> >> >> It all kind of depends on how you have the data organized, if it is >> just >> SNPs and you are trying to figure out if they are syn or non-syn >> then you >> kind of need a good database to do this since you'll have to know >> what gene >> they are in, CDS of the gene, etc. It is possible to do with >> something as >> basic as GFF3 for your genome and the SNP locations and >> Bio::DB::SeqFeature::Store. While I can think of a way to code it >> up from >> those bare-bones - maybe you should report back if you can just use >> the >> Ensembl classification of the SNPs? >> >> -jason >> >> >> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: >> >> Bio::Coordinate might help with coordinate conversion. However, >> much of >>> this sounds very Ensembl-like. Have you looked at the Ensembl >>> perl API? It >>> can do #1 (coordinate conversion), and I'm sure something could be >>> written >>> up to do the second. >>> >>> chris >>> >>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: >>> >>> Thanks Abhi-- I had a feeling there was more (or "less") to it-- >>> this >>>> would be a nice feature to have, don't think it exists. Will >>>> think about >>>> it-- cheers >>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>> abhishek.vit@...> >>>> To: "Mark A. Jensen" <maj@...> >>>> Cc: <bioperl-l@...> >>>> Sent: Monday, July 13, 2009 11:10 AM >>>> Subject: Re: [Bioperl-l] Classifying SNPs >>>> >>>> >>>> Dear Mark >>>>> Sorry I was not able to reply earlier. Many Thanks for your >>>>> detailed >>>>> explanation. However this is not exactly what I am looking for. >>>>> May be >>>>> my >>>>> initial mail was not well articulated or I am not able to infer >>>>> your >>>>> reply >>>>> fully. My bad. >>>>> >>>>> Well as an input what we have is the just the genomic >>>>> coordinates for >>>>> SNP's >>>>> predicted by Illumina propriety software CASAVA. What we would >>>>> like to >>>>> do is >>>>> to further classify these predicted SNP's . If they fall into >>>>> Coding >>>>> region >>>>> then whether they are synonymous/non-syn SNPs. >>>>> >>>>> So I guess something which translates >>>>> 1. SNP genomic coordinate into mRNA offset >>>>> 2. Then identify the ORF and target codon and check whether the >>>>> SNP >>>>> substitution will be syn/non-syn. >>>>> >>>>> Thanks, >>>>> -Abhi >>>>> >>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >>>>> <maj@...> >>>>> wrote: >>>>> >>>>> Hey Abhishek- >>>>>> You might root around in Bio::PopGen. Here's a script to get >>>>>> stuff from >>>>>> raw fasta data--see comments within. >>>>>> cheers >>>>>> Mark >>>>>> >>>>>> use Bio::AlignIO; >>>>>> use Bio::PopGen::Utilities; >>>>>> >>>>>> $file = "your_raw_file.fas"; >>>>>> >>>>>> >>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>>>>> >next_aln; >>>>>> # get the alignment into a Bio::PopGen::Population format, with >>>>>> codons >>>>>> # as the marker sites >>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>>>>> $aln, >>>>>> -site_model=>'cod'); >>>>>> # here are your variable codons... >>>>>> my @cdnpos = $pop->get_marker_names; >>>>>> # here are your individuals represented in the alignment >>>>>> my @inds = $pop->get_Individuals; >>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>>>> foreach my $cdn (@cdnpos) { >>>>>> # calculate the unique codons represented at this codon position >>>>>> my (%ucdns, @ucdns); >>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>>>> $ucdns{$_->get_Alleles}++ for @genos; >>>>>> @ucdns = sort keys %ucdns; >>>>>> # >>>>>> # here, use translate or something faster to identify syn/non-syn >>>>>> # check out code in Bio::Align::DNAStatistics for various methods >>>>>> >>>>>> } >>>>>> # relate back to individuals with this >>>>>> foreach my $ind (@inds) { >>>>>> print "Individual ".$ind->unique_id."\n"; >>>>>> print "Site\tAllele\n"; >>>>>> foreach my $cdn (@cdnpos) { >>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> 1; >>>>>> >>>>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>>>> abhishek.vit@...> >>>>>> To: <bioperl-l@...> >>>>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>>>> Subject: [Bioperl-l] Classifying SNPs >>>>>> >>>>>> >>>>>> >>>>>> Hi All >>>>>> >>>>>> This might seem to be an old track question. However I was not >>>>>> able to >>>>>> find a good answer in the many diff mailing list archives. >>>>>> >>>>>> For all our SNP predictions we would like to know whether they >>>>>> are >>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find >>>>>> the >>>>>> position on the gene where amino acid is getting changed and to >>>>>> what >>>>>> ...Also info about indels will help. >>>>>> >>>>>> I am not sure if something like this already exists. If not >>>>>> even some >>>>>> pointers on how to move forward will help. >>>>>> >>>>>> Thanks, >>>>>> -Abhi >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@... >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@... >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@... >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@... >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason@... >> >> -- Jason Stajich jason@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
|
|
|
Re: Classifying SNPsfixing a typo and explaining a gotcha On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote: > > Hello Abhishek > > Ensembl has a module for calculate SNP consequences in a transcript. > > The script that they use to create their consequences is located in: > > ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl > > The important bit is to convert your snp coordenates and the > variation_allele into a ConsequenceType object > > $consequence_type = > Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles); > fixing typo: (instead $chr it would be a $variation_id) Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles); warning: The transcript_id and the variation_id are not important if you are not building a ensembl database. BUT the gotcha part is that the start and end of the variation should refer to the same slice start than the transcript used in the next step (type_variation). Be careful because depending how you select the gene or slice to retrieve your transcripts your transcript start and end would be the chromosome coordinates or a relative start/end from the slice start. You should work with chr positions for the variations and the transcripts (where start/end == seq_region_start/seq_region_end) to avoid problems. > and pass this and a transcript to the type_variation > Bio::EnsEMBL::Utils::TranscriptAlleles exported method > > $consequences = type_variation($tr, $gene, $consequence_type); > The $gene is optional > in the module > > ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm > > The other important bit in this script is that now the functional_genomics > consequences are calculated in this script instead in the type_variation() > > The only drawback is that it return only the ensembl classes of consequences > , but you can extend that later if you need more specific consequences (I > have done that in the past for different projects). > > This ensembl aproach will save you a lot of problems with the mapping from > gene to protein and with multiple snps in a codon. > > If you have experience with ensembl then is easy to follow the code. If not > you can always ask for help in the ensembl-dev mailing list > (ensembl-dev@...) > > > If you want to read the code without checking out the whole api: > > > http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup > http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log > > > hope this helps > > > - Pablo > > > > ===================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin@... ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Classifying SNPsHi Pablo
Many thanks for for your reply. I am currently attending a workshop so might not get time to check out your suggestions. Once I am back I will get back to you in case I have any questions. Thanks, -Abhi On Tue, Jul 14, 2009 at 7:57 PM, Pablo Marin-Garcia <pg4@...>wrote: > > fixing a typo and explaining a gotcha > > On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote: > > >> Hello Abhishek >> >> Ensembl has a module for calculate SNP consequences in a transcript. >> >> The script that they use to create their consequences is located in: >> >> >> ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl >> >> The important bit is to convert your snp coordenates and the >> variation_allele into a ConsequenceType object >> >> $consequence_type = >> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles); >> >> > fixing typo: (instead $chr it would be a $variation_id) > > > Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles); > > warning: > > The transcript_id and the variation_id are not important if you are not > building a ensembl database. > > BUT the gotcha part is that the start and end of the variation should refer > to the same slice start than the transcript used in the next step > (type_variation). Be careful because depending how you select the gene or > slice to retrieve your transcripts your transcript start and end would be > the chromosome coordinates or a relative start/end from the slice start. > > You should work with chr positions for the variations and the transcripts > (where start/end == seq_region_start/seq_region_end) to avoid problems. > > and pass this and a transcript to the type_variation >> Bio::EnsEMBL::Utils::TranscriptAlleles exported method >> >> $consequences = type_variation($tr, $gene, $consequence_type); >> >> > The $gene is optional > > > in the module >> >> ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm >> >> The other important bit in this script is that now the functional_genomics >> consequences are calculated in this script instead in the type_variation() >> >> The only drawback is that it return only the ensembl classes of >> consequences , but you can extend that later if you need more specific >> consequences (I have done that in the past for different projects). >> >> This ensembl aproach will save you a lot of problems with the mapping from >> gene to protein and with multiple snps in a codon. >> >> If you have experience with ensembl then is easy to follow the code. If >> not you can always ask for help in the ensembl-dev mailing list ( >> ensembl-dev@...) >> >> >> If you want to read the code without checking out the whole api: >> >> >> >> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup >> >> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log >> >> >> hope this helps >> >> >> - Pablo >> >> >> >> >> > ===================================================================== > Pablo Marin-Garcia, PhD > > \\// (Argiope bruennichi > \/\/`(||>O:'\/\/ with stabilimentum) > //\\ > > Sanger Institute | PostDoc / Computer Biologist > Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) > Hinxton, Cambridge CB10 1HH | room : N333 > United Kingdom | email: pablo.marin@... > ==================================================================== > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |