how to get the protein sequences from DNA sequences around novel SNPs?

View: New views
4 Messages — Rating Filter:   Alert me  

how to get the protein sequences from DNA sequences around novel SNPs?

by Guangchun Song :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'm new bioperl user.  I' working on a project: To determine the
status of all tutative SNPs such as non-synonymous vs. synonymous, and
predict the tranlational effect of non-synonymous mutations as benign
or malicious.  I'm trying to use bioperl to get the DNA sequence and
translate to protein sequence for the SNPs that are in gene's coding
region.  Could someone tell me how to do it?

Thanks,

-Guangchun Song
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: how to get the protein sequences from DNA sequences around novel SNPs?

by RobertBradbury :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song <gc11song@...> wrote:
>
> I'm new bioperl user.  I' working on a project: To determine the
> status of all tutative SNPs such as non-synonymous vs. synonymous, and
> predict the tranlational effect of non-synonymous mutations as benign
> or malicious.  I'm trying to use bioperl to get the DNA sequence and
> translate to protein sequence for the SNPs that are in gene's coding
> region.  Could someone tell me how to do it?
>
>
I too would like to know if this information is available.  I've recently
been working with the dbSNP results from NCBI but they display the results
in a graphical format rather than data that one can play with and ask
questions of like "What is the most disease causing gene in the Human
Genome?" or "What are the critical proteins damaged by gene defects in the
Human Genome?" ... "In terms of premature deaths, extended health care
requirements, loss of quality of life, etc.?"

The same types of questions can be applied to the dog and cat genomes where
there is emotional value or the cow, horse, pig, etc. genomes where there is
economic value?

The value of BioPerl would increase significantly if there were
functionality that would allow easy access to "these mutations may have
negative/positive impact" (which means you need a function that qualifies
mutations by degree) and allow for impact to be subjectively determined
(implying there must be some callback function to provide a user
quality/impact rating).

For example:
   $/@differences =  protein_compare($mygene, $refseq_gene, @critical_aa,
@critical_domain, $callback)
Where $callback could "rate" differences about the protein and position and
the "type of interest" (e.g. metal binding amino acids, structural changing
amino acids, critical catalysis amino acids, etc.).

A default callback would be based on some evolving definition of "critical"
changes which result in human disease for example.

This is a "required" capability to be able to determine things like the
"adaptability" of a species -- those with fewest critical mutation points
may have better adaptability to mutation increasing circumstances.

Please pardon any errors in perl syntax/usage its been a while since I've
written perl and I'd really rather be coding in C.

Robert
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: how to get the protein sequences from DNA sequencesaround novel SNPs?

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I agree that BioPerl would significantly increase in value with
such a module; in fact, the BioTeam would probably buy us out.
My opinion is that the entire GWAS enterprise is the search for
such a callback function, for humans anyway. For those engaged
in this quest, if BioPerl doesn't provide a Maserati, it at least provides
good italian-made (among others) parts.
MAJ
----- Original Message -----
From: "Robert Bradbury" <robert.bradbury@...>
To: "Guangchun Song" <gc11song@...>
Cc: <bioperl-l@...>
Sent: Monday, November 09, 2009 4:15 PM
Subject: Re: [Bioperl-l] how to get the protein sequences from DNA
sequencesaround novel SNPs?


> On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song <gc11song@...> wrote:
>>
>> I'm new bioperl user.  I' working on a project: To determine the
>> status of all tutative SNPs such as non-synonymous vs. synonymous, and
>> predict the tranlational effect of non-synonymous mutations as benign
>> or malicious.  I'm trying to use bioperl to get the DNA sequence and
>> translate to protein sequence for the SNPs that are in gene's coding
>> region.  Could someone tell me how to do it?
>>
>>
> I too would like to know if this information is available.  I've recently
> been working with the dbSNP results from NCBI but they display the results
> in a graphical format rather than data that one can play with and ask
> questions of like "What is the most disease causing gene in the Human
> Genome?" or "What are the critical proteins damaged by gene defects in the
> Human Genome?" ... "In terms of premature deaths, extended health care
> requirements, loss of quality of life, etc.?"
>
> The same types of questions can be applied to the dog and cat genomes where
> there is emotional value or the cow, horse, pig, etc. genomes where there is
> economic value?
>
> The value of BioPerl would increase significantly if there were
> functionality that would allow easy access to "these mutations may have
> negative/positive impact" (which means you need a function that qualifies
> mutations by degree) and allow for impact to be subjectively determined
> (implying there must be some callback function to provide a user
> quality/impact rating).
>
> For example:
>   $/@differences =  protein_compare($mygene, $refseq_gene, @critical_aa,
> @critical_domain, $callback)
> Where $callback could "rate" differences about the protein and position and
> the "type of interest" (e.g. metal binding amino acids, structural changing
> amino acids, critical catalysis amino acids, etc.).
>
> A default callback would be based on some evolving definition of "critical"
> changes which result in human disease for example.
>
> This is a "required" capability to be able to determine things like the
> "adaptability" of a species -- those with fewest critical mutation points
> may have better adaptability to mutation increasing circumstances.
>
> Please pardon any errors in perl syntax/usage its been a while since I've
> written perl and I'd really rather be coding in C.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: how to get the protein sequences from DNA sequences around novel SNPs?

by Chris Fields-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote:

> On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song <gc11song@...>  
> wrote:
>>
>> I'm new bioperl user.  I' working on a project: To determine the
>> status of all tutative SNPs such as non-synonymous vs. synonymous,  
>> and
>> predict the tranlational effect of non-synonymous mutations as benign
>> or malicious.  I'm trying to use bioperl to get the DNA sequence and
>> translate to protein sequence for the SNPs that are in gene's coding
>> region.  Could someone tell me how to do it?
>>
>>
> I too would like to know if this information is available.  I've  
> recently
> been working with the dbSNP results from NCBI but they display the  
> results
> in a graphical format rather than data that one can play with and ask
> questions of like "What is the most disease causing gene in the Human
> Genome?" or "What are the critical proteins damaged by gene defects  
> in the
> Human Genome?" ... "In terms of premature deaths, extended health care
> requirements, loss of quality of life, etc.?"
>
> The same types of questions can be applied to the dog and cat  
> genomes where
> there is emotional value or the cow, horse, pig, etc. genomes where  
> there is
> economic value?
>
> The value of BioPerl would increase significantly if there were
> functionality that would allow easy access to "these mutations may  
> have
> negative/positive impact" (which means you need a function that  
> qualifies
> mutations by degree) and allow for impact to be subjectively  
> determined
> (implying there must be some callback function to provide a user
> quality/impact rating).
>
> For example:
>   $/@differences =  protein_compare($mygene, $refseq_gene,  
> @critical_aa,
> @critical_domain, $callback)
> Where $callback could "rate" differences about the protein and  
> position and
> the "type of interest" (e.g. metal binding amino acids, structural  
> changing
> amino acids, critical catalysis amino acids, etc.).
>
> A default callback would be based on some evolving definition of  
> "critical"
> changes which result in human disease for example.
>
> This is a "required" capability to be able to determine things like  
> the
> "adaptability" of a species -- those with fewest critical mutation  
> points
> may have better adaptability to mutation increasing circumstances.
>
> Please pardon any errors in perl syntax/usage its been a while since  
> I've
> written perl and I'd really rather be coding in C.
>
> Robert

I will say that most of the information from the SNP database is  
available in various formats (see following link under 'Retrieval  
Types'):

http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html

You can access this information, as well as the full XML, using  
something like the following script.

chris

------------------------------------------------

#!/usr/bin/perl -w

use 5.010;
use strict;
use warnings;
use Bio::DB::EUtilities;

my $term = shift;
my $eutil  = Bio::DB::EUtilities->new(-eutil    => 'esearch',
                                       -db       => 'snp',
                                       -term     => $term,
                                       -usehistory => 'y',
                                       -retmax   => 100);

my $hist = $eutil->next_History || die "No history returned";

# for SNP XML, change retmode to 'xml'
$eutil->set_parameters(-eutil   => 'efetch',
                        -history => $hist,
                        -retmode => 'text',
                        -rettype => 'flt');

# dumps to STDOUT
say $eutil->get_Response->content;


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l