|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Bio::SeqFeature::Generic unexpected behaviorHi,
I am building Bio::SeqFeature::Gene::Transcript objects to facilitate easy parsing of cds and intron sequences. I ran across an unexpected problem when creating two or more Transcript objects and attaching independent Bio::Seq objects to them. In the following code, 2 transcript objects are constructed independently (not cloned). However, extracting the cds for one transcript gives an errant (and unexpected result). Curiously, extracting the introns from the same transcript objects gives the correct result. I have highlighted the unexpected outcome below. Finally, I noticed that reversing the order of the attach_seq commands determines which cds is printed. The following example illustrates the problem. Can someone explain this? Jonathan #! usr/bin/env perl use strict; use warnings; use Bio::FeatureIO::gff; use Bio::SeqFeature::Annotated; use Bio::SeqFeature::Gene::Transcript; use Bio::SeqFeature::Gene::Exon; #use Bio::SeqFeature::Gene::UTR; #use Bio::SeqFeature::Gene::Intron; #Build the objects... my $seq1 = Bio::Seq->new(-seq => 'AAAAA'); my $seq2 = Bio::Seq->new(-seq => 'TTTTT'); my $tscript1 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 5, -strand => +1); my $tscript2 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 5, -strand => +1); my $exon1 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2, -strand => +1); my $exon2 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5, -strand => +1); $tscript1->add_exon($exon1); $tscript1->add_exon($exon2); $tscript2->add_exon($exon1); $tscript2->add_exon($exon2); $tscript1->attach_seq($seq1); $tscript2->attach_seq($seq2); #now extract the desired features and print print $tscript1->cds->seq,"\n"; #prints 'TTTT' <- unexpected !!! print $tscript2->cds->seq,"\n"; #prints 'TTTT' <- as expected foreach($tscript1->introns){ print $_->seq->seq,"\n"; #prints 'A' <- as expected } foreach($tscript2->introns){ print $_->seq->seq,"\n"; #prints 'T' <- as expected } _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Bio::SeqFeature::Generic unexpected behaviorOn Oct 7, 2009, at 5:13 PM, Jonathan Flowers wrote:
> Hi, > > I am building Bio::SeqFeature::Gene::Transcript objects to > facilitate easy > parsing of cds and intron sequences. I ran across an unexpected > problem > when creating two or more Transcript objects and attaching independent > Bio::Seq objects to them. In the following code, 2 transcript > objects are > constructed independently (not cloned). However, extracting the cds > for one > transcript gives an errant (and unexpected result). Curiously, > extracting > the introns from the same transcript objects gives the correct > result. I > have highlighted the unexpected outcome below. Finally, I noticed > that > reversing the order of the attach_seq commands determines which cds is > printed. So, is this a problem with Bio::SeqFeature::Generic, or Bio::SeqFeature::Gene? Just a bit confused by the subject line. > The following example illustrates the problem. > > Can someone explain this? > > Jonathan > > #! usr/bin/env perl > use strict; > use warnings; > use Bio::FeatureIO::gff; > use Bio::SeqFeature::Annotated; > use Bio::SeqFeature::Gene::Transcript; > use Bio::SeqFeature::Gene::Exon; > #use Bio::SeqFeature::Gene::UTR; > #use Bio::SeqFeature::Gene::Intron; > > #Build the objects... > my $seq1 = Bio::Seq->new(-seq => 'AAAAA'); > my $seq2 = Bio::Seq->new(-seq => 'TTTTT'); > > my $tscript1 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, - > end => > 5, -strand => +1); > my $tscript2 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, - > end => > 5, -strand => +1); > > my $exon1 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2, > -strand => +1); > my $exon2 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5, > -strand => +1); > > $tscript1->add_exon($exon1); > $tscript1->add_exon($exon2); > > $tscript2->add_exon($exon1); > $tscript2->add_exon($exon2); > > $tscript1->attach_seq($seq1); > $tscript2->attach_seq($seq2); So, my guess is Bio::SF::Gene::Transcript is attaching the sequences to the contained exons (you are using the same two exon instances for both transcripts). I don't think this is a bug; you should create a different set of exons for the second transcript, not use the same exon instances for both. This behavior, to me, makes sense using the following logic: if two (or more) transcripts share the same exons (such as in alternatively spliced transcripts), underlying sequence changes should affect both transcripts and their shared exons. Otherwise they shouldn't share exons (i.e. require different instances). If you change the above to the following: my $exon3 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2, -strand => +1); my $exon4 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5, -strand => +1); $tscript2->add_exon($exon3); $tscript2->add_exon($exon4); it works; the first statement prints 'AAAA' and the second 'TTTT'. There is a subtle bug pointed with your example, though. The the two transcripts have two different sequences but the exons for both now point to the same sequence (your first transcript prints 'A', but everything else has T's). I don't think there is an easy workaround for that, unless there is a way to call the parent transcript feature from the exon features. > #now extract the desired features and print > print $tscript1->cds->seq,"\n"; #prints 'TTTT' <- unexpected !!! > print $tscript2->cds->seq,"\n"; #prints 'TTTT' <- as expected > > foreach($tscript1->introns){ > print $_->seq->seq,"\n"; #prints 'A' <- as expected > } > > foreach($tscript2->introns){ > print $_->seq->seq,"\n"; #prints 'T' <- as expected > } Hope that explanation helps! chris _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |