|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Long /labels are wrapped, but can't be read Hi.
I am wondering whether this is a buglet or just a case of "Don't do that": If I set a very long /label on a feature and output the sequence in EMBL format, the qualifier value gets wrapped, but not quoted. When BioPerl reads such a file, an exception is thrown. I probably shouldn't be setting very long labels... But oughtn't BioPerl throw an exception when a too long label is set, or automatically quote the value when it is long enough to be wrapped, or know how to read a wrapped yet unquoted value? I will be happy to try and provide a patch for whichever solution is preferred. Here is an example script: #!/usr/bin/perl use strict; use warnings; use IO::String; use Bio::Seq; use Bio::SeqFeature::Generic; use Bio::SeqIO; print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n"; my $seq=Bio::Seq->new(-seq=>'ATG'); my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature', -start=>1, -end=>3); $feature->add_tag_value(label=>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink'); $seq->add_SeqFeature($feature); my $out_string=out($seq); print $out_string; my $fh=IO::String->new($out_string); my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL'); my $in_seq=$in->next_seq; print "Done\n"; sub out { my ($seq)=@_; my $string=''; my $fh=IO::String->new($string); my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL'); $out->write_seq($seq); return $string; } Which gives this output when run: BioPerl 1.0069 ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP. XX AC unknown; XX XX FH Key Location/Qualifiers FH FT misc_feature 1..3 FT /label=averylonglabelthisisindeedbutitoughttoworkanywaydont FT youthink XX SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other; atg 3 // ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't see new qualifier in: youthink from: /label=averylonglabelthisisindeedbutitoughttoworkanywaydont youthink STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294 STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392 STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24 ----------------------------------------------------------- If I change the value to include "-quotes ("simulating" that embl.pm quotes the value), BioPerl can read the EMBL string it produces fine: ----------------------------------------------------------- adsj@ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/embl.pl BioPerl 1.0069 ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP. XX AC unknown; XX XX FH Key Location/Qualifiers FH FT misc_feature 1..3 FT /label=""averylonglabelthisisindeedbutitoughttoworkanywaydo FT ntyouthink"" XX SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other; atg 3 // Done Best regards, Adam -- Adam Sjøgren adsj@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Long /labels are wrapped, but can't be readAdam,
Not sure, but this could be a case of 'both'. Labels that are quoted and aren't are currently distinguished via a global hash lookup (%FTQUAL_NO_QUOTE) due to the way the parser works; there is some logic behind this, just can't quite recall at the moment why it is this way. You could set a hash key for the label in cases where it isn't quoted, that should work. You can also test out the Bio::SeqIO::embldriver version (-format => 'embldriver'). If the above doesn't work out it's worth filing a bug for this behavior, though I'm not sure how easily it will be to fix. chris On Sep 28, 2009, at 2:51 AM, Adam Sjøgren wrote: > Hi. > > > I am wondering whether this is a buglet or just a case of "Don't do > that": > > If I set a very long /label on a feature and output the sequence in > EMBL > format, the qualifier value gets wrapped, but not quoted. > > When BioPerl reads such a file, an exception is thrown. > > I probably shouldn't be setting very long labels... But oughtn't > BioPerl > throw an exception when a too long label is set, or automatically > quote > the value when it is long enough to be wrapped, or know how to read a > wrapped yet unquoted value? > > I will be happy to try and provide a patch for whichever solution is > preferred. > > Here is an example script: > > #!/usr/bin/perl > > use strict; > use warnings; > > use IO::String; > > use Bio::Seq; > use Bio::SeqFeature::Generic; > use Bio::SeqIO; > > print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n"; > > my $seq=Bio::Seq->new(-seq=>'ATG'); > my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature', > -start=>1, -end=>3); > $feature->add_tag_value > (label > =>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink'); > $seq->add_SeqFeature($feature); > > my $out_string=out($seq); > print $out_string; > > my $fh=IO::String->new($out_string); > my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL'); > my $in_seq=$in->next_seq; > > print "Done\n"; > > sub out { > my ($seq)=@_; > > my $string=''; > my $fh=IO::String->new($string); > my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL'); > $out->write_seq($seq); > > return $string; > } > > Which gives this output when run: > > BioPerl 1.0069 > ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP. > XX > AC unknown; > XX > XX > FH Key Location/Qualifiers > FH > FT misc_feature 1..3 > FT / > label=averylonglabelthisisindeedbutitoughttoworkanywaydont > FT youthink > XX > SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other; > > atg > 3 > // > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Can't see new qualifier in: youthink > from: > /label=averylonglabelthisisindeedbutitoughttoworkanywaydont > youthink > > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294 > STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392 > STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24 > ----------------------------------------------------------- > > If I change the value to include "-quotes ("simulating" that embl.pm > quotes the value), BioPerl can read the EMBL string it produces fine: > > ----------------------------------------------------------- > adsj@ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/ > embl.pl > BioPerl 1.0069 > ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP. > XX > AC unknown; > XX > XX > FH Key Location/Qualifiers > FH > FT misc_feature 1..3 > FT / > label=""averylonglabelthisisindeedbutitoughttoworkanywaydo > FT ntyouthink"" > XX > SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other; > > atg > 3 > // > Done > > > Best regards, > > Adam > > -- > Adam Sjøgren > adsj@... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Long /labels are wrapped, but can't be readOn Tue, 29 Sep 2009 22:54:04 -0500, Chris wrote:
> Not sure, but this could be a case of 'both'. Labels that are quoted > and aren't are currently distinguished via a global hash lookup > (%FTQUAL_NO_QUOTE) due to the way the parser works; there is some > logic behind this, just can't quite recall at the moment why it is > this way. Yes, I saw that there is a number of qualifiers that aren't quoted automatically. The very easy "fix" for me would be to simply remove "label" from %FTQUAL_NO_QUOTE, but I'm not really sure what the reason for not quoting all values is, so I was hesitant to just propose that. > You could set a hash key for the label in cases where it isn't quoted, > that should work. You can also test out the Bio::SeqIO::embldriver > version (-format => 'embldriver'). Ah, embldriver reads the wrapped qualifier when it isn't quoted without problem. Nice! I hadn't noticed embldriver. I wonder which one is correct in this case? And should I switch to using embldriver to read, or does it make sense to try and concoct a patch that changes embl? Thanks for the feedback! Adam -- Adam Sjøgren adsj@... _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Long /labels are wrapped, but can't be readOn Sep 30, 2009, at 4:50 AM, Adam Sjøgren wrote:
> On Tue, 29 Sep 2009 22:54:04 -0500, Chris wrote: > >> Not sure, but this could be a case of 'both'. Labels that are quoted >> and aren't are currently distinguished via a global hash lookup >> (%FTQUAL_NO_QUOTE) due to the way the parser works; there is some >> logic behind this, just can't quite recall at the moment why it is >> this way. > > Yes, I saw that there is a number of qualifiers that aren't quoted > automatically. > > The very easy "fix" for me would be to simply remove "label" from > %FTQUAL_NO_QUOTE, but I'm not really sure what the reason for not > quoting all values is, so I was hesitant to just propose that. It's basically for more control over format IIRC. It appears to only play a role in output (via write_seq). >> You could set a hash key for the label in cases where it isn't >> quoted, >> that should work. You can also test out the Bio::SeqIO::embldriver >> version (-format => 'embldriver'). > > Ah, embldriver reads the wrapped qualifier when it isn't quoted > without > problem. Nice! I hadn't noticed embldriver. > > I wonder which one is correct in this case? > > And should I switch to using embldriver to read, or does it make sense > to try and concoct a patch that changes embl? Bio::SeqIO::embldriver is an attempt to coalesce the parsers into a generic driver/parser-handler framework; the various parsers (the drivers) would parse data into simple chunks, basically hash refs of data. These would be passed on to the handler object, which has methods designed to handle the chunks passed in. Basically it's like a souped-up XML parser, but the data is grouped together in a related, meaningful way (like an entire seqfeature, for instance). The main job of the driver is simply to parse the incoming data stream into chunks of naturally related data (think XML, but larger chunks of data, like an entire seqfeature) and pass it on to the handler object. For the moment they're still experimental, but I put them out with the release so they can be tested. The current problem with them at the moment is there is no specification on how a data chunk is defined and labeled, but I am thinking of using something like JSON for that. > Thanks for the feedback! > > Adam > > -- > Adam Sjøgren > adsj@... np. chris _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |