|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
How do I read a FASTA file containing protein sequences in lowercase?I'm using RichSequenceIterator to read FASTA files containing
proteins. Somehow it doesn't work when the protein sequences are in lowercase, which they sometimes are when downloaded from e.g. Uniprot. My code fails to recognize the following file as containing a protein sequence: >OPSD_FELCA mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn cmlttlccgknplgddeasttgsktetsqvapa What am I missing? Here's the code I'm using to read in sequences: private List<ISequence> sequencesFromInputStream(InputStream stream) { BufferedInputStream bufferedStream = new BufferedInputStream(stream); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqit = null; try { seqit = RichSequence.IOTools.readStream(bufferedStream, ns); } catch (IOException e) { logger.error("Couldn't read sequences from file", e); return Collections.emptyList(); } List<ISequence> sequences = new ArrayList<ISequence>(); try { while ( seqit.hasNext() ) { RichSequence rseq; rseq = seqit.nextRichSequence(); // *error occurs here* if (rseq == null) continue; String alphabet = rseq.getAlphabet().getName(); sequences.add( "DNA".equals(alphabet) ? new BiojavaDNA(rseq) : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) : new BiojavaProtein(rseq) ); } } catch (NoSuchElementException e) { logger.error("Read past last sequence", e); } catch (BioException e) { logger.error(e); // *ends up here* } return sequences; } Grateful for any pointers you might have. Regards, // Carl Mäsak _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
|
|
Re: How do I read a FASTA file containing protein sequences in lowercase?Could you post the output from the exception stack that it generates?
thanks, Richard On 6 Nov 2009, at 16:25, Carl Mäsak wrote: > I'm using RichSequenceIterator to read FASTA files containing > proteins. Somehow it doesn't work when the protein sequences are in > lowercase, which they sometimes are when downloaded from e.g. Uniprot. > My code fails to recognize the following file as containing a protein > sequence: > >> OPSD_FELCA > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln > lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv > aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq > qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn > cmlttlccgknplgddeasttgsktetsqvapa > > What am I missing? Here's the code I'm using to read in sequences: > > private List<ISequence> sequencesFromInputStream(InputStream > stream) { > > BufferedInputStream bufferedStream = new BufferedInputStream > (stream); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqit = null; > > try { > seqit = RichSequence.IOTools.readStream(bufferedStream, > ns); > } catch (IOException e) { > logger.error("Couldn't read sequences from file", e); > return Collections.emptyList(); > } > > List<ISequence> sequences = new ArrayList<ISequence>(); > try { > while ( seqit.hasNext() ) { > RichSequence rseq; > rseq = seqit.nextRichSequence(); // *error occurs > here* > if (rseq == null) > continue; > String alphabet = rseq.getAlphabet().getName(); > sequences.add( > "DNA".equals(alphabet) ? new BiojavaDNA(rseq) > : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) > : new BiojavaProtein > (rseq) ); > } > } catch (NoSuchElementException e) { > logger.error("Read past last sequence", e); > } catch (BioException e) { > logger.error(e); // *ends up here* > } > > return sequences; > } > > Grateful for any pointers you might have. > > Regards, > // Carl Mäsak > > _______________________________________________ > Biojava-l mailing list - Biojava-l@... > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland@... http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
|
|
Re: How do I read a FASTA file containing protein sequences in lowercase?Richard (>), Carl (>>):
>> I'm using RichSequenceIterator to read FASTA files containing >> proteins. Somehow it doesn't work when the protein sequences are in >> lowercase, which they sometimes are when downloaded from e.g. Uniprot. >> My code fails to recognize the following file as containing a protein >> sequence: >> >>> OPSD_FELCA >> >> >> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >> >> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >> >> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >> >> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >> cmlttlccgknplgddeasttgsktetsqvapa >> >> What am I missing? Here's the code I'm using to read in sequences: >> >> private List<ISequence> sequencesFromInputStream(InputStream stream) { >> >> BufferedInputStream bufferedStream = new >> BufferedInputStream(stream); >> Namespace ns = RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator seqit = null; >> >> try { >> seqit = RichSequence.IOTools.readStream(bufferedStream, ns); >> } catch (IOException e) { >> logger.error("Couldn't read sequences from file", e); >> return Collections.emptyList(); >> } >> >> List<ISequence> sequences = new ArrayList<ISequence>(); >> try { >> while ( seqit.hasNext() ) { >> RichSequence rseq; >> rseq = seqit.nextRichSequence(); // *error occurs here* >> if (rseq == null) >> continue; >> String alphabet = rseq.getAlphabet().getName(); >> sequences.add( >> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >> : new BiojavaProtein(rseq) ); >> } >> } catch (NoSuchElementException e) { >> logger.error("Read past last sequence", e); >> } catch (BioException e) { >> logger.error(e); // *ends up here* >> } >> >> return sequences; >> } >> >> Grateful for any pointers you might have. > > Could you post the output from the exception stack that it generates? org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream(BiojavaManager.java:314) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile(BiojavaManager.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke(AbstractManagerMethodDispatcher.java:243) at net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread(JavaManagerMethodDispatcher.java:248) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke(AbstractManagerMethodDispatcher.java:130) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at net.bioclipse.recording.WrapInProxyAdvice.invoke(WrapInProxyAdvice.java:22) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:59) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:67) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:34) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131) at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy18.invoke(Unknown Source) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:50) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy20.sequencesFromFile(Unknown Source) at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:152) at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:238) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:212) at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages(SequenceEditor.java:47) at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:357) at org.eclipse.ui.internal.EditorReference.createPartHelper(EditorReference.java:662) at org.eclipse.ui.internal.EditorReference.createPart(EditorReference.java:462) at org.eclipse.ui.internal.WorkbenchPartReference.getPart(WorkbenchPartReference.java:595) at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) at org.eclipse.ui.internal.presentations.PresentablePart.setVisible(PresentablePart.java:180) at org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select(PresentablePartFolder.java:270) at org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65) at org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473) at org.eclipse.ui.internal.PartStack.refreshPresentationSelection(PartStack.java:1256) at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209) at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) at org.eclipse.ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java:63) at org.eclipse.ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:225) at org.eclipse.ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:213) at org.eclipse.ui.internal.EditorManager.createEditorTab(EditorManager.java:778) at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor(EditorManager.java:677) at org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java:638) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java:2854) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2762) at org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java:2754) at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2705) at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2701) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2685) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2676) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) at org.eclipse.ui.actions.OpenFileAction.openFile(OpenFileAction.java:99) at org.eclipse.ui.actions.OpenSystemEditorAction.run(OpenSystemEditorAction.java:99) at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) at org.eclipse.ui.navigator.CommonNavigatorManager$3.open(CommonNavigatorManager.java:202) at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open(OpenAndLinkWithEditorHelper.java:48) at org.eclipse.jface.viewers.StructuredViewer$2.run(StructuredViewer.java:842) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.core.runtime.Platform.run(Platform.java:888) at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) at org.eclipse.jface.viewers.StructuredViewer.fireOpen(StructuredViewer.java:840) at org.eclipse.jface.viewers.StructuredViewer.handleOpen(StructuredViewer.java:1101) at org.eclipse.ui.navigator.CommonViewer.handleOpen(CommonViewer.java:467) at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen(StructuredViewer.java:1205) at org.eclipse.jface.util.OpenStrategy.fireOpenEvent(OpenStrategy.java:264) at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:258) at org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:298) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3441) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2405) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at net.bioclipse.ui.Application.start(Application.java:36) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) at org.eclipse.equinox.launcher.Main.run(Main.java:1311) at org.eclipse.equinox.launcher.Main.main(Main.java:1287) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l@... or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.FastaFormat Accession=OPSD_FELCA Id=null Comments=problem parsing symbols Parse_block=mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa Stack trace follows .... at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:244) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 114 more Caused by: org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't contain character: 'e' at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar(CharacterTokenization.java:175) at org.biojava.bio.seq.io.CharacterTokenization$TPStreamParser.characters(CharacterTokenization.java:246) at org.biojava.bio.symbol.SimpleSymbolList.<init>(SimpleSymbolList.java:178) at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:237) ... 115 more // Carl _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
|
|
Re: How do I read a FASTA file containing protein sequences in lowercase?Ah OK I see what's going on.
The convenience method you're using, RichSequence.IOTools.readStream (), uses FastaFormat to try and guess the alphabet to use based on the first line of the input sequence. In FastaFormat, it does this by searching for matching non-DNA symbols. The search is case-sensitive: protected static final Pattern aminoAcids = Pattern.compile(".* [FLIPQE].*"); FastaFormat needs patching to make this pattern non-case-sensitive. Still, if the sequence is such that any of the above symbols don't appear until the second or subsequent lines, the guessing will not work and it'll assume it's DNA, and give you the same error as before. In the circumstances where you know what alphabet the sequence is in advance, it's best to avoid the guessing algorithms and instead use the methods such as readFastaDNA that explicity specify the alphabet you want to read. However, there's still one thing that you definitely can't do and that's parse different types of sequence from the same input without inserting some kind of additional code to detect what alphabet each individual sequence is using before parsing it using the appropriate BioJava parser. Your code appears to expecting mixed input, but this won't work unless they all happen to be the same alphabet. cheers, Richard On 6 Nov 2009, at 16:54, Carl Mäsak wrote: > Richard (>), Carl (>>): >>> I'm using RichSequenceIterator to read FASTA files containing >>> proteins. Somehow it doesn't work when the protein sequences are in >>> lowercase, which they sometimes are when downloaded from e.g. >>> Uniprot. >>> My code fails to recognize the following file as containing a >>> protein >>> sequence: >>> >>>> OPSD_FELCA >>> >>> >>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >>> >>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >>> >>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >>> >>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >>> cmlttlccgknplgddeasttgsktetsqvapa >>> >>> What am I missing? Here's the code I'm using to read in sequences: >>> >>> private List<ISequence> sequencesFromInputStream(InputStream >>> stream) { >>> >>> BufferedInputStream bufferedStream = new >>> BufferedInputStream(stream); >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> RichSequenceIterator seqit = null; >>> >>> try { >>> seqit = RichSequence.IOTools.readStream(bufferedStream, >>> ns); >>> } catch (IOException e) { >>> logger.error("Couldn't read sequences from file", e); >>> return Collections.emptyList(); >>> } >>> >>> List<ISequence> sequences = new ArrayList<ISequence>(); >>> try { >>> while ( seqit.hasNext() ) { >>> RichSequence rseq; >>> rseq = seqit.nextRichSequence(); // *error occurs >>> here* >>> if (rseq == null) >>> continue; >>> String alphabet = rseq.getAlphabet().getName(); >>> sequences.add( >>> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >>> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >>> : new BiojavaProtein >>> (rseq) ); >>> } >>> } catch (NoSuchElementException e) { >>> logger.error("Read past last sequence", e); >>> } catch (BioException e) { >>> logger.error(e); // *ends up here* >>> } >>> >>> return sequences; >>> } >>> >>> Grateful for any pointers you might have. >> >> Could you post the output from the exception stack that it generates? > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:113) > at > net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream > (BiojavaManager.java:314) > at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile > (BiojavaManager.java:291) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke > (AbstractManagerMethodDispatcher.java:243) > at > net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread > (JavaManagerMethodDispatcher.java:248) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke > (AbstractManagerMethodDispatcher.java:130) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at net.bioclipse.recording.WrapInProxyAdvice.invoke > (WrapInProxyAdvice.java:22) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke > (ServiceInvoker.java:59) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke > (ServiceInvoker.java:67) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke > (ServiceTCCLInterceptor.java:34) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke > (LocalBundleContextAdvice.java:59) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed > (DelegatingIntroductionInterceptor.java:131) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke > (DelegatingIntroductionInterceptor.java:119) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy18.invoke(Unknown Source) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke > (AfterReturningAdviceInterceptor.java:50) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy20.sequencesFromFile(Unknown Source) > at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java: > 152) > at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:238) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:212) > at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages > (SequenceEditor.java:47) > at org.eclipse.ui.part.MultiPageEditorPart.createPartControl > (MultiPageEditorPart.java:357) > at org.eclipse.ui.internal.EditorReference.createPartHelper > (EditorReference.java:662) > at org.eclipse.ui.internal.EditorReference.createPart > (EditorReference.java:462) > at org.eclipse.ui.internal.WorkbenchPartReference.getPart > (WorkbenchPartReference.java:595) > at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) > at org.eclipse.ui.internal.presentations.PresentablePart.setVisible > (PresentablePart.java:180) > at > org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select > (PresentablePartFolder.java:270) > at > org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select > (LeftToRightTabOrder.java:65) > at > org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart > (TabbedStackPresentation.java:473) > at org.eclipse.ui.internal.PartStack.refreshPresentationSelection > (PartStack.java:1256) > at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java: > 1209) > at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) > at org.eclipse.ui.internal.EditorSashContainer.addEditor > (EditorSashContainer.java:63) > at org.eclipse.ui.internal.EditorAreaHelper.addToLayout > (EditorAreaHelper.java:225) > at org.eclipse.ui.internal.EditorAreaHelper.addEditor > (EditorAreaHelper.java:213) > at org.eclipse.ui.internal.EditorManager.createEditorTab > (EditorManager.java:778) > at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor > (EditorManager.java:677) > at org.eclipse.ui.internal.EditorManager.openEditor > (EditorManager.java:638) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched > (WorkbenchPage.java:2854) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor > (WorkbenchPage.java:2762) > at org.eclipse.ui.internal.WorkbenchPage.access$11 > (WorkbenchPage.java:2754) > at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java: > 2705) > at org.eclipse.swt.custom.BusyIndicator.showWhile > (BusyIndicator.java:70) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2701) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2685) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2676) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) > at org.eclipse.ui.actions.OpenFileAction.openFile > (OpenFileAction.java:99) > at org.eclipse.ui.actions.OpenSystemEditorAction.run > (OpenSystemEditorAction.java:99) > at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) > at org.eclipse.ui.navigator.CommonNavigatorManager$3.open > (CommonNavigatorManager.java:202) > at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open > (OpenAndLinkWithEditorHelper.java:48) > at org.eclipse.jface.viewers.StructuredViewer$2.run > (StructuredViewer.java:842) > at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) > at org.eclipse.core.runtime.Platform.run(Platform.java:888) > at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) > at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) > at org.eclipse.jface.viewers.StructuredViewer.fireOpen > (StructuredViewer.java:840) > at org.eclipse.jface.viewers.StructuredViewer.handleOpen > (StructuredViewer.java:1101) > at org.eclipse.ui.navigator.CommonViewer.handleOpen > (CommonViewer.java:467) > at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen > (StructuredViewer.java:1205) > at org.eclipse.jface.util.OpenStrategy.fireOpenEvent > (OpenStrategy.java:264) > at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java: > 258) > at org.eclipse.jface.util.OpenStrategy$1.handleEvent > (OpenStrategy.java:298) > at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) > at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) > at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) > at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java: > 3441) > at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) > at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java: > 2405) > at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) > at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) > at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) > at org.eclipse.core.databinding.observable.Realm.runWithDefault > (Realm.java:332) > at org.eclipse.ui.internal.Workbench.createAndRunWorkbench > (Workbench.java:493) > at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java: > 149) > at net.bioclipse.ui.Application.start(Application.java:36) > at org.eclipse.equinox.internal.app.EclipseAppHandle.run > (EclipseAppHandle.java:194) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication > (EclipseAppLauncher.java:110) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start > (EclipseAppLauncher.java:79) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:368) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:179) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) > at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) > at org.eclipse.equinox.launcher.Main.run(Main.java:1311) > at org.eclipse.equinox.launcher.Main.main(Main.java:1287) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l@... or post > a bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.FastaFormat > Accession=OPSD_FELCA > Id=null > Comments=problem parsing symbols > Parse_block > = > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa > Stack trace follows .... > > > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:244) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:110) > ... 114 more > Caused by: org.biojava.bio.symbol.IllegalSymbolException: This > tokenization doesn't contain character: 'e' > at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar > (CharacterTokenization.java:175) > at org.biojava.bio.seq.io.CharacterTokenization > $TPStreamParser.characters(CharacterTokenization.java:246) > at org.biojava.bio.symbol.SimpleSymbolList.<init> > (SimpleSymbolList.java:178) > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:237) > ... 115 more > > // Carl -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland@... http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
|
|
Re: How do I read a FASTA file containing protein sequences in lowercase?Richard (>):
> Ah OK I see what's going on. > > The convenience method you're using, RichSequence.IOTools.readStream(), uses > FastaFormat to try and guess the alphabet to use based on the first line of > the input sequence. > > In FastaFormat, it does this by searching for matching non-DNA symbols. The > search is case-sensitive: > > protected static final Pattern aminoAcids = > Pattern.compile(".*[FLIPQE].*"); > > FastaFormat needs patching to make this pattern non-case-sensitive. I also took the opportunity to remove the occurrences of .* in the Pattern above. Generally, once should be using Matcher.find() when one is interested in matching a part of a string. This is more efficient than using Matcher.matches() and surrounding the desired regular expression with .*, since the latter will cause a lot of unnecessary backtracking and make the search quadratic. This effect only shows up for very long strings, but long strings can and do happen in bioinformatics. The below measurements show the quadratic behaviour of the former approach. $ for length in 100 1000 10000 100000 1000000; do (time java WithDotStar $length) 2>&1 | grep real; done real 0m0.371s real 0m0.367s real 0m0.577s real 0m2.735s real 0m25.275s $ for length in 100 1000 10000 100000 1000000; do (time java WithoutDotStar $length) 2>&1 | grep real; done real 0m0.309s real 0m0.361s real 0m0.468s real 0m1.184s real 0m9.703s Kindly, // Carl _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
|
|
Re: How do I read a FASTA file containing protein sequences in lowercase?I've applied the patch to the trunk of biojava-live. Thanks!
Richard On 9 Nov 2009, at 16:26, Carl Mäsak wrote: > Richard (>): >> Ah OK I see what's going on. >> >> The convenience method you're using, RichSequence.IOTools.readStream(), uses >> FastaFormat to try and guess the alphabet to use based on the first line of >> the input sequence. >> >> In FastaFormat, it does this by searching for matching non-DNA symbols. The >> search is case-sensitive: >> >> protected static final Pattern aminoAcids = >> Pattern.compile(".*[FLIPQE].*"); >> >> FastaFormat needs patching to make this pattern non-case-sensitive. > > Patch attached. > > I also took the opportunity to remove the occurrences of .* in the > Pattern above. Generally, once should be using Matcher.find() when one > is interested in matching a part of a string. This is more efficient > than using Matcher.matches() and surrounding the desired regular > expression with .*, since the latter will cause a lot of unnecessary > backtracking and make the search quadratic. > > This effect only shows up for very long strings, but long strings can > and do happen in bioinformatics. The below measurements show the > quadratic behaviour of the former approach. > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithDotStar $length) 2>&1 | grep real; done > real 0m0.371s > real 0m0.367s > real 0m0.577s > real 0m2.735s > real 0m25.275s > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithoutDotStar $length) 2>&1 | grep real; done > real 0m0.309s > real 0m0.361s > real 0m0.468s > real 0m1.184s > real 0m9.703s > > Kindly, > // Carl > <aminoAcids.patch><WithDotStar.java><WithoutDotStar.java> -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland@... http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@... http://lists.open-bio.org/mailman/listinfo/biojava-l |
| Free embeddable forum powered by Nabble | Forum Help |