issue with reference name

View: New views
4 Messages — Rating Filter:   Alert me  

issue with reference name

by Prachi Shah :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have a dataset that I am trying to setup GBrowse for but the problem
is that the contig name have periods and braces, eg.
"SC020.(contig_18.1)".

I think its the braces that is causing trouble because they are not
allowed in col 1 according to GFF3 specs (http://gmod.org/wiki/GFF3):
Column 1: "seqid"
    The ID of the landmark used to establish the coordinate system for
the current feature. IDs may contain any characters, but must escape
any characters not in the set [a-zA-Z0-9.:^*$@!+_?-|]. In particular,
IDs may not contain unescaped whitespace and must not begin with an
unescaped ">".

I escaped the braces with URL encoding, eg. "SC020.%28contig_18.1%29".
Do the braces in the ID and the Name attributes also need to be URL
encoded? And how about the sequence header in the FASTA section?

Thanks,
Prachi


Here's some sample GFF lines:

SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
 contig  1       1824958 .       .       .
ID=SC020.(contig_18.1);Name=SC020.(contig_18.1)
SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
 mRNA    978712  981210  .       -       .
ID=7000000516956357;Parent=7000000516956351
SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
 gene    978712  981210  .       -       .
ID=7000000516956351;Name=Unknown
SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
 CDS     978712  981210  .       -       0
ID=AO090020000391;Parent=7000000516956357
SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
 exon    978712  981210  .       -       .
ID=7000000516956361;Parent=7000000516956357
##FASTA
>SC020.(contig_18.1)
aattttttaatttattaaattagatattttaaatatatttttataatatttaaatattat
aaactattataatctattattattataataataatattatttttaatatagtatttttat
atttgaattatttttttaattataaataattttcttttatattaaataattttcttttat
............

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: issue with reference name

by Maureen J. Donlin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Have you considered renaming the contigs with a simpler nomenclature?
A perl script could do that fairly quickly.

Maureen

On Wed, Nov 4, 2009 at 2:06 PM, Prachi Shah <prachi@...> wrote:

> Hi,
>
> I have a dataset that I am trying to setup GBrowse for but the problem
> is that the contig name have periods and braces, eg.
> "SC020.(contig_18.1)".
>
> I think its the braces that is causing trouble because they are not
> allowed in col 1 according to GFF3 specs (http://gmod.org/wiki/GFF3):
> Column 1: "seqid"
>    The ID of the landmark used to establish the coordinate system for
> the current feature. IDs may contain any characters, but must escape
> any characters not in the set [a-zA-Z0-9.:^*$@!+_?-|]. In particular,
> IDs may not contain unescaped whitespace and must not begin with an
> unescaped ">".
>
> I escaped the braces with URL encoding, eg. "SC020.%28contig_18.1%29".
> Do the braces in the ID and the Name attributes also need to be URL
> encoded? And how about the sequence header in the FASTA section?
>
> Thanks,
> Prachi
>
>
> Here's some sample GFF lines:
>
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>  contig  1       1824958 .       .       .
> ID=SC020.(contig_18.1);Name=SC020.(contig_18.1)
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>  mRNA    978712  981210  .       -       .
> ID=7000000516956357;Parent=7000000516956351
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>  gene    978712  981210  .       -       .
> ID=7000000516956351;Name=Unknown
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>  CDS     978712  981210  .       -       0
> ID=AO090020000391;Parent=7000000516956357
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>  exon    978712  981210  .       -       .
> ID=7000000516956361;Parent=7000000516956357
> ##FASTA
>>SC020.(contig_18.1)
> aattttttaatttattaaattagatattttaaatatatttttataatatttaaatattat
> aaactattataatctattattattataataataatattatttttaatatagtatttttat
> atttgaattatttttttaattataaataattttcttttatattaaataattttcttttat
> ............
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@...
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



--
Maureen J. Donlin, Ph.D.

Research Associate Professor
Dept. of Biochemistry & Molecular Biology
Dept. of Molecular Microbiology & Immunology
Saint Louis University School of Medicine
507 Doisy Research Center
314-977-8858

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: issue with reference name

by Prachi Shah :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Maureen,

Yes, I have considered that. But, I am at the receiving end for this
data and would like to not tweak it as far as I can.

Thanks,
Prachi

On Wed, Nov 4, 2009 at 12:17 PM, Maureen Donlin <donlinmj@...> wrote:

> Have you considered renaming the contigs with a simpler nomenclature?
> A perl script could do that fairly quickly.
>
> Maureen
>
> On Wed, Nov 4, 2009 at 2:06 PM, Prachi Shah <prachi@...> wrote:
>> Hi,
>>
>> I have a dataset that I am trying to setup GBrowse for but the problem
>> is that the contig name have periods and braces, eg.
>> "SC020.(contig_18.1)".
>>
>> I think its the braces that is causing trouble because they are not
>> allowed in col 1 according to GFF3 specs (http://gmod.org/wiki/GFF3):
>> Column 1: "seqid"
>>    The ID of the landmark used to establish the coordinate system for
>> the current feature. IDs may contain any characters, but must escape
>> any characters not in the set [a-zA-Z0-9.:^*$@!+_?-|]. In particular,
>> IDs may not contain unescaped whitespace and must not begin with an
>> unescaped ">".
>>
>> I escaped the braces with URL encoding, eg. "SC020.%28contig_18.1%29".
>> Do the braces in the ID and the Name attributes also need to be URL
>> encoded? And how about the sequence header in the FASTA section?
>>
>> Thanks,
>> Prachi
>>
>>
>> Here's some sample GFF lines:
>>
>> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>>  contig  1       1824958 .       .       .
>> ID=SC020.(contig_18.1);Name=SC020.(contig_18.1)
>> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>>  mRNA    978712  981210  .       -       .
>> ID=7000000516956357;Parent=7000000516956351
>> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>>  gene    978712  981210  .       -       .
>> ID=7000000516956351;Name=Unknown
>> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>>  CDS     978712  981210  .       -       0
>> ID=AO090020000391;Parent=7000000516956357
>> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
>>  exon    978712  981210  .       -       .
>> ID=7000000516956361;Parent=7000000516956357
>> ##FASTA
>>>SC020.(contig_18.1)
>> aattttttaatttattaaattagatattttaaatatatttttataatatttaaatattat
>> aaactattataatctattattattataataataatattatttttaatatagtatttttat
>> atttgaattatttttttaattataaataattttcttttatattaaataattttcttttat
>> ............
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse@...
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>
>
>
> --
> Maureen J. Donlin, Ph.D.
>
> Research Associate Professor
> Dept. of Biochemistry & Molecular Biology
> Dept. of Molecular Microbiology & Immunology
> Saint Louis University School of Medicine
> 507 Doisy Research Center
> 314-977-8858
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: issue with reference name

by Scott Cain-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Prachi,

I can't say for sure at the moment, but I bet the parenthesis are  
going to cause a problem.  First try URI escaping in column 9, but  
failing that, you'll probably need to remove the parens.  You might  
also want to ask the organization creating the data not to use special  
characters.

Scott


On Nov 4, 2009, at 3:06 PM, Prachi Shah wrote:

> Hi,
>
> I have a dataset that I am trying to setup GBrowse for but the problem
> is that the contig name have periods and braces, eg.
> "SC020.(contig_18.1)".
>
> I think its the braces that is causing trouble because they are not
> allowed in col 1 according to GFF3 specs (http://gmod.org/wiki/GFF3):
> Column 1: "seqid"
>    The ID of the landmark used to establish the coordinate system for
> the current feature. IDs may contain any characters, but must escape
> any characters not in the set [a-zA-Z0-9.:^*$@!+_?-|]. In particular,
> IDs may not contain unescaped whitespace and must not begin with an
> unescaped ">".
>
> I escaped the braces with URL encoding, eg. "SC020.%28contig_18.1%29".
> Do the braces in the ID and the Name attributes also need to be URL
> encoded? And how about the sequence header in the FASTA section?
>
> Thanks,
> Prachi
>
>
> Here's some sample GFF lines:
>
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
> contig  1       1824958 .       .       .
> ID=SC020.(contig_18.1);Name=SC020.(contig_18.1)
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
> mRNA    978712  981210  .       -       .
> ID=7000000516956357;Parent=7000000516956351
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
> gene    978712  981210  .       -       .
> ID=7000000516956351;Name=Unknown
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
> CDS     978712  981210  .       -       0
> ID=AO090020000391;Parent=7000000516956357
> SC020.%28contig_18.1%29 A_oryzae_RIB40_INSERTASSEMBLYFROMGENBANK_1
> exon    978712  981210  .       -       .
> ID=7000000516956361;Parent=7000000516956357
> ##FASTA
>> SC020.(contig_18.1)
> aattttttaatttattaaattagatattttaaatatatttttataatatttaaatattat
> aaactattataatctattattattataataataatattatttttaatatagtatttttat
> atttgaattatttttttaattataaataattttcttttatattaaataattttcttttat
> ............
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@...
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research





------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse