[SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

View: New views
7 Messages — Rating Filter:   Alert me  

[SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by James M. Ward :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It's been a few days without GMOD Email feeds... I miss reading them with coffee.

I tracked down a nagging error in the Chado/GBrowse2 data accessor, and I either (1) resolved it, or (2) found something I did wrong in loading the data into Chado.
My question is whether anyone recognizes this issue being caused by faulty data (that I need to clean sooner than later) or if this fix can be patched as-is?

The issue:
-- I loaded clone mappings with chromosomal coordinates and Target coordinates, referring to their own FASTA sequences previously loaded into Chado.
-- GBrowse reported an error in the apache error_log:
" STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/Perl/lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
-- The issue is apparently that the SQL to return segments by uniquename returned two rows instead of the expected one row.
-- The database had two entries in featureloc for this feature_id, one with chromosomal coordinates, one with clone coordinates.
-- My "fix" was to add to the SQL to ensure the name != srcname (i.e. that its coordinates weren't relative to itself.)

The joins which include "sf" below are the new additions:

    my $fetch_uniquename_query = $factory->dbh->prepare( "
       select f.name,fl.fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
       from feature f, featureloc fl, feature sf
       where f.feature_id = ? and
             f.feature_id = fl.feature_id and
             fl.srcfeature_id = sf.feature_id and
             sf.name != f.name
         ");

Lincoln, I got swamped the past two weeks, but am ready to remind you to tell me about GBrowse2 Finders.  :-)  Thank you sir!

James M. Ward
Bioinformatics and Computational Biology
Department of Neurobiology
Duke University Medical Center
james.m.ward@...
jmw86069@...
(919) 423-1107

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by Scott Cain-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi James,

My guess is that there is an inconsistency between what the GBrowse  
Chado adaptor is expecting and what is in your data.  Could you send a  
sample of it so I can play with it?  It seems to me that the fix you  
are describing shouldn't be necessary, but if you are using a  
reasonable data representation, then maybe it is.

Also, putting an unrelated reminder at the bottom of an email is a  
good way for it to get missed :-)

Scott


On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:

> It's been a few days without GMOD Email feeds... I miss reading them  
> with coffee.
>
> I tracked down a nagging error in the Chado/GBrowse2 data accessor,  
> and I either (1) resolved it, or (2) found something I did wrong in  
> loading the data into Chado.
> My question is whether anyone recognizes this issue being caused by  
> faulty data (that I need to clean sooner than later) or if this fix  
> can be patched as-is?
>
> The issue:
> -- I loaded clone mappings with chromosomal coordinates and Target  
> coordinates, referring to their own FASTA sequences previously  
> loaded into Chado.
> -- GBrowse reported an error in the apache error_log:
> " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/Perl/
> lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
> -- The issue is apparently that the SQL to return segments by  
> uniquename returned two rows instead of the expected one row.
> -- The database had two entries in featureloc for this feature_id,  
> one with chromosomal coordinates, one with clone coordinates.
> -- My "fix" was to add to the SQL to ensure the name != srcname  
> (i.e. that its coordinates weren't relative to itself.)
>
> The joins which include "sf" below are the new additions:
>
>     my $fetch_uniquename_query = $factory->dbh->prepare( "
>        select  
> f
> .name
> ,fl.fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>        from feature f, featureloc fl, feature sf
>        where f.feature_id = ? and
>              f.feature_id = fl.feature_id and
>              fl.srcfeature_id = sf.feature_id and
>              sf.name != f.name
>          ");
>
> Lincoln, I got swamped the past two weeks, but am ready to remind  
> you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> james.m.ward@...
> jmw86069@...
> (919) 423-1107
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart  
> your
> developing skills, take BlackBerry mobile applications to market and  
> stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference_______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@...
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research





------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [UN-SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by James M. Ward-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Scott,

For some reason, I keep tending to try things out rather than asking the experts...  I think the problem may be related to how I'm loading in alignments to Chado with the gff3 format.  I followed the example here: http://gmod.org/wiki/GFF#Alignments which shows the "Name" field equal to the name of the sequence being aligned, also same as in the "Target" field just afterwards.

When I load data this way, it keeps the name, and creates a unique name based upon the ID I used.  Somehow it seems to cross up the Name of the sequence with the name of the alignment which has that sequence as the "Target."  But maybe it's because the clones often have more than one mapped location, so the name may be returning multiple segments because of that??  I can't follow it.

I hope you'll recognize an obvious error by eye, and I'll correct it and be on my way. :-)

Here is an example of my original GFF3, and then a version afterwards. I'm using a clone with two mapped locations.  I can find one with two locations on the same chromosome too.

Original gff3:
(with Name set to the clone name, also which has DNA sequence pre-loaded by that name.)
chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 3 81
chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 82 237
chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 238 316
chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 317 390
chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 391 480
chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + . ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 481 500
chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - . ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 695 747
chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - . ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 492 692
Interesting thing here is that the "ID" seems lost -- the unique_name is the ID plus some number sequence so they all differ.

Updated gff3:
(I removed "Name" altogether -- and now it can load and display the alignment properly, BUT search by 0205P0028M03 doesn't work.)
chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 3 81
chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 82 237
chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 238 316
chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 317 390
chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 391 480
chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + . ID=exonerateMay2009_107776;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 481 500
chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - . ID=exonerateMay2009_107778;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 695 747
chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - . ID=exonerateMay2009_107778;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 492 692
When I search by name, it returns only the clone sequence itself, not the position of the clone on the chromosome (can I force it to use the synonym or description for the query instead of the name of type cDNA_clone?)

I appreciate your help!

James

On Mon, 2009-11-02 at 13:27 -0500, Scott Cain wrote:
Hi James,

My guess is that there is an inconsistency between what the GBrowse  
Chado adaptor is expecting and what is in your data.  Could you send a  
sample of it so I can play with it?  It seems to me that the fix you  
are describing shouldn't be necessary, but if you are using a  
reasonable data representation, then maybe it is.

Also, putting an unrelated reminder at the bottom of an email is a  
good way for it to get missed :-)

Scott


On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:

> It's been a few days without GMOD Email feeds... I miss reading them  
> with coffee.
>
> I tracked down a nagging error in the Chado/GBrowse2 data accessor,  
> and I either (1) resolved it, or (2) found something I did wrong in  
> loading the data into Chado.
> My question is whether anyone recognizes this issue being caused by  
> faulty data (that I need to clean sooner than later) or if this fix  
> can be patched as-is?
>
> The issue:
> -- I loaded clone mappings with chromosomal coordinates and Target  
> coordinates, referring to their own FASTA sequences previously  
> loaded into Chado.
> -- GBrowse reported an error in the apache error_log: 
> " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/Perl/ 
> lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
> -- The issue is apparently that the SQL to return segments by  
> uniquename returned two rows instead of the expected one row.
> -- The database had two entries in featureloc for this feature_id,  
> one with chromosomal coordinates, one with clone coordinates.
> -- My "fix" was to add to the SQL to ensure the name != srcname  
> (i.e. that its coordinates weren't relative to itself.)
>
> The joins which include "sf" below are the new additions:
>
>     my $fetch_uniquename_query = $factory->dbh->prepare( "
>        select  
> f 
> .name 
> ,fl.fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>        from feature f, featureloc fl, feature sf
>        where f.feature_id = ? and
>              f.feature_id = fl.feature_id and
>              fl.srcfeature_id = sf.feature_id and
>              sf.name != f.name
>          ");
>
> Lincoln, I got swamped the past two weeks, but am ready to remind  
> you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> james.m.ward@...
> jmw86069@...
> (919) 423-1107
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart  
> your
> developing skills, take BlackBerry mobile applications to market and  
> stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference_______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@...
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research




James M. Ward
Bioinformatics and Computational Biology
Department of Neurobiology
Duke University Medical Center
(919) 423-1107

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [UN-SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by James M. Ward-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Scott,

Wow thank you sir.  I keep getting confused among the HOWTO's and Admin Tutorials and perldocs (which are great, I just haven't assimilated all of it yet.)

Your SQL would produce a series of uniquenames (a number sequence appended to the end) but all would get the same name.  Most of the time it would get the clone name, but sometimes (for some reason) it would get the "exonerate_May2009_11111" name.  Too late to check now though, I just blew them away!  :-)

Now what I need is for users to be able to find these parent matches by name... do I put the clone name as the "Name" of the parent match, or will that put me back up to my first Email?

And in general, if we want someone to be able to query and find a feature, are we putting those aliases onto each feature (or parent feature), or is there a hook for storing it in a side-table?  I like the CGI for adding annotations to the GBrowse balloons, etc -- but that doesn't help with queries.  I could've easily missed this point in the docs, I was just expecting to override the search when that time comes...

Thank you!

James

On Wed, 2009-11-04 at 15:45 -0500, Scott Cain wrote:
Hi James,

The Chado loading script, gmod_bulk_load_gff3.pl, does not support  
grouping features by giving them the same ID.  Instead, you need to  
create a single match feature that spans the length of the hit (ie,  
from the min start to max end), and then a set of match_part features  
for each of the segments of the hit that have the match feature as  
Parent.  The parent feature can be a match or any of its is_a  
children, like cDNA_match or cross_genome_match.  I thought this was  
described in the documentation for the loader, but now I can't find it  
anywhere.  I'll add it after finishing this email.

I'm a little surprised that the loader didn't complain about the IDs  
being the same.  I'm curious what happened to the features when they  
went into the database.  For instance, what do you find if you did  
this query on the database:

   SELECT name,uniquename FROM feature WHERE uniquename like  
'exonerateMay2009_107776%';

Scott



On Nov 4, 2009, at 8:42 AM, James M. Ward wrote:

> Scott,
>
> For some reason, I keep tending to try things out rather than asking  
> the experts...  I think the problem may be related to how I'm  
> loading in alignments to Chado with the gff3 format.  I followed the  
> example here: http://gmod.org/wiki/GFF#Alignments which shows the  
> "Name" field equal to the name of the sequence being aligned, also  
> same as in the "Target" field just afterwards.
>
> When I load data this way, it keeps the name, and creates a unique  
> name based upon the ID I used.  Somehow it seems to cross up the  
> Name of the sequence with the name of the alignment which has that  
> sequence as the "Target."  But maybe it's because the clones often  
> have more than one mapped location, so the name may be returning  
> multiple segments because of that??  I can't follow it.
>
> I hope you'll recognize an obvious error by eye, and I'll correct it  
> and be on my way. :-)
>
> Here is an example of my original GFF3, and then a version  
> afterwards. I'm using a clone with two mapped locations.  I can find  
> one with two locations on the same chromosome too.
>
> Original gff3:
> (with Name set to the clone name, also which has DNA sequence pre- 
> loaded by that name.)
> chr2	exonerate_May2009	cDNA_match	18674307	18674385	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 3 81
> chr2	exonerate_May2009	cDNA_match	18674481	18674636	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 82  
> 237
> chr2	exonerate_May2009	cDNA_match	18674744	18674822	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 238  
> 316
> chr2	exonerate_May2009	cDNA_match	18675258	18675331	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 317  
> 390
> chr2	exonerate_May2009	cDNA_match	18675417	18675506	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 391  
> 480
> chr2	exonerate_May2009	cDNA_match	18677446	18677465	2362	+	.	 
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 481  
> 500
> chrUn	exonerate_May2009	cDNA_match	80342142	80342194	1088	-	.	 
> ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 695  
> 747
> chrUn	exonerate_May2009	cDNA_match	80342194	80342394	1088	-	.	 
> ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 492  
> 692
> Interesting thing here is that the "ID" seems lost -- the  
> unique_name is the ID plus some number sequence so they all differ.
>
> Updated gff3:
> (I removed "Name" altogether -- and now it can load and display the  
> alignment properly, BUT search by 0205P0028M03 doesn't work.)
> chr2	exonerate_May2009	cDNA_match	18674307	18674385	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 3 81
> chr2	exonerate_May2009	cDNA_match	18674481	18674636	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 82 237
> chr2	exonerate_May2009	cDNA_match	18674744	18674822	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 238 316
> chr2	exonerate_May2009	cDNA_match	18675258	18675331	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 317 390
> chr2	exonerate_May2009	cDNA_match	18675417	18675506	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 391 480
> chr2	exonerate_May2009	cDNA_match	18677446	18677465	2362	+	.	 
> ID 
> = 
> exonerateMay2009_107776 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 481 500
> chrUn	exonerate_May2009	cDNA_match	80342142	80342194	1088	-	.	 
> ID 
> = 
> exonerateMay2009_107778 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 695 747
> chrUn	exonerate_May2009	cDNA_match	80342194	80342394	1088	-	.	 
> ID 
> = 
> exonerateMay2009_107778 
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 492 692
> When I search by name, it returns only the clone sequence itself,  
> not the position of the clone on the chromosome (can I force it to  
> use the synonym or description for the query instead of the name of  
> type cDNA_clone?)
>
> I appreciate your help!
>
> James
>
> On Mon, 2009-11-02 at 13:27 -0500, Scott Cain wrote:
>>
>> Hi James,
>>
>> My guess is that there is an inconsistency between what the GBrowse
>> Chado adaptor is expecting and what is in your data.  Could you  
>> send a
>> sample of it so I can play with it?  It seems to me that the fix you
>> are describing shouldn't be necessary, but if you are using a
>> reasonable data representation, then maybe it is.
>>
>> Also, putting an unrelated reminder at the bottom of an email is a
>> good way for it to get missed :-)
>>
>> Scott
>>
>>
>> On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:
>>
>> > It's been a few days without GMOD Email feeds... I miss reading  
>> them
>> > with coffee.
>> >
>> > I tracked down a nagging error in the Chado/GBrowse2 data accessor,
>> > and I either (1) resolved it, or (2) found something I did wrong in
>> > loading the data into Chado.
>> > My question is whether anyone recognizes this issue being caused by
>> > faulty data (that I need to clean sooner than later) or if this fix
>> > can be patched as-is?
>> >
>> > The issue:
>> > -- I loaded clone mappings with chromosomal coordinates and Target
>> > coordinates, referring to their own FASTA sequences previously
>> > loaded into Chado.
>> > -- GBrowse reported an error in the apache error_log:
>> > " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/ 
>> Perl/
>> > lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
>> > -- The issue is apparently that the SQL to return segments by
>> > uniquename returned two rows instead of the expected one row.
>> > -- The database had two entries in featureloc for this feature_id,
>> > one with chromosomal coordinates, one with clone coordinates.
>> > -- My "fix" was to add to the SQL to ensure the name != srcname
>> > (i.e. that its coordinates weren't relative to itself.)
>> >
>> > The joins which include "sf" below are the new additions:
>> >
>> >     my $fetch_uniquename_query = $factory->dbh->prepare( "
>> >        select
>> > f
>> > .name
>> > 
>>  ,fl 
>> .fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>> >        from feature f, featureloc fl, feature sf
>> >        where f.feature_id = ? and
>> >              f.feature_id = fl.feature_id and
>> >              fl.srcfeature_id = sf.feature_id and
>> >              sf.name != f.name
>> >          ");
>> >
>> > Lincoln, I got swamped the past two weeks, but am ready to remind
>> > you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>> >
>> > James M. Ward
>> > Bioinformatics and Computational Biology
>> > Department of Neurobiology
>> > Duke University Medical Center
>> > james.m.ward@...
>> > jmw86069@...
>> > (919) 423-1107
>> >  
>> ------------------------------------------------------------------------------
>> > Come build with us! The BlackBerry(R) Developer Conference in SF,  
>> CA
>> > is the only developer event you need to attend this year. Jumpstart
>> > your
>> > developing skills, take BlackBerry mobile applications to market  
>> and
>> > stay
>> > ahead of the curve. Join us from November 9 - 12, 2009. Register  
>> now!
>> > http://p.sf.net/sfu/devconference_______________________________________________
>> > Gmod-gbrowse mailing list
>> > Gmod-gbrowse@...
>> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>> -----------------------------------------------------------------------
>> Scott Cain, Ph. D. scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Ontario Institute for Cancer Research
>>
>>
>>
>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> (919) 423-1107

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research




James M. Ward
Bioinformatics and Computational Biology
Department of Neurobiology
Duke University Medical Center
(919) 423-1107

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [UN-SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by James M. Ward-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Scott,

Great, I'm slowly understanding more... If "Target" gets put into the Alias table, it would make the little parts searchable by name (which like you said is not useful, and likely annoying to get 37 little tiny hit results back on the karyotype display.)

So I'll "Name" the parent but not the children, and will use the "--no_target_syn" switch to turn off default indexing of "Target" and see how it all goes.

Thank you for pointing out both options!

James

On Wed, 2009-11-04 at 16:21 -0500, Scott Cain wrote:
Hi James,

Yes, putting the Name in the parent feature will make it so that  
people can find it by name; the default for the loader is to also make  
an alias for the Target of the hit, so that users searching the by the  
name of the target sequence will also find the hit that way (that can  
be turned off at load time though).  I don't think it makes much sense  
to put Aliases in the child features: who is going to be looking for  
the third HSP of a BLAST hit, right?  Or did I not understand your  
question?

I figured what you described was what was in the feature table--so  
those things wouldn't end up being grouped together in GBrowse.

Scott



On Nov 4, 2009, at 1:09 PM, James M. Ward wrote:

> Scott,
>
> Wow thank you sir.  I keep getting confused among the HOWTO's and  
> Admin Tutorials and perldocs (which are great, I just haven't  
> assimilated all of it yet.)
>
> Your SQL would produce a series of uniquenames (a number sequence  
> appended to the end) but all would get the same name.  Most of the  
> time it would get the clone name, but sometimes (for some reason) it  
> would get the "exonerate_May2009_11111" name.  Too late to check now  
> though, I just blew them away!  :-)
>
> Now what I need is for users to be able to find these parent matches  
> by name... do I put the clone name as the "Name" of the parent  
> match, or will that put me back up to my first Email?
>
> And in general, if we want someone to be able to query and find a  
> feature, are we putting those aliases onto each feature (or parent  
> feature), or is there a hook for storing it in a side-table?  I like  
> the CGI for adding annotations to the GBrowse balloons, etc -- but  
> that doesn't help with queries.  I could've easily missed this point  
> in the docs, I was just expecting to override the search when that  
> time comes...
>
> Thank you!
>
> James
>
> On Wed, 2009-11-04 at 15:45 -0500, Scott Cain wrote:
>>
>> Hi James,
>>
>> The Chado loading script, gmod_bulk_load_gff3.pl, does not support
>> grouping features by giving them the same ID.  Instead, you need to
>> create a single match feature that spans the length of the hit (ie,
>> from the min start to max end), and then a set of match_part features
>> for each of the segments of the hit that have the match feature as
>> Parent.  The parent feature can be a match or any of its is_a
>> children, like cDNA_match or cross_genome_match.  I thought this was
>> described in the documentation for the loader, but now I can't find  
>> it
>> anywhere.  I'll add it after finishing this email.
>>
>> I'm a little surprised that the loader didn't complain about the IDs
>> being the same.  I'm curious what happened to the features when they
>> went into the database.  For instance, what do you find if you did
>> this query on the database:
>>
>>    SELECT name,uniquename FROM feature WHERE uniquename like
>> 'exonerateMay2009_107776%';
>>
>> Scott
>>
>>
>>
>> On Nov 4, 2009, at 8:42 AM, James M. Ward wrote:
>>
>> > Scott,
>> >
>> > For some reason, I keep tending to try things out rather than  
>> asking
>> > the experts...  I think the problem may be related to how I'm
>> > loading in alignments to Chado with the gff3 format.  I followed  
>> the
>> > example here: http://gmod.org/wiki/GFF#Alignments which shows the
>> > "Name" field equal to the name of the sequence being aligned, also
>> > same as in the "Target" field just afterwards.
>> >
>> > When I load data this way, it keeps the name, and creates a unique
>> > name based upon the ID I used.  Somehow it seems to cross up the
>> > Name of the sequence with the name of the alignment which has that
>> > sequence as the "Target."  But maybe it's because the clones often
>> > have more than one mapped location, so the name may be returning
>> > multiple segments because of that??  I can't follow it.
>> >
>> > I hope you'll recognize an obvious error by eye, and I'll correct  
>> it
>> > and be on my way. :-)
>> >
>> > Here is an example of my original GFF3, and then a version
>> > afterwards. I'm using a clone with two mapped locations.  I can  
>> find
>> > one with two locations on the same chromosome too.
>> >
>> > Original gff3:
>> > (with Name set to the clone name, also which has DNA sequence pre-
>> > loaded by that name.)
>> > chr2	exonerate_May2009	cDNA_match	18674307	18674385	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 3 81
>> > chr2	exonerate_May2009	cDNA_match	18674481	18674636	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 82
>> > 237
>> > chr2	exonerate_May2009	cDNA_match	18674744	18674822	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 238
>> > 316
>> > chr2	exonerate_May2009	cDNA_match	18675258	18675331	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 317
>> > 390
>> > chr2	exonerate_May2009	cDNA_match	18675417	18675506	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 391
>> > 480
>> > chr2	exonerate_May2009	cDNA_match	18677446	18677465	2362	+	.	
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 481
>> > 500
>> > chrUn	exonerate_May2009	cDNA_match	80342142	80342194	1088	-	.	
>> > ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03  
>> 695
>> > 747
>> > chrUn	exonerate_May2009	cDNA_match	80342194	80342394	1088	-	.	
>> > ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03  
>> 492
>> > 692
>> > Interesting thing here is that the "ID" seems lost -- the
>> > unique_name is the ID plus some number sequence so they all differ.
>> >
>> > Updated gff3:
>> > (I removed "Name" altogether -- and now it can load and display the
>> > alignment properly, BUT search by 0205P0028M03 doesn't work.)
>> > chr2	exonerate_May2009	cDNA_match	18674307	18674385	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 3 81
>> > chr2	exonerate_May2009	cDNA_match	18674481	18674636	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 82 237
>> > chr2	exonerate_May2009	cDNA_match	18674744	18674822	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 238 316
>> > chr2	exonerate_May2009	cDNA_match	18675258	18675331	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 317 390
>> > chr2	exonerate_May2009	cDNA_match	18675417	18675506	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 391 480
>> > chr2	exonerate_May2009	cDNA_match	18677446	18677465	2362	+	.	
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 481 500
>> > chrUn	exonerate_May2009	cDNA_match	80342142	80342194	1088	-	.	
>> > ID
>> > =
>> > exonerateMay2009_107778
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 695 747
>> > chrUn	exonerate_May2009	cDNA_match	80342194	80342394	1088	-	.	
>> > ID
>> > =
>> > exonerateMay2009_107778
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 492 692
>> > When I search by name, it returns only the clone sequence itself,
>> > not the position of the clone on the chromosome (can I force it to
>> > use the synonym or description for the query instead of the name of
>> > type cDNA_clone?)
>> >
>> > I appreciate your help!
>> >
>> > James
>> >
>> > On Mon, 2009-11-02 at 13:27 -0500, Scott Cain wrote:
>> >>
>> >> Hi James,
>> >>
>> >> My guess is that there is an inconsistency between what the  
>> GBrowse
>> >> Chado adaptor is expecting and what is in your data.  Could you
>> >> send a
>> >> sample of it so I can play with it?  It seems to me that the fix  
>> you
>> >> are describing shouldn't be necessary, but if you are using a
>> >> reasonable data representation, then maybe it is.
>> >>
>> >> Also, putting an unrelated reminder at the bottom of an email is a
>> >> good way for it to get missed :-)
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:
>> >>
>> >> > It's been a few days without GMOD Email feeds... I miss reading
>> >> them
>> >> > with coffee.
>> >> >
>> >> > I tracked down a nagging error in the Chado/GBrowse2 data  
>> accessor,
>> >> > and I either (1) resolved it, or (2) found something I did  
>> wrong in
>> >> > loading the data into Chado.
>> >> > My question is whether anyone recognizes this issue being  
>> caused by
>> >> > faulty data (that I need to clean sooner than later) or if  
>> this fix
>> >> > can be patched as-is?
>> >> >
>> >> > The issue:
>> >> > -- I loaded clone mappings with chromosomal coordinates and  
>> Target
>> >> > coordinates, referring to their own FASTA sequences previously
>> >> > loaded into Chado.
>> >> > -- GBrowse reported an error in the apache error_log:
>> >> > " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/
>> >> Perl/
>> >> > lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
>> >> > -- The issue is apparently that the SQL to return segments by
>> >> > uniquename returned two rows instead of the expected one row.
>> >> > -- The database had two entries in featureloc for this  
>> feature_id,
>> >> > one with chromosomal coordinates, one with clone coordinates.
>> >> > -- My "fix" was to add to the SQL to ensure the name != srcname
>> >> > (i.e. that its coordinates weren't relative to itself.)
>> >> >
>> >> > The joins which include "sf" below are the new additions:
>> >> >
>> >> >     my $fetch_uniquename_query = $factory->dbh->prepare( "
>> >> >        select
>> >> > f
>> >> > .name
>> >> >
>> >>  ,fl
>> > 
>> > .fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>> >> >        from feature f, featureloc fl, feature sf
>> >> >        where f.feature_id = ? and
>> >> >              f.feature_id = fl.feature_id and
>> >> >              fl.srcfeature_id = sf.feature_id and
>> >> >              sf.name != f.name
>> >> >          ");
>> >> >
>> >> > Lincoln, I got swamped the past two weeks, but am ready to  
>> remind
>> >> > you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>> >> >
>> >> > James M. Ward
>> >> > Bioinformatics and Computational Biology
>> >> > Department of Neurobiology
>> >> > Duke University Medical Center
>> >> > james.m.ward@...
>> >> > jmw86069@...
>> >> > (919) 423-1107
>> >> >
>> >>  
>> ------------------------------------------------------------------------------
>> >> > Come build with us! The BlackBerry(R) Developer Conference in  
>> SF,
>> >> CA
>> >> > is the only developer event you need to attend this year.  
>> Jumpstart
>> >> > your
>> >> > developing skills, take BlackBerry mobile applications to market
>> >> and
>> >> > stay
>> >> > ahead of the curve. Join us from November 9 - 12, 2009. Register
>> >> now!
>> >> > http://p.sf.net/sfu/devconference_______________________________________________
>> >> > Gmod-gbrowse mailing list
>> >> > Gmod-gbrowse@...
>> >> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>> >>
>> >>  
>> -----------------------------------------------------------------------
>> >> Scott Cain, Ph. D. scott at scottcain dot net
>> >> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >>
>> >>
>> >>
>> >
>> > James M. Ward
>> > Bioinformatics and Computational Biology
>> > Department of Neurobiology
>> > Duke University Medical Center
>> > (919) 423-1107
>>
>> -----------------------------------------------------------------------
>> Scott Cain, Ph. D. scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Ontario Institute for Cancer Research
>>
>>
>>
>>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> (919) 423-1107

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research




James M. Ward
Bioinformatics and Computational Biology
Department of Neurobiology
Duke University Medical Center
(919) 423-1107

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [UN-SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by Scott Cain-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi James,

The Chado loading script, gmod_bulk_load_gff3.pl, does not support  
grouping features by giving them the same ID.  Instead, you need to  
create a single match feature that spans the length of the hit (ie,  
from the min start to max end), and then a set of match_part features  
for each of the segments of the hit that have the match feature as  
Parent.  The parent feature can be a match or any of its is_a  
children, like cDNA_match or cross_genome_match.  I thought this was  
described in the documentation for the loader, but now I can't find it  
anywhere.  I'll add it after finishing this email.

I'm a little surprised that the loader didn't complain about the IDs  
being the same.  I'm curious what happened to the features when they  
went into the database.  For instance, what do you find if you did  
this query on the database:

   SELECT name,uniquename FROM feature WHERE uniquename like  
'exonerateMay2009_107776%';

Scott



On Nov 4, 2009, at 8:42 AM, James M. Ward wrote:

> Scott,
>
> For some reason, I keep tending to try things out rather than asking  
> the experts...  I think the problem may be related to how I'm  
> loading in alignments to Chado with the gff3 format.  I followed the  
> example here: http://gmod.org/wiki/GFF#Alignments which shows the  
> "Name" field equal to the name of the sequence being aligned, also  
> same as in the "Target" field just afterwards.
>
> When I load data this way, it keeps the name, and creates a unique  
> name based upon the ID I used.  Somehow it seems to cross up the  
> Name of the sequence with the name of the alignment which has that  
> sequence as the "Target."  But maybe it's because the clones often  
> have more than one mapped location, so the name may be returning  
> multiple segments because of that??  I can't follow it.
>
> I hope you'll recognize an obvious error by eye, and I'll correct it  
> and be on my way. :-)
>
> Here is an example of my original GFF3, and then a version  
> afterwards. I'm using a clone with two mapped locations.  I can find  
> one with two locations on the same chromosome too.
>
> Original gff3:
> (with Name set to the clone name, also which has DNA sequence pre-
> loaded by that name.)
> chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 3 81
> chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 82  
> 237
> chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 238  
> 316
> chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 317  
> 390
> chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 391  
> 480
> chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + .
> ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 481  
> 500
> chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - .
> ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 695  
> 747
> chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - .
> ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03 492  
> 692
> Interesting thing here is that the "ID" seems lost -- the  
> unique_name is the ID plus some number sequence so they all differ.
>
> Updated gff3:
> (I removed "Name" altogether -- and now it can load and display the  
> alignment properly, BUT search by 0205P0028M03 doesn't work.)
> chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 3 81
> chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 82 237
> chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 238 316
> chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 317 390
> chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 391 480
> chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + .
> ID
> =
> exonerateMay2009_107776
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 481 500
> chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - .
> ID
> =
> exonerateMay2009_107778
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 695 747
> chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - .
> ID
> =
> exonerateMay2009_107778
> ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 492 692
> When I search by name, it returns only the clone sequence itself,  
> not the position of the clone on the chromosome (can I force it to  
> use the synonym or description for the query instead of the name of  
> type cDNA_clone?)
>
> I appreciate your help!
>
> James
>
> On Mon, 2009-11-02 at 13:27 -0500, Scott Cain wrote:
>>
>> Hi James,
>>
>> My guess is that there is an inconsistency between what the GBrowse
>> Chado adaptor is expecting and what is in your data.  Could you  
>> send a
>> sample of it so I can play with it?  It seems to me that the fix you
>> are describing shouldn't be necessary, but if you are using a
>> reasonable data representation, then maybe it is.
>>
>> Also, putting an unrelated reminder at the bottom of an email is a
>> good way for it to get missed :-)
>>
>> Scott
>>
>>
>> On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:
>>
>> > It's been a few days without GMOD Email feeds... I miss reading  
>> them
>> > with coffee.
>> >
>> > I tracked down a nagging error in the Chado/GBrowse2 data accessor,
>> > and I either (1) resolved it, or (2) found something I did wrong in
>> > loading the data into Chado.
>> > My question is whether anyone recognizes this issue being caused by
>> > faulty data (that I need to clean sooner than later) or if this fix
>> > can be patched as-is?
>> >
>> > The issue:
>> > -- I loaded clone mappings with chromosomal coordinates and Target
>> > coordinates, referring to their own FASTA sequences previously
>> > loaded into Chado.
>> > -- GBrowse reported an error in the apache error_log:
>> > " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/
>> Perl/
>> > lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
>> > -- The issue is apparently that the SQL to return segments by
>> > uniquename returned two rows instead of the expected one row.
>> > -- The database had two entries in featureloc for this feature_id,
>> > one with chromosomal coordinates, one with clone coordinates.
>> > -- My "fix" was to add to the SQL to ensure the name != srcname
>> > (i.e. that its coordinates weren't relative to itself.)
>> >
>> > The joins which include "sf" below are the new additions:
>> >
>> >     my $fetch_uniquename_query = $factory->dbh->prepare( "
>> >        select
>> > f
>> > .name
>> >
>>  ,fl
>> .fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>> >        from feature f, featureloc fl, feature sf
>> >        where f.feature_id = ? and
>> >              f.feature_id = fl.feature_id and
>> >              fl.srcfeature_id = sf.feature_id and
>> >              sf.name != f.name
>> >          ");
>> >
>> > Lincoln, I got swamped the past two weeks, but am ready to remind
>> > you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>> >
>> > James M. Ward
>> > Bioinformatics and Computational Biology
>> > Department of Neurobiology
>> > Duke University Medical Center
>> > james.m.ward@...
>> > jmw86069@...
>> > (919) 423-1107
>> >  
>> ------------------------------------------------------------------------------
>> > Come build with us! The BlackBerry(R) Developer Conference in SF,  
>> CA
>> > is the only developer event you need to attend this year. Jumpstart
>> > your
>> > developing skills, take BlackBerry mobile applications to market  
>> and
>> > stay
>> > ahead of the curve. Join us from November 9 - 12, 2009. Register  
>> now!
>> > http://p.sf.net/sfu/devconference_______________________________________________
>> > Gmod-gbrowse mailing list
>> > Gmod-gbrowse@...
>> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>> -----------------------------------------------------------------------
>> Scott Cain, Ph. D. scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Ontario Institute for Cancer Research
>>
>>
>>
>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> (919) 423-1107

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research





------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [UN-SOLVED] Chado-GBrowse2 multiple segments error -- resolved or data issue?

by Scott Cain-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi James,

Yes, putting the Name in the parent feature will make it so that  
people can find it by name; the default for the loader is to also make  
an alias for the Target of the hit, so that users searching the by the  
name of the target sequence will also find the hit that way (that can  
be turned off at load time though).  I don't think it makes much sense  
to put Aliases in the child features: who is going to be looking for  
the third HSP of a BLAST hit, right?  Or did I not understand your  
question?

I figured what you described was what was in the feature table--so  
those things wouldn't end up being grouped together in GBrowse.

Scott



On Nov 4, 2009, at 1:09 PM, James M. Ward wrote:

> Scott,
>
> Wow thank you sir.  I keep getting confused among the HOWTO's and  
> Admin Tutorials and perldocs (which are great, I just haven't  
> assimilated all of it yet.)
>
> Your SQL would produce a series of uniquenames (a number sequence  
> appended to the end) but all would get the same name.  Most of the  
> time it would get the clone name, but sometimes (for some reason) it  
> would get the "exonerate_May2009_11111" name.  Too late to check now  
> though, I just blew them away!  :-)
>
> Now what I need is for users to be able to find these parent matches  
> by name... do I put the clone name as the "Name" of the parent  
> match, or will that put me back up to my first Email?
>
> And in general, if we want someone to be able to query and find a  
> feature, are we putting those aliases onto each feature (or parent  
> feature), or is there a hook for storing it in a side-table?  I like  
> the CGI for adding annotations to the GBrowse balloons, etc -- but  
> that doesn't help with queries.  I could've easily missed this point  
> in the docs, I was just expecting to override the search when that  
> time comes...
>
> Thank you!
>
> James
>
> On Wed, 2009-11-04 at 15:45 -0500, Scott Cain wrote:
>>
>> Hi James,
>>
>> The Chado loading script, gmod_bulk_load_gff3.pl, does not support
>> grouping features by giving them the same ID.  Instead, you need to
>> create a single match feature that spans the length of the hit (ie,
>> from the min start to max end), and then a set of match_part features
>> for each of the segments of the hit that have the match feature as
>> Parent.  The parent feature can be a match or any of its is_a
>> children, like cDNA_match or cross_genome_match.  I thought this was
>> described in the documentation for the loader, but now I can't find  
>> it
>> anywhere.  I'll add it after finishing this email.
>>
>> I'm a little surprised that the loader didn't complain about the IDs
>> being the same.  I'm curious what happened to the features when they
>> went into the database.  For instance, what do you find if you did
>> this query on the database:
>>
>>    SELECT name,uniquename FROM feature WHERE uniquename like
>> 'exonerateMay2009_107776%';
>>
>> Scott
>>
>>
>>
>> On Nov 4, 2009, at 8:42 AM, James M. Ward wrote:
>>
>> > Scott,
>> >
>> > For some reason, I keep tending to try things out rather than  
>> asking
>> > the experts...  I think the problem may be related to how I'm
>> > loading in alignments to Chado with the gff3 format.  I followed  
>> the
>> > example here: http://gmod.org/wiki/GFF#Alignments which shows the
>> > "Name" field equal to the name of the sequence being aligned, also
>> > same as in the "Target" field just afterwards.
>> >
>> > When I load data this way, it keeps the name, and creates a unique
>> > name based upon the ID I used.  Somehow it seems to cross up the
>> > Name of the sequence with the name of the alignment which has that
>> > sequence as the "Target."  But maybe it's because the clones often
>> > have more than one mapped location, so the name may be returning
>> > multiple segments because of that??  I can't follow it.
>> >
>> > I hope you'll recognize an obvious error by eye, and I'll correct  
>> it
>> > and be on my way. :-)
>> >
>> > Here is an example of my original GFF3, and then a version
>> > afterwards. I'm using a clone with two mapped locations.  I can  
>> find
>> > one with two locations on the same chromosome too.
>> >
>> > Original gff3:
>> > (with Name set to the clone name, also which has DNA sequence pre-
>> > loaded by that name.)
>> > chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 3 81
>> > chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03 82
>> > 237
>> > chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 238
>> > 316
>> > chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 317
>> > 390
>> > chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 391
>> > 480
>> > chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + .
>> > ID=exonerateMay2009_107776;Name=0205P0028M03;Target=0205P0028M03  
>> 481
>> > 500
>> > chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - .
>> > ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03  
>> 695
>> > 747
>> > chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - .
>> > ID=exonerateMay2009_107778;Name=0205P0028M03;Target=0205P0028M03  
>> 492
>> > 692
>> > Interesting thing here is that the "ID" seems lost -- the
>> > unique_name is the ID plus some number sequence so they all differ.
>> >
>> > Updated gff3:
>> > (I removed "Name" altogether -- and now it can load and display the
>> > alignment properly, BUT search by 0205P0028M03 doesn't work.)
>> > chr2 exonerate_May2009 cDNA_match 18674307 18674385 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 3 81
>> > chr2 exonerate_May2009 cDNA_match 18674481 18674636 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 82 237
>> > chr2 exonerate_May2009 cDNA_match 18674744 18674822 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 238 316
>> > chr2 exonerate_May2009 cDNA_match 18675258 18675331 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 317 390
>> > chr2 exonerate_May2009 cDNA_match 18675417 18675506 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 391 480
>> > chr2 exonerate_May2009 cDNA_match 18677446 18677465 2362 + .
>> > ID
>> > =
>> > exonerateMay2009_107776
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 481 500
>> > chrUn exonerate_May2009 cDNA_match 80342142 80342194 1088 - .
>> > ID
>> > =
>> > exonerateMay2009_107778
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 695 747
>> > chrUn exonerate_May2009 cDNA_match 80342194 80342394 1088 - .
>> > ID
>> > =
>> > exonerateMay2009_107778
>> > ;Alias=0205P0028M03;Note=0205P0028M03;Target=0205P0028M03 492 692
>> > When I search by name, it returns only the clone sequence itself,
>> > not the position of the clone on the chromosome (can I force it to
>> > use the synonym or description for the query instead of the name of
>> > type cDNA_clone?)
>> >
>> > I appreciate your help!
>> >
>> > James
>> >
>> > On Mon, 2009-11-02 at 13:27 -0500, Scott Cain wrote:
>> >>
>> >> Hi James,
>> >>
>> >> My guess is that there is an inconsistency between what the  
>> GBrowse
>> >> Chado adaptor is expecting and what is in your data.  Could you
>> >> send a
>> >> sample of it so I can play with it?  It seems to me that the fix  
>> you
>> >> are describing shouldn't be necessary, but if you are using a
>> >> reasonable data representation, then maybe it is.
>> >>
>> >> Also, putting an unrelated reminder at the bottom of an email is a
>> >> good way for it to get missed :-)
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Nov 2, 2009, at 1:16 PM, James M. Ward wrote:
>> >>
>> >> > It's been a few days without GMOD Email feeds... I miss reading
>> >> them
>> >> > with coffee.
>> >> >
>> >> > I tracked down a nagging error in the Chado/GBrowse2 data  
>> accessor,
>> >> > and I either (1) resolved it, or (2) found something I did  
>> wrong in
>> >> > loading the data into Chado.
>> >> > My question is whether anyone recognizes this issue being  
>> caused by
>> >> > faulty data (that I need to clean sooner than later) or if  
>> this fix
>> >> > can be patched as-is?
>> >> >
>> >> > The issue:
>> >> > -- I loaded clone mappings with chromosomal coordinates and  
>> Target
>> >> > coordinates, referring to their own FASTA sequences previously
>> >> > loaded into Chado.
>> >> > -- GBrowse reported an error in the apache error_log:
>> >> > " STACK Bio::DB::Das::Chado::Segment::new /home/jarvislab/Build/
>> >> Perl/
>> >> > lib/perl5/site_perl/5.10.0/Bio/DB/Das/Chado/Segment.pm:269"
>> >> > -- The issue is apparently that the SQL to return segments by
>> >> > uniquename returned two rows instead of the expected one row.
>> >> > -- The database had two entries in featureloc for this  
>> feature_id,
>> >> > one with chromosomal coordinates, one with clone coordinates.
>> >> > -- My "fix" was to add to the SQL to ensure the name != srcname
>> >> > (i.e. that its coordinates weren't relative to itself.)
>> >> >
>> >> > The joins which include "sf" below are the new additions:
>> >> >
>> >> >     my $fetch_uniquename_query = $factory->dbh->prepare( "
>> >> >        select
>> >> > f
>> >> > .name
>> >> >
>> >>  ,fl
>> >
>> > .fmin,fl.fmax,f.uniquename,f.is_obsolete,fl.srcfeature_id,fl.strand
>> >> >        from feature f, featureloc fl, feature sf
>> >> >        where f.feature_id = ? and
>> >> >              f.feature_id = fl.feature_id and
>> >> >              fl.srcfeature_id = sf.feature_id and
>> >> >              sf.name != f.name
>> >> >          ");
>> >> >
>> >> > Lincoln, I got swamped the past two weeks, but am ready to  
>> remind
>> >> > you to tell me about GBrowse2 Finders.  :-)  Thank you sir!
>> >> >
>> >> > James M. Ward
>> >> > Bioinformatics and Computational Biology
>> >> > Department of Neurobiology
>> >> > Duke University Medical Center
>> >> > james.m.ward@...
>> >> > jmw86069@...
>> >> > (919) 423-1107
>> >> >
>> >>  
>> ------------------------------------------------------------------------------
>> >> > Come build with us! The BlackBerry(R) Developer Conference in  
>> SF,
>> >> CA
>> >> > is the only developer event you need to attend this year.  
>> Jumpstart
>> >> > your
>> >> > developing skills, take BlackBerry mobile applications to market
>> >> and
>> >> > stay
>> >> > ahead of the curve. Join us from November 9 - 12, 2009. Register
>> >> now!
>> >> > http://p.sf.net/sfu/devconference_______________________________________________
>> >> > Gmod-gbrowse mailing list
>> >> > Gmod-gbrowse@...
>> >> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>> >>
>> >>  
>> -----------------------------------------------------------------------
>> >> Scott Cain, Ph. D. scott at scottcain dot net
>> >> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >>
>> >>
>> >>
>> >
>> > James M. Ward
>> > Bioinformatics and Computational Biology
>> > Department of Neurobiology
>> > Duke University Medical Center
>> > (919) 423-1107
>>
>> -----------------------------------------------------------------------
>> Scott Cain, Ph. D. scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Ontario Institute for Cancer Research
>>
>>
>>
>>
> James M. Ward
> Bioinformatics and Computational Biology
> Department of Neurobiology
> Duke University Medical Center
> (919) 423-1107

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research





------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse