|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: gbrowse_syn display questionHi Sheldon,
Can you please also give me some information on the clustal2hits. What are the different columns in the resulting hits file. What should be the final format of the file. Nikhat Sheldon McKay wrote: >Hi Chris, > >Clustalw is not suitable for the use case you describe. Jason's email >describes a pipeline similar to the one wormbase uses, except they >convert their alignment data to clustalw format. The format is a >commonly used alignment data format and is merely a convenience. It >does not mean the program clustalw should be used to actually make the >alignments. > >There are a variety of structured and ad hoc ways to get there but >what you need to end up with is a gbrowse_syn database loading file >that uses this specification: >http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format > >Sheldon > > >On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...> wrote: > > >>Hi >> >> >> >>We’re trying to get gbrowse_syn to display synteny blocks across regions of >>Brassica rapa, Brassica oleracea and some Arabidopsis species. >> >>However, when we feed clustalw a set of ~100-150 kb sequences that are know >>to be more or less syntenic, we get a single clustalw alignment that >>basically goes from end to end of each sequence and thus describes and >>displays it as a single synteny block. >> >>What we had been expecting (and hoping) to see was that clustalw would make >>separate multiple sequence alignments for each conserved region (genes and >>CNS) and skip over regions where there was little or no conservation >>(something similar to VISTA). >> >> >> >>In the example file pecan.aln, there are a number of alignment blocks, >>presumably generated by a single clustalw run using those 5 nematode >>sequences. I’m wondering if you can tell me exactly what clustalw parameters >>were used to generate this file. Perhaps we can tweak the parameters for our >>Brassica data to force clustalw to generate a set of good and separate >>alignment blocks rather than one wimpy that simply aligns the sequences more >>or less end-to-end and then flags the conserved bases. >> >> >> >>Any and all comments welcome. >> >> >> >>Thanks >> >> >> >>Chris Town >> >>_______________________ >> >>------------------------------------------------------------------------------ >>Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT >>is a gathering of tech-side developers & brand creativity professionals. >>Meet >>the minds behind Google Creative Lab, Visual Complexity, Processing, & >>iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian >>Group, R/GA, & Big Spaceship. http://www.creativitycat.com >>_______________________________________________ >>Gmod-gbrowse mailing list >>Gmod-gbrowse@... >>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >> >> >> > > > > > ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display questionHi Nikhat,
The alignment database schema, loading format, etc are described here: http://gmod.org/wiki/GBrowse_syn_Database Help with other aspects of gbrowse_syn can be reached from here: http://gmod.org/wiki/GBrowse_syn Regarding your previous email with the screenshot, I have not seen what you actually loaded into your alignment database but, but your MFA file does not appear to match your display, so I speculate that your loading file was not formatted correctly. See below for more details. Regards, Sheldon The clustal2hit.pl script takes clustal format (as describe here: http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) and produces a file in the format needed by load_alignment_database.pl I made a more generic version of this script, called aln2hit.pl, wherein you can specify other MSA formats. In your case the usage would be: perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt then perl load_alignment_database.pl dbname hits.txt If your database loads correctly, it should look something like this: mysql> select * from alignments; select * from map limit 10; +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ | hit_id | hit_name | src1 | ref1 | start1 | end1 | strand1 | seq1 | bin | src2 | ref2 | start2 | end2 | strand2 | seq2 | +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ | 1 | H000001 | boa1 | 84 | 234 | 85462 | + | | 100000.000000 | boa111 | 85 | 1 | 107914 | + | | | 2 | H000001r | boa111 | 85 | 1 | 107914 | + | | 1000000.000000 | boa1 | 84 | 234 | 85462 | + | | | 3 | H000002 | boa1 | 84 | 234 | 85462 | + | | 100000.000000 | boa11 | 83 | 1 | 134393 | - | | | 4 | H000002r | boa11 | 83 | 1 | 134393 | + | | 1000000.000000 | boa1 | 84 | 234 | 85462 | - | | | 5 | H000003 | boa111 | 85 | 1 | 107914 | + | | 1000000.000000 | boa11 | 83 | 1 | 134393 | - | | | 6 | H000003r | boa11 | 83 | 1 | 134393 | + | | 1000000.000000 | boa111 | 85 | 1 | 107914 | - | | +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ 6 rows in set (0.00 sec) +--------+----------+------+------+------+ | map_id | hit_name | src1 | pos1 | pos2 | +--------+----------+------+------+------+ | 1 | H000001 | boa1 | 300 | 1 | | 2 | H000001 | boa1 | 400 | 1 | | 3 | H000001 | boa1 | 500 | 1 | | 4 | H000001 | boa1 | 600 | 1 | | 5 | H000001 | boa1 | 700 | 1 | | 6 | H000001 | boa1 | 800 | 1 | | 7 | H000001 | boa1 | 900 | 1 | | 8 | H000001 | boa1 | 1000 | 5 | | 9 | H000001 | boa1 | 1100 | 5 | | 10 | H000001 | boa1 | 1200 | 23 | +--------+----------+------+------+------+ 10 rows in set (0.00 sec) On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...> wrote: Hi Sheldon, -- Sheldon McKay, PhD Cold Spring Harbor Laboratory Office/Mobile: 516-367-6998 / 631-651-9728 Sent from Milford, Connecticut, United States ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display question Hi Sheldon, ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display questionHi Nikhat,
Actually, I disagree. The file you sent has only one contiguous alignment; this is the correct behavior for the parser. The insertions/deletions in the alignment are reflected in the pairwise coordinate maps. There was a thread a while back that discussed slicing the alignment into chunks but that is optional. Sheldon On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...> wrote:
-- Sheldon McKay, PhD Cold Spring Harbor Laboratory Office/Mobile: 516-367-6998 / 631-651-9728 ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display question ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display questionHi Chris,
From my communication with Sheldon what I could understand is that the cigar lines i.e the red and green (color can be changed) lines just represent the begining and end of the sequence that was used in the alignment irrespective of the matches found or not. The grid lines are the region of homology. Sheldon , You can correct me if my understanding is wrong. Problem is that we are expecting the cigar lines to represent the region of homology. So we were expecting the cigar lines to break when there is a gap in the alignment. If I can change the clustal2hit in a way that alignment is stored only when it finds a region of homology, not by the begin and end of the seq that will generate the image the way we want it. Nikhat Sheldon McKay wrote: > Hi Nikhat, > > Actually, I disagree. The file you sent has only one contiguous > alignment; this is the correct behavior for the parser. The > insertions/deletions in the alignment are reflected in the pairwise > coordinate maps. There was a thread a while back that discussed > slicing the alignment into chunks but that is optional. > > Sheldon > > > > On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@... > <mailto:nzafar@...>> wrote: > > > Hi Sheldon, > > I tried what you suggested in your previous email. But now the > block is one continous block without any breaks which is not ture > if we look at the alignment. > > here is the data in alignment table ; > > select * from alignments; > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > | hit_id | hit_name | src1 | ref1 | start1 | end1 | strand1 | > seq1 | bin | src2 | ref2 | start2 | end2 | strand2 > | seq2 | > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > | 1 | H000001 | boa1 | 84 | 234 | 85462 | + | > | 100000.000000 | boa111 | 85 | 1 | 107914 | + > | | > | 2 | H000001r | boa111 | 85 | 1 | 107914 | + | > | 1000000.000000 | boa1 | 84 | 234 | 85462 | + > | | > | 3 | H000002 | boa1 | 84 | 234 | 85462 | + | > | 100000.000000 | boa11 | 83 | 1 | 134393 | - > | | > | 4 | H000002r | boa11 | 83 | 1 | 134393 | + | > | 1000000.000000 | boa1 | 84 | 234 | 85462 | - > | | > | 5 | H000003 | boa111 | 85 | 1 | 107914 | + | > | 1000000.000000 | boa11 | 83 | 1 | 134393 | - > | | > | 6 | H000003r | boa11 | 83 | 1 | 134393 | + | > | 1000000.000000 | boa111 | 85 | 1 | 107914 | - > | | > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > > > select * from map where map_id < 20; > +--------+----------+------+------+------+ > | map_id | hit_name | src1 | pos1 | pos2 | > +--------+----------+------+------+------+ > | 1 | H000001 | boa1 | 300 | 1 | > | 2 | H000001 | boa1 | 400 | 1 | > | 3 | H000001 | boa1 | 500 | 1 | > | 4 | H000001 | boa1 | 600 | 1 | > | 5 | H000001 | boa1 | 700 | 1 | > | 6 | H000001 | boa1 | 800 | 1 | > | 7 | H000001 | boa1 | 900 | 1 | > | 8 | H000001 | boa1 | 1000 | 5 | > | 9 | H000001 | boa1 | 1100 | 5 | > | 10 | H000001 | boa1 | 1200 | 23 | > | 11 | H000001 | boa1 | 1300 | 26 | > | 12 | H000001 | boa1 | 1400 | 38 | > | 13 | H000001 | boa1 | 1500 | 48 | > | 14 | H000001 | boa1 | 1600 | 52 | > | 15 | H000001 | boa1 | 1700 | 61 | > | 16 | H000001 | boa1 | 1800 | 61 | > | 17 | H000001 | boa1 | 1900 | 61 | > | 18 | H000001 | boa1 | 2000 | 86 | > | 19 | H000001 | boa1 | 2100 | 86 | > +--------+----------+------+------+------+ > 19 rows in set (0.00 sec) > > > I use aln2hit and used my mfa file as the input. > > > Nikhat > > > -----Original Message----- > From: Sheldon McKay [mailto:sheldon.mckay@... > <mailto:sheldon.mckay@...>] > Sent: Tue 6/2/2009 3:35 PM > To: Zafar, Nikhat > Cc: gmod-gbrowse@... > <mailto:gmod-gbrowse@...> > Subject: Re: [Gmod-gbrowse] gbrowse_syn display question > > Hi Nikhat, > > The alignment database schema, loading format, etc are described here: > http://gmod.org/wiki/GBrowse_syn_Database > > Help with other aspects of gbrowse_syn can be reached from here: > http://gmod.org/wiki/GBrowse_syn > > Regarding your previous email with the screenshot, I have not seen > what you > actually loaded into your alignment database but, but your MFA > file does not > appear to match your display, so I speculate that your loading > file was not > formatted correctly. > > See below for more details. > > Regards, > Sheldon > > > The clustal2hit.pl script takes clustal format (as describe here: > http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) > and > produces a file in the format needed by load_alignment_database.pl > I made a more generic version of this script, called aln2hit.pl, > wherein you > can specify other MSA formats. > > In your case the usage would be: > perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt > then > perl load_alignment_database.pl dbname hits.txt > > If your database loads correctly, it should look something like this: > > mysql> select * from alignments; select * from map limit 10; > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > | hit_id | hit_name | src1 | ref1 | start1 | end1 | strand1 | > seq1 | > bin | src2 | ref2 | start2 | end2 | strand2 | seq2 | > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > | 1 | H000001 | boa1 | 84 | 234 | 85462 | + | > | > 100000.000000 | boa111 | 85 | 1 | 107914 | + | | > | 2 | H000001r | boa111 | 85 | 1 | 107914 | + | > | > 1000000.000000 | boa1 | 84 | 234 | 85462 | + | | > | 3 | H000002 | boa1 | 84 | 234 | 85462 | + | > | > 100000.000000 | boa11 | 83 | 1 | 134393 | - | | > | 4 | H000002r | boa11 | 83 | 1 | 134393 | + | > | > 1000000.000000 | boa1 | 84 | 234 | 85462 | - | | > | 5 | H000003 | boa111 | 85 | 1 | 107914 | + | > | > 1000000.000000 | boa11 | 83 | 1 | 134393 | - | | > | 6 | H000003r | boa11 | 83 | 1 | 134393 | + | > | > 1000000.000000 | boa111 | 85 | 1 | 107914 | - | | > +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ > 6 rows in set (0.00 sec) > > +--------+----------+------+------+------+ > | map_id | hit_name | src1 | pos1 | pos2 | > +--------+----------+------+------+------+ > | 1 | H000001 | boa1 | 300 | 1 | > | 2 | H000001 | boa1 | 400 | 1 | > | 3 | H000001 | boa1 | 500 | 1 | > | 4 | H000001 | boa1 | 600 | 1 | > | 5 | H000001 | boa1 | 700 | 1 | > | 6 | H000001 | boa1 | 800 | 1 | > | 7 | H000001 | boa1 | 900 | 1 | > | 8 | H000001 | boa1 | 1000 | 5 | > | 9 | H000001 | boa1 | 1100 | 5 | > | 10 | H000001 | boa1 | 1200 | 23 | > +--------+----------+------+------+------+ > 10 rows in set (0.00 sec) > > > > On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@... > <mailto:nzafar@...>> wrote: > > > Hi Sheldon, > > > > Can you please also give me some information on the > clustal2hits. What are > > the different columns in the resulting hits file. What should be > the final > > format of the file. > > > > Nikhat > > > > > > > > Sheldon McKay wrote: > > > > Hi Chris, > >> > >> Clustalw is not suitable for the use case you describe. > Jason's email > >> describes a pipeline similar to the one wormbase uses, except they > >> convert their alignment data to clustalw format. The format is a > >> commonly used alignment data format and is merely a > convenience. It > >> does not mean the program clustalw should be used to actually > make the > >> alignments. > >> > >> There are a variety of structured and ad hoc ways to get there but > >> what you need to end up with is a gbrowse_syn database loading file > >> that uses this specification: > >> > http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format > >> > >> Sheldon > >> > >> > >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. > <cdtown@... <mailto:cdtown@...>> > >> wrote: > >> > >> > >>> Hi > >>> > >>> > >>> > >>> We're trying to get gbrowse_syn to display synteny blocks > across regions > >>> of > >>> Brassica rapa, Brassica oleracea and some Arabidopsis species. > >>> > >>> However, when we feed clustalw a set of ~100-150 kb sequences > that are > >>> know > >>> to be more or less syntenic, we get a single clustalw > alignment that > >>> basically goes from end to end of each sequence and thus > describes and > >>> displays it as a single synteny block. > >>> > >>> What we had been expecting (and hoping) to see was that > clustalw would > >>> make > >>> separate multiple sequence alignments for each conserved > region (genes > >>> and > >>> CNS) and skip over regions where there was little or no > conservation > >>> (something similar to VISTA). > >>> > >>> > >>> > >>> In the example file pecan.aln, there are a number of alignment > blocks, > >>> presumably generated by a single clustalw run using those 5 > nematode > >>> sequences. I'm wondering if you can tell me exactly what clustalw > >>> parameters > >>> were used to generate this file. Perhaps we can tweak the > parameters for > >>> our > >>> Brassica data to force clustalw to generate a set of good and > separate > >>> alignment blocks rather than one wimpy that simply aligns the > sequences > >>> more > >>> or less end-to-end and then flags the conserved bases. > >>> > >>> > >>> > >>> Any and all comments welcome. > >>> > >>> > >>> > >>> Thanks > >>> > >>> > >>> > >>> Chris Town > >>> > >>> _______________________ > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Register Now for Creativity and Technology (CaT), June 3rd, > NYC. CaT > >>> is a gathering of tech-side developers & brand creativity > professionals. > >>> Meet > >>> the minds behind Google Creative Lab, Visual Complexity, > Processing, & > >>> iPhoneDevCamp asthey present alongside digital heavyweights like > >>> Barbarian > >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com > >>> _______________________________________________ > >>> Gmod-gbrowse mailing list > >>> Gmod-gbrowse@... > <mailto:Gmod-gbrowse@...> > >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > >>> > >>> > >>> > >>> > >> > >> > >> > >> > >> > > > > > > > -- > Sheldon McKay, PhD > Cold Spring Harbor Laboratory > Office/Mobile: 516-367-6998 / 631-651-9728 > > Sent from Milford, Connecticut, United States > > > > > -- > Sheldon McKay, PhD > Cold Spring Harbor Laboratory > Office/Mobile: 516-367-6998 / 631-651-9728 > ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display questionHi Sheldon,
In the file attached I can see two type of grid lines purple color and black color. What do they represent. Thanks Nikhat Nikhat Zafar wrote: >Hi Chris, > > From my communication with Sheldon what I could understand is that the >cigar lines i.e the red and green (color can be changed) lines just >represent the begining and end of the sequence that was used in the >alignment irrespective of the matches found or not. The grid lines are >the region of homology. > >Sheldon , >You can correct me if my understanding is wrong. Problem is that we are >expecting the cigar lines to represent the region of homology. So we >were expecting the cigar lines to break when there is a gap in the >alignment. If I can change the clustal2hit in a way that alignment is >stored only when it finds a region of homology, not by the begin and end >of the seq that will generate the image the way we want it. > > >Nikhat > > > > > >Sheldon McKay wrote: > > > >>Hi Nikhat, >> >>Actually, I disagree. The file you sent has only one contiguous >>alignment; this is the correct behavior for the parser. The >>insertions/deletions in the alignment are reflected in the pairwise >>coordinate maps. There was a thread a while back that discussed >>slicing the alignment into chunks but that is optional. >> >>Sheldon >> >> >> >>On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@... >><mailto:nzafar@...>> wrote: >> >> >> Hi Sheldon, >> >> I tried what you suggested in your previous email. But now the >> block is one continous block without any breaks which is not ture >> if we look at the alignment. >> >> here is the data in alignment table ; >> >> select * from alignments; >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> | hit_id | hit_name | src1 | ref1 | start1 | end1 | strand1 | >> seq1 | bin | src2 | ref2 | start2 | end2 | strand2 >> | seq2 | >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> | 1 | H000001 | boa1 | 84 | 234 | 85462 | + | >> | 100000.000000 | boa111 | 85 | 1 | 107914 | + >> | | >> | 2 | H000001r | boa111 | 85 | 1 | 107914 | + | >> | 1000000.000000 | boa1 | 84 | 234 | 85462 | + >> | | >> | 3 | H000002 | boa1 | 84 | 234 | 85462 | + | >> | 100000.000000 | boa11 | 83 | 1 | 134393 | - >> | | >> | 4 | H000002r | boa11 | 83 | 1 | 134393 | + | >> | 1000000.000000 | boa1 | 84 | 234 | 85462 | - >> | | >> | 5 | H000003 | boa111 | 85 | 1 | 107914 | + | >> | 1000000.000000 | boa11 | 83 | 1 | 134393 | - >> | | >> | 6 | H000003r | boa11 | 83 | 1 | 134393 | + | >> | 1000000.000000 | boa111 | 85 | 1 | 107914 | - >> | | >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> >> >> select * from map where map_id < 20; >> +--------+----------+------+------+------+ >> | map_id | hit_name | src1 | pos1 | pos2 | >> +--------+----------+------+------+------+ >> | 1 | H000001 | boa1 | 300 | 1 | >> | 2 | H000001 | boa1 | 400 | 1 | >> | 3 | H000001 | boa1 | 500 | 1 | >> | 4 | H000001 | boa1 | 600 | 1 | >> | 5 | H000001 | boa1 | 700 | 1 | >> | 6 | H000001 | boa1 | 800 | 1 | >> | 7 | H000001 | boa1 | 900 | 1 | >> | 8 | H000001 | boa1 | 1000 | 5 | >> | 9 | H000001 | boa1 | 1100 | 5 | >> | 10 | H000001 | boa1 | 1200 | 23 | >> | 11 | H000001 | boa1 | 1300 | 26 | >> | 12 | H000001 | boa1 | 1400 | 38 | >> | 13 | H000001 | boa1 | 1500 | 48 | >> | 14 | H000001 | boa1 | 1600 | 52 | >> | 15 | H000001 | boa1 | 1700 | 61 | >> | 16 | H000001 | boa1 | 1800 | 61 | >> | 17 | H000001 | boa1 | 1900 | 61 | >> | 18 | H000001 | boa1 | 2000 | 86 | >> | 19 | H000001 | boa1 | 2100 | 86 | >> +--------+----------+------+------+------+ >> 19 rows in set (0.00 sec) >> >> >> I use aln2hit and used my mfa file as the input. >> >> >> Nikhat >> >> >> -----Original Message----- >> From: Sheldon McKay [mailto:sheldon.mckay@... >> <mailto:sheldon.mckay@...>] >> Sent: Tue 6/2/2009 3:35 PM >> To: Zafar, Nikhat >> Cc: gmod-gbrowse@... >> <mailto:gmod-gbrowse@...> >> Subject: Re: [Gmod-gbrowse] gbrowse_syn display question >> >> Hi Nikhat, >> >> The alignment database schema, loading format, etc are described here: >> http://gmod.org/wiki/GBrowse_syn_Database >> >> Help with other aspects of gbrowse_syn can be reached from here: >> http://gmod.org/wiki/GBrowse_syn >> >> Regarding your previous email with the screenshot, I have not seen >> what you >> actually loaded into your alignment database but, but your MFA >> file does not >> appear to match your display, so I speculate that your loading >> file was not >> formatted correctly. >> >> See below for more details. >> >> Regards, >> Sheldon >> >> >> The clustal2hit.pl script takes clustal format (as describe here: >> http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) >> and >> produces a file in the format needed by load_alignment_database.pl >> I made a more generic version of this script, called aln2hit.pl, >> wherein you >> can specify other MSA formats. >> >> In your case the usage would be: >> perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt >> then >> perl load_alignment_database.pl dbname hits.txt >> >> If your database loads correctly, it should look something like this: >> >> mysql> select * from alignments; select * from map limit 10; >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> | hit_id | hit_name | src1 | ref1 | start1 | end1 | strand1 | >> seq1 | >> bin | src2 | ref2 | start2 | end2 | strand2 | seq2 | >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> | 1 | H000001 | boa1 | 84 | 234 | 85462 | + | >> | >> 100000.000000 | boa111 | 85 | 1 | 107914 | + | | >> | 2 | H000001r | boa111 | 85 | 1 | 107914 | + | >> | >> 1000000.000000 | boa1 | 84 | 234 | 85462 | + | | >> | 3 | H000002 | boa1 | 84 | 234 | 85462 | + | >> | >> 100000.000000 | boa11 | 83 | 1 | 134393 | - | | >> | 4 | H000002r | boa11 | 83 | 1 | 134393 | + | >> | >> 1000000.000000 | boa1 | 84 | 234 | 85462 | - | | >> | 5 | H000003 | boa111 | 85 | 1 | 107914 | + | >> | >> 1000000.000000 | boa11 | 83 | 1 | 134393 | - | | >> | 6 | H000003r | boa11 | 83 | 1 | 134393 | + | >> | >> 1000000.000000 | boa111 | 85 | 1 | 107914 | - | | >> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+ >> 6 rows in set (0.00 sec) >> >> +--------+----------+------+------+------+ >> | map_id | hit_name | src1 | pos1 | pos2 | >> +--------+----------+------+------+------+ >> | 1 | H000001 | boa1 | 300 | 1 | >> | 2 | H000001 | boa1 | 400 | 1 | >> | 3 | H000001 | boa1 | 500 | 1 | >> | 4 | H000001 | boa1 | 600 | 1 | >> | 5 | H000001 | boa1 | 700 | 1 | >> | 6 | H000001 | boa1 | 800 | 1 | >> | 7 | H000001 | boa1 | 900 | 1 | >> | 8 | H000001 | boa1 | 1000 | 5 | >> | 9 | H000001 | boa1 | 1100 | 5 | >> | 10 | H000001 | boa1 | 1200 | 23 | >> +--------+----------+------+------+------+ >> 10 rows in set (0.00 sec) >> >> >> >> On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@... >> <mailto:nzafar@...>> wrote: >> >> > Hi Sheldon, >> > >> > Can you please also give me some information on the >> clustal2hits. What are >> > the different columns in the resulting hits file. What should be >> the final >> > format of the file. >> > >> > Nikhat >> > >> > >> > >> > Sheldon McKay wrote: >> > >> > Hi Chris, >> >> >> >> Clustalw is not suitable for the use case you describe. >> Jason's email >> >> describes a pipeline similar to the one wormbase uses, except they >> >> convert their alignment data to clustalw format. The format is a >> >> commonly used alignment data format and is merely a >> convenience. It >> >> does not mean the program clustalw should be used to actually >> make the >> >> alignments. >> >> >> >> There are a variety of structured and ad hoc ways to get there but >> >> what you need to end up with is a gbrowse_syn database loading file >> >> that uses this specification: >> >> >> http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format >> >> >> >> Sheldon >> >> >> >> >> >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. >> <cdtown@... <mailto:cdtown@...>> >> >> wrote: >> >> >> >> >> >>> Hi >> >>> >> >>> >> >>> >> >>> We're trying to get gbrowse_syn to display synteny blocks >> across regions >> >>> of >> >>> Brassica rapa, Brassica oleracea and some Arabidopsis species. >> >>> >> >>> However, when we feed clustalw a set of ~100-150 kb sequences >> that are >> >>> know >> >>> to be more or less syntenic, we get a single clustalw >> alignment that >> >>> basically goes from end to end of each sequence and thus >> describes and >> >>> displays it as a single synteny block. >> >>> >> >>> What we had been expecting (and hoping) to see was that >> clustalw would >> >>> make >> >>> separate multiple sequence alignments for each conserved >> region (genes >> >>> and >> >>> CNS) and skip over regions where there was little or no >> conservation >> >>> (something similar to VISTA). >> >>> >> >>> >> >>> >> >>> In the example file pecan.aln, there are a number of alignment >> blocks, >> >>> presumably generated by a single clustalw run using those 5 >> nematode >> >>> sequences. I'm wondering if you can tell me exactly what clustalw >> >>> parameters >> >>> were used to generate this file. Perhaps we can tweak the >> parameters for >> >>> our >> >>> Brassica data to force clustalw to generate a set of good and >> separate >> >>> alignment blocks rather than one wimpy that simply aligns the >> sequences >> >>> more >> >>> or less end-to-end and then flags the conserved bases. >> >>> >> >>> >> >>> >> >>> Any and all comments welcome. >> >>> >> >>> >> >>> >> >>> Thanks >> >>> >> >>> >> >>> >> >>> Chris Town >> >>> >> >>> _______________________ >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Register Now for Creativity and Technology (CaT), June 3rd, >> NYC. CaT >> >>> is a gathering of tech-side developers & brand creativity >> professionals. >> >>> Meet >> >>> the minds behind Google Creative Lab, Visual Complexity, >> Processing, & >> >>> iPhoneDevCamp asthey present alongside digital heavyweights like >> >>> Barbarian >> >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com >> >>> _______________________________________________ >> >>> Gmod-gbrowse mailing list >> >>> Gmod-gbrowse@... >> <mailto:Gmod-gbrowse@...> >> >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >>> >> >>> >> >>> >> >>> >> >> >> >> >> >> >> >> >> >> >> > >> > >> >> >> -- >> Sheldon McKay, PhD >> Cold Spring Harbor Laboratory >> Office/Mobile: 516-367-6998 / 631-651-9728 >> >> Sent from Milford, Connecticut, United States >> >> >> >> >>-- >>Sheldon McKay, PhD >>Cold Spring Harbor Laboratory >>Office/Mobile: 516-367-6998 / 631-651-9728 >> >> >> > > >------------------------------------------------------------------------------ >OpenSolaris 2009.06 is a cutting edge operating system for enterprises >looking to deploy the next generation of Solaris that includes the latest >innovations from Sun and the OpenSource community. Download a copy and >enjoy capabilities such as Networking, Storage and Virtualization. >Go to: http://p.sf.net/sfu/opensolaris-get >_______________________________________________ >Gmod-gbrowse mailing list >Gmod-gbrowse@... >https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
Re: gbrowse_syn display questionSorry for the delay in responding to this, I have been traveling today.
> From my communication with Sheldon what I could understand is that the cigar > lines i.e the red and green (color can be changed) lines just represent the > begining and end of the sequence that was used in the alignment irrespective Yes, I think so but these are not cigar lines, just the beginning and end of the 'hit' features corresponding to the start and end of the aligned regions. cigar lines (strings) are a compact representation of alignment data that are not currently used in gbrowse_syn but will be in a future version. In any case, they have no relationship to the graphical rendering at all. > of the matches found or not. The grid lines are the region of homology. No, the grid lines do not indicate homology, they indicate pair-wise coordinate maps between the aligned sequences. The spacing between the grid lines in the target sequence reflects gapped regions (aka indels) -- closer together = deletion, farther arart = insertion. I should point out that gbrowse_syn is for visualizing alignments data superimposed on genome annotations. It takes no position on questions of homology or orthology of aligned sequences. Ensuring appropriate alignments are done for homologous and/or orthologous sequneces is entirely the responsibility of the investigator who produces the alignment data. > You can correct me if my understanding is wrong. Problem is that we are > expecting the cigar lines to represent the region of homology. So we were > expecting the cigar lines to break when there is a gap in the alignment. If > I can change the clustal2hit in a way that alignment is stored only when it > finds a region of homology, not by the begin and end of the seq that will > generate the image the way we want it. I think you may need to revisit the parameters you used in generating the alignment data. The alignment as presented is a single gapped alignments spanning the whole regions. Gaps are not breaks in the context of a whole alignmant, rather they are insertions or deletions introduced to optimize the alignment and can be taken to represent insertion and deletion events in the evolution of the homologous sequences as they diverged from one another. Not knowing the particulars of your aligment protocol, I think it might do to add a higher gap extension penaltly or something so that you get a series of more similar alignment blocks rather than coercing the whole chromosome into a monolithic alignment block from one end to the other. Sheldon ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Gmod-gbrowse mailing list Gmod-gbrowse@... https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
|
|
|
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |