gbrowse_syn display question

View: New views
9 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: gbrowse_syn display question

by Nikhat Zafar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sheldon,

Can you please also give me some information on the clustal2hits. What
are the different columns in the resulting hits file. What should be the
final format of the file.

Nikhat


Sheldon McKay wrote:

>Hi Chris,
>
>Clustalw is not suitable for the use case you describe.  Jason's email
>describes a pipeline similar to the one wormbase uses, except they
>convert their alignment data to clustalw format.  The format is a
>commonly used alignment data format and is merely a convenience.  It
>does not mean the program clustalw should be used to actually make the
>alignments.
>
>There are a variety of structured and ad hoc ways to get there but
>what you need to end up with is a gbrowse_syn database loading file
>that uses this specification:
>http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>
>Sheldon
>
>
>On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...> wrote:
>  
>
>>Hi
>>
>>
>>
>>We’re trying to get gbrowse_syn to display synteny blocks across regions of
>>Brassica rapa, Brassica oleracea and some Arabidopsis species.
>>
>>However, when we feed clustalw a set of ~100-150 kb sequences that are know
>>to be more or less syntenic, we get a single clustalw alignment that
>>basically goes from end to end of each sequence and thus describes and
>>displays it as a single synteny block.
>>
>>What we had been expecting (and hoping) to see was that clustalw would make
>>separate multiple sequence alignments for each conserved region (genes and
>>CNS) and skip over regions where there was little or no conservation
>>(something similar to VISTA).
>>
>>
>>
>>In the example file pecan.aln, there are a number of alignment blocks,
>>presumably generated by a single clustalw run using those 5 nematode
>>sequences. I’m wondering if you can tell me exactly what clustalw parameters
>>were used to generate this file. Perhaps we can tweak the parameters for our
>>Brassica data to force clustalw to generate a set of good and separate
>>alignment blocks rather than one wimpy that simply aligns the sequences more
>>or less end-to-end and then flags the conserved bases.
>>
>>
>>
>>Any and all comments welcome.
>>
>>
>>
>>Thanks
>>
>>
>>
>>Chris Town
>>
>>_______________________
>>
>>------------------------------------------------------------------------------
>>Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
>>is a gathering of tech-side developers & brand creativity professionals.
>>Meet
>>the minds behind Google Creative Lab, Visual Complexity, Processing, &
>>iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
>>Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>>_______________________________________________
>>Gmod-gbrowse mailing list
>>Gmod-gbrowse@...
>>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>>
>>    
>>
>
>
>
>  
>


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Sheldon McKay-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nikhat,

The alignment database schema, loading format, etc are described here: http://gmod.org/wiki/GBrowse_syn_Database

Help with other aspects of gbrowse_syn can be reached from here:
http://gmod.org/wiki/GBrowse_syn

Regarding your previous email with the screenshot, I have not seen what you actually loaded into your alignment database but, but your MFA file does not appear to match your display, so I speculate that your loading file was not formatted correctly.

See below for more details.

Regards,
Sheldon


The clustal2hit.pl script takes clustal format (as describe here: http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) and produces a file in the format needed by load_alignment_database.pl
I made a more generic version of this script, called aln2hit.pl, wherein you can specify other MSA formats.

In your case the usage would be:
perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
then
perl load_alignment_database.pl dbname hits.txt

If your database loads correctly, it should look something like this:

mysql> select * from alignments; select * from map limit 10;
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
| hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 | bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
|      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
|      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
|      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
|      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
6 rows in set (0.00 sec)

+--------+----------+------+------+------+
| map_id | hit_name | src1 | pos1 | pos2 |
+--------+----------+------+------+------+
|      1 | H000001  | boa1 |  300 |    1 |
|      2 | H000001  | boa1 |  400 |    1 |
|      3 | H000001  | boa1 |  500 |    1 |
|      4 | H000001  | boa1 |  600 |    1 |
|      5 | H000001  | boa1 |  700 |    1 |
|      6 | H000001  | boa1 |  800 |    1 |
|      7 | H000001  | boa1 |  900 |    1 |
|      8 | H000001  | boa1 | 1000 |    5 |
|      9 | H000001  | boa1 | 1100 |    5 |
|     10 | H000001  | boa1 | 1200 |   23 |
+--------+----------+------+------+------+
10 rows in set (0.00 sec)



On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...> wrote:
Hi Sheldon,

Can you please also give me some information on the clustal2hits. What are the different columns in the resulting hits file. What should be the final format of the file.

Nikhat



Sheldon McKay wrote:

Hi Chris,

Clustalw is not suitable for the use case you describe.  Jason's email
describes a pipeline similar to the one wormbase uses, except they
convert their alignment data to clustalw format.  The format is a
commonly used alignment data format and is merely a convenience.  It
does not mean the program clustalw should be used to actually make the
alignments.

There are a variety of structured and ad hoc ways to get there but
what you need to end up with is a gbrowse_syn database loading file
that uses this specification:
http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format

Sheldon


On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...> wrote:
 
Hi



We’re trying to get gbrowse_syn to display synteny blocks across regions of
Brassica rapa, Brassica oleracea and some Arabidopsis species.

However, when we feed clustalw a set of ~100-150 kb sequences that are know
to be more or less syntenic, we get a single clustalw alignment that
basically goes from end to end of each sequence and thus describes and
displays it as a single synteny block.

What we had been expecting (and hoping) to see was that clustalw would make
separate multiple sequence alignments for each conserved region (genes and
CNS) and skip over regions where there was little or no conservation
(something similar to VISTA).



In the example file pecan.aln, there are a number of alignment blocks,
presumably generated by a single clustalw run using those 5 nematode
sequences. I’m wondering if you can tell me exactly what clustalw parameters
were used to generate this file. Perhaps we can tweak the parameters for our
Brassica data to force clustalw to generate a set of good and separate
alignment blocks rather than one wimpy that simply aligns the sequences more
or less end-to-end and then flags the conserved bases.



Any and all comments welcome.



Thanks



Chris Town

_______________________

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals.
Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse


 



 




--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728

Sent from Milford, Connecticut, United States
------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Nikhat Zafar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RE: [Gmod-gbrowse] gbrowse_syn display question

 Hi Sheldon,

 I tried what you suggested in your previous email. But now the block is one continous block without any breaks which is not ture if we look at the alignment.

here is the data in alignment table ;

 select * from alignments;
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
| hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 | bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
|      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
|      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
|      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
|      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+


select * from map where map_id < 20;
+--------+----------+------+------+------+
| map_id | hit_name | src1 | pos1 | pos2 |
+--------+----------+------+------+------+
|      1 | H000001  | boa1 |  300 |    1 |
|      2 | H000001  | boa1 |  400 |    1 |
|      3 | H000001  | boa1 |  500 |    1 |
|      4 | H000001  | boa1 |  600 |    1 |
|      5 | H000001  | boa1 |  700 |    1 |
|      6 | H000001  | boa1 |  800 |    1 |
|      7 | H000001  | boa1 |  900 |    1 |
|      8 | H000001  | boa1 | 1000 |    5 |
|      9 | H000001  | boa1 | 1100 |    5 |
|     10 | H000001  | boa1 | 1200 |   23 |
|     11 | H000001  | boa1 | 1300 |   26 |
|     12 | H000001  | boa1 | 1400 |   38 |
|     13 | H000001  | boa1 | 1500 |   48 |
|     14 | H000001  | boa1 | 1600 |   52 |
|     15 | H000001  | boa1 | 1700 |   61 |
|     16 | H000001  | boa1 | 1800 |   61 |
|     17 | H000001  | boa1 | 1900 |   61 | 
|     18 | H000001  | boa1 | 2000 |   86 |
|     19 | H000001  | boa1 | 2100 |   86 |
+--------+----------+------+------+------+
19 rows in set (0.00 sec)


I use aln2hit and used my mfa file as the input.


Nikhat


-----Original Message-----
From: Sheldon McKay [sheldon.mckay@...]
Sent: Tue 6/2/2009 3:35 PM
To: Zafar, Nikhat
Cc: gmod-gbrowse@...
Subject: Re: [Gmod-gbrowse] gbrowse_syn display question

Hi Nikhat,

The alignment database schema, loading format, etc are described here:
http://gmod.org/wiki/GBrowse_syn_Database

Help with other aspects of gbrowse_syn can be reached from here:
http://gmod.org/wiki/GBrowse_syn

Regarding your previous email with the screenshot, I have not seen what you
actually loaded into your alignment database but, but your MFA file does not
appear to match your display, so I speculate that your loading file was not
formatted correctly.

See below for more details.

Regards,
Sheldon


The clustal2hit.pl script takes clustal format (as describe here:
http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) and
produces a file in the format needed by load_alignment_database.pl
I made a more generic version of this script, called aln2hit.pl, wherein you
can specify other MSA formats.

In your case the usage would be:
perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
then
perl load_alignment_database.pl dbname hits.txt

If your database loads correctly, it should look something like this:

mysql> select * from alignments; select * from map limit 10;
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
| hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 |
bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
|      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |
100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
|      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      |
1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
|      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |
100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      |
1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
|      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      |
1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      |
1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
6 rows in set (0.00 sec)

+--------+----------+------+------+------+
| map_id | hit_name | src1 | pos1 | pos2 |
+--------+----------+------+------+------+
|      1 | H000001  | boa1 |  300 |    1 |
|      2 | H000001  | boa1 |  400 |    1 |
|      3 | H000001  | boa1 |  500 |    1 |
|      4 | H000001  | boa1 |  600 |    1 |
|      5 | H000001  | boa1 |  700 |    1 |
|      6 | H000001  | boa1 |  800 |    1 |
|      7 | H000001  | boa1 |  900 |    1 |
|      8 | H000001  | boa1 | 1000 |    5 |
|      9 | H000001  | boa1 | 1100 |    5 |
|     10 | H000001  | boa1 | 1200 |   23 |
+--------+----------+------+------+------+
10 rows in set (0.00 sec)



On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...> wrote:

> Hi Sheldon,
>
> Can you please also give me some information on the clustal2hits. What are
> the different columns in the resulting hits file. What should be the final
> format of the file.
>
> Nikhat
>
>
>
> Sheldon McKay wrote:
>
>  Hi Chris,
>>
>> Clustalw is not suitable for the use case you describe.  Jason's email
>> describes a pipeline similar to the one wormbase uses, except they
>> convert their alignment data to clustalw format.  The format is a
>> commonly used alignment data format and is merely a convenience.  It
>> does not mean the program clustalw should be used to actually make the
>> alignments.
>>
>> There are a variety of structured and ad hoc ways to get there but
>> what you need to end up with is a gbrowse_syn database loading file
>> that uses this specification:
>> http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>>
>> Sheldon
>>
>>
>> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...>
>> wrote:
>>
>>
>>> Hi
>>>
>>>
>>>
>>> We're trying to get gbrowse_syn to display synteny blocks across regions
>>> of
>>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
>>>
>>> However, when we feed clustalw a set of ~100-150 kb sequences that are
>>> know
>>> to be more or less syntenic, we get a single clustalw alignment that
>>> basically goes from end to end of each sequence and thus describes and
>>> displays it as a single synteny block.
>>>
>>> What we had been expecting (and hoping) to see was that clustalw would
>>> make
>>> separate multiple sequence alignments for each conserved region (genes
>>> and
>>> CNS) and skip over regions where there was little or no conservation
>>> (something similar to VISTA).
>>>
>>>
>>>
>>> In the example file pecan.aln, there are a number of alignment blocks,
>>> presumably generated by a single clustalw run using those 5 nematode
>>> sequences. I'm wondering if you can tell me exactly what clustalw
>>> parameters
>>> were used to generate this file. Perhaps we can tweak the parameters for
>>> our
>>> Brassica data to force clustalw to generate a set of good and separate
>>> alignment blocks rather than one wimpy that simply aligns the sequences
>>> more
>>> or less end-to-end and then flags the conserved bases.
>>>
>>>
>>>
>>> Any and all comments welcome.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Chris Town
>>>
>>> _______________________
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
>>> is a gathering of tech-side developers & brand creativity professionals.
>>> Meet
>>> the minds behind Google Creative Lab, Visual Complexity, Processing, &
>>> iPhoneDevCamp asthey present alongside digital heavyweights like
>>> Barbarian
>>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse@...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>


--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728

Sent from Milford, Connecticut, United States


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Sheldon McKay-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nikhat,

Actually, I disagree.  The file you sent has only one contiguous alignment; this is the correct behavior for the parser.   The insertions/deletions in the alignment are reflected in the pairwise coordinate maps.  There was a thread a while back that discussed slicing the alignment into chunks but that is optional.

Sheldon



On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...> wrote:

 Hi Sheldon,

 I tried what you suggested in your previous email. But now the block is one continous block without any breaks which is not ture if we look at the alignment.

here is the data in alignment table ;

 select * from alignments;
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
| hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 | bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
|      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
|      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
|      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |  100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
|      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      | 1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      | 1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+


select * from map where map_id < 20;
+--------+----------+------+------+------+
| map_id | hit_name | src1 | pos1 | pos2 |
+--------+----------+------+------+------+
|      1 | H000001  | boa1 |  300 |    1 |
|      2 | H000001  | boa1 |  400 |    1 |
|      3 | H000001  | boa1 |  500 |    1 |
|      4 | H000001  | boa1 |  600 |    1 |
|      5 | H000001  | boa1 |  700 |    1 |
|      6 | H000001  | boa1 |  800 |    1 |
|      7 | H000001  | boa1 |  900 |    1 |
|      8 | H000001  | boa1 | 1000 |    5 |
|      9 | H000001  | boa1 | 1100 |    5 |
|     10 | H000001  | boa1 | 1200 |   23 |
|     11 | H000001  | boa1 | 1300 |   26 |
|     12 | H000001  | boa1 | 1400 |   38 |
|     13 | H000001  | boa1 | 1500 |   48 |
|     14 | H000001  | boa1 | 1600 |   52 |
|     15 | H000001  | boa1 | 1700 |   61 |
|     16 | H000001  | boa1 | 1800 |   61 |
|     17 | H000001  | boa1 | 1900 |   61 |
|     18 | H000001  | boa1 | 2000 |   86 |
|     19 | H000001  | boa1 | 2100 |   86 |
+--------+----------+------+------+------+
19 rows in set (0.00 sec)


I use aln2hit and used my mfa file as the input.


Nikhat


-----Original Message-----
From: Sheldon McKay [mailto:sheldon.mckay@...]
Sent: Tue 6/2/2009 3:35 PM
To: Zafar, Nikhat
Cc: gmod-gbrowse@...
Subject: Re: [Gmod-gbrowse] gbrowse_syn display question

Hi Nikhat,

The alignment database schema, loading format, etc are described here:
http://gmod.org/wiki/GBrowse_syn_Database

Help with other aspects of gbrowse_syn can be reached from here:
http://gmod.org/wiki/GBrowse_syn

Regarding your previous email with the screenshot, I have not seen what you
actually loaded into your alignment database but, but your MFA file does not
appear to match your display, so I speculate that your loading file was not
formatted correctly.

See below for more details.

Regards,
Sheldon


The clustal2hit.pl script takes clustal format (as describe here:
http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) and
produces a file in the format needed by load_alignment_database.pl
I made a more generic version of this script, called aln2hit.pl, wherein you
can specify other MSA formats.

In your case the usage would be:
perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
then
perl load_alignment_database.pl dbname hits.txt

If your database loads correctly, it should look something like this:

mysql> select * from alignments; select * from map limit 10;
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
| hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 |
bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
|      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |
100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
|      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      |
1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
|      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |
100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      |
1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
|      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      |
1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
|      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      |
1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
+--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
6 rows in set (0.00 sec)

+--------+----------+------+------+------+
| map_id | hit_name | src1 | pos1 | pos2 |
+--------+----------+------+------+------+
|      1 | H000001  | boa1 |  300 |    1 |
|      2 | H000001  | boa1 |  400 |    1 |
|      3 | H000001  | boa1 |  500 |    1 |
|      4 | H000001  | boa1 |  600 |    1 |
|      5 | H000001  | boa1 |  700 |    1 |
|      6 | H000001  | boa1 |  800 |    1 |
|      7 | H000001  | boa1 |  900 |    1 |
|      8 | H000001  | boa1 | 1000 |    5 |
|      9 | H000001  | boa1 | 1100 |    5 |
|     10 | H000001  | boa1 | 1200 |   23 |
+--------+----------+------+------+------+
10 rows in set (0.00 sec)



On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...> wrote:

> Hi Sheldon,
>
> Can you please also give me some information on the clustal2hits. What are
> the different columns in the resulting hits file. What should be the final
> format of the file.
>
> Nikhat
>
>
>
> Sheldon McKay wrote:
>
>  Hi Chris,
>>
>> Clustalw is not suitable for the use case you describe.  Jason's email
>> describes a pipeline similar to the one wormbase uses, except they
>> convert their alignment data to clustalw format.  The format is a
>> commonly used alignment data format and is merely a convenience.  It
>> does not mean the program clustalw should be used to actually make the
>> alignments.
>>
>> There are a variety of structured and ad hoc ways to get there but
>> what you need to end up with is a gbrowse_syn database loading file
>> that uses this specification:
>> http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>>
>> Sheldon
>>
>>
>> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...>
>> wrote:
>>
>>
>>> Hi
>>>
>>>
>>>
>>> We're trying to get gbrowse_syn to display synteny blocks across regions
>>> of
>>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
>>>
>>> However, when we feed clustalw a set of ~100-150 kb sequences that are
>>> know
>>> to be more or less syntenic, we get a single clustalw alignment that
>>> basically goes from end to end of each sequence and thus describes and
>>> displays it as a single synteny block.
>>>
>>> What we had been expecting (and hoping) to see was that clustalw would
>>> make
>>> separate multiple sequence alignments for each conserved region (genes
>>> and
>>> CNS) and skip over regions where there was little or no conservation
>>> (something similar to VISTA).
>>>
>>>
>>>
>>> In the example file pecan.aln, there are a number of alignment blocks,
>>> presumably generated by a single clustalw run using those 5 nematode
>>> sequences. I'm wondering if you can tell me exactly what clustalw
>>> parameters
>>> were used to generate this file. Perhaps we can tweak the parameters for
>>> our
>>> Brassica data to force clustalw to generate a set of good and separate
>>> alignment blocks rather than one wimpy that simply aligns the sequences
>>> more
>>> or less end-to-end and then flags the conserved bases.
>>>
>>>
>>>
>>> Any and all comments welcome.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Chris Town
>>>
>>> _______________________
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
>>> is a gathering of tech-side developers & brand creativity professionals.
>>> Meet
>>> the minds behind Google Creative Lab, Visual Complexity, Processing, &
>>> iPhoneDevCamp asthey present alongside digital heavyweights like
>>> Barbarian
>>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse@...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>


--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728

Sent from Milford, Connecticut, United States




--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Nikhat Zafar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RE: [Gmod-gbrowse] gbrowse_syn display question

 

 But there are lot of gaps in the alignments which were showing up when I used clustal2hits to generate hits but they are not showing up now.

Nikhat


-----Original Message-----
From: Sheldon McKay [sheldon.mckay@...]
Sent: Tue 6/2/2009 7:35 PM
To: Zafar, Nikhat
Cc: gmod-gbrowse@...
Subject: Re: [Gmod-gbrowse] gbrowse_syn display question

Hi Nikhat,

Actually, I disagree.  The file you sent has only one contiguous alignment;
this is the correct behavior for the parser.   The insertions/deletions in
the alignment are reflected in the pairwise coordinate maps.  There was a
thread a while back that discussed slicing the alignment into chunks but
that is optional.

Sheldon



On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...> wrote:

>
>  Hi Sheldon,
>
>  I tried what you suggested in your previous email. But now the block is
> one continous block without any breaks which is not ture if we look at the
> alignment.
>
> here is the data in alignment table ;
>
>  select * from alignments;
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
> | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 |
> bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
> |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |
>  100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
> |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      |
> 1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
> |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |
>  100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
> |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      |
> 1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
> |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      |
> 1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
> |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      |
> 1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>
>
> select * from map where map_id < 20;
> +--------+----------+------+------+------+
> | map_id | hit_name | src1 | pos1 | pos2 |
> +--------+----------+------+------+------+
> |      1 | H000001  | boa1 |  300 |    1 |
> |      2 | H000001  | boa1 |  400 |    1 |
> |      3 | H000001  | boa1 |  500 |    1 |
> |      4 | H000001  | boa1 |  600 |    1 |
> |      5 | H000001  | boa1 |  700 |    1 |
> |      6 | H000001  | boa1 |  800 |    1 |
> |      7 | H000001  | boa1 |  900 |    1 |
> |      8 | H000001  | boa1 | 1000 |    5 |
> |      9 | H000001  | boa1 | 1100 |    5 |
> |     10 | H000001  | boa1 | 1200 |   23 |
> |     11 | H000001  | boa1 | 1300 |   26 |
> |     12 | H000001  | boa1 | 1400 |   38 |
> |     13 | H000001  | boa1 | 1500 |   48 |
> |     14 | H000001  | boa1 | 1600 |   52 |
> |     15 | H000001  | boa1 | 1700 |   61 |
> |     16 | H000001  | boa1 | 1800 |   61 |
> |     17 | H000001  | boa1 | 1900 |   61 |
> |     18 | H000001  | boa1 | 2000 |   86 |
> |     19 | H000001  | boa1 | 2100 |   86 |
> +--------+----------+------+------+------+
> 19 rows in set (0.00 sec)
>
>
> I use aln2hit and used my mfa file as the input.
>
>
> Nikhat
>
>
> -----Original Message-----
> From: Sheldon McKay [sheldon.mckay@...]
> Sent: Tue 6/2/2009 3:35 PM
> To: Zafar, Nikhat
> Cc: gmod-gbrowse@...
> Subject: Re: [Gmod-gbrowse] gbrowse_syn display question
>
> Hi Nikhat,
>
> The alignment database schema, loading format, etc are described here:
> http://gmod.org/wiki/GBrowse_syn_Database
>
> Help with other aspects of gbrowse_syn can be reached from here:
> http://gmod.org/wiki/GBrowse_syn
>
> Regarding your previous email with the screenshot, I have not seen what you
> actually loaded into your alignment database but, but your MFA file does
> not
> appear to match your display, so I speculate that your loading file was not
> formatted correctly.
>
> See below for more details.
>
> Regards,
> Sheldon
>
>
> The clustal2hit.pl script takes clustal format (as describe here:
> http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format) and
> produces a file in the format needed by load_alignment_database.pl
> I made a more generic version of this script, called aln2hit.pl, wherein
> you
> can specify other MSA formats.
>
> In your case the usage would be:
> perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
> then
> perl load_alignment_database.pl dbname hits.txt
>
> If your database loads correctly, it should look something like this:
>
> mysql> select * from alignments; select * from map limit 10;
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
> | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 | seq1 |
> bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
> |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |      |
> 100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
> |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |      |
> 1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
> |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |      |
> 100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
> |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |      |
> 1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
> |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |      |
> 1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
> |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |      |
> 1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
>
> +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
> 6 rows in set (0.00 sec)
>
> +--------+----------+------+------+------+
> | map_id | hit_name | src1 | pos1 | pos2 |
> +--------+----------+------+------+------+
> |      1 | H000001  | boa1 |  300 |    1 |
> |      2 | H000001  | boa1 |  400 |    1 |
> |      3 | H000001  | boa1 |  500 |    1 |
> |      4 | H000001  | boa1 |  600 |    1 |
> |      5 | H000001  | boa1 |  700 |    1 |
> |      6 | H000001  | boa1 |  800 |    1 |
> |      7 | H000001  | boa1 |  900 |    1 |
> |      8 | H000001  | boa1 | 1000 |    5 |
> |      9 | H000001  | boa1 | 1100 |    5 |
> |     10 | H000001  | boa1 | 1200 |   23 |
> +--------+----------+------+------+------+
> 10 rows in set (0.00 sec)
>
>
>
> On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...> wrote:
>
> > Hi Sheldon,
> >
> > Can you please also give me some information on the clustal2hits. What
> are
> > the different columns in the resulting hits file. What should be the
> final
> > format of the file.
> >
> > Nikhat
> >
> >
> >
> > Sheldon McKay wrote:
> >
> >  Hi Chris,
> >>
> >> Clustalw is not suitable for the use case you describe.  Jason's email
> >> describes a pipeline similar to the one wormbase uses, except they
> >> convert their alignment data to clustalw format.  The format is a
> >> commonly used alignment data format and is merely a convenience.  It
> >> does not mean the program clustalw should be used to actually make the
> >> alignments.
> >>
> >> There are a variety of structured and ad hoc ways to get there but
> >> what you need to end up with is a gbrowse_syn database loading file
> >> that uses this specification:
> >> http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
> >>
> >> Sheldon
> >>
> >>
> >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D. <cdtown@...>
> >> wrote:
> >>
> >>
> >>> Hi
> >>>
> >>>
> >>>
> >>> We're trying to get gbrowse_syn to display synteny blocks across
> regions
> >>> of
> >>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
> >>>
> >>> However, when we feed clustalw a set of ~100-150 kb sequences that are
> >>> know
> >>> to be more or less syntenic, we get a single clustalw alignment that
> >>> basically goes from end to end of each sequence and thus describes and
> >>> displays it as a single synteny block.
> >>>
> >>> What we had been expecting (and hoping) to see was that clustalw would
> >>> make
> >>> separate multiple sequence alignments for each conserved region (genes
> >>> and
> >>> CNS) and skip over regions where there was little or no conservation
> >>> (something similar to VISTA).
> >>>
> >>>
> >>>
> >>> In the example file pecan.aln, there are a number of alignment blocks,
> >>> presumably generated by a single clustalw run using those 5 nematode
> >>> sequences. I'm wondering if you can tell me exactly what clustalw
> >>> parameters
> >>> were used to generate this file. Perhaps we can tweak the parameters
> for
> >>> our
> >>> Brassica data to force clustalw to generate a set of good and separate
> >>> alignment blocks rather than one wimpy that simply aligns the sequences
> >>> more
> >>> or less end-to-end and then flags the conserved bases.
> >>>
> >>>
> >>>
> >>> Any and all comments welcome.
> >>>
> >>>
> >>>
> >>> Thanks
> >>>
> >>>
> >>>
> >>> Chris Town
> >>>
> >>> _______________________
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
> >>> is a gathering of tech-side developers & brand creativity
> professionals.
> >>> Meet
> >>> the minds behind Google Creative Lab, Visual Complexity, Processing, &
> >>> iPhoneDevCamp asthey present alongside digital heavyweights like
> >>> Barbarian
> >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
> >>> _______________________________________________
> >>> Gmod-gbrowse mailing list
> >>> Gmod-gbrowse@...
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>
> --
> Sheldon McKay, PhD
> Cold Spring Harbor Laboratory
> Office/Mobile:  516-367-6998 / 631-651-9728
>
> Sent from Milford, Connecticut, United States
>
>


--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Nikhat Zafar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Chris,

 From my communication with Sheldon what I could understand is that the
cigar lines i.e the red and green (color can be changed) lines just
represent the begining and end of the sequence that was used in the
alignment irrespective of the matches found or not. The grid lines are
the region of homology.

Sheldon ,
You can correct me if  my understanding is wrong. Problem is that we are
expecting the cigar lines to represent the region of homology. So we
were expecting the cigar lines to break when there is a gap in the
alignment. If I can change the clustal2hit in a way that alignment is
stored only when it finds a region of homology, not by the begin and end
of the seq that will generate the image the way we want it.


Nikhat





Sheldon McKay wrote:

> Hi Nikhat,
>
> Actually, I disagree.  The file you sent has only one contiguous
> alignment; this is the correct behavior for the parser.   The
> insertions/deletions in the alignment are reflected in the pairwise
> coordinate maps.  There was a thread a while back that discussed
> slicing the alignment into chunks but that is optional.
>
> Sheldon
>
>
>
> On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...
> <mailto:nzafar@...>> wrote:
>
>
>      Hi Sheldon,
>
>      I tried what you suggested in your previous email. But now the
>     block is one continous block without any breaks which is not ture
>     if we look at the alignment.
>
>     here is the data in alignment table ;
>
>      select * from alignments;
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>     | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>     seq1 | bin            | src2   | ref2 | start2 | end2   | strand2
>     | seq2 |
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>     |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>          |  100000.000000 | boa111 | 85   |      1 | 107914 | +      
>     |      |
>     |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>          | 1000000.000000 | boa1   | 84   |    234 |  85462 | +      
>     |      |
>     |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>          |  100000.000000 | boa11  | 83   |      1 | 134393 | -      
>     |      |
>     |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>          | 1000000.000000 | boa1   | 84   |    234 |  85462 | -      
>     |      |
>     |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>          | 1000000.000000 | boa11  | 83   |      1 | 134393 | -      
>     |      |
>     |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>          | 1000000.000000 | boa111 | 85   |      1 | 107914 | -      
>     |      |
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>
>
>     select * from map where map_id < 20;
>     +--------+----------+------+------+------+
>     | map_id | hit_name | src1 | pos1 | pos2 |
>     +--------+----------+------+------+------+
>     |      1 | H000001  | boa1 |  300 |    1 |
>     |      2 | H000001  | boa1 |  400 |    1 |
>     |      3 | H000001  | boa1 |  500 |    1 |
>     |      4 | H000001  | boa1 |  600 |    1 |
>     |      5 | H000001  | boa1 |  700 |    1 |
>     |      6 | H000001  | boa1 |  800 |    1 |
>     |      7 | H000001  | boa1 |  900 |    1 |
>     |      8 | H000001  | boa1 | 1000 |    5 |
>     |      9 | H000001  | boa1 | 1100 |    5 |
>     |     10 | H000001  | boa1 | 1200 |   23 |
>     |     11 | H000001  | boa1 | 1300 |   26 |
>     |     12 | H000001  | boa1 | 1400 |   38 |
>     |     13 | H000001  | boa1 | 1500 |   48 |
>     |     14 | H000001  | boa1 | 1600 |   52 |
>     |     15 | H000001  | boa1 | 1700 |   61 |
>     |     16 | H000001  | boa1 | 1800 |   61 |
>     |     17 | H000001  | boa1 | 1900 |   61 |
>     |     18 | H000001  | boa1 | 2000 |   86 |
>     |     19 | H000001  | boa1 | 2100 |   86 |
>     +--------+----------+------+------+------+
>     19 rows in set (0.00 sec)
>
>
>     I use aln2hit and used my mfa file as the input.
>
>
>     Nikhat
>
>
>     -----Original Message-----
>     From: Sheldon McKay [mailto:sheldon.mckay@...
>     <mailto:sheldon.mckay@...>]
>     Sent: Tue 6/2/2009 3:35 PM
>     To: Zafar, Nikhat
>     Cc: gmod-gbrowse@...
>     <mailto:gmod-gbrowse@...>
>     Subject: Re: [Gmod-gbrowse] gbrowse_syn display question
>
>     Hi Nikhat,
>
>     The alignment database schema, loading format, etc are described here:
>     http://gmod.org/wiki/GBrowse_syn_Database
>
>     Help with other aspects of gbrowse_syn can be reached from here:
>     http://gmod.org/wiki/GBrowse_syn
>
>     Regarding your previous email with the screenshot, I have not seen
>     what you
>     actually loaded into your alignment database but, but your MFA
>     file does not
>     appear to match your display, so I speculate that your loading
>     file was not
>     formatted correctly.
>
>     See below for more details.
>
>     Regards,
>     Sheldon
>
>
>     The clustal2hit.pl script takes clustal format (as describe here:
>     http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format)
>     and
>     produces a file in the format needed by load_alignment_database.pl
>     I made a more generic version of this script, called aln2hit.pl,
>     wherein you
>     can specify other MSA formats.
>
>     In your case the usage would be:
>     perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
>     then
>     perl load_alignment_database.pl dbname hits.txt
>
>     If your database loads correctly, it should look something like this:
>
>     mysql> select * from alignments; select * from map limit 10;
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>     | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>     seq1 |
>     bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>     |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>          |
>     100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
>     |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>          |
>     1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
>     |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>          |
>     100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
>     |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>          |
>     1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
>     |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>          |
>     1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
>     |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>          |
>     1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
>     +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>     6 rows in set (0.00 sec)
>
>     +--------+----------+------+------+------+
>     | map_id | hit_name | src1 | pos1 | pos2 |
>     +--------+----------+------+------+------+
>     |      1 | H000001  | boa1 |  300 |    1 |
>     |      2 | H000001  | boa1 |  400 |    1 |
>     |      3 | H000001  | boa1 |  500 |    1 |
>     |      4 | H000001  | boa1 |  600 |    1 |
>     |      5 | H000001  | boa1 |  700 |    1 |
>     |      6 | H000001  | boa1 |  800 |    1 |
>     |      7 | H000001  | boa1 |  900 |    1 |
>     |      8 | H000001  | boa1 | 1000 |    5 |
>     |      9 | H000001  | boa1 | 1100 |    5 |
>     |     10 | H000001  | boa1 | 1200 |   23 |
>     +--------+----------+------+------+------+
>     10 rows in set (0.00 sec)
>
>
>
>     On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...
>     <mailto:nzafar@...>> wrote:
>
>     > Hi Sheldon,
>     >
>     > Can you please also give me some information on the
>     clustal2hits. What are
>     > the different columns in the resulting hits file. What should be
>     the final
>     > format of the file.
>     >
>     > Nikhat
>     >
>     >
>     >
>     > Sheldon McKay wrote:
>     >
>     >  Hi Chris,
>     >>
>     >> Clustalw is not suitable for the use case you describe.
>      Jason's email
>     >> describes a pipeline similar to the one wormbase uses, except they
>     >> convert their alignment data to clustalw format.  The format is a
>     >> commonly used alignment data format and is merely a
>     convenience.  It
>     >> does not mean the program clustalw should be used to actually
>     make the
>     >> alignments.
>     >>
>     >> There are a variety of structured and ad hoc ways to get there but
>     >> what you need to end up with is a gbrowse_syn database loading file
>     >> that uses this specification:
>     >>
>     http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>     >>
>     >> Sheldon
>     >>
>     >>
>     >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D.
>     <cdtown@... <mailto:cdtown@...>>
>     >> wrote:
>     >>
>     >>
>     >>> Hi
>     >>>
>     >>>
>     >>>
>     >>> We're trying to get gbrowse_syn to display synteny blocks
>     across regions
>     >>> of
>     >>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
>     >>>
>     >>> However, when we feed clustalw a set of ~100-150 kb sequences
>     that are
>     >>> know
>     >>> to be more or less syntenic, we get a single clustalw
>     alignment that
>     >>> basically goes from end to end of each sequence and thus
>     describes and
>     >>> displays it as a single synteny block.
>     >>>
>     >>> What we had been expecting (and hoping) to see was that
>     clustalw would
>     >>> make
>     >>> separate multiple sequence alignments for each conserved
>     region (genes
>     >>> and
>     >>> CNS) and skip over regions where there was little or no
>     conservation
>     >>> (something similar to VISTA).
>     >>>
>     >>>
>     >>>
>     >>> In the example file pecan.aln, there are a number of alignment
>     blocks,
>     >>> presumably generated by a single clustalw run using those 5
>     nematode
>     >>> sequences. I'm wondering if you can tell me exactly what clustalw
>     >>> parameters
>     >>> were used to generate this file. Perhaps we can tweak the
>     parameters for
>     >>> our
>     >>> Brassica data to force clustalw to generate a set of good and
>     separate
>     >>> alignment blocks rather than one wimpy that simply aligns the
>     sequences
>     >>> more
>     >>> or less end-to-end and then flags the conserved bases.
>     >>>
>     >>>
>     >>>
>     >>> Any and all comments welcome.
>     >>>
>     >>>
>     >>>
>     >>> Thanks
>     >>>
>     >>>
>     >>>
>     >>> Chris Town
>     >>>
>     >>> _______________________
>     >>>
>     >>>
>     >>>
>     ------------------------------------------------------------------------------
>     >>> Register Now for Creativity and Technology (CaT), June 3rd,
>     NYC. CaT
>     >>> is a gathering of tech-side developers & brand creativity
>     professionals.
>     >>> Meet
>     >>> the minds behind Google Creative Lab, Visual Complexity,
>     Processing, &
>     >>> iPhoneDevCamp asthey present alongside digital heavyweights like
>     >>> Barbarian
>     >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>     >>> _______________________________________________
>     >>> Gmod-gbrowse mailing list
>     >>> Gmod-gbrowse@...
>     <mailto:Gmod-gbrowse@...>
>     >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>     >>>
>     >>>
>     >>>
>     >>>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >
>     >
>
>
>     --
>     Sheldon McKay, PhD
>     Cold Spring Harbor Laboratory
>     Office/Mobile:  516-367-6998 / 631-651-9728
>
>     Sent from Milford, Connecticut, United States
>
>
>
>
> --
> Sheldon McKay, PhD
> Cold Spring Harbor Laboratory
> Office/Mobile:  516-367-6998 / 631-651-9728
>


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: gbrowse_syn display question

by Nikhat Zafar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sheldon,

In the file attached I can see two type of grid lines purple color and
black color. What do they  represent.

Thanks
Nikhat


Nikhat Zafar wrote:

>Hi Chris,
>
> From my communication with Sheldon what I could understand is that the
>cigar lines i.e the red and green (color can be changed) lines just
>represent the begining and end of the sequence that was used in the
>alignment irrespective of the matches found or not. The grid lines are
>the region of homology.
>
>Sheldon ,
>You can correct me if  my understanding is wrong. Problem is that we are
>expecting the cigar lines to represent the region of homology. So we
>were expecting the cigar lines to break when there is a gap in the
>alignment. If I can change the clustal2hit in a way that alignment is
>stored only when it finds a region of homology, not by the begin and end
>of the seq that will generate the image the way we want it.
>
>
>Nikhat
>
>
>
>
>
>Sheldon McKay wrote:
>
>  
>
>>Hi Nikhat,
>>
>>Actually, I disagree.  The file you sent has only one contiguous
>>alignment; this is the correct behavior for the parser.   The
>>insertions/deletions in the alignment are reflected in the pairwise
>>coordinate maps.  There was a thread a while back that discussed
>>slicing the alignment into chunks but that is optional.
>>
>>Sheldon
>>
>>
>>
>>On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...
>><mailto:nzafar@...>> wrote:
>>
>>
>>     Hi Sheldon,
>>
>>     I tried what you suggested in your previous email. But now the
>>    block is one continous block without any breaks which is not ture
>>    if we look at the alignment.
>>
>>    here is the data in alignment table ;
>>
>>     select * from alignments;
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>    | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>>    seq1 | bin            | src2   | ref2 | start2 | end2   | strand2
>>    | seq2 |
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>    |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>>         |  100000.000000 | boa111 | 85   |      1 | 107914 | +      
>>    |      |
>>    |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>>         | 1000000.000000 | boa1   | 84   |    234 |  85462 | +      
>>    |      |
>>    |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>>         |  100000.000000 | boa11  | 83   |      1 | 134393 | -      
>>    |      |
>>    |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>>         | 1000000.000000 | boa1   | 84   |    234 |  85462 | -      
>>    |      |
>>    |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>>         | 1000000.000000 | boa11  | 83   |      1 | 134393 | -      
>>    |      |
>>    |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>>         | 1000000.000000 | boa111 | 85   |      1 | 107914 | -      
>>    |      |
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>
>>
>>    select * from map where map_id < 20;
>>    +--------+----------+------+------+------+
>>    | map_id | hit_name | src1 | pos1 | pos2 |
>>    +--------+----------+------+------+------+
>>    |      1 | H000001  | boa1 |  300 |    1 |
>>    |      2 | H000001  | boa1 |  400 |    1 |
>>    |      3 | H000001  | boa1 |  500 |    1 |
>>    |      4 | H000001  | boa1 |  600 |    1 |
>>    |      5 | H000001  | boa1 |  700 |    1 |
>>    |      6 | H000001  | boa1 |  800 |    1 |
>>    |      7 | H000001  | boa1 |  900 |    1 |
>>    |      8 | H000001  | boa1 | 1000 |    5 |
>>    |      9 | H000001  | boa1 | 1100 |    5 |
>>    |     10 | H000001  | boa1 | 1200 |   23 |
>>    |     11 | H000001  | boa1 | 1300 |   26 |
>>    |     12 | H000001  | boa1 | 1400 |   38 |
>>    |     13 | H000001  | boa1 | 1500 |   48 |
>>    |     14 | H000001  | boa1 | 1600 |   52 |
>>    |     15 | H000001  | boa1 | 1700 |   61 |
>>    |     16 | H000001  | boa1 | 1800 |   61 |
>>    |     17 | H000001  | boa1 | 1900 |   61 |
>>    |     18 | H000001  | boa1 | 2000 |   86 |
>>    |     19 | H000001  | boa1 | 2100 |   86 |
>>    +--------+----------+------+------+------+
>>    19 rows in set (0.00 sec)
>>
>>
>>    I use aln2hit and used my mfa file as the input.
>>
>>
>>    Nikhat
>>
>>
>>    -----Original Message-----
>>    From: Sheldon McKay [mailto:sheldon.mckay@...
>>    <mailto:sheldon.mckay@...>]
>>    Sent: Tue 6/2/2009 3:35 PM
>>    To: Zafar, Nikhat
>>    Cc: gmod-gbrowse@...
>>    <mailto:gmod-gbrowse@...>
>>    Subject: Re: [Gmod-gbrowse] gbrowse_syn display question
>>
>>    Hi Nikhat,
>>
>>    The alignment database schema, loading format, etc are described here:
>>    http://gmod.org/wiki/GBrowse_syn_Database
>>
>>    Help with other aspects of gbrowse_syn can be reached from here:
>>    http://gmod.org/wiki/GBrowse_syn
>>
>>    Regarding your previous email with the screenshot, I have not seen
>>    what you
>>    actually loaded into your alignment database but, but your MFA
>>    file does not
>>    appear to match your display, so I speculate that your loading
>>    file was not
>>    formatted correctly.
>>
>>    See below for more details.
>>
>>    Regards,
>>    Sheldon
>>
>>
>>    The clustal2hit.pl script takes clustal format (as describe here:
>>    http://gmod.org/wiki/GBrowse_syn_Database#Clustal_alignment_format)
>>    and
>>    produces a file in the format needed by load_alignment_database.pl
>>    I made a more generic version of this script, called aln2hit.pl,
>>    wherein you
>>    can specify other MSA formats.
>>
>>    In your case the usage would be:
>>    perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
>>    then
>>    perl load_alignment_database.pl dbname hits.txt
>>
>>    If your database loads correctly, it should look something like this:
>>
>>    mysql> select * from alignments; select * from map limit 10;
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>    | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>>    seq1 |
>>    bin            | src2   | ref2 | start2 | end2   | strand2 | seq2 |
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>    |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>>         |
>>    100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
>>    |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>>         |
>>    1000000.000000 | boa1   | 84   |    234 |  85462 | +       |      |
>>    |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>>         |
>>    100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
>>    |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>>         |
>>    1000000.000000 | boa1   | 84   |    234 |  85462 | -       |      |
>>    |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>>         |
>>    1000000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
>>    |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>>         |
>>    1000000.000000 | boa111 | 85   |      1 | 107914 | -       |      |
>>    +--------+----------+--------+------+--------+--------+---------+------+----------------+--------+------+--------+--------+---------+------+
>>    6 rows in set (0.00 sec)
>>
>>    +--------+----------+------+------+------+
>>    | map_id | hit_name | src1 | pos1 | pos2 |
>>    +--------+----------+------+------+------+
>>    |      1 | H000001  | boa1 |  300 |    1 |
>>    |      2 | H000001  | boa1 |  400 |    1 |
>>    |      3 | H000001  | boa1 |  500 |    1 |
>>    |      4 | H000001  | boa1 |  600 |    1 |
>>    |      5 | H000001  | boa1 |  700 |    1 |
>>    |      6 | H000001  | boa1 |  800 |    1 |
>>    |      7 | H000001  | boa1 |  900 |    1 |
>>    |      8 | H000001  | boa1 | 1000 |    5 |
>>    |      9 | H000001  | boa1 | 1100 |    5 |
>>    |     10 | H000001  | boa1 | 1200 |   23 |
>>    +--------+----------+------+------+------+
>>    10 rows in set (0.00 sec)
>>
>>
>>
>>    On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...
>>    <mailto:nzafar@...>> wrote:
>>
>>    > Hi Sheldon,
>>    >
>>    > Can you please also give me some information on the
>>    clustal2hits. What are
>>    > the different columns in the resulting hits file. What should be
>>    the final
>>    > format of the file.
>>    >
>>    > Nikhat
>>    >
>>    >
>>    >
>>    > Sheldon McKay wrote:
>>    >
>>    >  Hi Chris,
>>    >>
>>    >> Clustalw is not suitable for the use case you describe.
>>     Jason's email
>>    >> describes a pipeline similar to the one wormbase uses, except they
>>    >> convert their alignment data to clustalw format.  The format is a
>>    >> commonly used alignment data format and is merely a
>>    convenience.  It
>>    >> does not mean the program clustalw should be used to actually
>>    make the
>>    >> alignments.
>>    >>
>>    >> There are a variety of structured and ad hoc ways to get there but
>>    >> what you need to end up with is a gbrowse_syn database loading file
>>    >> that uses this specification:
>>    >>
>>    http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>>    >>
>>    >> Sheldon
>>    >>
>>    >>
>>    >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D.
>>    <cdtown@... <mailto:cdtown@...>>
>>    >> wrote:
>>    >>
>>    >>
>>    >>> Hi
>>    >>>
>>    >>>
>>    >>>
>>    >>> We're trying to get gbrowse_syn to display synteny blocks
>>    across regions
>>    >>> of
>>    >>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
>>    >>>
>>    >>> However, when we feed clustalw a set of ~100-150 kb sequences
>>    that are
>>    >>> know
>>    >>> to be more or less syntenic, we get a single clustalw
>>    alignment that
>>    >>> basically goes from end to end of each sequence and thus
>>    describes and
>>    >>> displays it as a single synteny block.
>>    >>>
>>    >>> What we had been expecting (and hoping) to see was that
>>    clustalw would
>>    >>> make
>>    >>> separate multiple sequence alignments for each conserved
>>    region (genes
>>    >>> and
>>    >>> CNS) and skip over regions where there was little or no
>>    conservation
>>    >>> (something similar to VISTA).
>>    >>>
>>    >>>
>>    >>>
>>    >>> In the example file pecan.aln, there are a number of alignment
>>    blocks,
>>    >>> presumably generated by a single clustalw run using those 5
>>    nematode
>>    >>> sequences. I'm wondering if you can tell me exactly what clustalw
>>    >>> parameters
>>    >>> were used to generate this file. Perhaps we can tweak the
>>    parameters for
>>    >>> our
>>    >>> Brassica data to force clustalw to generate a set of good and
>>    separate
>>    >>> alignment blocks rather than one wimpy that simply aligns the
>>    sequences
>>    >>> more
>>    >>> or less end-to-end and then flags the conserved bases.
>>    >>>
>>    >>>
>>    >>>
>>    >>> Any and all comments welcome.
>>    >>>
>>    >>>
>>    >>>
>>    >>> Thanks
>>    >>>
>>    >>>
>>    >>>
>>    >>> Chris Town
>>    >>>
>>    >>> _______________________
>>    >>>
>>    >>>
>>    >>>
>>    ------------------------------------------------------------------------------
>>    >>> Register Now for Creativity and Technology (CaT), June 3rd,
>>    NYC. CaT
>>    >>> is a gathering of tech-side developers & brand creativity
>>    professionals.
>>    >>> Meet
>>    >>> the minds behind Google Creative Lab, Visual Complexity,
>>    Processing, &
>>    >>> iPhoneDevCamp asthey present alongside digital heavyweights like
>>    >>> Barbarian
>>    >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>>    >>> _______________________________________________
>>    >>> Gmod-gbrowse mailing list
>>    >>> Gmod-gbrowse@...
>>    <mailto:Gmod-gbrowse@...>
>>    >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>    >>>
>>    >>>
>>    >>>
>>    >>>
>>    >>
>>    >>
>>    >>
>>    >>
>>    >>
>>    >
>>    >
>>
>>
>>    --
>>    Sheldon McKay, PhD
>>    Cold Spring Harbor Laboratory
>>    Office/Mobile:  516-367-6998 / 631-651-9728
>>
>>    Sent from Milford, Connecticut, United States
>>
>>
>>
>>
>>--
>>Sheldon McKay, PhD
>>Cold Spring Harbor Laboratory
>>Office/Mobile:  516-367-6998 / 631-651-9728
>>
>>    
>>
>
>
>------------------------------------------------------------------------------
>OpenSolaris 2009.06 is a cutting edge operating system for enterprises
>looking to deploy the next generation of Solaris that includes the latest
>innovations from Sun and the OpenSource community. Download a copy and
>enjoy capabilities such as Networking, Storage and Virtualization.
>Go to: http://p.sf.net/sfu/opensolaris-get
>_______________________________________________
>Gmod-gbrowse mailing list
>Gmod-gbrowse@...
>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>  
>


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

slide_gbrowse_syn.jpg (319K) Download Attachment

Re: gbrowse_syn display question

by Sheldon McKay-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry for the delay in responding to this,  I have been traveling today.

> From my communication with Sheldon what I could understand is that the cigar
> lines i.e the red and green (color can be changed) lines just represent the
> begining and end of the sequence that was used in the alignment irrespective

Yes, I think so but these are not cigar lines, just the beginning and
end of the 'hit' features corresponding to the start and end of the
aligned regions.  cigar lines (strings) are a compact representation
of alignment data that are not currently used in gbrowse_syn but will
be in a future version.  In any case, they have no relationship to the
graphical rendering at all.


> of the matches found or not. The grid lines are the region of homology.

No, the grid lines do not indicate homology, they indicate pair-wise
coordinate maps between the aligned sequences.  The spacing between
the grid lines in the target sequence reflects gapped regions (aka
indels) -- closer together = deletion, farther arart = insertion.
I should point out that gbrowse_syn is for visualizing alignments data
superimposed on genome annotations.  It takes no position on questions
of homology or orthology of aligned sequences.  Ensuring appropriate
alignments are done for homologous and/or orthologous sequneces is
entirely the responsibility of the investigator who produces the
alignment data.

> You can correct me if  my understanding is wrong. Problem is that we are
> expecting the cigar lines to represent the region of homology. So we were
> expecting the cigar lines to break when there is a gap in the alignment. If
> I can change the clustal2hit in a way that alignment is stored only when it
> finds a region of homology, not by the begin and end of the seq that will
> generate the image the way we want it.

I think you may need to revisit the parameters you used in generating
the alignment data.  The alignment as presented is a single gapped
alignments spanning the whole regions.  Gaps are not breaks in the
context of a whole alignmant, rather they are insertions or deletions
introduced to optimize the alignment and can be taken to represent
insertion and deletion events in the evolution of the homologous
sequences as they diverged from one another.    Not knowing the
particulars of your aligment protocol, I think it might do to add a
higher gap extension penaltly or something so that you get a series of
more similar alignment blocks rather than coercing the whole
chromosome into a monolithic alignment block from one end to the
other.

Sheldon

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Parent Message unknown Re: gbrowse_syn display question

by Sheldon McKay-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

HI Nikhat,

The grid-lines represent that the mapping of coordinates between one
sequence and another  in the alignment.  If the are no indels, the
maps will have a more or less 1:1 relationship but indels affect the
relative position.  The purpose of this map is so that gain and loss
of sequence within aligned regions in gbrowse_syn is visually
represnted by the spacing of the lines in the target sequence relative
to the reference sequence.

The lines do not necessarily represent a match, in the sense of two
nucleotide residues being the same at that position.

Sheldon

On Wed, Jun 3, 2009 at 1:55 PM, Nikhat Zafar<nzafar@...> wrote:

> Hi Sheldon,
>
> What do the grid lines represent. We wre thinking they are representing a
> match.
>
> Nikhat
>
> Sheldon McKay wrote:
>
>> Every fifth line is blue to provide a visual reference point.  When  there
>> are a lit of indels, following the blue lines help in  identifying aligned
>> orthologous features.
>>
>> Sent from my iPhone -- sorry for the typos
>>
>> Sheldon McKay, PhD
>> Cold Spring Harbor Laboratory
>>
>>
>> On Jun 3, 2009, at 10:31 AM, Nikhat Zafar <nzafar@...> wrote:
>>
>>> Hi Sheldon,
>>>
>>> In the file attached I can see two type of grid lines purple color  and
>>> black color. What do they  represent.
>>>
>>> Thanks
>>> Nikhat
>>>
>>>
>>> Nikhat Zafar wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> From my communication with Sheldon what I could understand is that  the
>>>> cigar lines i.e the red and green (color can be changed) lines  just
>>>> represent the begining and end of the sequence that was used  in the
>>>> alignment irrespective of the matches found or not. The grid  lines are the
>>>> region of homology.
>>>>
>>>> Sheldon ,
>>>> You can correct me if  my understanding is wrong. Problem is that  we
>>>> are expecting the cigar lines to represent the region of  homology. So we
>>>> were expecting the cigar lines to break when there  is a gap in the
>>>> alignment. If I can change the clustal2hit in a way  that alignment is
>>>> stored only when it finds a region of homology,  not by the begin and end of
>>>> the seq that will generate the image  the way we want it.
>>>>
>>>>
>>>> Nikhat
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sheldon McKay wrote:
>>>>
>>>>
>>>>> Hi Nikhat,
>>>>>
>>>>> Actually, I disagree.  The file you sent has only one contiguous
>>>>>  alignment; this is the correct behavior for the parser.   The
>>>>>  insertions/deletions in the alignment are reflected in the  pairwise
>>>>> coordinate maps.  There was a thread a while back that  discussed slicing
>>>>> the alignment into chunks but that is optional.
>>>>>
>>>>> Sheldon
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 2, 2009 at 5:35 PM, Zafar, Nikhat <nzafar@...
>>>>> <mailto:nzafar@... >> wrote:
>>>>>
>>>>>
>>>>>   Hi Sheldon,
>>>>>
>>>>>   I tried what you suggested in your previous email. But now the
>>>>>  block is one continous block without any breaks which is not ture
>>>>>  if we look at the alignment.
>>>>>
>>>>>  here is the data in alignment table ;
>>>>>
>>>>>   select * from alignments;
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>  | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>>>>>  seq1 | bin            | src2   | ref2 | start2 | end2   | strand2
>>>>>  | seq2 |
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>  |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>>>>>       |  100000.000000 | boa111 | 85   |      1 | 107914 |  +         |
>>>>>      |
>>>>>  |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>>>>>       | 1000000.000000 | boa1   | 84   |    234 |  85462 |  +         |
>>>>>      |
>>>>>  |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>>>>>       |  100000.000000 | boa11  | 83   |      1 | 134393 |  -         |
>>>>>      |
>>>>>  |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>>>>>       | 1000000.000000 | boa1   | 84   |    234 |  85462 |  -         |
>>>>>      |
>>>>>  |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>>>>>       | 1000000.000000 | boa11  | 83   |      1 | 134393 |  -         |
>>>>>      |
>>>>>  |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>>>>>       | 1000000.000000 | boa111 | 85   |      1 | 107914 |  -         |
>>>>>      |
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>
>>>>>
>>>>>  select * from map where map_id < 20;
>>>>>  +--------+----------+------+------+------+
>>>>>  | map_id | hit_name | src1 | pos1 | pos2 |
>>>>>  +--------+----------+------+------+------+
>>>>>  |      1 | H000001  | boa1 |  300 |    1 |
>>>>>  |      2 | H000001  | boa1 |  400 |    1 |
>>>>>  |      3 | H000001  | boa1 |  500 |    1 |
>>>>>  |      4 | H000001  | boa1 |  600 |    1 |
>>>>>  |      5 | H000001  | boa1 |  700 |    1 |
>>>>>  |      6 | H000001  | boa1 |  800 |    1 |
>>>>>  |      7 | H000001  | boa1 |  900 |    1 |
>>>>>  |      8 | H000001  | boa1 | 1000 |    5 |
>>>>>  |      9 | H000001  | boa1 | 1100 |    5 |
>>>>>  |     10 | H000001  | boa1 | 1200 |   23 |
>>>>>  |     11 | H000001  | boa1 | 1300 |   26 |
>>>>>  |     12 | H000001  | boa1 | 1400 |   38 |
>>>>>  |     13 | H000001  | boa1 | 1500 |   48 |
>>>>>  |     14 | H000001  | boa1 | 1600 |   52 |
>>>>>  |     15 | H000001  | boa1 | 1700 |   61 |
>>>>>  |     16 | H000001  | boa1 | 1800 |   61 |
>>>>>  |     17 | H000001  | boa1 | 1900 |   61 |
>>>>>  |     18 | H000001  | boa1 | 2000 |   86 |
>>>>>  |     19 | H000001  | boa1 | 2100 |   86 |
>>>>>  +--------+----------+------+------+------+
>>>>>  19 rows in set (0.00 sec)
>>>>>
>>>>>
>>>>>  I use aln2hit and used my mfa file as the input.
>>>>>
>>>>>
>>>>>  Nikhat
>>>>>
>>>>>
>>>>>  -----Original Message-----
>>>>>  From: Sheldon McKay [mailto:sheldon.mckay@...
>>>>>  <mailto:sheldon.mckay@...>]
>>>>>  Sent: Tue 6/2/2009 3:35 PM
>>>>>  To: Zafar, Nikhat
>>>>>  Cc: gmod-gbrowse@...
>>>>>  <mailto:gmod-gbrowse@...>
>>>>>  Subject: Re: [Gmod-gbrowse] gbrowse_syn display question
>>>>>
>>>>>  Hi Nikhat,
>>>>>
>>>>>  The alignment database schema, loading format, etc are described
>>>>>  here:
>>>>>  http://gmod.org/wiki/GBrowse_syn_Database
>>>>>
>>>>>  Help with other aspects of gbrowse_syn can be reached from here:
>>>>>  http://gmod.org/wiki/GBrowse_syn
>>>>>
>>>>>  Regarding your previous email with the screenshot, I have not seen
>>>>>  what you
>>>>>  actually loaded into your alignment database but, but your MFA
>>>>>  file does not
>>>>>  appear to match your display, so I speculate that your loading
>>>>>  file was not
>>>>>  formatted correctly.
>>>>>
>>>>>  See below for more details.
>>>>>
>>>>>  Regards,
>>>>>  Sheldon
>>>>>
>>>>>
>>>>>  The clustal2hit.pl script takes clustal format (as describe here:
>>>>>  http://gmod.org/wiki/ GBrowse_syn_Database#Clustal_alignment_format)
>>>>>  and
>>>>>  produces a file in the format needed by load_alignment_database.pl
>>>>>  I made a more generic version of this script, called aln2hit.pl,
>>>>>  wherein you
>>>>>  can specify other MSA formats.
>>>>>
>>>>>  In your case the usage would be:
>>>>>  perl aln2hit.pl -i vista_alignment.mfa -f fasta >hits.txt
>>>>>  then
>>>>>  perl load_alignment_database.pl dbname hits.txt
>>>>>
>>>>>  If your database loads correctly, it should look something like  this:
>>>>>
>>>>>  mysql> select * from alignments; select * from map limit 10;
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>  | hit_id | hit_name | src1   | ref1 | start1 | end1   | strand1 |
>>>>>  seq1 |
>>>>>  bin            | src2   | ref2 | start2 | end2   | strand2 |  seq2 |
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>  |      1 | H000001  | boa1   | 84   |    234 |  85462 | +       |
>>>>>       |
>>>>>  100000.000000 | boa111 | 85   |      1 | 107914 | +       |      |
>>>>>  |      2 | H000001r | boa111 | 85   |      1 | 107914 | +       |
>>>>>       |
>>>>>  1000000.000000 | boa1   | 84   |    234 |  85462 | +        |      |
>>>>>  |      3 | H000002  | boa1   | 84   |    234 |  85462 | +       |
>>>>>       |
>>>>>  100000.000000 | boa11  | 83   |      1 | 134393 | -       |      |
>>>>>  |      4 | H000002r | boa11  | 83   |      1 | 134393 | +       |
>>>>>       |
>>>>>  1000000.000000 | boa1   | 84   |    234 |  85462 | -        |      |
>>>>>  |      5 | H000003  | boa111 | 85   |      1 | 107914 | +       |
>>>>>       |
>>>>>  1000000.000000 | boa11  | 83   |      1 | 134393 | -        |      |
>>>>>  |      6 | H000003r | boa11  | 83   |      1 | 134393 | +       |
>>>>>       |
>>>>>  1000000.000000 | boa111 | 85   |      1 | 107914 | -        |      |
>>>>>  +--------+----------+--------+------+--------+--------+---------
>>>>> +------+----------------+--------+------+--------+--------
>>>>> +---------+------+
>>>>>  6 rows in set (0.00 sec)
>>>>>
>>>>>  +--------+----------+------+------+------+
>>>>>  | map_id | hit_name | src1 | pos1 | pos2 |
>>>>>  +--------+----------+------+------+------+
>>>>>  |      1 | H000001  | boa1 |  300 |    1 |
>>>>>  |      2 | H000001  | boa1 |  400 |    1 |
>>>>>  |      3 | H000001  | boa1 |  500 |    1 |
>>>>>  |      4 | H000001  | boa1 |  600 |    1 |
>>>>>  |      5 | H000001  | boa1 |  700 |    1 |
>>>>>  |      6 | H000001  | boa1 |  800 |    1 |
>>>>>  |      7 | H000001  | boa1 |  900 |    1 |
>>>>>  |      8 | H000001  | boa1 | 1000 |    5 |
>>>>>  |      9 | H000001  | boa1 | 1100 |    5 |
>>>>>  |     10 | H000001  | boa1 | 1200 |   23 |
>>>>>  +--------+----------+------+------+------+
>>>>>  10 rows in set (0.00 sec)
>>>>>
>>>>>
>>>>>
>>>>>  On Tue, Jun 2, 2009 at 2:06 PM, Nikhat Zafar <nzafar@...
>>>>>  <mailto:nzafar@...>> wrote:
>>>>>
>>>>>  > Hi Sheldon,
>>>>>  >
>>>>>  > Can you please also give me some information on the
>>>>>  clustal2hits. What are
>>>>>  > the different columns in the resulting hits file. What should be
>>>>>  the final
>>>>>  > format of the file.
>>>>>  >
>>>>>  > Nikhat
>>>>>  >
>>>>>  >
>>>>>  >
>>>>>  > Sheldon McKay wrote:
>>>>>  >
>>>>>  >  Hi Chris,
>>>>>  >>
>>>>>  >> Clustalw is not suitable for the use case you describe.
>>>>>   Jason's email
>>>>>  >> describes a pipeline similar to the one wormbase uses, except  they
>>>>>  >> convert their alignment data to clustalw format.  The format  is a
>>>>>  >> commonly used alignment data format and is merely a
>>>>>  convenience.  It
>>>>>  >> does not mean the program clustalw should be used to actually
>>>>>  make the
>>>>>  >> alignments.
>>>>>  >>
>>>>>  >> There are a variety of structured and ad hoc ways to get  there but
>>>>>  >> what you need to end up with is a gbrowse_syn database  loading
>>>>> file
>>>>>  >> that uses this specification:
>>>>>  >>
>>>>>
>>>>>  http://gmod.org/wiki/GBrowse_syn_Database#alignment_data_loading_format
>>>>>  >>
>>>>>  >> Sheldon
>>>>>  >>
>>>>>  >>
>>>>>  >> On Thu, May 21, 2009 at 2:29 PM, Town, Christopher D.
>>>>>  <cdtown@... <mailto:cdtown@...>>
>>>>>  >> wrote:
>>>>>  >>
>>>>>  >>
>>>>>  >>> Hi
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> We're trying to get gbrowse_syn to display synteny blocks
>>>>>  across regions
>>>>>  >>> of
>>>>>  >>> Brassica rapa, Brassica oleracea and some Arabidopsis species.
>>>>>  >>>
>>>>>  >>> However, when we feed clustalw a set of ~100-150 kb sequences
>>>>>  that are
>>>>>  >>> know
>>>>>  >>> to be more or less syntenic, we get a single clustalw
>>>>>  alignment that
>>>>>  >>> basically goes from end to end of each sequence and thus
>>>>>  describes and
>>>>>  >>> displays it as a single synteny block.
>>>>>  >>>
>>>>>  >>> What we had been expecting (and hoping) to see was that
>>>>>  clustalw would
>>>>>  >>> make
>>>>>  >>> separate multiple sequence alignments for each conserved
>>>>>  region (genes
>>>>>  >>> and
>>>>>  >>> CNS) and skip over regions where there was little or no
>>>>>  conservation
>>>>>  >>> (something similar to VISTA).
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> In the example file pecan.aln, there are a number of alignment
>>>>>  blocks,
>>>>>  >>> presumably generated by a single clustalw run using those 5
>>>>>  nematode
>>>>>  >>> sequences. I'm wondering if you can tell me exactly what  clustalw
>>>>>  >>> parameters
>>>>>  >>> were used to generate this file. Perhaps we can tweak the
>>>>>  parameters for
>>>>>  >>> our
>>>>>  >>> Brassica data to force clustalw to generate a set of good and
>>>>>  separate
>>>>>  >>> alignment blocks rather than one wimpy that simply aligns the
>>>>>  sequences
>>>>>  >>> more
>>>>>  >>> or less end-to-end and then flags the conserved bases.
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> Any and all comments welcome.
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> Thanks
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> Chris Town
>>>>>  >>>
>>>>>  >>> _______________________
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>   --- --- --- ---
>>>>> ------------------------------------------------------------------
>>>>>  >>> Register Now for Creativity and Technology (CaT), June 3rd,
>>>>>  NYC. CaT
>>>>>  >>> is a gathering of tech-side developers & brand creativity
>>>>>  professionals.
>>>>>  >>> Meet
>>>>>  >>> the minds behind Google Creative Lab, Visual Complexity,
>>>>>  Processing, &
>>>>>  >>> iPhoneDevCamp asthey present alongside digital heavyweights  like
>>>>>  >>> Barbarian
>>>>>  >>> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
>>>>>  >>> _______________________________________________
>>>>>  >>> Gmod-gbrowse mailing list
>>>>>  >>> Gmod-gbrowse@...
>>>>>  <mailto:Gmod-gbrowse@...>
>>>>>  >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>>
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >
>>>>>  >
>>>>>
>>>>>
>>>>>  --
>>>>>  Sheldon McKay, PhD
>>>>>  Cold Spring Harbor Laboratory
>>>>>  Office/Mobile:  516-367-6998 / 631-651-9728
>>>>>
>>>>>  Sent from Milford, Connecticut, United States
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sheldon McKay, PhD
>>>>> Cold Spring Harbor Laboratory
>>>>> Office/Mobile:  516-367-6998 / 631-651-9728
>>>>>
>>>>>
>>>>
>>>>
>>>> --- --- ---
>>>> ---------------------------------------------------------------------
>>>> OpenSolaris 2009.06 is a cutting edge operating system for  enterprises
>>>> looking to deploy the next generation of Solaris that  includes the latest
>>>> innovations from Sun and the OpenSource  community. Download a copy and
>>>> enjoy capabilities such as  Networking, Storage and Virtualization. Go to:
>>>> http://p.sf.net/sfu/opensolaris-get
>>>> _______________________________________________
>>>> Gmod-gbrowse mailing list
>>>> Gmod-gbrowse@...
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>
>>> <slide_gbrowse_syn.jpg>
>>
>
>



--
Sheldon McKay, PhD
Cold Spring Harbor Laboratory
Office/Mobile:  516-367-6998 / 631-651-9728

Sent from Washington, District of Columbia, United States

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
< Prev | 1 - 2 | Next >