display errors when insert into reference for SAM data

View: New views
7 Messages — Rating Filter:   Alert me  

display errors when insert into reference for SAM data

by Nicole Washington :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I found some errors for display of short read data (SAM format) in  
gbrowse.  The display is fine in the samtools tviewer, but is  
erroneous in gbrowse.  I believe the display error is due to inserts  
into the reference sequence, which are displayed correctly in tview  
(with an *), but not displayed in gbrowse, and therefore causes many  
of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome  
displaying the same SAM file, in either the tview or gbrowse.









The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

gbrowse.png (425K) Download Attachment
tview.png (90K) Download Attachment

Re: display errors when insert into reference for SAM data

by Lincoln Stein-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Could you send me the CIGAR string for the ones that are displaying incorrectly? Gaps are supposed to be handled correctly (with a dash).

Lincoln

On Fri, Nov 6, 2009 at 4:12 PM, Nicole Washington <nlwashington@...> wrote:
Hello,

I found some errors for display of short read data (SAM format) in gbrowse.  The display is fine in the samtools tviewer, but is erroneous in gbrowse.  I believe the display error is due to inserts into the reference sequence, which are displayed correctly in tview (with an *), but not displayed in gbrowse, and therefore causes many of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome displaying the same SAM file, in either the tview or gbrowse.







The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: display errors when insert into reference for SAM data

by Nicole Washington :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

the cigar of a few of the reads (id and cigar) in the screenshot:

E7CXM7J01EW30F 21M1I71M
E7CXM7J02GJ9HZ 42M1D49M
E7CXM7J01CJ58D 7M1D61M1D53M

I think the biggest problem is when there's an insert into the reference by any read in the display, it affects the drawing of all the reads.  

Nicole



On Nov 6, 2009, at 1:43 PM, Lincoln Stein wrote:

Could you send me the CIGAR string for the ones that are displaying incorrectly? Gaps are supposed to be handled correctly (with a dash).

Lincoln

On Fri, Nov 6, 2009 at 4:12 PM, Nicole Washington <nlwashington@...> wrote:
Hello,

I found some errors for display of short read data (SAM format) in gbrowse.  The display is fine in the samtools tviewer, but is erroneous in gbrowse.  I believe the display error is due to inserts into the reference sequence, which are displayed correctly in tview (with an *), but not displayed in gbrowse, and therefore causes many of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome displaying the same SAM file, in either the tview or gbrowse.







The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: display errors when insert into reference for SAM data

by Lincoln Stein-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nicole,

In the gbrowse database definition, try adding "-split_splices 1" to the db_args. if this doesn't fix the problem, I'll have to go bug hunting.

Lincoln

On Fri, Nov 6, 2009 at 4:52 PM, Nicole Washington <nlwashington@...> wrote:
the cigar of a few of the reads (id and cigar) in the screenshot:

E7CXM7J01EW30F 21M1I71M
E7CXM7J02GJ9HZ 42M1D49M
E7CXM7J01CJ58D 7M1D61M1D53M

I think the biggest problem is when there's an insert into the reference by any read in the display, it affects the drawing of all the reads.  

Nicole



On Nov 6, 2009, at 1:43 PM, Lincoln Stein wrote:

Could you send me the CIGAR string for the ones that are displaying incorrectly? Gaps are supposed to be handled correctly (with a dash).

Lincoln

On Fri, Nov 6, 2009 at 4:12 PM, Nicole Washington <nlwashington@...> wrote:
Hello,

I found some errors for display of short read data (SAM format) in gbrowse.  The display is fine in the samtools tviewer, but is erroneous in gbrowse.  I believe the display error is due to inserts into the reference sequence, which are displayed correctly in tview (with an *), but not displayed in gbrowse, and therefore causes many of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome displaying the same SAM file, in either the tview or gbrowse.







The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: display errors when insert into reference for SAM data

by Nicole Washington :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

the view doesn't look any different.  sorry :(

here's the gbrowse stanza, for your reference:
http://submit.modencode.org/submit/public/get_gbrowse_stanzas/2329

nicole


On Nov 6, 2009, at 2:08 PM, Lincoln Stein wrote:

Hi Nicole,

In the gbrowse database definition, try adding "-split_splices 1" to the db_args. if this doesn't fix the problem, I'll have to go bug hunting.

Lincoln

On Fri, Nov 6, 2009 at 4:52 PM, Nicole Washington <nlwashington@...> wrote:
the cigar of a few of the reads (id and cigar) in the screenshot:

E7CXM7J01EW30F 21M1I71M
E7CXM7J02GJ9HZ 42M1D49M
E7CXM7J01CJ58D 7M1D61M1D53M

I think the biggest problem is when there's an insert into the reference by any read in the display, it affects the drawing of all the reads.  

Nicole



On Nov 6, 2009, at 1:43 PM, Lincoln Stein wrote:

Could you send me the CIGAR string for the ones that are displaying incorrectly? Gaps are supposed to be handled correctly (with a dash).

Lincoln

On Fri, Nov 6, 2009 at 4:12 PM, Nicole Washington <nlwashington@...> wrote:
Hello,

I found some errors for display of short read data (SAM format) in gbrowse.  The display is fine in the samtools tviewer, but is erroneous in gbrowse.  I believe the display error is due to inserts into the reference sequence, which are displayed correctly in tview (with an *), but not displayed in gbrowse, and therefore causes many of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome displaying the same SAM file, in either the tview or gbrowse.







The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Parent Message unknown Re: display errors when insert into reference for SAM data

by Scott Perry-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I also experienced the issue where any insertion/deletion in a read
was resulting in a huge number of base mismatches. I was able to
correct this by add 'realign = 1' to the read track configuration. See
http://search.cpan.org/~lds/Bio-Graphics-1.982/lib/Bio/Graphics/Glyph/segments.pm
for details.

-Scott

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: display errors when insert into reference for SAM data

by Lincoln Stein-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nicole, Scott,

realign=1 will paper over the problem. It is intended for cases when the CIGAR line is wrong and turns on a local smith-waterman aligner. Assuming the CIGAR line is correct, I will have to find and fix this bug.

Lincoln

On Fri, Nov 6, 2009 at 5:28 PM, Nicole Washington <nlwashington@...> wrote:
the view doesn't look any different.  sorry :(

here's the gbrowse stanza, for your reference:
http://submit.modencode.org/submit/public/get_gbrowse_stanzas/2329

nicole


On Nov 6, 2009, at 2:08 PM, Lincoln Stein wrote:

Hi Nicole,

In the gbrowse database definition, try adding "-split_splices 1" to the db_args. if this doesn't fix the problem, I'll have to go bug hunting.

Lincoln

On Fri, Nov 6, 2009 at 4:52 PM, Nicole Washington <nlwashington@...> wrote:
the cigar of a few of the reads (id and cigar) in the screenshot:

E7CXM7J01EW30F 21M1I71M
E7CXM7J02GJ9HZ 42M1D49M
E7CXM7J01CJ58D 7M1D61M1D53M

I think the biggest problem is when there's an insert into the reference by any read in the display, it affects the drawing of all the reads.  

Nicole



On Nov 6, 2009, at 1:43 PM, Lincoln Stein wrote:

Could you send me the CIGAR string for the ones that are displaying incorrectly? Gaps are supposed to be handled correctly (with a dash).

Lincoln

On Fri, Nov 6, 2009 at 4:12 PM, Nicole Washington <nlwashington@...> wrote:
Hello,

I found some errors for display of short read data (SAM format) in gbrowse.  The display is fine in the samtools tviewer, but is erroneous in gbrowse.  I believe the display error is due to inserts into the reference sequence, which are displayed correctly in tview (with an *), but not displayed in gbrowse, and therefore causes many of the bases to show up as mismatched.

I have attached screen shots of the same region of the genome displaying the same SAM file, in either the tview or gbrowse.







The data file is available at:

http://submit.modencode.org/submit/public/get_file/2329/extracted/L3.ws190.sam

(these alignments are on WS190).

Thanks,

Nicole



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa@...>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse