Assert in xcb_io.c:542

View: New views
7 Messages — Rating Filter:   Alert me  

Assert in xcb_io.c:542

by Graeme Gill :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
        a user of one of my applications is reporting an assert
from xcb:

% dispcal -v -q l -y l test
dispcal: ../../src/xcb_io.c:542: _XRead: Assertion
`dpy->xcb->reply_data != ((void *)0)' failed.
zsh: abort (core dumped)  dispcal -v -q l -y l test

This application has been running fine for a long time
with the non xcb based client libraries.

Is this likely to be a bug in xcb, or some sort of
latent bug in corresponding application X11 call
that the original X11 client libraries didn't trip
up on ? What further information is needed to track
this down ?

thanks,
        Graeme Gill.
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Jamey Sharp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 3, 2009 at 9:08 PM, Graeme Gill <graeme2@...> wrote:
> % dispcal -v -q l -y l test
> dispcal: ../../src/xcb_io.c:542: _XRead: Assertion
> `dpy->xcb->reply_data != ((void *)0)' failed.
> zsh: abort (core dumped)  dispcal -v -q l -y l test

This is an assert in Xlib, and is another case of new libX11 checking
for caller errors that old libX11 didn't explicitly check.

When _XReply is called, Xlib knows (by looking at the length field)
how much data is in the reply. XCB-based libX11 verifies that the
caller doesn't try to read more out of the reply than the server
claimed to provide. That's the assertion you're failing here.

As far as I know, old libX11 should have started producing wrong
answers after that point, unless the buggy client code was matched by
buggy server code, or the client exits immediately after getting that
reply.

The first important thing to do, if that description didn't already
point you in the right direction, is to get a stack trace, with
debugging symbols for libX11 and for whatever code is calling it when
the assertion fires.

Bonus points for also getting a network trace to see what protocol
went over the wire there; I like using wireshark for that. A network
trace from a successful run (with a non-XCB-based libX11, perhaps)
might also help.

If there turns out to be an X server bug here, it'd also be good to
know what server version is running, such as from xdpyinfo.

Jamey
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Graeme Gill :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jamey Sharp wrote:

> On Tue, Nov 3, 2009 at 9:08 PM, Graeme Gill <graeme2@...> wrote:
>> % dispcal -v -q l -y l test
>> dispcal: ../../src/xcb_io.c:542: _XRead: Assertion
>> `dpy->xcb->reply_data != ((void *)0)' failed.
>> zsh: abort (core dumped)  dispcal -v -q l -y l test
>
> This is an assert in Xlib, and is another case of new libX11 checking
> for caller errors that old libX11 didn't explicitly check.
>
> When _XReply is called, Xlib knows (by looking at the length field)
> how much data is in the reply. XCB-based libX11 verifies that the
> caller doesn't try to read more out of the reply than the server
> claimed to provide. That's the assertion you're failing here.

Hi,
        thanks for responding. I suspect this assert is happening in
response to a call to XRRGetCrtcGamma(display, crtc)) under the
circumstance that the X server says it implements XRandR V 1.2,
but the underlying graphics card driver does not.
I think that what happens is this triggers an X protocol error,
which I catch though an error handler (XSetErrorHandler()) that set
a (global) flag and returns 0. The call to XRRGetCrtcGamma() is
then able to return, and my code can continue by switching to an older
extension to access the VideoLUTs.
(The underlying problem was raised a while ago on the Xorg list -
see <http://lists.freedesktop.org/archives/xorg/2008-July/036797.html>)

Does this scenario sound like it would trigger the assert ?

> As far as I know, old libX11 should have started producing wrong
> answers after that point, unless the buggy client code was matched by
> buggy server code, or the client exits immediately after getting that
> reply.

> The first important thing to do, if that description didn't already
> point you in the right direction, is to get a stack trace, with
> debugging symbols for libX11 and for whatever code is calling it when
> the assertion fires.

Not easy, since I'm not in a position to reproduce it, and the user
who reported doesn't seem to be hugely technical.

> Bonus points for also getting a network trace to see what protocol
> went over the wire there; I like using wireshark for that. A network
> trace from a successful run (with a non-XCB-based libX11, perhaps)
> might also help.

Hmm - it seems a bit cryptic to use. While it was easy enough to
capture remote server traffic, the necessary options to capture a local
server and client escape me, and Google and the wireshark manual/wiki
give no hints (ie. How do you capture DISPLAY=:0.0 traffic ?). Setting
wireshark to capture "local" (127.0.0.1) or "any" devices captured nothing.
The remote X11 server doesn't support XRandR so it's no help.

> If there turns out to be an X server bug here, it'd also be good to
> know what server version is running, such as from xdpyinfo.

> name of display:    :0.0
> version number:    11.0
> vendor string:    The X.Org Foundation
> vendor release number:    10604000
> X.Org version: 1.6.4
> maximum request size:  16777212 bytes
> motion buffer size:  256
> bitmap unit, bit order, padding:    32, LSBFirst, 32
> image byte order:    LSBFirst
> number of supported pixmap formats:    7
> supported pixmap formats:
>     depth 1, bits_per_pixel 1, scanline_pad 32
>     depth 4, bits_per_pixel 8, scanline_pad 32
>     depth 8, bits_per_pixel 8, scanline_pad 32
>     depth 15, bits_per_pixel 16, scanline_pad 32
>     depth 16, bits_per_pixel 16, scanline_pad 32
>     depth 24, bits_per_pixel 32, scanline_pad 32
>     depth 32, bits_per_pixel 32, scanline_pad 32
> keycode range:    minimum 8, maximum 255
> focus:  window 0x2e00095, revert to PointerRoot
> number of extensions:    27
>     BIG-REQUESTS
>     Composite
>     DAMAGE
>     DOUBLE-BUFFER
>     DPMS
>     DRI2
>     GLX
>     Generic Event Extension
>     MIT-SCREEN-SAVER
>     MIT-SHM
>     RANDR
>     RECORD
>     RENDER
>     SECURITY
>     SGI-GLX
>     SHAPE
>     SYNC
>     X-Resource
>     XC-MISC
>     XFIXES
>     XFree86-DGA
>     XFree86-VidModeExtension
>     XINERAMA
>     XInputExtension
>     XKEYBOARD
>     XTEST
>     XVideo
> default screen number:    0
> number of screens:    1
>
> screen #0:
>   dimensions:    1440x900 pixels (381x238 millimeters)
>   resolution:    96x96 dots per inch
>   depths (7):    24, 1, 4, 8, 15, 16, 32
>   root window id:    0x119
>   depth of root window:    24 planes
>   number of colormaps:    minimum 1, maximum 1
>   default colormap:    0x20
>   default number of colormap cells:    256
>   preallocated pixels:    black 0, white 16777215
>   options:    backing-store NO, save-unders NO
>   largest cursor:    64x64
>   current input event mask:    0x7a803f
>     KeyPressMask             KeyReleaseMask           ButtonPressMask          
>     ButtonReleaseMask        EnterWindowMask          LeaveWindowMask          
>     ExposureMask             StructureNotifyMask      SubstructureNotifyMask  
>     SubstructureRedirectMask FocusChangeMask          PropertyChangeMask      
>   number of visuals:    72
>   default visual id:  0x21

Graeme Gill.
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Jamey Sharp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 5:29 AM, Graeme Gill <graeme2@...> wrote:

> Jamey Sharp wrote:
>> When _XReply is called, Xlib knows (by looking at the length field)
>> how much data is in the reply. XCB-based libX11 verifies that the
>> caller doesn't try to read more out of the reply than the server
>> claimed to provide. That's the assertion you're failing here.
>
>        thanks for responding. I suspect this assert is happening in
> response to a call to XRRGetCrtcGamma(display, crtc)) under the
> circumstance that the X server says it implements XRandR V 1.2,
> but the underlying graphics card driver does not.
> I think that what happens is this triggers an X protocol error,
> which I catch though an error handler (XSetErrorHandler()) that set
> a (global) flag and returns 0. The call to XRRGetCrtcGamma() is
> then able to return, and my code can continue by switching to an older
> extension to access the VideoLUTs.
> (The underlying problem was raised a while ago on the Xorg list -
> see <http://lists.freedesktop.org/archives/xorg/2008-July/036797.html>)
>
> Does this scenario sound like it would trigger the assert ?

I was thinking "no, that's crazy" ;-) but it looks to me like
XRRGetCrtcGamma is really buggy in its handling of X errors. If the
response is an X error, all the code does is set a flag that it then
ignores; it reads the resource ID field of the error as "length", and
the minor opcode as "size", and tries to read more list contents. That
last would definitely trigger this assert.

Keithp, your commit message for those crtc bits said "More testing
seems indicated," which is apparently true. :-) Would you fix this
please?

Graeme, I'm obligated to point out that if you use XCB for this
request (see http://xcb.freedesktop.org/MixingCalls/) then you get a
substantially better error-handling interface, as well as avoiding
Keith's buggy code. ;-) You could throw away your global flag and
error-handling callback. That would tie you to versions of Xlib built
on XCB, but I have trouble being sad about that. ;-)

>> Bonus points for also getting a network trace to see what protocol
>> went over the wire there; I like using wireshark for that. A network
>> trace from a successful run (with a non-XCB-based libX11, perhaps)
>> might also help.
>
> Hmm - it seems a bit cryptic to use. While it was easy enough to
> capture remote server traffic, the necessary options to capture a local
> server and client escape me, and Google and the wireshark manual/wiki
> give no hints (ie. How do you capture DISPLAY=:0.0 traffic ?). Setting
> wireshark to capture "local" (127.0.0.1) or "any" devices captured nothing.
> The remote X11 server doesn't support XRandR so it's no help.

If you set DISPLAY=localhost:0 then the client will use TCP on the
loopback interface and wireshark can capture that. Of course your X
server can't be running with "-nolisten tcp", which is default in many
installations. I wish wireshark could capture from Unix domain
sockets... Anyway, we probably don't need a network trace after all.

Jamey
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Graeme Gill-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jamey Sharp wrote:
> I was thinking "no, that's crazy" ;-) but it looks to me like
> XRRGetCrtcGamma is really buggy in its handling of X errors. If the
> response is an X error, all the code does is set a flag that it then
> ignores; it reads the resource ID field of the error as "length", and
> the minor opcode as "size", and tries to read more list contents. That
> last would definitely trigger this assert.

OK, so there is an explanation for the assert. It turns out that
my guess at the particular trigger was incorrect, although I would expect
it is still an X error that gets generated and causes the assert.

Apparently what happens is the user hotplugs a new display but doesn't
configure it:

1. Start with only laptop panel connected.  dispwin works.
2. Plug in VGA cable.  dispwin fails.
3. xrandr --output VGA1 --right-of LVDS1 --auto.  dispwin works again.

(I'm not clear on what is exactly meant by "dispwin fails", although I
suspect it means the xcb assert gets triggered.)

I'm also not clear on whether the above is expected XRandR behavior
or not (my Intel display is built into the motherboard and doesn't
allow for an external display, so I don't think I can test it myself.)

> Graeme, I'm obligated to point out that if you use XCB for this
> request (see http://xcb.freedesktop.org/MixingCalls/) then you get a
> substantially better error-handling interface, as well as avoiding
> Keith's buggy code. ;-) You could throw away your global flag and
> error-handling callback. That would tie you to versions of Xlib built
> on XCB, but I have trouble being sad about that. ;-)

A better error handler is welcome, but I'm not sure If I'm going
to re-write using xcb just yet though, sorry :-)

> If you set DISPLAY=localhost:0 then the client will use TCP on the
> loopback interface and wireshark can capture that. Of course your X
> server can't be running with "-nolisten tcp", which is default in many
> installations. I wish wireshark could capture from Unix domain
> sockets... Anyway, we probably don't need a network trace after all.

OK, thanks to your hint I got it working to the localhost and got a capture
of normally working situation. Is it worth asking the user who reported
the problem to do the same ?

cheers,
        Graeme Gill.
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Jamey Sharp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 6:55 PM, Graeme Gill <graeme@...> wrote:
> Apparently what happens is the user hotplugs a new display but doesn't
> configure it:
>
> 1. Start with only laptop panel connected.  dispwin works.
> 2. Plug in VGA cable.  dispwin fails.
> 3. xrandr --output VGA1 --right-of LVDS1 --auto.  dispwin works again.
>
> (I'm not clear on what is exactly meant by "dispwin fails", although I
> suspect it means the xcb assert gets triggered.)

The single most important piece of information we could get is a stack
trace. We'd need to know what called _XRead when the assertion fired.

> OK, thanks to your hint I got it working to the localhost and got a capture
> of normally working situation. Is it worth asking the user who reported
> the problem to do the same ?

If for some reason it's easier to get a network trace than a stack
trace, then sure, it would help. But a stack trace is probably all
that's really necessary for this bug.

> A better error handler is welcome, but I'm not sure If I'm going
> to re-write using xcb just yet though, sorry :-)

Just to be clear: You don't have to re-write all your code using XCB.
You can replace just one Xlib call with a call into XCB, if you want.
But I didn't really expect you to go for it. :-)

Jamey
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb

Re: Assert in xcb_io.c:542

by Jamey Sharp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Graeme,

In case you didn't see the thread, ajax believes he's identified the
root cause of this problem with dispwin. He's fixed it in X.org git, and
I reviewed his fix; and I guess he's provided updated packages for
Fedora 12. See
        https://bugzilla.redhat.com/show_bug.cgi?id=498931

I'm hoping you can confirm with your bug-reporter that this fixes the
problem.

Jamey
_______________________________________________
Xcb mailing list
Xcb@...
http://lists.freedesktop.org/mailman/listinfo/xcb