|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
GCJ 3.4.3 and 3.3 classloading problemHello everyone:
We use and maintain our own "exotic" ports of gcj 3.3 and 3.4.3 for arm-wince-pe (Windows CE 5). For some years we have been struggling with a classloading problem that has recently become chronic. As a disclaimer, I first want to say that we have carefully thought through which version of gcj we should port and use. We went with 3.3 and more recently 3.4.3 for primarily one reason: libgcj in the 4 series is prohibitively large for the embedded applications we have to field. So we are currently using 3.4.3 and binutils-2.17.50, and we are sticking with that. Please try not to dismiss this inquiry because we're using an "older version". The application exhibiting the classloading trouble is a port of the Jetty application server to arm-wince-pe-gcj. Jetty and most of the infrastructure it uses (e.g. Spring, Velocity) are gcj-compiled (to .o) and statically linked, but the webapp application code consists .class files that are classloaded by gcj from a .war file by Jetty. SOME of the time everything works, and there is no issue. But sometimes segmentation faults (C0000005) and alignment faults occur during classloading. The faults usually occur somewhere in string processing, typically in java::lang::String::getChars. Oddly, the occurrence of these faults seems to depend on the particular details of a given Jetty image link and the sizes of the linked object files. For example, a 2 line change to one of the statically linked modules can make a "working version" of our Jetty cease to work once the modified Java sources are recompiled/relinked. I have never seen anything like this, and have not seen anything so baffling in quite a while. I am not even sure where to start. Otherwise, our port of gcj 3.4.3 to arm-wince-pe is working perfectly. PERFECTLY. We are in a position where I have to fix this at any cost. If I have to climb Mount Everest, I will do that. Any ideas on where to start or what we could look at would be greatly appreciated. Thanks in advance! Craig Vanderborgh Voxware Incorporated |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problem>>>>> "Craig" == Craig Vanderborgh <craigvanderborgh@...> writes:
Craig> Please try not to dismiss this inquiry because we're using an Craig> "older version". Ok. Craig> The application exhibiting the classloading trouble is a port of the Craig> Jetty application server to arm-wince-pe-gcj. Jetty and most of the Craig> infrastructure it uses (e.g. Spring, Velocity) are gcj-compiled (to Craig> .o) and statically linked, but the webapp application code consists Craig> .class files that are classloaded by gcj from a .war file by Jetty. Nice. Craig> The faults usually occur somewhere in string processing, Craig> typically in java::lang::String::getChars. I don't recall seeing any problems like this. Of course, since these releases were so long ago, I wouldn't really expect to remember... Did you search bugzilla for closed bugs along these lines? That might yield something. The String thing is interesting. We've had various bugs involving String.intern and also java.lang.ref that might cause inappropriate collection. I don't have a theory that covers why changing the executable helps. Craig> We are in a position where I have to fix this at any cost. If I have Craig> to climb Mount Everest, I will do that. Any ideas on where to start Craig> or what we could look at would be greatly appreciated. This is a tough sort of problem. I hesitate to suggest any approaches without knowing what you've tried. (Also my libgcj debugging expertise is not completely fresh...) Tom |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemTom Tromey wrote:
>>>>>> "Craig" == Craig Vanderborgh <craigvanderborgh@...> writes: > > Craig> Please try not to dismiss this inquiry because we're using an > Craig> "older version". > > Ok. > > Craig> The application exhibiting the classloading trouble is a port of the > Craig> Jetty application server to arm-wince-pe-gcj. Jetty and most of the > Craig> infrastructure it uses (e.g. Spring, Velocity) are gcj-compiled (to > Craig> .o) and statically linked, but the webapp application code consists > Craig> .class files that are classloaded by gcj from a .war file by Jetty. > > Nice. > > Craig> The faults usually occur somewhere in string processing, > Craig> typically in java::lang::String::getChars. > > I don't recall seeing any problems like this. Of course, since these > releases were so long ago, I wouldn't really expect to remember... > > Did you search bugzilla for closed bugs along these lines? > That might yield something. > > The String thing is interesting. We've had various bugs involving > String.intern and also java.lang.ref that might cause inappropriate > collection. > > I don't have a theory that covers why changing the executable helps. > > Craig> We are in a position where I have to fix this at any cost. If I have > Craig> to climb Mount Everest, I will do that. Any ideas on where to start > Craig> or what we could look at would be greatly appreciated. > > This is a tough sort of problem. I hesitate to suggest any approaches > without knowing what you've tried. (Also my libgcj debugging expertise > is not completely fresh...) Getting a test case that can run under GDB is invaluable. Also the files dumped out from gnu.gcj.util.GCInfo may be of use. It basically gives you a 'core file' of all live objects. You can use it to manually follow all the references in the system. We added it to the trunk after the versions you are using, but it was originally developed on 3.3 and 3.4 and shouldn't be difficult to backport again. David Daney |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemOn Tue, Sep 8, 2009 at 10:50 PM, Tom Tromey <tromey@...> wrote:
>>>>>> "Craig" == Craig Vanderborgh <craigvanderborgh@...> writes: > > > Craig> The faults usually occur somewhere in string processing, > Craig> typically in java::lang::String::getChars. > > I don't recall seeing any problems like this. Of course, since these > releases were so long ago, I wouldn't really expect to remember... > > Did you search bugzilla for closed bugs along these lines? > That might yield something. > > The String thing is interesting. We've had various bugs involving > String.intern and also java.lang.ref that might cause inappropriate > collection. > Until you suggested that GC might be involved I did not really think about that possibility. Well, further testing reveals that it is. The crash happens not when GC is invoked generally, but the first time that GC is invoked AND the heap is expanded during GC. And this is absolutely consistent. Is this what "inappropriate collection" could look like? Is it possible that the classloaded objects are different in some (incorrect) way, such that objects or parts of objects might be getting garbage collected when they are still needed? Is this what you're suggesting? If so, any suggestions on how I could constructively proceed from here? Thanks again, guys! Craig Vanderborgh > I don't have a theory that covers why changing the executable helps. > > Craig> We are in a position where I have to fix this at any cost. If I have > Craig> to climb Mount Everest, I will do that. Any ideas on where to start > Craig> or what we could look at would be greatly appreciated. > > This is a tough sort of problem. I hesitate to suggest any approaches > without knowing what you've tried. (Also my libgcj debugging expertise > is not completely fresh...) > > Tom > |
|
|
RE: GCJ 3.4.3 and 3.3 classloading problem> -----Original Message----- > From: java-owner@... [mailto:java-owner@...] > On Behalf Of Craig Vanderborgh > Sent: Wednesday, September 09, 2009 8:53 PM > To: java@... > Subject: Re: GCJ 3.4.3 and 3.3 classloading problem > > On Tue, Sep 8, 2009 at 10:50 PM, Tom Tromey <tromey@...> wrote: > >>>>>> "Craig" == Craig Vanderborgh > <craigvanderborgh@...> writes: > > > > > > Craig> The faults usually occur somewhere in string processing, > > Craig> typically in java::lang::String::getChars. > > > > I don't recall seeing any problems like this. Of course, > since these > > releases were so long ago, I wouldn't really expect to remember... > > > > Did you search bugzilla for closed bugs along these lines? > > That might yield something. > > > > The String thing is interesting. We've had various bugs involving > > String.intern and also java.lang.ref that might cause inappropriate > > collection. > > > > Until you suggested that GC might be involved I did not > really think about that possibility. Well, further testing > reveals that it is. > The crash happens not when GC is invoked generally, but the > first time that GC is invoked AND the heap is expanded during > GC. And this is absolutely consistent. Is this what > "inappropriate collection" could look like? Is it possible > that the classloaded objects are different in some > (incorrect) way, such that objects or parts of objects might > be getting garbage collected when they are still needed? Is > this what you're suggesting? > > If so, any suggestions on how I could constructively proceed > from here? > What happens if you set the GC_IGNORE_GCJ_INFO environment variable set? Can you run far enough with GC_DONT_GC? In the end, this may require brute force debugging. Find the object that was corrupted/collected early, and then follow the chain of objects from a root checking which ones are marked, so that you can identify where a link wasn't followed correctly. This is unfortunately much easier if you can get the process to loop at the point of failure, so that you can call GC_is_marked() from the debugger. Hans |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemOn Thu, Sep 10, 2009 at 4:52 AM, Craig Vanderborgh
<craigvanderborgh@...> wrote: > Until you suggested that GC might be involved I did not really think > about that possibility. Well, further testing reveals that it is. > The crash happens not when GC is invoked generally, but the first time > that GC is invoked AND the heap is expanded during GC. And this is > absolutely consistent. Is this what "inappropriate collection" could > look like? Is it possible that the classloaded objects are different > in some (incorrect) way, such that objects or parts of objects might > be getting garbage collected when they are still needed? Is this what > you're suggesting? When you say "Classloaded objects", are you loading interpreted classes? Or classes loaded at runtime from DLLs? java.lang.Class objects are a special case for the garbage collector. They are marked via the custom mark function _Jv_MarkObj in boehm.cc. If for some reason this mark function is not working or not being called correctly, Compiled class objects in the base application would likely work anyway because they are present in the application's static data area which is conservatively scanned. On the other hand, interpreted classes and code built with -findirect-classes are allocated at runtime and depend on the mark function. > If so, any suggestions on how I could constructively proceed from here? It's hard to say without more information, but I'd look carefully at which objects which are being prematurely collected. Look at where they ought to be reachable from, and see if any obvious patterns emerge. As David suggests, getting a heap dump and stack traces of the crash would be invaluable. Bryce |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemHello Hans, thanks for helping us out with this..
> You mean the crash happens in the same GC cycle in which the heap is grown? Or in the next cycle after that? Either way it sounds strange to me. It may be that there is some large object allocation that causes the heap expansion and occurs near the failure. But I'm not sure. > Yes, this is what I'm trying to say. > What happens if you set the GC_IGNORE_GCJ_INFO environment variable set? Can you run far enough with GC_DONT_GC? With GC_IGNORE_GCJ_INFO set, the application runs until about 3 garbage collection cycles occur, then it crashes with a STATUS_ILLEGAL_INSTRUCTION exception (0xc000001d). Running with GC_DONT_GC, the application runs for a long time without any trouble at all, until it can't expand the heap anymore. I should mention that I retrofitted boehm-gc 6.2.6 from our port of gcj-3.3 to our port of gcj 3.4.3. Recall that gc 6.3.1 is the version used "out of the box" for libgcj 3.4.1, 3.4.3, 3.4.4, 3.4.5, 3.4.6, and for the 3.5 snapshots. The reason I did this was it seemed like a lot less work if I could reuse our gc 6.2.6 port. But maybe this was not the best choice, since both our gcj 3.3 and 3.4.3 ports exhibit this problem (although to a much less degree in 3.3). And maybe something was "lost in translation" when I made the needed changes to put gc 6.2.6 into gcj 3.4.3. Regardless, it is clear that gc 6.3.1 is probably the "most tested" version in gcj 3.x series. It seems unlikely to me that this problem could have existed in the more vanilla gcj 3.4.x series, so I am tempted to restore a port of gc 6.3.1 to my libgcj build and start there. Might this be the best way to proceed? Can you remember if there are significant differences between gc 6.2.6 and 6.3.1? Best Regards, Craig Vanderborgh Voxware Incorporated > > In the end, this may require brute force debugging. Find the object that was corrupted/collected early, and then follow the chain of objects from a root checking which ones are marked, so that you can identify where a link wasn't followed correctly. This is unfortunately much easier if you can get the process to loop at the point of failure, so that you can call GC_is_marked() from the debugger. > > Hans |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problem>>>>> "Craig" == Craig Vanderborgh <craigvanderborgh@...> writes:
Craig> Until you suggested that GC might be involved I did not really think Craig> about that possibility. Well, further testing reveals that it is. Craig> The crash happens not when GC is invoked generally, but the first time Craig> that GC is invoked AND the heap is expanded during GC. And this is Craig> absolutely consistent. Craig> If so, any suggestions on how I could constructively proceed from here? In addition to what everybody else said, I would suggest starting with the GC test suite, to make sure your port of the GC is working properly. Tom |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemCraig Vanderborgh wrote:
> Hello Hans, thanks for helping us out with this.. > >> You mean the crash happens in the same GC cycle in which the heap is grown? Or in the next cycle after that? Either way it sounds strange to me. It may be that there is some large object allocation that causes the heap expansion and occurs near the failure. But I'm not sure. >> > > Yes, this is what I'm trying to say. > >> What happens if you set the GC_IGNORE_GCJ_INFO environment variable set? Can you run far enough with GC_DONT_GC? > > With GC_IGNORE_GCJ_INFO set, the application runs until about 3 > garbage collection cycles occur, then it crashes with a > STATUS_ILLEGAL_INSTRUCTION exception (0xc000001d). Running with > GC_DONT_GC, the application runs for a long time without any trouble > at all, until it can't expand the heap anymore. > > I should mention that I retrofitted boehm-gc 6.2.6 from our port of > gcj-3.3 to our port of gcj 3.4.3. Recall that gc 6.3.1 is the version > used "out of the box" for libgcj 3.4.1, 3.4.3, 3.4.4, 3.4.5, 3.4.6, > and for the 3.5 snapshots. The reason I did this was it seemed like a > lot less work if I could reuse our gc 6.2.6 port. But maybe this was > not the best choice, since both our gcj 3.3 and 3.4.3 ports exhibit > this problem (although to a much less degree in 3.3). And maybe > something was "lost in translation" when I made the needed changes to > put gc 6.2.6 into gcj 3.4.3. Regardless, it is clear that gc 6.3.1 is > probably the "most tested" version in gcj 3.x series. Are you absolutely sure that you're marking instances of class Class correctly? An error would there cause exactly this behaviour. Andrew. |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemAndrew Haley wrote:
> Craig Vanderborgh wrote: >> Hello Hans, thanks for helping us out with this.. >> >>> You mean the crash happens in the same GC cycle in which the heap is grown? Or in the next cycle after that? Either way it sounds strange to me. It may be that there is some large object allocation that causes the heap expansion and occurs near the failure. But I'm not sure. >>> >> Yes, this is what I'm trying to say. >> >>> What happens if you set the GC_IGNORE_GCJ_INFO environment variable set? Can you run far enough with GC_DONT_GC? >> With GC_IGNORE_GCJ_INFO set, the application runs until about 3 >> garbage collection cycles occur, then it crashes with a >> STATUS_ILLEGAL_INSTRUCTION exception (0xc000001d). Running with >> GC_DONT_GC, the application runs for a long time without any trouble >> at all, until it can't expand the heap anymore. >> >> I should mention that I retrofitted boehm-gc 6.2.6 from our port of >> gcj-3.3 to our port of gcj 3.4.3. Recall that gc 6.3.1 is the version >> used "out of the box" for libgcj 3.4.1, 3.4.3, 3.4.4, 3.4.5, 3.4.6, >> and for the 3.5 snapshots. The reason I did this was it seemed like a >> lot less work if I could reuse our gc 6.2.6 port. But maybe this was >> not the best choice, since both our gcj 3.3 and 3.4.3 ports exhibit >> this problem (although to a much less degree in 3.3). And maybe >> something was "lost in translation" when I made the needed changes to >> put gc 6.2.6 into gcj 3.4.3. Regardless, it is clear that gc 6.3.1 is >> probably the "most tested" version in gcj 3.x series. > > Are you absolutely sure that you're marking instances of class Class > correctly? An error would there cause exactly this behaviour. > On the off chance that you are using WeakReference, there were bugs in 3.3 that we fixed that can lead to trying access objects that have already been GCed and had their memory reused. When this happens you can end up dispatching a method call through a bogus vtable and end up executing garbage. David Daney |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problem>
> Are you absolutely sure that you're marking instances of class Class > correctly? An error would there cause exactly this behaviour. > No I am not sure. What would be the easiest way to look for this (where, how to look). This is what I had been thinking also, because when we do NO classloading (in a different, very large gcj application built using the same libgcj) there is NO PROBLEM WHATSOEVER - that app is completely reliable. Craig > Andrew. > |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemCraig Vanderborgh wrote:
>> Are you absolutely sure that you're marking instances of class Class >> correctly? An error would there cause exactly this behaviour. > > No I am not sure. What would be the easiest way to look for this > (where, how to look). This is what I had been thinking also, because > when we do NO classloading (in a different, very large gcj application > built using the same libgcj) there is NO PROBLEM WHATSOEVER - that app > is completely reliable. This is a real smoking gun, then. You need a perfect match between the fields of Class in libjava/boehm.cc, libjava/java/lang/Class.h, and gcc/java/decl.c. Your best bet is to look for any changes between gcj 3.3 and 3.4.3 in this area. Andrew. |
|
|
Re: GCJ 3.4.3 and 3.3 classloading problemCraig> No I am not sure. What would be the easiest way to look for this
Craig> (where, how to look). This is what I had been thinking also, because Craig> when we do NO classloading (in a different, very large gcj application Craig> built using the same libgcj) there is NO PROBLEM WHATSOEVER - that app Craig> is completely reliable. Classes are marked by boehm.cc:_Jv_MarkObj. What you would do is examine the Class-marking code and make sure it corresponds to the markable fields in Class. There was at least one bug here in the past: http://gcc.gnu.org/ml/java-patches/2002-q4/msg00491.html Tom |
| Free embeddable forum powered by Nabble | Forum Help |