A suspicious GC bug

View: New views
5 Messages — Rating Filter:   Alert me  

A suspicious GC bug

by Sakur.deagod :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi, all :
   I've been working on AVR32 java extension module for jamvm , a project initiated by Guennadi who has made release announcement in this ML sometime before,  and recently I found a weird crash after successfully doing gc once  (we are basing on jamvm 1.5.0). Then I tried to track the cause , and  here's what I found  (see the text in bold style) :

Chunk @0x2ab2d764 size: 16
Chunk @0x2ab2db94 size: 64
Chunk @0x2ab2e7d4 size: 112
Chunk @0x2ab2efbc size: 208
Chunk @0x2ab3025c size: 72
Chunk @0x2ab3038c size: 88
Chunk @0x2ab30e64 size: 24
Chunk @0x2ab30f14 size: 40
Chunk @0x2ab3159c size: 32
Chunk @0x2ab316a4 size: 72
Chunk @0x2ab3180c size: 716379672
<GC: Allocated objects: 405>
<GC: Freed 10 object(s) using 488 bytes>
<GC: Largest block is 716379672 total free is 716380400 out of 1048568 (68319%)>
<GC: Mark took 0.044015 seconds, scan took 0.047880 seconds>
<GC: running 9 finalisers>
GC : running finalizer [0x2ab31d38][0x2ab31ca0]
GC : running next finalizer
GC : running finalizer [0x2ab32f38][0x2ab31df8]
GC : running next finalizer
GC : running finalizer [0x2ab333d8][0x2ab33340]
GC : running next finalizer
GC : running finalizer [0x2ab33c90][0x2ab31df8]
Segmentation fault

   See, it is much larger than 1048568 (the default max heap size). The segfault was happening on a Class object (java/lang/System) which was freed incidentally. I add some "printf" to track memory malloc history for 0x2ab3180c. Here's the output :
  
  found block 0x2ab3180c 1021936 min_object_size 16
  allocate ptr 0x2ab31810 (size 24) ,next chunkpp  0x2ab31824  len 1021912
  .......
 found block 0x2ab3180c 96 min_object_size 16 -----> chunk 0x2ab3180c has 96 bytes space ,
 allocate 0x2ab31810 (88) ,next chunkpp  0x2ab325cc  len136

  The problem is the latter allocation.  We allocate one object from chunk 0x2ab3180c with 88 bytes , the remaining  bytes for  this chunk is 8 , which is less than min_object_size ,thus ,this chunk will not be allocated from any more . However, during "doSweep" phase ,when gc reaches "0x2ab3180c"in heap, it will recognize the object the address points to (allocate bit), but gc will also scan the succeeding chunks, the nearest one is 0x2ab3180c + 0x58 = 0x2ab31864 . Note that 0x2ab31864 is not valid chunk since it is actually pointing to the 8 bytes remaining space in chunk 0x2ab3180c. So you can see the following log to confirm this :

FREE: Freeing ob @0x2ab31810 class [C - start of block
GC : current chunk size 88 88 (this is what I added)
FREE: unalloced block @0x2ab31864 size 716379584 - merging onto block @0x2ab3180c

  Besides , I think gc compact wasn't running at that time ,so I guess this might be  a gc bug ?  

Nevo

------------------------------------------------------------------------------

_______________________________________________
Jamvm-general mailing list
Jamvm-general@...
https://lists.sourceforge.net/lists/listinfo/jamvm-general

Re: A suspicious GC bug

by Robert Lougher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nevo,

2009/7/1 Nevo <sakur.deagod@...>:
> hi, all :
>    I've been working on AVR32 java extension module for jamvm , a project
> initiated by Guennadi who has made release announcement in this ML sometime
> before,  and recently I found a weird crash after successfully doing gc once
>  (we are basing on jamvm 1.5.0). Then I tried to track the cause , and
>  here's what I found  (see the text in bold style) :

Just to be clear.  Have you successfully been able to run your port
with object allocation/GC cycles over a reasonable period of time, but
you've seen this crash once?  Or have you only been able to run GC
once, in total, at which point this problem occurred?

> Chunk @0x2ab2d764 size: 16
> Chunk @0x2ab2db94 size: 64
> Chunk @0x2ab2e7d4 size: 112
> Chunk @0x2ab2efbc size: 208
> Chunk @0x2ab3025c size: 72
> Chunk @0x2ab3038c size: 88
> Chunk @0x2ab30e64 size: 24
> Chunk @0x2ab30f14 size: 40
> Chunk @0x2ab3159c size: 32
> Chunk @0x2ab316a4 size: 72
> Chunk @0x2ab3180c size: 716379672
> <GC: Allocated objects: 405>
> <GC: Freed 10 object(s) using 488 bytes>
> <GC: Largest block is 716379672 total free is 716380400 out of 1048568
> (68319%)>
> <GC: Mark took 0.044015 seconds, scan took 0.047880 seconds>
> <GC: running 9 finalisers>
> GC : running finalizer [0x2ab31d38][0x2ab31ca0]
> GC : running next finalizer
> GC : running finalizer [0x2ab32f38][0x2ab31df8]
> GC : running next finalizer
> GC : running finalizer [0x2ab333d8][0x2ab33340]
> GC : running next finalizer
> GC : running finalizer [0x2ab33c90][0x2ab31df8]
> Segmentation fault
>
>    See, it is much larger than 1048568 (the default max heap size). The
> segfault was happening on a Class object (java/lang/System) which was freed
> incidentally. I add some "printf" to track memory malloc history for
> 0x2ab3180c. Here's the output :
>
>   found block 0x2ab3180c 1021936 min_object_size 16
>   allocate ptr 0x2ab31810 (size 24) ,next chunkpp  0x2ab31824  len 1021912
>   .......
>  found block 0x2ab3180c 96 min_object_size 16 -----> chunk 0x2ab3180c has 96
> bytes space ,
>  allocate 0x2ab31810 (88) ,next chunkpp  0x2ab325cc  len136
>
>   The problem is the latter allocation.  We allocate one object from chunk
> 0x2ab3180c with 88 bytes , the remaining  bytes for  this chunk is 8 , which
> is less than min_object_size ,thus ,this chunk will not be allocated from
> any more . However, during "doSweep" phase ,when gc reaches "0x2ab3180c"in
> heap, it will recognize the object the address points to (allocate bit), but
> gc will also scan the succeeding chunks, the nearest one is 0x2ab3180c +
> 0x58 = 0x2ab31864 . Note that 0x2ab31864 is not valid chunk since it is
> actually pointing to the 8 bytes remaining space in chunk 0x2ab3180c. So you
> can see the following log to confirm this :
>

I'm sorry, but your understanding is incorrect.  When a chunk is split
during allocation the remainder always has a valid chunk header.
Allocation is done in multiples of the "object grain" (grain as in a
piece of wood), which means the remainder must be at least the object
grain in size.  The object grain is 8 bytes, the chunk header is word
sized, so the remainder will always be large enough for the chunk
header.

If the remainder is smaller than the MIN_OBJECT_SIZE it is not added
to the free list (there's no point).  Therefore, there are unallocated
chunks which are not on the free list, but the chunks remain valid and
the heap can be walked correctly (as in doSweep).

> FREE: Freeing ob @0x2ab31810 class [C - start of block
> GC : current chunk size 88 88 (this is what I added)
> FREE: unalloced block @0x2ab31864 size 716379584 - merging onto block
> @0x2ab3180c
>
>   Besides , I think gc compact wasn't running at that time ,so I guess this
> might be  a gc bug ?
>

Such a bug as you describe would be triggered almost constantly.  The
core allocation and sweep routines have been in use virtually
unchanged for 7 years, without any reports of problems of this nature.
 I'm not saying there's no bugs lurking in there, but any that are
there will not be reproduced this easily.

My feeling is that heap corruption is occurring elsewhere, perhaps
invalid field offsets or similar?

Rob.

> Nevo
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Jamvm-general mailing list
> Jamvm-general@...
> https://lists.sourceforge.net/lists/listinfo/jamvm-general
>
>

------------------------------------------------------------------------------
_______________________________________________
Jamvm-general mailing list
Jamvm-general@...
https://lists.sourceforge.net/lists/listinfo/jamvm-general

Re: A suspicious GC bug

by Robert Lougher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nevo,

2009/7/1 Robert Lougher <rob.lougher@...>:
> Hi Nevo,
>
> 2009/7/1 Nevo <sakur.deagod@...>:
>
> My feeling is that heap corruption is occurring elsewhere, perhaps
> invalid field offsets or similar?
>

What I meant to say is that heap corruption can be horribly difficult
and time consuming to track down.  Often the heap corruption will not
be apparent until the next or even several GC cycles have occurred.
But the fact that the GC blows up doesn't indicate the problem
occurred there.  By its nature, GC scans the contents of all
chunks/objects, so any inconsistency shows up here often before the
Java program being ran touches the objects again.

Rob.

> Rob.
>
>> Nevo
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Jamvm-general mailing list
>> Jamvm-general@...
>> https://lists.sourceforge.net/lists/listinfo/jamvm-general
>>
>>
>

------------------------------------------------------------------------------
_______________________________________________
Jamvm-general mailing list
Jamvm-general@...
https://lists.sourceforge.net/lists/listinfo/jamvm-general

Re: A suspicious GC bug

by Sakur.deagod :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi, Rob:
   Thanks for your reply.

2009/7/1 Robert Lougher <rob.lougher@...>
Hi Nevo,


Just to be clear.  Have you successfully been able to run your port
with object allocation/GC cycles over a reasonable period of time, but
you've seen this crash once?  Or have you only been able to run GC
once, in total, at which point this problem occurred?
 Yup, I've been successfully survived from GC once ,then it immediately crashed on after second collection course. Here's the output for the first time gc was triggered:
 .....
<GC: Allocated objects: 1874>
<GC: Freed 20778 object(s) using 913184 bytes>
<GC: Largest block is 885456 total free is 913208 out of 1048568 (87%)>
<GC: Mark took 0.060697 seconds, scan took 6.880759 seconds>
<GC: running 3 finalisers>


 

I'm sorry, but your understanding is incorrect.  When a chunk is split
during allocation the remainder always has a valid chunk header.
Allocation is done in multiples of the "object grain" (grain as in a
piece of wood), which means the remainder must be at least the object
grain in size.  The object grain is 8 bytes, the chunk header is word
sized, so the remainder will always be large enough for the chunk
header.

If the remainder is smaller than the MIN_OBJECT_SIZE it is not added
to the free list (there's no point).  Therefore, there are unallocated
chunks which are not on the free list, but the chunks remain valid and
the heap can be walked correctly (as in doSweep).
   Yeah, I knew the allocation is "object grain" aligned ,and the remainder will definitely be large enough to hold a chunk header (one word), maybe my understanding is quite limited to small portion of the code , and I have been not able to do one native  test to confirm my thought, but could you kindly point out where the header is set for the remainder which is smaller than MIN_OBJECT_SIZE ,that would be helpful for me to understand thoroughly ? 
   My question here is just when allocating 2 object initially in the heap, the heap is divided into 3  parts, object A , object B, and the remaining unallocated space, then assuming GC is triggered sometime later , and object B is marked . That means object A will be freed ,and the space it takes will be added to freelist as a separate chunk. If the size for that chunk is 96B (like in my case here), upon finishing gc  , another allocation request (object C) is received for 88B, and will be immediately allocated from the chunk which was used by object A, thus leave a 8B's fragment for that chunk. Then how to deal with this 8B unallocated space ?
   I don't know if my question is really silly ,but I would be much appreciated if you tell what Im missing here , thanks .
 


> FREE: Freeing ob @0x2ab31810 class [C - start of block
> GC : current chunk size 88 88 (this is what I added)
> FREE: unalloced block @0x2ab31864 size 716379584 - merging onto block
> @0x2ab3180c
>
>   Besides , I think gc compact wasn't running at that time ,so I guess this
> might be  a gc bug ?
>

Such a bug as you describe would be triggered almost constantly.  The
core allocation and sweep routines have been in use virtually
unchanged for 7 years, without any reports of problems of this nature.
 I'm not saying there's no bugs lurking in there, but any that are
there will not be reproduced this easily.

My feeling is that heap corruption is occurring elsewhere, perhaps
invalid field offsets or similar?

 Ha, you might know it is really hard to track down the heap corruption ,even the heap is 1M size, especially we are also dealing with JEM type of object reference, but I made some attempt to add size header for the remainder when it was small than MIN_OBJECT_SIZE, guess what ,, the test is going all the way  and finished normally. I test 3 times , all looks good.  But you know ,this is really a quite simple test ,will try to call System.gc directly later on to see any differences (Still, some hidden JEM type of object reference will be likely to corrupt heap , which I m unaware atm :( ) 


------------------------------------------------------------------------------

_______________________________________________
Jamvm-general mailing list
Jamvm-general@...
https://lists.sourceforge.net/lists/listinfo/jamvm-general

Re: A suspicious GC bug

by Sakur.deagod :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi, Rob:

2009/7/1 Robert Lougher <rob.lougher@...>
Hi Nevo,

2009/7/1 Robert Lougher <rob.lougher@...>:
> Hi Nevo,
>
> 2009/7/1 Nevo <sakur.deagod@...>:
>
> My feeling is that heap corruption is occurring elsewhere, perhaps
> invalid field offsets or similar?
>

What I meant to say is that heap corruption can be horribly difficult
and time consuming to track down.  Often the heap corruption will not
be apparent until the next or even several GC cycles have occurred.
But the fact that the GC blows up doesn't indicate the problem
occurred there.  By its nature, GC scans the contents of all
chunks/objects, so any inconsistency shows up here often before the
Java program being ran touches the objects again.

  looking at current gc implementation,  the heap will be walked twice , doMark ,and doSweep. That means , once there's something wrong in marking phase (such as a JEM reference is taken as jamvm reference ), the sweep phase will never go through the path.  Thanks for your kind suggest, I will be more cautious to avoid JEM reference corrupt the heap .

Nevo
 


Rob.

> Rob.
>
>> Nevo
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Jamvm-general mailing list
>> Jamvm-general@...
>> https://lists.sourceforge.net/lists/listinfo/jamvm-general
>>
>>
>


------------------------------------------------------------------------------

_______________________________________________
Jamvm-general mailing list
Jamvm-general@...
https://lists.sourceforge.net/lists/listinfo/jamvm-general