Strange deadlock issue during multi-threaded data load

View: New views
14 Messages — Rating Filter:   Alert me  

Strange deadlock issue during multi-threaded data load

by Peter Ciuffetti-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
   
    I'm building a large database based on eXist 1.4rc on a 4-core Linux
cloud server w/8GB ram.  I'm having what I think are multi-threading
issues when more than one client thread tries to read from and write to
the same XML resource.  The file I'm building, when complete, will
contain about 1.5 million smallish XML fragments, maybe 100-200MB of
total text.  My load program is multi-threaded and uses the XML:DB api
with XQuery update extensions.  I started with three load threads and
for the most part things seemed to be going OK.  Then progress appeared
to stop after about a day of processing.  Inspecting the exist.log and
xmldb.log it appeared that the three client threads were each waiting
for an xquery they submitted to complete (the compile step was reported
in the log, but not the execution result).  It remained this way for 83
minutes and then, remarkably, resumed processing.  Meanwhile I was
inspecting the situation, looking at the running queries via the amdin
web client.  The server was responsive to the webclient's browse
requests but not the xml:db requests of the load program.  I couldn't
tell what the server was doing for the three threads that were hung,
though CPU usage was evident via 'top'.  And I don't know what
eventually jarred it loose.
   
    The forensic analysis of the logs is below.  Other possibly relevant
config settings: the server JVM is allocated 4GB of ram and 1.5GB of
this is allocated to cache. I have increased the 'allowed-fragmentation'
to 100000.  The pool size is 20 and the query-pool size is 5. I have
turned off fulltext indexing for my collections and I have a range index
on the element I am searching.  The watchdog query timeout is -1.  The
collection I am querying has only one other file in it.  I believe most
other settings are at their defaults.
   
    The load action perform a search on each fragment to see if one like
it is already there and then inserts the fragment when needed.  The
first xquery to not return was a search.  The other two threads followed
soon after were attempting to do inserts.
   
    This behavior happened three times during the course of the day, and
on the last one, I decided to abort the load and try some adjustments.

    So my question is what could have been happening during the 83
minute interval?  Whatever was happening does not produce any exist.log
output at the debug level.
   
    I restarted the load with a single thread and have not run into any
noticeable multi-minute delays, however I would really like to take
advantage of the multi-core server I am using to speed this job up.  I
am also worried whether I might run into other deadlock situations after
the database is built.  I noticed too what appared to have been a
massive accumulation of cached results that were purged en mass after
the 83 minute pause.  My client does a ResourceSet.clear() after every
query in the hope that this would eliminate dome of the temp files
accumulating in /tmp.
   
    Any advice for whether the query pool should be increased?  Or some
other way to avoid this deadlock?  I suppose I could set a query timeout
and then code the client to retry.  But this seems to be configuring
around the issue - the queries that hung were all completing in well
under a second when things were moving.
   
    Thanks,
    Pete Ciuffetti
       
Here is the exist.log analysis

At 11:31:05, thread 2 does a search of the file being built...

2009-11-03 11:31:05,892 [P1-2] DEBUG (XQuery.java [compile]:160) - Query
diagnostics:
element {"h:headings"} {
    attribute {xml:id} {
        heading_by_canonical
    }
    {
        collection("/db/staging-meta/uniform")
        (# exist:optimize #)
        {
            descendant-or-self::h:heading[child::h:canonical =
"barbaresco_docg"]
        }

    }
}
2009-11-03 11:31:05,892 [P1-2] DEBUG (XQuery.java [compile]:166) -
Compilation took 1 ms

...about one second later, thread one does an insert or update...

2009-11-03 11:31:06,652 [P1-6] DEBUG (XQuery.java [compile]:160) - Query
diagnostics:
let  <1>
    $doc := doc("headings"),
    let  <1>
        $node := $doc
        id("uh_ed97fd2ca11b3a65b04030d025bb4b32"),
        let  <1>
            $newnode := element {"h:heading"} {
                attribute {xml:id} {
                    uh_ed97fd2ca11b3a65b04030d025bb4b32
                }
                element {"title"} {
                    text {
                        Aesculus x carnea
                    }
                }
                element {"h:plain"} {
                    text {
                        Aesculus x carnea
                    }
                }
                element {"h:canonical"} {
                    text {
                        aesculus_x_carnea
                    }
                }
            }
        return
            if (
                exists($node)
            ) then
                update replace

                    $node
                    with

                    $newnode

            else
                update insert

                    $newnodeinto
                    $doc
                    id("headings")

2009-11-03 11:31:06,653 [P1-6] DEBUG (XQuery.java [compile]:166) -
Compilation took 2 ms
2009-11-03 11:31:06,654 [P1-6] DEBUG (Insert.java [eval]:131) - Found: 1
nodes

..at about the same time as thread 6, thread 1 does a very similar insert...

2009-11-03 11:31:06,761 [P1-1] DEBUG (XQuery.java [compile]:160) - Query
diagnostics:
let  <1>
    $doc := doc("headings"),
    let  <1>
        $node := $doc
        id("uh_bb2f3d32c85733ca997a28d0d579983f"),
        let  <1>
            $newnode := element {"h:heading"} {
                attribute {xml:id} {
                    uh_bb2f3d32c85733ca997a28d0d579983f
                }
                element {"title"} {
                    text {
                        Balda
                    }
                    element {"indexterm"} {
                        attribute {zone} {

                        }
                        element {"primary"} {
                            text {
                                Balda
                            }
                        }
                    }
                    text {


                    }
                }
                element {"h:plain"} {
                    text {
                        Balda
                    }
                }
                element {"h:canonical"} {
                    text {
                        balda
                    }
                }
            }
        return
            if (
                exists($node)
            ) then
                update replace

                    $node
                    with

                    $newnode

            else
                update insert

                    $newnodeinto
                    $doc
                    id("headings")

2009-11-03 11:31:06,761 [P1-1] DEBUG (XQuery.java [compile]:166) -
Compilation took 2 ms

...At this point, there is no further ouput from thread 1,2 or 6 until,
almost 83 minutes later, thread 6's insert finishes
In the interrim I was able to browse the server with the web UI and look
at running queries (these actions
employed two other threads).  Thread 6 says...

2009-11-03 12:54:03,182 [P1-6] DEBUG (XQuery.java [execute]:229) -
Execution took 4,976,529 ms
2009-11-03 12:54:03,183 [P1-6] DEBUG (HTTPUtils.java
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
2009-11-03 12:54:03,183 [P1-6] DEBUG (HTTPUtils.java
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
2009-11-03 12:54:03,183 [P1-6] INFO  (RpcConnection.java [doQuery]:299)
- query took 4976530ms.
2009-11-03 12:54:03,184 [P1-6] DEBUG (RpcConnection.java [queryP]:2409)
- found 0

...immediately after that, thread 1's insert finishes...

2009-11-03 12:54:03,185 [P1-1] DEBUG (Insert.java [eval]:131) - Found: 1
nodes
2009-11-03 12:54:03,189 [P1-1] DEBUG (XQuery.java [execute]:229) -
Execution took 4,976,428 ms
2009-11-03 12:54:03,189 [P1-1] DEBUG (HTTPUtils.java
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
2009-11-03 12:54:03,189 [P1-1] DEBUG (HTTPUtils.java
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
2009-11-03 12:54:03,189 [P1-1] INFO  (RpcConnection.java [doQuery]:299)
- query took 4976428ms.
2009-11-03 12:54:03,190 [P1-1] DEBUG (RpcConnection.java [queryP]:2409)
- found 0

...Next I get 5,452 rows in the log that look like this...
2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java
[checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009

...a couple seconds later, thread 2's query finally finishes....

2009-11-03 12:54:06,864 [P1-2] DEBUG (XQuery.java [execute]:229) -
Execution took 4,980,971 ms


...a half hour later, the following appeared in the scheduler.log...

2009-11-03 01:22:45,370 [exist_QuartzScheduler_Worker-4] ERROR
(JobRunShell.java [run]:211) - Job eXist.internal.FileLockHeartBeat:
/credo/apps/eXist/webapp/WEB-INF/data/journal.lck threw an unhandled
Exception:
java.lang.IllegalArgumentException
        at java.nio.Buffer.position(Buffer.java:218)
        at sun.nio.ch.IOUtil.write(IOUtil.java:78)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:203)
        at org.exist.storage.lock.FileLock.save(FileLock.java:231)
        at
org.exist.storage.lock.FileLockHeartBeat.execute(FileLockHeartBeat.java:71)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
2009-11-03 01:22:45,370 [exist_QuartzScheduler_Worker-4] ERROR
(QuartzScheduler.java [schedulerError]:2185) - Job
(eXist.internal.FileLockHeartBeat:
/credo/apps/eXist/webapp/WEB-INF/data/journal.lck threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See
nested exception: java.lang.IllegalArgumentException]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.IllegalArgumentException
        at java.nio.Buffer.position(Buffer.java:218)
        at sun.nio.ch.IOUtil.write(IOUtil.java:78)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:203)
        at org.exist.storage.lock.FileLock.save(FileLock.java:231)
        at
org.exist.storage.lock.FileLockHeartBeat.execute(FileLockHeartBeat.java:71)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        ... 1 more

(four similar errors from separate times on the previous day also showed
the same exception)



2009-11-03 11:04:56,032 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 296 ms.
2009-11-03 11:05:14,068 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 24 ms.
2009-11-03 11:05:16,828 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 15 ms.
2009-11-03 11:05:25,514 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 22 ms.
[xmldb.log shows a mysterious gap of 83 minutes where nothing is logged]
2009-11-03 12:28:43,202 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 266 ms.
2009-11-03 12:29:00,442 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 12 ms.
2009-11-03 12:29:09,398 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 11 ms.
2009-11-03 12:29:12,887 [P1-5] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 30 ms.
2009-11-03 12:29:12,935 [P1-3] DEBUG (LocalXPathQueryService.java
[execute]:191) - query took 1 ms.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Pieter Deelen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Peter,

>    So my question is what could have been happening during the 83
> minute interval?  Whatever was happening does not produce any exist.log
> output at the debug level.

Have you tried the trace level? It might give some more information about
what is happening.

> ...Next I get 5,452 rows in the log that look like this...
> 2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java
> [checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009

Is this the first one? What does the last one look like?

Regards,
Pieter

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Peter Ciuffetti-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Pieter Deelen wrote:
Hi Peter,

  
   So my question is what could have been happening during the 83
minute interval?  Whatever was happening does not produce any exist.log
output at the debug level.
    

Have you tried the trace level? It might give some more information about
what is happening.
  
I haven't - for a job that runs for hours with multiple threads I'm not sure this would be practical to do for the whole package tree.  Is there some package or class you think I could focus on who's trace output might be relevant?
  
...Next I get 5,452 rows in the log that look like this...
2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java
[checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009
    

Is this the first one? What does the last one look like?
  
The line shown above was the last one in the sequence.  The first one looked like

2009-11-03 12:54:03,192 [P1-6] DEBUG (QueryResultCache.java [checkTimestamps]:65) - Removing result set Tue Nov 03 11:31:00 CST 2009

The timestamps at the end of these 5,452 log lines were not in a strict time order, but all were within the range of 11:28:05 - 11:31:06,  The latter is, perhaps not coincidentally, the time stamp of the insert query in thread 6 that started at the beginning of the problem

..and all were attributed to either thread 1 or thread 6.

Thanks,
Pete

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Pieter Deelen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>    So my question is what could have been happening during the 83
> minute interval?  Whatever was happening does not produce any exist.log
> output at the debug level.
>
>
> Have you tried the trace level? It might give some more information about
> what is happening.
>
>
> I haven't - for a job that runs for hours with multiple threads I'm not sure
> this would be practical to do for the whole package tree.  Is there some
> package or class you think I could focus on who's trace output might be
> relevant?
>

I have used the trace level on the ROOT logger before and noticed a slowdown
of about 25%, but YMMV.

>
> ...Next I get 5,452 rows in the log that look like this...
> 2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java
> [checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009
>
>
> Is this the first one? What does the last one look like?
>
>
> The line shown above was the last one in the sequence.  The first one looked
> like
>
> 2009-11-03 12:54:03,192 [P1-6] DEBUG (QueryResultCache.java
> [checkTimestamps]:65) - Removing result set Tue Nov 03 11:31:00 CST 2009
>
> The timestamps at the end of these 5,452 log lines were not in a strict time
> order, but all were within the range of 11:28:05 - 11:31:06,  The latter is,
> perhaps not coincidentally, the time stamp of the insert query in thread 6
> that started at the beginning of the problem
>
> ..and all were attributed to either thread 1 or thread 6.

5,452 / 181 =~ 30.1 queries / second. Is that a typical rate for your
application?
If so, these messages might not indicate a problem.

I thought these results might have been generated while the threads hung, i.e.,
eXist redoing the queries over and over again. But apparently this is
not the case.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Peter Ciuffetti-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Pieter Deelen wrote:
...Next I get 5,452 rows in the log that look like this...
2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java
[checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009


Is this the first one? What does the last one look like?


The line shown above was the last one in the sequence.  The first one looked
like

2009-11-03 12:54:03,192 [P1-6] DEBUG (QueryResultCache.java
[checkTimestamps]:65) - Removing result set Tue Nov 03 11:31:00 CST 2009

The timestamps at the end of these 5,452 log lines were not in a strict time
order, but all were within the range of 11:28:05 - 11:31:06,  The latter is,
perhaps not coincidentally, the time stamp of the insert query in thread 6
that started at the beginning of the problem

..and all were attributed to either thread 1 or thread 6.
    

5,452 / 181 =~ 30.1 queries / second. Is that a typical rate for your
application?
If so, these messages might not indicate a problem.

I thought these results might have been generated while the threads hung, i.e.,
eXist redoing the queries over and over again. But apparently this is
not the case.

  
I think these cached results being purged are legitimate queries that occurred in the 3 minutes just prior to the deadlock.  The application I've built is new and the server I'm using is new, so I don't know whether 30 queries per second is to be expected, but I can say that many log lines show query execution times in the 1-2 ms range (about 41% of all the queries reported in my currently active log).

In any event they were not purged until 83 minutes after the last of them was cached, so I suspect that this is just more evidence that whatever was preventing query response to the hung threads was also preventing the result cache purging.

Thanks,
Pete

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Pieter Deelen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 5:20 PM, Peter Ciuffetti <pete@...> wrote:
> I have increased the 'allowed-fragmentation' to 100000.
Mmm, this is pretty big. Are you trying to avoid defragmentation
altogether or trying to postpone it? Defragmentation of a large
fragmented document might take quite a while (though 83 minutes seems
a little excessive). Are you seeing "Defragmenting document ...."
messages?

On Fri, Nov 6, 2009 at 2:45 PM, Peter Ciuffetti <pete@...> wrote:
> I think these cached results being purged are legitimate queries that
> occurred in the 3 minutes just prior to the deadlock.
...
> In any event they were not purged until 83 minutes after the last of them
> was cached, so I suspect that this is just more evidence that whatever was
> preventing query response to the hung threads was also preventing the result
> cache purging.

Query results older than 3 minutes are purged each time an XML-RPC
call comes in, and since no new calls were coming in when the threads
were hung, this did not happen for a long time. So indeed, the big
purge seems to be caused by the hanging threads and not to be their
cause.

Besides trying a run with trace level, perhaps you could dump the
threads a few times when they hang. See:
http://article.gmane.org/gmane.text.xml.exist/23317.

Regards,
Pieter

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Peter Ciuffetti-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Pieter Deelen wrote:
On Thu, Nov 5, 2009 at 5:20 PM, Peter Ciuffetti pete@... wrote:
  
I have increased the 'allowed-fragmentation' to 100000.
    
Mmm, this is pretty big. Are you trying to avoid defragmentation
altogether or trying to postpone it? Defragmentation of a large
fragmented document might take quite a while (though 83 minutes seems
a little excessive). Are you seeing "Defragmenting document ...."
messages?
  
My intention was to avoid a defrag occuring too frequently.  Since I'm inserting hundreds of thousands of rows to a file that will be large, defraging it every 100 inserts seemed inefficient, although for the most parts, inserts will occur at the end of the file.   I was influenced by a post where someone tried a large value for this also (http://markmail.org/message/2xys6cizius3tzfd)

I do not have any "Defragmenting document" messages in the log.

On Fri, Nov 6, 2009 at 2:45 PM, Peter Ciuffetti pete@... wrote:
  
I think these cached results being purged are legitimate queries that
occurred in the 3 minutes just prior to the deadlock.
    
...
  
In any event they were not purged until 83 minutes after the last of them
was cached, so I suspect that this is just more evidence that whatever was
preventing query response to the hung threads was also preventing the result
cache purging.
    

Query results older than 3 minutes are purged each time an XML-RPC
call comes in, and since no new calls were coming in when the threads
were hung, this did not happen for a long time. So indeed, the big
purge seems to be caused by the hanging threads and not to be their
cause.

Besides trying a run with trace level, perhaps you could dump the
threads a few times when they hang. See:
http://article.gmane.org/gmane.text.xml.exist/23317.

  

I will have to wait a few days to try this.  I will report back.

Thanks for the help
Pete

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Dmitriy Shabanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> So my question is what could have been happening during the 83
> minute interval?  Whatever was happening does not produce any
> exist.log
> output at the debug level.

My production server did face same strange issue. The eXist freeze for
several minutes (> 5 minutes), no response from server. After it
continue as normal, but restart bring data corruption (recovering
failed). (Wolfgang: I think that is my recovering problem bug.)

What can it be? I don't think that it concurrence issue on xquery engine
site, because it happening ones a 1-3 days (not frequently, concurrence
issue on lucene happened ones an 1-3 hours), so it some background
process.

--
Cheers,

Dmitriy Shabanov

PS analyzing log ...


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The JMX interface into eXist's lock manager can provide you a list of
all threads currently waiting to lock a resource or collection, the
locks they are waiting for, and the thread currently holding the lock.
Just call the JMXClient with parameter -l. This information can then
be combined with a Java stack trace of all running threads. You need
to enable JMX on the server as well. See http://exist-db.org/jmx.html.

Dmitriy, which SVN revision are you running?

Wolfgang

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threaded data load

by Dmitriy Shabanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 2009-11-08 at 10:12 +0100, Wolfgang Meier wrote:
> Dmitriy, which SVN revision are you running?

SVN Version 9426
Build Date 2009 09 26 (that is wrong)
__________________________

exist.jar modify date 28.10.09 (that is build date)

I going to log threads list ones a minute.
 
--
Cheers,

Dmitriy Shabanov


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threadeddata load

by Michael Sokolov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I had a similar (?) issue: during an intensive loading operation, the
database locked up: the system seemed idle, but eXIst was not responding at
all, and it would not shutdown normally.  Killing the process and starting
again, it failed to recover from the transaction log (or journal - am I
confusing these?).  Then I deleted the log (*.log in the data folder) and
restarted: the database came up, but one of the collections was corrupt: I
would get exceptions when trying to access documents in that collection.
This has happened twice now: I suspect I could reproduce if I had to...

-Mike

> -----Original Message-----
> From: Dmitriy Shabanov [mailto:shabanovd@...]
> Sent: Sunday, November 08, 2009 2:02 AM
> To: Peter Ciuffetti
> Cc: exist-open@...
> Subject: Re: [Exist-open] Strange deadlock issue during
> multi-threadeddata load
>
> On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> > So my question is what could have been happening during the
> 83 minute
> > interval?  Whatever was happening does not produce any exist.log
> > output at the debug level.
>
> My production server did face same strange issue. The eXist
> freeze for several minutes (> 5 minutes), no response from
> server. After it continue as normal, but restart bring data
> corruption (recovering failed). (Wolfgang: I think that is my
> recovering problem bug.)
>
> What can it be? I don't think that it concurrence issue on
> xquery engine site, because it happening ones a 1-3 days (not
> frequently, concurrence issue on lucene happened ones an 1-3
> hours), so it some background process.
>
> --
> Cheers,
>
> Dmitriy Shabanov
>
> PS analyzing log ...
>
>
> --------------------------------------------------------------
> ----------------
> Let Crystal Reports handle the reporting - Free Crystal
> Reports 2008 30-Day trial. Simplify your report design,
> integration and deployment - and focus on what you do best,
> core application coding. Discover what's new with Crystal
> Reports now.  http://p.sf.net/sfu/bobj-july 
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue during multi-threadeddata load

by Dmitriy Shabanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mike, do you use webdav (enable and/or heavy use) ? & what is revision?

--
Cheers,

Dmitriy Shabanov

On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote:

> I had a similar (?) issue: during an intensive loading operation, the
> database locked up: the system seemed idle, but eXIst was not responding at
> all, and it would not shutdown normally.  Killing the process and starting
> again, it failed to recover from the transaction log (or journal - am I
> confusing these?).  Then I deleted the log (*.log in the data folder) and
> restarted: the database came up, but one of the collections was corrupt: I
> would get exceptions when trying to access documents in that collection.
> This has happened twice now: I suspect I could reproduce if I had to...
>
> -Mike
>
> > -----Original Message-----
> > From: Dmitriy Shabanov [mailto:shabanovd@...]
> > Sent: Sunday, November 08, 2009 2:02 AM
> > To: Peter Ciuffetti
> > Cc: exist-open@...
> > Subject: Re: [Exist-open] Strange deadlock issue during
> > multi-threadeddata load
> >
> > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> > > So my question is what could have been happening during the
> > 83 minute
> > > interval?  Whatever was happening does not produce any exist.log
> > > output at the debug level.
> >
> > My production server did face same strange issue. The eXist
> > freeze for several minutes (> 5 minutes), no response from
> > server. After it continue as normal, but restart bring data
> > corruption (recovering failed). (Wolfgang: I think that is my
> > recovering problem bug.)
> >
> > What can it be? I don't think that it concurrence issue on
> > xquery engine site, because it happening ones a 1-3 days (not
> > frequently, concurrence issue on lucene happened ones an 1-3
> > hours), so it some background process.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue duringmulti-threadeddata load

by Michael Sokolov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't use webdav - my updates are all via REST.  I was updating a large
number of images (binary documents) and some xml in a large folder structure
(lots of collections), so it's possible the collection-creation was the
source of this problem given the collection cache size issue we discovered
last week.

I don't know whether webdav is enabled - how could I tell?  The revision
again I'm not sure how to tell, but I updated from trunk revision a week ago
on October 30.  But I think the first time this happened was prior to that:
I think with the 1.4rc release.

You asked me to reproduce, but what info would you be looking for?  Just the
idea that it's repeatable?  Anything else to look for especially?

-Mike

> -----Original Message-----
> From: Dmitriy Shabanov [mailto:shabanovd@...]
> Sent: Sunday, November 08, 2009 1:13 PM
> To: Michael Sokolov
> Cc: 'Peter Ciuffetti'; exist-open@...
> Subject: RE: [Exist-open] Strange deadlock issue
> duringmulti-threadeddata load
>
> Mike, do you use webdav (enable and/or heavy use) ? & what is
> revision?
>
> --
> Cheers,
>
> Dmitriy Shabanov
>
> On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote:
> > I had a similar (?) issue: during an intensive loading
> operation, the
> > database locked up: the system seemed idle, but eXIst was not
> > responding at all, and it would not shutdown normally.  Killing the
> > process and starting again, it failed to recover from the
> transaction
> > log (or journal - am I confusing these?).  Then I deleted the log
> > (*.log in the data folder) and
> > restarted: the database came up, but one of the collections was
> > corrupt: I would get exceptions when trying to access
> documents in that collection.
> > This has happened twice now: I suspect I could reproduce if
> I had to...
> >
> > -Mike
> >
> > > -----Original Message-----
> > > From: Dmitriy Shabanov [mailto:shabanovd@...]
> > > Sent: Sunday, November 08, 2009 2:02 AM
> > > To: Peter Ciuffetti
> > > Cc: exist-open@...
> > > Subject: Re: [Exist-open] Strange deadlock issue during
> > > multi-threadeddata load
> > >
> > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> > > > So my question is what could have been happening during the
> > > 83 minute
> > > > interval?  Whatever was happening does not produce any
> exist.log
> > > > output at the debug level.
> > >
> > > My production server did face same strange issue. The
> eXist freeze
> > > for several minutes (> 5 minutes), no response from
> server. After it
> > > continue as normal, but restart bring data corruption (recovering
> > > failed). (Wolfgang: I think that is my recovering problem bug.)
> > >
> > > What can it be? I don't think that it concurrence issue on xquery
> > > engine site, because it happening ones a 1-3 days (not
> frequently,
> > > concurrence issue on lucene happened ones an 1-3 hours),
> so it some
> > > background process.
>
>
>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Strange deadlock issue duringmulti-threadeddata load

by Dmitriy Shabanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 2009-11-08 at 15:01 -0500, Michael Sokolov wrote:
> I don't use webdav - my updates are all via REST.  I was updating a large
> number of images (binary documents) and some xml in a large folder structure
> (lots of collections), so it's possible the collection-creation was the
> source of this problem given the collection cache size issue we discovered
> last week.

I don't have  lot of collections, there are around 100 collection in
different parent-collections (~30 @ same level).

By the way, a lot of images in my db too (well, not much 100-200 same
collection). That give idea to check code related to binary resources
(journal IMHO).

> I don't know whether webdav is enabled - how could I tell?  The revision
> again I'm not sure how to tell, but I updated from trunk revision a week ago
> on October 30.  But I think the first time this happened was prior to that:
> I think with the 1.4rc release.
>
> You asked me to reproduce, but what info would you be looking for?  Just the
> idea that it's repeatable?  Anything else to look for especially?

Steps to get to that stage. Reproducible: same steps give same (buggy)
result.

> -Mike
>
> > -----Original Message-----
> > From: Dmitriy Shabanov [mailto:shabanovd@...]
> > Sent: Sunday, November 08, 2009 1:13 PM
> > To: Michael Sokolov
> > Cc: 'Peter Ciuffetti'; exist-open@...
> > Subject: RE: [Exist-open] Strange deadlock issue
> > duringmulti-threadeddata load
> >
> > Mike, do you use webdav (enable and/or heavy use) ? & what is
> > revision?
> >
> > --
> > Cheers,
> >
> > Dmitriy Shabanov
> >
> > On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote:
> > > I had a similar (?) issue: during an intensive loading
> > operation, the
> > > database locked up: the system seemed idle, but eXIst was not
> > > responding at all, and it would not shutdown normally.  Killing the
> > > process and starting again, it failed to recover from the
> > transaction
> > > log (or journal - am I confusing these?).  Then I deleted the log
> > > (*.log in the data folder) and
> > > restarted: the database came up, but one of the collections was
> > > corrupt: I would get exceptions when trying to access
> > documents in that collection.
> > > This has happened twice now: I suspect I could reproduce if
> > I had to...
> > >
> > > -Mike
> > >
> > > > -----Original Message-----
> > > > From: Dmitriy Shabanov [mailto:shabanovd@...]
> > > > Sent: Sunday, November 08, 2009 2:02 AM
> > > > To: Peter Ciuffetti
> > > > Cc: exist-open@...
> > > > Subject: Re: [Exist-open] Strange deadlock issue during
> > > > multi-threadeddata load
> > > >
> > > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> > > > > So my question is what could have been happening during the
> > > > 83 minute
> > > > > interval?  Whatever was happening does not produce any
> > exist.log
> > > > > output at the debug level.
> > > >
> > > > My production server did face same strange issue. The
> > eXist freeze
> > > > for several minutes (> 5 minutes), no response from
> > server. After it
> > > > continue as normal, but restart bring data corruption (recovering
> > > > failed). (Wolfgang: I think that is my recovering problem bug.)
> > > >
> > > > What can it be? I don't think that it concurrence issue on xquery
> > > > engine site, because it happening ones a 1-3 days (not
> > frequently,
> > > > concurrence issue on lucene happened ones an 1-3 hours),
> > so it some
> > > > background process.
> >
> >
> >
>
--
Cheers,

Dmitriy Shabanov


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open