|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Strange deadlock issue during multi-threaded data loadHi,
I'm building a large database based on eXist 1.4rc on a 4-core Linux cloud server w/8GB ram. I'm having what I think are multi-threading issues when more than one client thread tries to read from and write to the same XML resource. The file I'm building, when complete, will contain about 1.5 million smallish XML fragments, maybe 100-200MB of total text. My load program is multi-threaded and uses the XML:DB api with XQuery update extensions. I started with three load threads and for the most part things seemed to be going OK. Then progress appeared to stop after about a day of processing. Inspecting the exist.log and xmldb.log it appeared that the three client threads were each waiting for an xquery they submitted to complete (the compile step was reported in the log, but not the execution result). It remained this way for 83 minutes and then, remarkably, resumed processing. Meanwhile I was inspecting the situation, looking at the running queries via the amdin web client. The server was responsive to the webclient's browse requests but not the xml:db requests of the load program. I couldn't tell what the server was doing for the three threads that were hung, though CPU usage was evident via 'top'. And I don't know what eventually jarred it loose. The forensic analysis of the logs is below. Other possibly relevant config settings: the server JVM is allocated 4GB of ram and 1.5GB of this is allocated to cache. I have increased the 'allowed-fragmentation' to 100000. The pool size is 20 and the query-pool size is 5. I have turned off fulltext indexing for my collections and I have a range index on the element I am searching. The watchdog query timeout is -1. The collection I am querying has only one other file in it. I believe most other settings are at their defaults. The load action perform a search on each fragment to see if one like it is already there and then inserts the fragment when needed. The first xquery to not return was a search. The other two threads followed soon after were attempting to do inserts. This behavior happened three times during the course of the day, and on the last one, I decided to abort the load and try some adjustments. So my question is what could have been happening during the 83 minute interval? Whatever was happening does not produce any exist.log output at the debug level. I restarted the load with a single thread and have not run into any noticeable multi-minute delays, however I would really like to take advantage of the multi-core server I am using to speed this job up. I am also worried whether I might run into other deadlock situations after the database is built. I noticed too what appared to have been a massive accumulation of cached results that were purged en mass after the 83 minute pause. My client does a ResourceSet.clear() after every query in the hope that this would eliminate dome of the temp files accumulating in /tmp. Any advice for whether the query pool should be increased? Or some other way to avoid this deadlock? I suppose I could set a query timeout and then code the client to retry. But this seems to be configuring around the issue - the queries that hung were all completing in well under a second when things were moving. Thanks, Pete Ciuffetti Here is the exist.log analysis At 11:31:05, thread 2 does a search of the file being built... 2009-11-03 11:31:05,892 [P1-2] DEBUG (XQuery.java [compile]:160) - Query diagnostics: element {"h:headings"} { attribute {xml:id} { heading_by_canonical } { collection("/db/staging-meta/uniform") (# exist:optimize #) { descendant-or-self::h:heading[child::h:canonical = "barbaresco_docg"] } } } 2009-11-03 11:31:05,892 [P1-2] DEBUG (XQuery.java [compile]:166) - Compilation took 1 ms ...about one second later, thread one does an insert or update... 2009-11-03 11:31:06,652 [P1-6] DEBUG (XQuery.java [compile]:160) - Query diagnostics: let <1> $doc := doc("headings"), let <1> $node := $doc id("uh_ed97fd2ca11b3a65b04030d025bb4b32"), let <1> $newnode := element {"h:heading"} { attribute {xml:id} { uh_ed97fd2ca11b3a65b04030d025bb4b32 } element {"title"} { text { Aesculus x carnea } } element {"h:plain"} { text { Aesculus x carnea } } element {"h:canonical"} { text { aesculus_x_carnea } } } return if ( exists($node) ) then update replace $node with $newnode else update insert $newnodeinto $doc id("headings") 2009-11-03 11:31:06,653 [P1-6] DEBUG (XQuery.java [compile]:166) - Compilation took 2 ms 2009-11-03 11:31:06,654 [P1-6] DEBUG (Insert.java [eval]:131) - Found: 1 nodes ..at about the same time as thread 6, thread 1 does a very similar insert... 2009-11-03 11:31:06,761 [P1-1] DEBUG (XQuery.java [compile]:160) - Query diagnostics: let <1> $doc := doc("headings"), let <1> $node := $doc id("uh_bb2f3d32c85733ca997a28d0d579983f"), let <1> $newnode := element {"h:heading"} { attribute {xml:id} { uh_bb2f3d32c85733ca997a28d0d579983f } element {"title"} { text { Balda } element {"indexterm"} { attribute {zone} { } element {"primary"} { text { Balda } } } text { } } element {"h:plain"} { text { Balda } } element {"h:canonical"} { text { balda } } } return if ( exists($node) ) then update replace $node with $newnode else update insert $newnodeinto $doc id("headings") 2009-11-03 11:31:06,761 [P1-1] DEBUG (XQuery.java [compile]:166) - Compilation took 2 ms ...At this point, there is no further ouput from thread 1,2 or 6 until, almost 83 minutes later, thread 6's insert finishes In the interrim I was able to browse the server with the web UI and look at running queries (these actions employed two other threads). Thread 6 says... 2009-11-03 12:54:03,182 [P1-6] DEBUG (XQuery.java [execute]:229) - Execution took 4,976,529 ms 2009-11-03 12:54:03,183 [P1-6] DEBUG (HTTPUtils.java [addLastModifiedHeader]:61) - mostRecentDocumentTime: 0 2009-11-03 12:54:03,183 [P1-6] DEBUG (HTTPUtils.java [addLastModifiedHeader]:61) - mostRecentDocumentTime: 0 2009-11-03 12:54:03,183 [P1-6] INFO (RpcConnection.java [doQuery]:299) - query took 4976530ms. 2009-11-03 12:54:03,184 [P1-6] DEBUG (RpcConnection.java [queryP]:2409) - found 0 ...immediately after that, thread 1's insert finishes... 2009-11-03 12:54:03,185 [P1-1] DEBUG (Insert.java [eval]:131) - Found: 1 nodes 2009-11-03 12:54:03,189 [P1-1] DEBUG (XQuery.java [execute]:229) - Execution took 4,976,428 ms 2009-11-03 12:54:03,189 [P1-1] DEBUG (HTTPUtils.java [addLastModifiedHeader]:61) - mostRecentDocumentTime: 0 2009-11-03 12:54:03,189 [P1-1] DEBUG (HTTPUtils.java [addLastModifiedHeader]:61) - mostRecentDocumentTime: 0 2009-11-03 12:54:03,189 [P1-1] INFO (RpcConnection.java [doQuery]:299) - query took 4976428ms. 2009-11-03 12:54:03,190 [P1-1] DEBUG (RpcConnection.java [queryP]:2409) - found 0 ...Next I get 5,452 rows in the log that look like this... 2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java [checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009 ...a couple seconds later, thread 2's query finally finishes.... 2009-11-03 12:54:06,864 [P1-2] DEBUG (XQuery.java [execute]:229) - Execution took 4,980,971 ms ...a half hour later, the following appeared in the scheduler.log... 2009-11-03 01:22:45,370 [exist_QuartzScheduler_Worker-4] ERROR (JobRunShell.java [run]:211) - Job eXist.internal.FileLockHeartBeat: /credo/apps/eXist/webapp/WEB-INF/data/journal.lck threw an unhandled Exception: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at sun.nio.ch.IOUtil.write(IOUtil.java:78) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:203) at org.exist.storage.lock.FileLock.save(FileLock.java:231) at org.exist.storage.lock.FileLockHeartBeat.execute(FileLockHeartBeat.java:71) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) 2009-11-03 01:22:45,370 [exist_QuartzScheduler_Worker-4] ERROR (QuartzScheduler.java [schedulerError]:2185) - Job (eXist.internal.FileLockHeartBeat: /credo/apps/eXist/webapp/WEB-INF/data/journal.lck threw an exception. org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.IllegalArgumentException] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at sun.nio.ch.IOUtil.write(IOUtil.java:78) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:203) at org.exist.storage.lock.FileLock.save(FileLock.java:231) at org.exist.storage.lock.FileLockHeartBeat.execute(FileLockHeartBeat.java:71) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ... 1 more (four similar errors from separate times on the previous day also showed the same exception) 2009-11-03 11:04:56,032 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 296 ms. 2009-11-03 11:05:14,068 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 24 ms. 2009-11-03 11:05:16,828 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 15 ms. 2009-11-03 11:05:25,514 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 22 ms. [xmldb.log shows a mysterious gap of 83 minutes where nothing is logged] 2009-11-03 12:28:43,202 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 266 ms. 2009-11-03 12:29:00,442 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 12 ms. 2009-11-03 12:29:09,398 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 11 ms. 2009-11-03 12:29:12,887 [P1-5] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 30 ms. 2009-11-03 12:29:12,935 [P1-3] DEBUG (LocalXPathQueryService.java [execute]:191) - query took 1 ms. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadHi Peter,
> So my question is what could have been happening during the 83 > minute interval? Whatever was happening does not produce any exist.log > output at the debug level. Have you tried the trace level? It might give some more information about what is happening. > ...Next I get 5,452 rows in the log that look like this... > 2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java > [checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009 Is this the first one? What does the last one look like? Regards, Pieter ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadPieter Deelen wrote: I haven't - for a job that runs for hours with multiple threads I'm not sure this would be practical to do for the whole package tree. Is there some package or class you think I could focus on who's trace output might be relevant?Hi Peter, The line shown above was the last one in the sequence. The first one looked like 2009-11-03 12:54:03,192 [P1-6] DEBUG (QueryResultCache.java [checkTimestamps]:65) - Removing result set Tue Nov 03 11:31:00 CST 2009 The timestamps at the end of these 5,452 log lines were not in a strict time order, but all were within the range of 11:28:05 - 11:31:06, The latter is, perhaps not coincidentally, the time stamp of the insert query in thread 6 that started at the beginning of the problem ..and all were attributed to either thread 1 or thread 6. Thanks, Pete ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data load> So my question is what could have been happening during the 83
> minute interval? Whatever was happening does not produce any exist.log > output at the debug level. > > > Have you tried the trace level? It might give some more information about > what is happening. > > > I haven't - for a job that runs for hours with multiple threads I'm not sure > this would be practical to do for the whole package tree. Is there some > package or class you think I could focus on who's trace output might be > relevant? > I have used the trace level on the ROOT logger before and noticed a slowdown of about 25%, but YMMV. > > ...Next I get 5,452 rows in the log that look like this... > 2009-11-03 12:54:04,547 [P1-6] DEBUG (QueryResultCache.java > [checkTimestamps]:65) - Removing result set Tue Nov 03 11:28:39 CST 2009 > > > Is this the first one? What does the last one look like? > > > The line shown above was the last one in the sequence. The first one looked > like > > 2009-11-03 12:54:03,192 [P1-6] DEBUG (QueryResultCache.java > [checkTimestamps]:65) - Removing result set Tue Nov 03 11:31:00 CST 2009 > > The timestamps at the end of these 5,452 log lines were not in a strict time > order, but all were within the range of 11:28:05 - 11:31:06, The latter is, > perhaps not coincidentally, the time stamp of the insert query in thread 6 > that started at the beginning of the problem > > ..and all were attributed to either thread 1 or thread 6. 5,452 / 181 =~ 30.1 queries / second. Is that a typical rate for your application? If so, these messages might not indicate a problem. I thought these results might have been generated while the threads hung, i.e., eXist redoing the queries over and over again. But apparently this is not the case. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadPieter Deelen wrote: I think these cached results being purged are legitimate queries that occurred in the 3 minutes just prior to the deadlock. The application I've built is new and the server I'm using is new, so I don't know whether 30 queries per second is to be expected, but I can say that many log lines show query execution times in the 1-2 ms range (about 41% of all the queries reported in my currently active log). In any event they were not purged until 83 minutes after the last of them was cached, so I suspect that this is just more evidence that whatever was preventing query response to the hung threads was also preventing the result cache purging. Thanks, Pete ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadOn Thu, Nov 5, 2009 at 5:20 PM, Peter Ciuffetti <pete@...> wrote:
> I have increased the 'allowed-fragmentation' to 100000. Mmm, this is pretty big. Are you trying to avoid defragmentation altogether or trying to postpone it? Defragmentation of a large fragmented document might take quite a while (though 83 minutes seems a little excessive). Are you seeing "Defragmenting document ...." messages? On Fri, Nov 6, 2009 at 2:45 PM, Peter Ciuffetti <pete@...> wrote: > I think these cached results being purged are legitimate queries that > occurred in the 3 minutes just prior to the deadlock. ... > In any event they were not purged until 83 minutes after the last of them > was cached, so I suspect that this is just more evidence that whatever was > preventing query response to the hung threads was also preventing the result > cache purging. Query results older than 3 minutes are purged each time an XML-RPC call comes in, and since no new calls were coming in when the threads were hung, this did not happen for a long time. So indeed, the big purge seems to be caused by the hanging threads and not to be their cause. Besides trying a run with trace level, perhaps you could dump the threads a few times when they hang. See: http://article.gmane.org/gmane.text.xml.exist/23317. Regards, Pieter ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadPieter Deelen wrote: My intention was to avoid a defrag occuring too frequently. Since I'm inserting hundreds of thousands of rows to a file that will be large, defraging it every 100 inserts seemed inefficient, although for the most parts, inserts will occur at the end of the file. I was influenced by a post where someone tried a large value for this also (http://markmail.org/message/2xys6cizius3tzfd)On Thu, Nov 5, 2009 at 5:20 PM, Peter Ciuffetti pete@... wrote: I do not have any "Defragmenting document" messages in the log. On Fri, Nov 6, 2009 at 2:45 PM, Peter Ciuffetti pete@... wrote: I will have to wait a few days to try this. I will report back. Thanks for the help Pete ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadOn Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote:
> So my question is what could have been happening during the 83 > minute interval? Whatever was happening does not produce any > exist.log > output at the debug level. My production server did face same strange issue. The eXist freeze for several minutes (> 5 minutes), no response from server. After it continue as normal, but restart bring data corruption (recovering failed). (Wolfgang: I think that is my recovering problem bug.) What can it be? I don't think that it concurrence issue on xquery engine site, because it happening ones a 1-3 days (not frequently, concurrence issue on lucene happened ones an 1-3 hours), so it some background process. -- Cheers, Dmitriy Shabanov PS analyzing log ... ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadThe JMX interface into eXist's lock manager can provide you a list of
all threads currently waiting to lock a resource or collection, the locks they are waiting for, and the thread currently holding the lock. Just call the JMXClient with parameter -l. This information can then be combined with a Java stack trace of all running threads. You need to enable JMX on the server as well. See http://exist-db.org/jmx.html. Dmitriy, which SVN revision are you running? Wolfgang ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threaded data loadOn Sun, 2009-11-08 at 10:12 +0100, Wolfgang Meier wrote:
> Dmitriy, which SVN revision are you running? SVN Version 9426 Build Date 2009 09 26 (that is wrong) __________________________ exist.jar modify date 28.10.09 (that is build date) I going to log threads list ones a minute. -- Cheers, Dmitriy Shabanov ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threadeddata loadI had a similar (?) issue: during an intensive loading operation, the
database locked up: the system seemed idle, but eXIst was not responding at all, and it would not shutdown normally. Killing the process and starting again, it failed to recover from the transaction log (or journal - am I confusing these?). Then I deleted the log (*.log in the data folder) and restarted: the database came up, but one of the collections was corrupt: I would get exceptions when trying to access documents in that collection. This has happened twice now: I suspect I could reproduce if I had to... -Mike > -----Original Message----- > From: Dmitriy Shabanov [mailto:shabanovd@...] > Sent: Sunday, November 08, 2009 2:02 AM > To: Peter Ciuffetti > Cc: exist-open@... > Subject: Re: [Exist-open] Strange deadlock issue during > multi-threadeddata load > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote: > > So my question is what could have been happening during the > 83 minute > > interval? Whatever was happening does not produce any exist.log > > output at the debug level. > > My production server did face same strange issue. The eXist > freeze for several minutes (> 5 minutes), no response from > server. After it continue as normal, but restart bring data > corruption (recovering failed). (Wolfgang: I think that is my > recovering problem bug.) > > What can it be? I don't think that it concurrence issue on > xquery engine site, because it happening ones a 1-3 days (not > frequently, concurrence issue on lucene happened ones an 1-3 > hours), so it some background process. > > -- > Cheers, > > Dmitriy Shabanov > > PS analyzing log ... > > > -------------------------------------------------------------- > ---------------- > Let Crystal Reports handle the reporting - Free Crystal > Reports 2008 30-Day trial. Simplify your report design, > integration and deployment - and focus on what you do best, > core application coding. Discover what's new with Crystal > Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Exist-open mailing list > Exist-open@... > https://lists.sourceforge.net/lists/listinfo/exist-open > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue during multi-threadeddata loadMike, do you use webdav (enable and/or heavy use) ? & what is revision?
-- Cheers, Dmitriy Shabanov On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote: > I had a similar (?) issue: during an intensive loading operation, the > database locked up: the system seemed idle, but eXIst was not responding at > all, and it would not shutdown normally. Killing the process and starting > again, it failed to recover from the transaction log (or journal - am I > confusing these?). Then I deleted the log (*.log in the data folder) and > restarted: the database came up, but one of the collections was corrupt: I > would get exceptions when trying to access documents in that collection. > This has happened twice now: I suspect I could reproduce if I had to... > > -Mike > > > -----Original Message----- > > From: Dmitriy Shabanov [mailto:shabanovd@...] > > Sent: Sunday, November 08, 2009 2:02 AM > > To: Peter Ciuffetti > > Cc: exist-open@... > > Subject: Re: [Exist-open] Strange deadlock issue during > > multi-threadeddata load > > > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote: > > > So my question is what could have been happening during the > > 83 minute > > > interval? Whatever was happening does not produce any exist.log > > > output at the debug level. > > > > My production server did face same strange issue. The eXist > > freeze for several minutes (> 5 minutes), no response from > > server. After it continue as normal, but restart bring data > > corruption (recovering failed). (Wolfgang: I think that is my > > recovering problem bug.) > > > > What can it be? I don't think that it concurrence issue on > > xquery engine site, because it happening ones a 1-3 days (not > > frequently, concurrence issue on lucene happened ones an 1-3 > > hours), so it some background process. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue duringmulti-threadeddata loadI don't use webdav - my updates are all via REST. I was updating a large
number of images (binary documents) and some xml in a large folder structure (lots of collections), so it's possible the collection-creation was the source of this problem given the collection cache size issue we discovered last week. I don't know whether webdav is enabled - how could I tell? The revision again I'm not sure how to tell, but I updated from trunk revision a week ago on October 30. But I think the first time this happened was prior to that: I think with the 1.4rc release. You asked me to reproduce, but what info would you be looking for? Just the idea that it's repeatable? Anything else to look for especially? -Mike > -----Original Message----- > From: Dmitriy Shabanov [mailto:shabanovd@...] > Sent: Sunday, November 08, 2009 1:13 PM > To: Michael Sokolov > Cc: 'Peter Ciuffetti'; exist-open@... > Subject: RE: [Exist-open] Strange deadlock issue > duringmulti-threadeddata load > > Mike, do you use webdav (enable and/or heavy use) ? & what is > revision? > > -- > Cheers, > > Dmitriy Shabanov > > On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote: > > I had a similar (?) issue: during an intensive loading > operation, the > > database locked up: the system seemed idle, but eXIst was not > > responding at all, and it would not shutdown normally. Killing the > > process and starting again, it failed to recover from the > transaction > > log (or journal - am I confusing these?). Then I deleted the log > > (*.log in the data folder) and > > restarted: the database came up, but one of the collections was > > corrupt: I would get exceptions when trying to access > documents in that collection. > > This has happened twice now: I suspect I could reproduce if > I had to... > > > > -Mike > > > > > -----Original Message----- > > > From: Dmitriy Shabanov [mailto:shabanovd@...] > > > Sent: Sunday, November 08, 2009 2:02 AM > > > To: Peter Ciuffetti > > > Cc: exist-open@... > > > Subject: Re: [Exist-open] Strange deadlock issue during > > > multi-threadeddata load > > > > > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote: > > > > So my question is what could have been happening during the > > > 83 minute > > > > interval? Whatever was happening does not produce any > exist.log > > > > output at the debug level. > > > > > > My production server did face same strange issue. The > eXist freeze > > > for several minutes (> 5 minutes), no response from > server. After it > > > continue as normal, but restart bring data corruption (recovering > > > failed). (Wolfgang: I think that is my recovering problem bug.) > > > > > > What can it be? I don't think that it concurrence issue on xquery > > > engine site, because it happening ones a 1-3 days (not > frequently, > > > concurrence issue on lucene happened ones an 1-3 hours), > so it some > > > background process. > > > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Strange deadlock issue duringmulti-threadeddata loadOn Sun, 2009-11-08 at 15:01 -0500, Michael Sokolov wrote:
> I don't use webdav - my updates are all via REST. I was updating a large > number of images (binary documents) and some xml in a large folder structure > (lots of collections), so it's possible the collection-creation was the > source of this problem given the collection cache size issue we discovered > last week. I don't have lot of collections, there are around 100 collection in different parent-collections (~30 @ same level). By the way, a lot of images in my db too (well, not much 100-200 same collection). That give idea to check code related to binary resources (journal IMHO). > I don't know whether webdav is enabled - how could I tell? The revision > again I'm not sure how to tell, but I updated from trunk revision a week ago > on October 30. But I think the first time this happened was prior to that: > I think with the 1.4rc release. > > You asked me to reproduce, but what info would you be looking for? Just the > idea that it's repeatable? Anything else to look for especially? Steps to get to that stage. Reproducible: same steps give same (buggy) result. > -Mike > > > -----Original Message----- > > From: Dmitriy Shabanov [mailto:shabanovd@...] > > Sent: Sunday, November 08, 2009 1:13 PM > > To: Michael Sokolov > > Cc: 'Peter Ciuffetti'; exist-open@... > > Subject: RE: [Exist-open] Strange deadlock issue > > duringmulti-threadeddata load > > > > Mike, do you use webdav (enable and/or heavy use) ? & what is > > revision? > > > > -- > > Cheers, > > > > Dmitriy Shabanov > > > > On Sun, 2009-11-08 at 12:22 -0500, Michael Sokolov wrote: > > > I had a similar (?) issue: during an intensive loading > > operation, the > > > database locked up: the system seemed idle, but eXIst was not > > > responding at all, and it would not shutdown normally. Killing the > > > process and starting again, it failed to recover from the > > transaction > > > log (or journal - am I confusing these?). Then I deleted the log > > > (*.log in the data folder) and > > > restarted: the database came up, but one of the collections was > > > corrupt: I would get exceptions when trying to access > > documents in that collection. > > > This has happened twice now: I suspect I could reproduce if > > I had to... > > > > > > -Mike > > > > > > > -----Original Message----- > > > > From: Dmitriy Shabanov [mailto:shabanovd@...] > > > > Sent: Sunday, November 08, 2009 2:02 AM > > > > To: Peter Ciuffetti > > > > Cc: exist-open@... > > > > Subject: Re: [Exist-open] Strange deadlock issue during > > > > multi-threadeddata load > > > > > > > > On Thu, 2009-11-05 at 11:20 -0500, Peter Ciuffetti wrote: > > > > > So my question is what could have been happening during the > > > > 83 minute > > > > > interval? Whatever was happening does not produce any > > exist.log > > > > > output at the debug level. > > > > > > > > My production server did face same strange issue. The > > eXist freeze > > > > for several minutes (> 5 minutes), no response from > > server. After it > > > > continue as normal, but restart bring data corruption (recovering > > > > failed). (Wolfgang: I think that is my recovering problem bug.) > > > > > > > > What can it be? I don't think that it concurrence issue on xquery > > > > engine site, because it happening ones a 1-3 days (not > > frequently, > > > > concurrence issue on lucene happened ones an 1-3 hours), > > so it some > > > > background process. > > > > > > > Cheers, Dmitriy Shabanov ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
| Free embeddable forum powered by Nabble | Forum Help |