There are two threads accessing the same connection but this is because of the connection pool. When you configure a connection pool in Glassfish you can configure error handling on the pool. One option that can be configured is to configure the connection pool such that any error on any connection in the pool causes all of the connections in the pool to be destroyed and recreated. The management of the pool is under control of the application server.
So how my application got into this state is that one connection experienced an error. This was NOT the connection that was committing the transaction. The connection error was detected by the pool management code of Glassfish and it was configured to destroy all connections. So this code, in a separate thread, went through the connection list and called closed each of the connections. The timing is right and the synchronization is wrong so between the time the LogicalConnection checks for a null physical connection and the time it uses the physical connection, it is now null.
Note that many of the methods are synchronized and some are not. They should all be synchronized as obviously there is a window of opportunity between checking for a null physical connection and using the physical connection. With the synchronization, the pool management code of Glassfish that is closing the connections would have blocked on trying to null out the physical connection while the real transaction finished (it has the lock) or it could be that the physical connection would be null'ed out and then the checkForNullPhysicalConnection would have detected this and thrown its own exception. Either scenario is correct and consistent but instead what happens is that checkForNullPhysicalConnection does not see a null physical connection and then by the time it uses that physical connection it becomes null and then a NPE is thrown.
From: Kathey Marsden (Commented) (JIRA) [mailto:jira@...]
Sent: Wednesday, December 28, 2011 12:59 PM
To: derby-dev@... Subject: [jira] [Commented] (DERBY-5561) Race conditions in LogicalConnection checking for a null physical connection
Kathey Marsden commented on DERBY-5561:
Do you have multiple threads accessing the same connection at the same time? If so is it intentional? Typically when that is being done it is not intentional as even if the various methods were synchronized the two threads would share transaction and other state that would be difficult to coordinate.
> Race conditions in LogicalConnection checking for a null physical connection
> Key: DERBY-5561
> URL: https://issues.apache.org/jira/browse/DERBY-5561 > Project: Derby
> Issue Type: Bug
> Components: Network Client
> Affects Versions: 10.8.2.2
> Environment: Solaris 10
> Glassfish V2.1.1
> ClientXADataSource connection pool
> Reporter: Brett Bergquist
> There are race conditions with checkForNullPhysicalConnection calls in LogicalConnection. checkForNullPhysicalConnection is not synchronized and it checks for the member "phsyicalConnection" which can be cleared by "nullPhsyicalConnection" (which is synchronized) and "close" (which is synchronized) and "closeWithoutRecyclingToPool" (which is synchronized).
> This affects "nativeSQL", "getAutoCommit", "getTransactionIsolation", "getWarnings", "isReadOnly", "getCatalog", "getTypeMap", "createStatement", "prepareCall", "prepareStatement", "setHoldability", "getHoldability", "setSavePoint", "rollBack", "releaseSavePoint", "getSchema", "setSchema".
> All of these call "checkForNullPhysicalConnection" and then use the member "physicalConnection" after that call returns. Because these methods are not synchronized, between the time "checkForNullPhysicalConnectoin" returns and "physicalConnection" is used, the "physicalConnection" member could be set to null and then a NPE occurs.
> Probably all of these methods should be changed to synchronized.