[bug #27823] SQLClient drops connections without sending notifications

View: New views
6 Messages — Rating Filter:   Alert me  

[bug #27823] SQLClient drops connections without sending notifications

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


URL:
  <http://savannah.gnu.org/bugs/?27823>

                 Summary: SQLClient drops connections without sending
notifications
                 Project: GNUstep
            Submitted by: kaicats
            Submitted on: Di 27 Okt 2009 12:04:04 GMT
                Category: Libraries
                Severity: 3 - Normal
              Item Group: Bug
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

Scenario:

You have a connection to a PostgreSQL server.
The server issues an error.

Postgres.m will then close the connection and raise an exception, but will
not send out a SQLClientDidDisconnectNotification. For that matter, I've never
seen a SQLClientDidConnectNotification either. Presumably that never happens
with automatic connects.

This is rather important when there's a need to start every connection with
stuff like SET client_encoding = 'UTF8'.

Anyway, it seems rather unfriendly (and performance-hostile) to drop the
connection just because there was an error. You cannot always avoid, say,
causing a duplicate key error, in a situation where more than one connection
is updating.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27823>

_______________________________________________
  Nachricht geschickt von/durch Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep

[bug #27823] SQLClient drops connections without sending notifications

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Update of bug #27823 (project gnustep):

                  Status:                    None => Fixed                  
             Open/Closed:                    Open => In Test                

    _______________________________________________________

Follow-up Comment #1:

I went through and checked that backends don't call the internal
connect/disconnect methods when they should be calling the public versions
(which are responsible for sending notifications).
This should be working in svn trunk now.

> Anyway, it seems rather unfriendly (and performance-hostile) to drop
> the connection just because there was an error.

I'm not sure I can agree with that ... if you are causing errors frequently
enough for dropping the connection to be a big performance issue then there's
probably something wrong with the way you are doing things.

Dropping the connection is a safety feature, to ensure that there is no
chance of a connection being left in any sort of confused state by some
unanticipated error ...  but I wouldn't mind a patch to allow a connection to
attempt to recover gracefully as an option instead.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27823>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep

Re: [bug #27823] SQLClient drops connections without sending notifications

by Robert J. Slover :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Richard,

At work, one part of our application is used to 'scan' data gathered  
from various devices, normalizes it, and inserts it into the database  
as measurements.  This is very performance-critical, and for a very  
large percentage of the cases, we are only ever inserting rows (new  
measurements).  However, on occasion we need to 'reprocess' the data  
that was originally gathered (for instance, if the model used to  
normalize the data has been modified to correct an error).  In this  
case, the same measurement rows will be inserted, albeit they may  
contain slightly different information.  This component of the  
application has no knowledge of whether it is processing data for the  
first time or reprocessing it (and no need to know, either).  It first  
attempts to insert a row and if that fails due to a duplicate key  
constraint violation, it will do an update instead.  The overhead of  
querying the database first to see if the record is already there  
would in most cases be completely wasted, and in the rarer case would  
not save anything over simply attempting the insert in the first  
place.  This of course uses straight C and ODBC, but the principle is  
the same…if the ODBC drivers forced a disconnect on every constraint  
violation, we would have significantly worse performance, and would  
have to opt for the generally slower approach of querying first, since  
we can only commit a group of measurements for an interval on success  
of the entire scan (it either all goes in or none of it does).

In summary, I would agree with the OP that disconnecting due to an  
error is generally the wrong thing to do.  If SQL*Plus did that, I  
would refuse to use it.  This behavior is akin to library code calling  
exit(1), etc., instead of returning an error condition that can  
possibly be handled.

--Robert

On Oct 27, 2009, at 8:59 AM, Richard Frith-Macdonald wrote:

>
> Update of bug #27823 (project gnustep):
>
>                  Status:                    None => Fixed
>             Open/Closed:                    Open => In Test
>
>    _______________________________________________________
>
> Follow-up Comment #1:
>
> I went through and checked that backends don't call the internal
> connect/disconnect methods when they should be calling the public  
> versions
> (which are responsible for sending notifications).
> This should be working in svn trunk now.
>
>> Anyway, it seems rather unfriendly (and performance-hostile) to drop
>> the connection just because there was an error.
>
> I'm not sure I can agree with that ... if you are causing errors  
> frequently
> enough for dropping the connection to be a big performance issue  
> then there's
> probably something wrong with the way you are doing things.
>
> Dropping the connection is a safety feature, to ensure that there is  
> no
> chance of a connection being left in any sort of confused state by  
> some
> unanticipated error ...  but I wouldn't mind a patch to allow a  
> connection to
> attempt to recover gracefully as an option instead.
>
>    _______________________________________________________
>
> Reply to this item at:
>
>  <http://savannah.gnu.org/bugs/?27823>
>
> _______________________________________________
>  Message sent via/by Savannah
>  http://savannah.gnu.org/
>
>
>
> _______________________________________________
> Bug-gnustep mailing list
> Bug-gnustep@...
> http://lists.gnu.org/mailman/listinfo/bug-gnustep



_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep

Re: [bug #27823] SQLClient drops connections without sending notifications

by Richard Frith-Macdonald-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 27 Oct 2009, at 19:42, Robert J. Slover wrote:

> Richard,
>
> At work, one part of our application is used to 'scan' data gathered  
> from various devices, normalizes it, and inserts it into the  
> database as measurements.  This is very performance-critical, and  
> for a very large percentage of the cases, we are only ever inserting  
> rows (new measurements).  However, on occasion we need to  
> 'reprocess' the data that was originally gathered (for instance, if  
> the model used to normalize the data has been modified to correct an  
> error).  In this case, the same measurement rows will be inserted,  
> albeit they may contain slightly different information.  This  
> component of the application has no knowledge of whether it is  
> processing data for the first time or reprocessing it (and no need  
> to know, either).  It first attempts to insert a row and if that  
> fails due to a duplicate key constraint violation, it will do an  
> update instead.  The overhead of querying the database first to see  
> if the record is already there would in most cases be completely  
> wasted, and in the rarer case would not save anything over simply  
> attempting the insert in the first place.  This of course uses  
> straight C and ODBC, but the principle is the same…if the ODBC  
> drivers forced a disconnect on every constraint violation, we would  
> have significantly worse performance, and would have to opt for the  
> generally slower approach of querying first, since we can only  
> commit a group of measurements for an interval on success of the  
> entire scan (it either all goes in or none of it does).

As I said I don't mind accepting a patch to allow things to not  
disconnect on error, but your example really just re-enforces my  
assertion that it should not be an issue.

You say that your code does not 'need to know' whether it's adding new  
records or replacing existing ones, yet it certainly does since it  
must handle the errors which will occur if it tries to insert a  
duplicate value.  This means that your code is more complex than it  
would be if it really didn't need to know.

Your code will be performing inconsistently ... sometimes (usually) it  
will be fast, but other times is will be slow because of the error  
handling.  When it's slow, it is presumably still 'good enough' for  
your current system, but is unlikely to scale well if you start having  
to deal with bigger datasets.

If, instead it was structured as a transaction which first deletes any  
existing records and then inserts the new ones, it might be very  
slightly slower in the common case, but more consistent and simpler.  
It would never be anything like as slow as the case where you try an  
insert, catch the error, and then update ... and if that performance  
is acceptable then the performance of the simpler, more consistent way  
of doing things must be acceptable too.  In fact the delete and insert  
model is very efficient ... when no deletion is actually needed, the  
delete has the effect of reading index information into memory so it's  
available for the insertion and was not wasted effort.  When a  
deletion is needed, the database server is able to optimise it ...  
postgres implements an update as a delete and insert anyway, so the  
performance in this case is about the same as in the case where no  
deletion is needed, which is also about the same as when you just do  
an update!

The SQLClient library was developed specifically for performance  
critical database coding (specifically pushing huge numbers of  
messages to mobile phones) ...
My idea of performance critical code is software which runs  
consistently fast, and error generation/handling is something you take  
great effort to avoid as it is fundamentally opposed to consistency  
(when an error occurs performance changes) and speed (error handling  
is slow because of the additional client-server messages and  
transaction overheads).  In fact, avoiding error generation would come  
about number three on the list of essentials for high performance  
database programming (after use of indexes and batching of inserts/
updates).

The only times I use the design pattern of attempting an operation,  
catching errors, and handling the errors separately are:
1. rarely, when performance is truly not an issue (in which case loss  
of connection is irrelevant)
2. inside a stored procedure ... so the error handling is all done  
within the database server and is therefore much faster as it's all in  
a single transaction
The second is not really what we were talking about though ... server-
side error handling is a legitimate tool and means that the client  
side doesn't receive an error.

So as I see it, the only case where this matters is where existing  
code catching errors happens to be fast enough with the error  
catching, but not fast enough if reconnects are required ... a fairly  
rare situation, in which I'd see the ability to change the disconnect  
behavior as a stop-gap to allow you to keep a system running while  
rewriting and testing the critical section to handle heavier loads.



_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep

Re: [bug #27823] SQLClient drops connections without sending notifications

by Nicola Pero-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>> At work, one part of our application is used to 'scan' data  
>> gathered from various devices, normalizes it, and inserts it into  
>> the database as measurements.  This is very performance-critical,  
>> and for a very large percentage of the cases, we are only ever  
>> inserting rows (new measurements).  However, on occasion we need to  
>> 'reprocess' the data that was originally gathered (for instance, if  
>> the model used to normalize the data has been modified to correct  
>> an error).  In this case, the same measurement rows will be  
>> inserted, albeit they may contain slightly different information.  
>> This component of the application has no knowledge of whether it is  
>> processing data for the first time or reprocessing it (and no need  
>> to know, either).  It first attempts to insert a row and if that  
>> fails due to a duplicate key constraint violation, it will do an  
>> update instead.  The overhead of querying the database first to see  
>> if the record is already there would in most cases be completely  
>> wasted, and in the rarer case would not save anything over simply  
>> attempting the insert in the first place.  This of course uses  
>> straight C and ODBC, but the principle is the same…if the ODBC  
>> drivers forced a disconnect on every constraint violation, we would  
>> have significantly worse performance, and would have to opt for the  
>> generally slower approach of querying first, since we can only  
>> commit a group of measurements for an interval on success of the  
>> entire scan (it either all goes in or none of it does).
>
> As I said I don't mind accepting a patch to allow things to not  
> disconnect on error, but your example really just re-enforces my  
> assertion that it should not be an issue.

What about simply having an option to choose between disconnect (or  
not) on error ?

I agree that disconnect on error is safer and should be the default,  
on the other hand it should be
trivial to have an option not to disconnect on error (and just do a  
quick cleanup of the connection). :-)

SQL does support errors, and sometimes they could be useful - for  
example, you wouldn't be able to
implement an SQL terminal using SQLClient if you can't have an option  
to not disconnect on error - if you
mistype a command in an SQL terminal, you'll get an error back, but  
the connection stays open.

It's just part of the standard features of SQL - if we make it an  
option, it would be really easy and non-intrusive.
If you need to use it, you turn it on, if not, you don't. ;-)

Thanks

_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep

Re: [bug #27823] SQLClient drops connections without sending notifications

by Richard Frith-Macdonald-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 28 Oct 2009, at 08:33, Nicola Pero wrote:

>
>>> At work, one part of our application is used to 'scan' data  
>>> gathered from various devices, normalizes it, and inserts it into  
>>> the database as measurements.  This is very performance-critical,  
>>> and for a very large percentage of the cases, we are only ever  
>>> inserting rows (new measurements).  However, on occasion we need  
>>> to 'reprocess' the data that was originally gathered (for  
>>> instance, if the model used to normalize the data has been  
>>> modified to correct an error).  In this case, the same measurement  
>>> rows will be inserted, albeit they may contain slightly different  
>>> information.  This component of the application has no knowledge  
>>> of whether it is processing data for the first time or  
>>> reprocessing it (and no need to know, either).  It first attempts  
>>> to insert a row and if that fails due to a duplicate key  
>>> constraint violation, it will do an update instead.  The overhead  
>>> of querying the database first to see if the record is already  
>>> there would in most cases be completely wasted, and in the rarer  
>>> case would not save anything over simply attempting the insert in  
>>> the first place.  This of course uses straight C and ODBC, but the  
>>> principle is the same…if the ODBC drivers forced a disconnect on  
>>> every constraint violation, we would have significantly worse  
>>> performance, and would have to opt for the generally slower  
>>> approach of querying first, since we can only commit a group of  
>>> measurements for an interval on success of the entire scan (it  
>>> either all goes in or none of it does).
>>
>> As I said I don't mind accepting a patch to allow things to not  
>> disconnect on error, but your example really just re-enforces my  
>> assertion that it should not be an issue.
>
> What about simply having an option to choose between disconnect (or  
> not) on error ?

That's what I meant by the line above about accepting a patch ...  
something to permit you to choose not to disconnect.

> I agree that disconnect on error is safer and should be the default,  
> on the other hand it should be
> trivial to have an option not to disconnect on error (and just do a  
> quick cleanup of the connection). :-)

I would say fairly easy rather than trivial ... it would need to be  
implemented for all backends, and it would probably need to  
differentiate between fatal errors and recoverable ones (it's no good  
keeping the connection 'open' if the server won't talk to us).

_______________________________________________
Bug-gnustep mailing list
Bug-gnustep@...
http://lists.gnu.org/mailman/listinfo/bug-gnustep