[RFC] Remove pendingLimit from OSyncQueue

View: New views
6 Messages — Rating Filter:   Alert me  

[RFC] Remove pendingLimit from OSyncQueue

by Michael Bell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I would like to remove pendingLimit from OSyncQueue to fix the dead lock
of ticket #1078. The pendingLimit of the OSyncQueue and the not fully
independent messages can produced dead locks. Today the queue already
causes deadlocks.

What happens?

The OSyncQueue stops dispatching if pendingLimit number of messages
waiting for an answer. This is correct IPC behaviour if the messages are
independent but this is not correct in case of OpenSync.

Some protocols (e.g. SyncML) can only flush once for an object type
(SyncML datastore). This means changes are collected and send if the
maximum message size of the protocol is reached or all OpenSync messages
are present. This means the only guaranteed send operation is done if
the OpenSync message committed_all is handled.

A first workaround was to commit all changes immediately and abort the
complete synchronization if an error happens by signalling the error to
the committed_all context. This does not work because the mapping (of
SyncML) is potentially sent after the changes are received. OpenSync can
handle mappings only for not committed changes.

So the decision is simply to add a new mechanism for mapping IDs or to
remove the pendingLimit which creates a dead lock between two originally
independent queues.

Best regards

Michael
- --
___________________________________________________________________

Michael Bell                        Humboldt-Universitaet zu Berlin

Tel.: +49 (0)30-2093 2482           ZE Computer- und Medienservice
Fax:  +49 (0)30-2093 2704           Unter den Linden 6
michael.bell@...       D-10099 Berlin
___________________________________________________________________

PGP Fingerprint: 09E4 3D29 4156 2774 0F2C  C643 D8BD 1918 2030 5AAB
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ3bGV2L0ZGCAwWqsRAnrFAKCA56uxFW4qLJCKvC9lbSpQdFI7fwCggz0R
QcKCjxg+huA1bn9yDj2Njf4=
=Q2wY
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: [RFC] Remove pendingLimit from OSyncQueue

by Henrik /KaarPoSoft :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I believe the OpenSync API must support the way the SyncML protocol works.

This means:

1) The pendingLimit should be removed.
As Michael writes, the SyncML protocol groups updates, so there may be
any number of pending updates before a commit can be completed.
(Maybe it would be possible to let the plugin define the pendingLimit
(including setting it to unlimited))?

2) The mapping issue needs to be handled.
The SyncML protocol defines that the mappings are send after all
changes, so the OpenSync API needs to be augmented with a way to handle
this.

/Henrik


Michael Bell wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I would like to remove pendingLimit from OSyncQueue to fix the dead lock
> of ticket #1078. The pendingLimit of the OSyncQueue and the not fully
> independent messages can produced dead locks. Today the queue already
> causes deadlocks.
>
> What happens?
>
> The OSyncQueue stops dispatching if pendingLimit number of messages
> waiting for an answer. This is correct IPC behaviour if the messages are
> independent but this is not correct in case of OpenSync.
>
> Some protocols (e.g. SyncML) can only flush once for an object type
> (SyncML datastore). This means changes are collected and send if the
> maximum message size of the protocol is reached or all OpenSync messages
> are present. This means the only guaranteed send operation is done if
> the OpenSync message committed_all is handled.
>
> A first workaround was to commit all changes immediately and abort the
> complete synchronization if an error happens by signalling the error to
> the committed_all context. This does not work because the mapping (of
> SyncML) is potentially sent after the changes are received. OpenSync can
> handle mappings only for not committed changes.
>
> So the decision is simply to add a new mechanism for mapping IDs or to
> remove the pendingLimit which creates a dead lock between two originally
> independent queues.
>
> Best regards
>
> Michael
> - --
> ___________________________________________________________________
>
> Michael Bell                        Humboldt-Universitaet zu Berlin
>
> Tel.: +49 (0)30-2093 2482           ZE Computer- und Medienservice
> Fax:  +49 (0)30-2093 2704           Unter den Linden 6
> michael.bell@...       D-10099 Berlin
> ___________________________________________________________________
>
>
>  


------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: [RFC] Remove pendingLimit from OSyncQueue

by Daniel Gollub-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thursday 09 April 2009 10:28:05 am Michael Bell wrote:

> Hi,
>
> I would like to remove pendingLimit from OSyncQueue to fix the dead lock
> of ticket #1078. The pendingLimit of the OSyncQueue and the not fully
> independent messages can produced dead locks. Today the queue already
> causes deadlocks.
>
> What happens?
>
> The OSyncQueue stops dispatching if pendingLimit number of messages
> waiting for an answer. This is correct IPC behaviour if the messages are
> independent but this is not correct in case of OpenSync.
>
> Some protocols (e.g. SyncML) can only flush once for an object type
> (SyncML datastore). This means changes are collected and send if the
> maximum message size of the protocol is reached or all OpenSync messages
> are present. This means the only guaranteed send operation is done if
> the OpenSync message committed_all is handled.
>
> A first workaround was to commit all changes immediately and abort the
> complete synchronization if an error happens by signalling the error to
> the committed_all context. This does not work because the mapping (of
> SyncML) is potentially sent after the changes are received. OpenSync can
> handle mappings only for not committed changes.

That's wrong. See my mail to your batch_commit RFC thread.

> So the decision is simply to add a new mechanism for mapping IDs or to
> remove the pendingLimit which creates a dead lock between two originally
> independent queues.

I don't see the need of a new mapping ID mechanism. We Just need to adapt when
the UID could get updated by the plugin. Right now this can only happen in the
"commit (change) context" - which i thought would be perfectly fine.

If you write in SyncML an entry - when do you get the peers mapping id of the
just changed entry?

I wonder if it's really impossible to update the change UID before reply the
"commit (change) context". (Independent of the pendingLimit issue!)

Maybe it would help to introduce a new osync_context_ interface instead.

We change the definition of osync_context_report_success() within the commit
context. Instead of (additionally) "updating" the UID of a change within the
this commit context - this functions just ACKs that the commit got
handled/scheduled within the plugin. Internally this would just frees the
pendingQeue - no commit report to the Engine - no signalling to the OpenSync
frontend to a write event.

An additional osync_context_ interface could later report the changed UID
after write. For that the plugin would need to increase the ref of the
OSyncContext* and unref it once the change or error got reported with an
osync_context_ interface. We actually could reuse here
osync_context_update_change() which is used also in get_changes().

A plugin which is using commit sink funciton in async way must register a
committed_all() sink function - to signal when all changes got finally
committed.

It's similar to your osync_mapping_alter_interface() proposal with the
difference that there would be only internal changes - no public changes.
Actually my longterm goal is that plugins only use the osync_context_
interface to report changes to the engine ;)

Would this solve the pendingLimit issue?

Best Regards,
Daniel

--
Daniel Gollub                        Geschaeftsfuehrer: Ralph Dehner
FOSS Developer                       Unternehmenssitz:  Vohburg
B1 Systems GmbH                      Amtsgericht:       Ingolstadt
Mobil: +49-(0)-160 47 73 970         Handelsregister:   HRB 3537
EMail: gollub@...          http://www.b1-systems.de

Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg
http://pgpkeys.pca.dfn.de/pks/lookup?op=get&search=0xED14B95C2F8CA78D

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: [RFC] Remove pendingLimit from OSyncQueue

by Michael Bell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Gollub wrote:

> On Thursday 09 April 2009 10:28:05 am Michael Bell wrote:
>>
>> So the decision is simply to add a new mechanism for mapping IDs or to
>> remove the pendingLimit which creates a dead lock between two originally
>> independent queues.
>
> I don't see the need of a new mapping ID mechanism. We Just need to adapt when
> the UID could get updated by the plugin. Right now this can only happen in the
> "commit (change) context" - which i thought would be perfectly fine.
>
> If you write in SyncML an entry - when do you get the peers mapping id of the
> just changed entry?

Potentially after all changes are available.

> I wonder if it's really impossible to update the change UID before reply the
> "commit (change) context". (Independent of the pendingLimit issue!)
>
> Maybe it would help to introduce a new osync_context_ interface instead.
>
> We change the definition of osync_context_report_success() within the commit
> context. Instead of (additionally) "updating" the UID of a change within the
> this commit context - this functions just ACKs that the commit got
> handled/scheduled within the plugin. Internally this would just frees the
> pendingQeue - no commit report to the Engine - no signalling to the OpenSync
> frontend to a write event.

This would work but it is a hack.

> An additional osync_context_ interface could later report the changed UID
> after write. For that the plugin would need to increase the ref of the
> OSyncContext* and unref it once the change or error got reported with an
> osync_context_ interface. We actually could reuse here
> osync_context_update_change() which is used also in get_changes().

If only the pendingLimit is influenced by the former call then we don't
need additional interfaces because the change was not written. This
means we can still use osync_change_set_uid.

> A plugin which is using commit sink funciton in async way must register a
> committed_all() sink function - to signal when all changes got finally
> committed.

No problem, this is what the SyncML plugin does.

> Would this solve the pendingLimit issue?

I think so. I'm just not sure about introducing an API function to hack
the IPC stuff. Perhaps we should use a function name like
osync_context_set_async. The problem is that the IPC stuff is already
asynchronous and we just make it really async. Are there any other
really asynchronous plugins?

I don't think that a dependency between two pipes is a good idea but I
understand that IPC has limits. I also used IBM MQ series in the past
which has nothing to do with IPC. So is our message queue implementation
a real IPC implementation which requires limits?

Best regards

Michael
- --
___________________________________________________________________

Michael Bell                        Humboldt-Universitaet zu Berlin

Tel.: +49 (0)30-2093 2482           ZE Computer- und Medienservice
Fax:  +49 (0)30-2093 2704           Unter den Linden 6
michael.bell@...       D-10099 Berlin
___________________________________________________________________

PGP Fingerprint: 09E4 3D29 4156 2774 0F2C  C643 D8BD 1918 2030 5AAB
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ5IC52L0ZGCAwWqsRAiB9AKCrD9WkHLA6ckFxbBi01JEWOxsgAACgo8Ib
nvRJ5J2Rqw/EJeJADuvXdVs=
=qWzv
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: [RFC] Remove pendingLimit from OSyncQueue

by Daniel Gollub-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 14 April 2009 02:25:29 pm Michael Bell wrote:

> Daniel Gollub wrote:
> > On Thursday 09 April 2009 10:28:05 am Michael Bell wrote:
> >> So the decision is simply to add a new mechanism for mapping IDs or to
> >> remove the pendingLimit which creates a dead lock between two originally
> >> independent queues.
> >
> > I don't see the need of a new mapping ID mechanism. We Just need to adapt
> > when the UID could get updated by the plugin. Right now this can only
> > happen in the "commit (change) context" - which i thought would be
> > perfectly fine.
> >
> > If you write in SyncML an entry - when do you get the peers mapping id of
> > the just changed entry?
>
> Potentially after all changes are available.
>
> > I wonder if it's really impossible to update the change UID before reply
> > the "commit (change) context". (Independent of the pendingLimit issue!)
> >
> > Maybe it would help to introduce a new osync_context_ interface instead.
> >
> > We change the definition of osync_context_report_success() within the
> > commit context. Instead of (additionally) "updating" the UID of a change
> > within the this commit context - this functions just ACKs that the commit
> > got handled/scheduled within the plugin. Internally this would just frees
> > the pendingQeue - no commit report to the Engine - no signalling to the
> > OpenSync frontend to a write event.
>
> This would work but it is a hack.

Ok - this "hack" already exists in get_changes() context with
osync_context_report_change(). (But also for other reasons - like: mixed
objtype syncing).

>
> > An additional osync_context_ interface could later report the changed UID
> > after write. For that the plugin would need to increase the ref of the
> > OSyncContext* and unref it once the change or error got reported with an
> > osync_context_ interface. We actually could reuse here
> > osync_context_update_change() which is used also in get_changes().
>
> If only the pendingLimit is influenced by the former call then we don't
> need additional interfaces because the change was not written. This
> means we can still use osync_change_set_uid.

No - since the plugin process space and the engine process space are different
and still need to handle modificatoin on OSyncChange within some context
interface. The reason is that the OSyncChange pointer you have in the plugin
is not the same pointer you have in the engine.

So if you don't change or add a context interface - then after the  
osync_context_report_success() call there would be no way to change the UID.

For that reason we need one context call to ACK the message - to get the
message from the pending reply list. And one context interface which reports
the (uid) change after the entry got written.

>
> > A plugin which is using commit sink funciton in async way must register a
> > committed_all() sink function - to signal when all changes got finally
> > committed.
>
> No problem, this is what the SyncML plugin does.
>
> > Would this solve the pendingLimit issue?
>
> I think so. I'm just not sure about introducing an API function to hack
> the IPC stuff.

It's not a hack - we do the similar thing inside get_change() context. The
different is that get_changes() get only called once.

> Perhaps we should use a function name like
> osync_context_set_async. The problem is that the IPC stuff is already
> asynchronous and we just make it really async. Are there any other
> really asynchronous plugins?

Don't know any - maybe the qtopia-sync one. But we should introduce some
example async plugin.

>
> I don't think that a dependency between two pipes is a good idea but I
> understand that IPC has limits. I also used IBM MQ series in the past
> which has nothing to do with IPC. So is our message queue implementation
> a real IPC implementation which requires limits?

No Idea - maybe Graham kann answer this.

With the new osync_context_ interface there would be no dependency - afaik.
The commit function would just get async.


Best Regards,
Daniel

--
Daniel Gollub                        Geschaeftsfuehrer: Ralph Dehner
FOSS Developer                       Unternehmenssitz:  Vohburg
B1 Systems GmbH                      Amtsgericht:       Ingolstadt
Mobil: +49-(0)-160 47 73 970         Handelsregister:   HRB 3537
EMail: gollub@...          http://www.b1-systems.de

Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg
http://pgpkeys.pca.dfn.de/pks/lookup?op=get&search=0xED14B95C2F8CA78D

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: [RFC] Remove pendingLimit from OSyncQueue

by Graham Cobb-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 14 April 2009 13:39:43 Daniel Gollub wrote:
> On Tuesday 14 April 2009 02:25:29 pm Michael Bell wrote:
> > I don't think that a dependency between two pipes is a good idea but I
> > understand that IPC has limits. I also used IBM MQ series in the past
> > which has nothing to do with IPC. So is our message queue implementation
> > a real IPC implementation which requires limits?
>
> No Idea - maybe Graham kann answer this.

Sorry about the delay -- I have been away and have not had a chance to spend
any time on this until today.

I understand the problem, and the current timeout/limit mechanism definitely
deadlocks with the way the async plugins work today (I am ignoring any
changes suggested in this thread as I am not sure I understand exactly what
has been proposed).

The pendingLimit is there to allow timeouts to work properly.  If the
pendingLimit is just removed, the timeouts will break as they did before (the
timeout starts counting at the wrong time and so if there are a large number
of transactions queued up the timeout fires too early).  But let's review
what timeouts are for and how we would **like** them to behave.

As I understand it, the main purpose of the timeouts is to deal with cases
where the remote device (or some intermediary) has got stuck and is no longer
proceeding with transactions (but not returning errors).  It also helps with
cases where the plugin tries to send a message but does not notice that there
is an error (e.g. a socket has been disconnected) and it will never get a
reply.  This is, of course, a plugin bug but it is useful that the timeout
mechanism also protects against that problem.

There is a secondary use for timeouts and that is to protect against problems
in the IPC mechanism itself -- e.g. a process has stopped and is no longer
reading the pipe.  This is a smaller consideration and can be handled by
mechanisms within the IPC itself if necessary, so let's ignore it for now.

There seem to be three plugin architectures which are relevant (I thought,
when I was rewriting the timeout code, that there were only the first two but
I now realise there is a third):

1) Synchronous plugin (most plugins are like this, I believe): when a
transaction is received by the plugin (e.g. Connect or Get Changes or Commit)
it does synchronous writes to send messages to the device and synchronous
reads to get messages from the device.  If the device stops responding, the
thread executing the plugin will just wait.  No other plugin messages will be
handled while it is waiting.

2) Asynchronous but transaction-at-a-time plugin (maybe there are none like
this): when the transaction is received, the plugin sends the message to the
device and then returns.  The thread polls the socket and resumes when the
response is received. However, other plugin messages can be handled while it
is waiting -- so further updates will cause further messages to be sent to
the device.  If the device stops responding, the engine will keep sending
updates which the plugin will send. although it is not seeing any responses.

3) Aysnchronous, multiple transaction plugin (like SyncML): when the
transaction is received, it is stored internally to the plugin.  Nothing is
sent until a message fills up or the last transaction is received. Then all
updates are sent and, when responses are received, the updates are completed.  
If the device stops responding then all or some updates will not receive a
response.

For 1 and 2, the timeout is protecting each single commit and the value should
be set based on the time needed for that transaction.  In the case of 2, this
means the pendingLimit is needed -- it limits the number of updates that
might already be queued ahead of this one and so allows the timeout value to
be calculated (i.e. pendingLimit * maximum time for one update).

For 3, however, it is much harder.  One option would be to set the timeouts
for each commit (and the commit_all) based on the time the device needs to
complete a maximum sized message of updates.  On the other hand, that doesn't
allow for the fact that the OpenSync engine itself might take some time (in
complex cases) to even provide enough updates to fill a message: timeouts
were not intended to have to take into account OpenSync engine processing.

Another option for 3 is to set no timeouts at all on the individual commits,
but set a timeout on the commit_all.  The problem with that is that the
timeout value for the commit_all is potentially unlimited (it is not limited
to a single message of commits as the commit_all will start as soon as the
commits have all been queued and hundreds of messages may have been sent to
the device).

On the other hand, the plugin itself knows what is going on.  So, I think the
best option for case 3 is for the plugin itself to control the timeouts.  I
suggest that in this case, there are no timeouts on the commit or commit_all,
but that the plugin itself sets a timeout when it has assembled a message and
sent it to the device.  I.e. add some sort of OsyncStartTimeout(int timeout)
and OSyncStopTimeout() calls.  The plugin would start the timeout when the
first message was sent to the device.  Whenever a response is received, the
timeout would be stopped and, if there were one or more messages still
waiting for responses, it would be started again.  If the timeout actually
fires, then the OSyncQueue code completes all pending operations with a
timeout error (just as in the existing timeout processing).

This does mean that for plugins using this third architecture, they have to
have some extra complexity.  But I can't see an alternative, if we want to
keep the timeout protection.  Of course, we could decide that for this
release of OpenSync that we disable timeout processing altogether -- and add
it back in later (with additions to the API at that time).

Does anyone have an alternative suggestion?  If not, I will spec up a
suggested API for the timeout operations.

Graham


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty-free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel