Cluster Events update

View: New views
13 Messages — Rating Filter:   Alert me  

Cluster Events update

by Taylor Gautier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
For the upcoming release of Terracotta, we are considering changing the eventing mechanism.  So far, our design discussions have identified some core use cases, and we think we have identified the general strategy.  I'd love to hear any comments from our users on this new direction.

============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS ===================
MOTIVATION 
=================================================== 
1) The current eventing mechanism has race conditions 

2) The implementation is confusing slightly - a disconnected event comes to a node for two occasions - a) if a node is disconnected then the node gets a this.disconnected event. This event can never have perfect knowledge, and the node may reconnect at some time. b) if another node is quarantined (never to rejoin) it gets a nodeX disconnected. 

Since these two events are named similarly, they overload the meaning of "disconnected". In the first case the connection has been severed, but may be restored, while the second is an absolute measure of the membership of the cluster - the server sent the message so it is definitive. 

USE CASES 
=================================================== 
1) Client needs to change behavior when TC is no longer able to service operations - e.g. kill themselves 

2) Map evictor use case 
  - needs to know when a node has left the system. 
  - needs to query the system to know from a list of objects what objects are not faulted into any client (it is accepted that this query is async and the response is guaranteed to be out of date) 

3) Clustered async use case 
  - needs to know if a node has left the system. 
  - needs to query the system to know from a list of objects which ones are "orphaned" - e.g. are no longer accepted (this may be identical to 2a) 

4) Master/Worker 
  - needs to know when a node has joined the system to re-balance work across all nodes (although this can easily be coded with wait/notify) 
  - needs to know when a node (and which node) has left the system to re-balance work from that node across remaining nodes 

5) Location Aware Cache 
  - need to execute work on where an object is faulted 
  - need to query the system about where an object, or a list of objects, is faulted 

NOTE: 
Use case of switching to local data could be construed as #1, but is a much more involved use case, so while someone could use the solution for Use Case #1, we aren't specifically targeting that capability. 

SUGGESTED SOLUTIONS 
=================================================== 
Roughly the following "things" seem to solve the use cases: 

1) Topology Change Events 
  - node joined ? - no "real" use case for it - regular code techniques can be used 
  - node left 

2) Cluster Operational Events 
  - tc operations are enabled 
  - tc operations are disabled 

3) Data Aware Information 
  - a list of nodes where an object is faulted 
  - a list of list of nodes where a list of objects is faulted 
  - out of this list of objects, which ones are not faulted anywhere 


============ DRAFT IMPLEMENTATION THOUGHTS ===================

At the meeting of today we decided that the use cases, events and data aware operations are sufficient and appropriate. 

We also decided to focus on a non JMX API for several reasons: 
* artificially introduces a whole infrastructure that is not needed and that is leaky 
* makes it more difficult for the developer to integrate 
* is not a compile-type API, which means less safety 

We're going to design a POJO API with compile-time safety that is based on dependency injection. The first idea is to annotate a field a being a 'DSOClusterUtil' (or whatever name). This would then cause DSO to inject a local instance of that utility class, providing the developers with API methods to perform listener registration and data locality inspection. 


_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: Cluster Events update

by Kunal Bhasin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I would also recommend adding notifications for the L2 processes starting and stopping. At least broadcasting the status of the L2 based on L2 group comm would be a good start. 

Also, sending the ip/hostname of the machine along with the clientid would also be useful. 

Kunal
--Sent while mobile--

On Jan 16, 2009, at 9:04 AM, Taylor Gautier <tgautier@...> wrote:

For the upcoming release of Terracotta, we are considering changing the eventing mechanism.  So far, our design discussions have identified some core use cases, and we think we have identified the general strategy.  I'd love to hear any comments from our users on this new direction.

============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS ===================
MOTIVATION 
=================================================== 
1) The current eventing mechanism has race conditions 

2) The implementation is confusing slightly - a disconnected event comes to a node for two occasions - a) if a node is disconnected then the node gets a this.disconnected event. This event can never have perfect knowledge, and the node may reconnect at some time. b) if another node is quarantined (never to rejoin) it gets a nodeX disconnected. 

Since these two events are named similarly, they overload the meaning of "disconnected". In the first case the connection has been severed, but may be restored, while the second is an absolute measure of the membership of the cluster - the server sent the message so it is definitive. 

USE CASES 
=================================================== 
1) Client needs to change behavior when TC is no longer able to service operations - e.g. kill themselves 

2) Map evictor use case 
  - needs to know when a node has left the system. 
  - needs to query the system to know from a list of objects what objects are not faulted into any client (it is accepted that this query is async and the response is guaranteed to be out of date) 

3) Clustered async use case 
  - needs to know if a node has left the system. 
  - needs to query the system to know from a list of objects which ones are "orphaned" - e.g. are no longer accepted (this may be identical to 2a) 

4) Master/Worker 
  - needs to know when a node has joined the system to re-balance work across all nodes (although this can easily be coded with wait/notify) 
  - needs to know when a node (and which node) has left the system to re-balance work from that node across remaining nodes 

5) Location Aware Cache 
  - need to execute work on where an object is faulted 
  - need to query the system about where an object, or a list of objects, is faulted 

NOTE: 
Use case of switching to local data could be construed as #1, but is a much more involved use case, so while someone could use the solution for Use Case #1, we aren't specifically targeting that capability. 

SUGGESTED SOLUTIONS 
=================================================== 
Roughly the following "things" seem to solve the use cases: 

1) Topology Change Events 
  - node joined ? - no "real" use case for it - regular code techniques can be used 
  - node left 

2) Cluster Operational Events 
  - tc operations are enabled 
  - tc operations are disabled 

3) Data Aware Information 
  - a list of nodes where an object is faulted 
  - a list of list of nodes where a list of objects is faulted 
  - out of this list of objects, which ones are not faulted anywhere 


============ DRAFT IMPLEMENTATION THOUGHTS ===================

At the meeting of today we decided that the use cases, events and data aware operations are sufficient and appropriate. 

We also decided to focus on a non JMX API for several reasons: 
* artificially introduces a whole infrastructure that is not needed and that is leaky 
* makes it more difficult for the developer to integrate 
* is not a compile-type API, which means less safety 

We're going to design a POJO API with compile-time safety that is based on dependency injection. The first idea is to annotate a field a being a 'DSOClusterUtil' (or whatever name). This would then cause DSO to inject a local instance of that utility class, providing the developers with API methods to perform listener registration and data locality inspection. 

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] Cluster Events update

by sharrissf :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For which usecase?


On Jan 16, 2009, at 10:57 AM, kbhasin@... wrote:

I would also recommend adding notifications for the L2 processes starting and stopping. At least broadcasting the status of the L2 based on L2 group comm would be a good start. 

Also, sending the ip/hostname of the machine along with the clientid would also be useful. 

Kunal
--Sent while mobile--

On Jan 16, 2009, at 9:04 AM, Taylor Gautier <tgautier@...> wrote:

For the upcoming release of Terracotta, we are considering changing the eventing mechanism.  So far, our design discussions have identified some core use cases, and we think we have identified the general strategy.  I'd love to hear any comments from our users on this new direction.

============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS ===================
MOTIVATION 
=================================================== 
1) The current eventing mechanism has race conditions 

2) The implementation is confusing slightly - a disconnected event comes to a node for two occasions - a) if a node is disconnected then the node gets a this.disconnected event. This event can never have perfect knowledge, and the node may reconnect at some time. b) if another node is quarantined (never to rejoin) it gets a nodeX disconnected. 

Since these two events are named similarly, they overload the meaning of "disconnected". In the first case the connection has been severed, but may be restored, while the second is an absolute measure of the membership of the cluster - the server sent the message so it is definitive. 

USE CASES 
=================================================== 
1) Client needs to change behavior when TC is no longer able to service operations - e.g. kill themselves 

2) Map evictor use case 
  - needs to know when a node has left the system. 
  - needs to query the system to know from a list of objects what objects are not faulted into any client (it is accepted that this query is async and the response is guaranteed to be out of date) 

3) Clustered async use case 
  - needs to know if a node has left the system. 
  - needs to query the system to know from a list of objects which ones are "orphaned" - e.g. are no longer accepted (this may be identical to 2a) 

4) Master/Worker 
  - needs to know when a node has joined the system to re-balance work across all nodes (although this can easily be coded with wait/notify) 
  - needs to know when a node (and which node) has left the system to re-balance work from that node across remaining nodes 

5) Location Aware Cache 
  - need to execute work on where an object is faulted 
  - need to query the system about where an object, or a list of objects, is faulted 

NOTE: 
Use case of switching to local data could be construed as #1, but is a much more involved use case, so while someone could use the solution for Use Case #1, we aren't specifically targeting that capability. 

SUGGESTED SOLUTIONS 
=================================================== 
Roughly the following "things" seem to solve the use cases: 

1) Topology Change Events 
  - node joined ? - no "real" use case for it - regular code techniques can be used 
  - node left 

2) Cluster Operational Events 
  - tc operations are enabled 
  - tc operations are disabled 

3) Data Aware Information 
  - a list of nodes where an object is faulted 
  - a list of list of nodes where a list of objects is faulted 
  - out of this list of objects, which ones are not faulted anywhere 


============ DRAFT IMPLEMENTATION THOUGHTS ===================

At the meeting of today we decided that the use cases, events and data aware operations are sufficient and appropriate. 

We also decided to focus on a non JMX API for several reasons: 
* artificially introduces a whole infrastructure that is not needed and that is leaky 
* makes it more difficult for the developer to integrate 
* is not a compile-type API, which means less safety 

We're going to design a POJO API with compile-time safety that is based on dependency injection. The first idea is to annotate a field a being a 'DSOClusterUtil' (or whatever name). This would then cause DSO to inject a local instance of that utility class, providing the developers with API methods to perform listener registration and data locality inspection. 

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users
_______________________________________________
tc-forge-dev mailing list
tc-forge-dev@...
http://lists.terracotta.org/mailman/listinfo/tc-forge-dev


_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] Cluster Events update

by Kunal Bhasin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Re: [tc-forge-dev] [tc-users] Cluster Events update This would be generic monitoring and would benefit all use cases to ease monitoring in production, IMHO.

Kunal.




From: Steven Harris <steve@...>
Date: Fri, 16 Jan 2009 11:00:37 -0800
To: Kunal Bhasin <kbhasin@...>
Cc: Untitled 5 <tc-users@...>, tc-forge-dev <tc-forge-dev@...>, tc-dev <tc-dev@...>
Subject: Re: [tc-forge-dev] [tc-users] Cluster Events update

For which usecase?

 
Cheers,
Steve Harris
"Terracotta.  It's ten pounds of awesome in a five pound sack. <http://www.miketec.org/serendipity/index.php?/archives/7-Oracle-and-Postgres-Redux.html> "



 

On Jan 16, 2009, at 10:57 AM, kbhasin@... wrote:

I would also recommend adding notifications for the L2 processes starting and stopping. At least broadcasting the status of the L2 based on L2 group comm would be a good start.

Also, sending the ip/hostname of the machine along with the clientid would also be useful.

Kunal
--Sent while mobile--

On Jan 16, 2009, at 9:04 AM, Taylor Gautier <tgautier@...> wrote:

For the upcoming release of Terracotta, we are considering changing the eventing mechanism.  So far, our design discussions have identified some core use cases, and we think we have identified the general strategy.  I'd love to hear any comments from our users on this new direction.

============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS ===================
MOTIVATION
===================================================
1) The current eventing mechanism has race conditions

2) The implementation is confusing slightly - a disconnected event comes to a node for two occasions - a) if a node is disconnected then the node gets a this.disconnected event. This event can never have perfect knowledge, and the node may reconnect at some time. b) if another node is quarantined (never to rejoin) it gets a nodeX disconnected.

Since these two events are named similarly, they overload the meaning of "disconnected". In the first case the connection has been severed, but may be restored, while the second is an absolute measure of the membership of the cluster - the server sent the message so it is definitive.

USE CASES
===================================================
1) Client needs to change behavior when TC is no longer able to service operations - e.g. kill themselves

2) Map evictor use case
  - needs to know when a node has left the system.
  - needs to query the system to know from a list of objects what objects are not faulted into any client (it is accepted that this query is async and the response is guaranteed to be out of date)

3) Clustered async use case
  - needs to know if a node has left the system.
  - needs to query the system to know from a list of objects which ones are "orphaned" - e.g. are no longer accepted (this may be identical to 2a)

4) Master/Worker
  - needs to know when a node has joined the system to re-balance work across all nodes (although this can easily be coded with wait/notify)
  - needs to know when a node (and which node) has left the system to re-balance work from that node across remaining nodes

5) Location Aware Cache
  - need to execute work on where an object is faulted
  - need to query the system about where an object, or a list of objects, is faulted

NOTE:
Use case of switching to local data could be construed as #1, but is a much more involved use case, so while someone could use the solution for Use Case #1, we aren't specifically targeting that capability.

SUGGESTED SOLUTIONS
===================================================
Roughly the following "things" seem to solve the use cases:

1) Topology Change Events
  - node joined ? - no "real" use case for it - regular code techniques can be used
  - node left

2) Cluster Operational Events
  - tc operations are enabled
  - tc operations are disabled

3) Data Aware Information
  - a list of nodes where an object is faulted
  - a list of list of nodes where a list of objects is faulted
  - out of this list of objects, which ones are not faulted anywhere

============ DRAFT IMPLEMENTATION THOUGHTS ===================

At the meeting of today we decided that the use cases, events and data aware operations are sufficient and appropriate.

We also decided to focus on a non JMX API for several reasons:
* artificially introduces a whole infrastructure that is not needed and that is leaky
* makes it more difficult for the developer to integrate
* is not a compile-type API, which means less safety

We're going to design a POJO API with compile-time safety that is based on dependency injection. The first idea is to annotate a field a being a 'DSOClusterUtil' (or whatever name). This would then cause DSO to inject a local instance of that utility class, providing the developers with API methods to perform listener registration and data locality inspection.

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users
_______________________________________________
tc-forge-dev mailing list
tc-forge-dev@...
http://lists.terracotta.org/mailman/listinfo/tc-forge-dev



_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] Cluster Events update

by Sergio Bossa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jan 16, 2009 at 6:04 PM, Taylor Gautier
<tgautier@...> wrote:

> 4) Master/Worker
>   - needs to know when a node has joined the system to re-balance work
> across all nodes (although this can easily be coded with wait/notify)
>   - needs to know when a node (and which node) has left the system to
> re-balance work from that node across remaining nodes

I'd like to add some thoughts about the Master/Worker use case...
Having notification for when a node joins and leaves the cluster is
not enough: what if the node joins, but the worker process is not
started? Or if it starts both a worker and a master process? How we
may know if the joining/leaving node is a master or worker?
What I mean to say is that we need to know both when a node
joins/leaves, and when something interesting happens in that node.
That is, it would be great to be able to send and receive
notifications of user defined events during node lifecycle, i.e.:
sending a user defined event when a node leaves the cluster.
What do you think?

> We also decided to focus on a non JMX API for several reasons:
> * artificially introduces a whole infrastructure that is not needed and that
> is leaky
> * makes it more difficult for the developer to integrate
> * is not a compile-type API, which means less safety
>
> We're going to design a POJO API with compile-time safety that is based on
> dependency injection.

Absolutely agree.
Let us know as your POJO APIs get shaped up, so that we can help with
our feedback.

Cheers,

Sergio B.

--
Sergio Bossa
Software Passionate, Java Technologies Specialist and Open Source Enthusiast.
Blog : http://sbtourist.blogspot.com
Sourcesense - making sense of Open Source : http://www.sourcesense.com
Pro-netics s.p.a. : http://www.pronetics.it
_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-dev] [tc-forge-dev] Cluster Events update

by Geert Bevin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sergio,

>
> I'd like to add some thoughts about the Master/Worker use case...
> Having notification for when a node joins and leaves the cluster is
> not enough: what if the node joins, but the worker process is not
> started? Or if it starts both a worker and a master process? How we
> may know if the joining/leaving node is a master or worker?
> What I mean to say is that we need to know both when a node
> joins/leaves, and when something interesting happens in that node.

Isn't that something that you can handle by keeping track of the  
different states yourself? Seems to me that with the events and node  
identifiers, you're able to deduct what you need to know. We were also  
thinking of always sending around the entire cluster topology data  
along with each event. This would make it easy to know the cluster  
state at the moment that the event was sent, without risking to base  
logic on an outdated cluster state.

> That is, it would be great to be able to send and receive
> notifications of user defined events during node lifecycle, i.e.:
> sending a user defined event when a node leaves the cluster.
> What do you think?

It might be interesting to design something like that but I'm  
wondering if this should be part of the core cluster events. It seems  
to me that this could be a forge project that consumed the cluster  
events and sets up a messaging channel for the user events. I  
personally prefer to keep the core functionalities as focused as  
possible, as long as of course all other required features can be  
implemented as a TIM.

>> We also decided to focus on a non JMX API for several reasons:
>> * artificially introduces a whole infrastructure that is not needed  
>> and that
>> is leaky
>> * makes it more difficult for the developer to integrate
>> * is not a compile-type API, which means less safety
>>
>> We're going to design a POJO API with compile-time safety that is  
>> based on
>> dependency injection.
>
> Absolutely agree.
> Let us know as your POJO APIs get shaped up, so that we can help with
> our feedback.

Thanks, we'll send out the API suggestion when it starts taking shape.

Take care,

Geert

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Flytecase Band - http://flytecase.be
Music and words - http://gbevin.com

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] Cluster Events update

by Geert Bevin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We sort of started migrating away from monitoring during our latest  
discussion about cluster events since there's already an overlap in  
functionality between the JMX API that the admin console uses and the  
cluster events that are currently being sent out. We were thinking of  
gearing the new cluster events towards the programmer of the  
application, ie. something that is used through a Java API and  
annotations from within a DSO application that is part of the cluster.  
We can then independently clean up the admin console JMX API for  
public consumption and have two clearly distinct approaches without  
overlap.

On 16 Jan 2009, at 20:31, Kunal wrote:

> This would be generic monitoring and would benefit all use cases to  
> ease monitoring in production, IMHO.
>
> Kunal.
>
>
>
> From: Steven Harris <steve@...>
> Date: Fri, 16 Jan 2009 11:00:37 -0800
> To: Kunal Bhasin <kbhasin@...>
> Cc: Untitled 5 <tc-users@...>, tc-forge-dev <tc-forge-dev@...
> >, tc-dev <tc-dev@...>
> Subject: Re: [tc-forge-dev] [tc-users] Cluster Events update
>
> For which usecase?
>
>
> Cheers,
> Steve Harris
> "Terracotta.  It's ten pounds of awesome in a five pound sack. <http://www.miketec.org/serendipity/index.php?/archives/7-Oracle-and-Postgres-Redux.html 
> > "
>
>
>
>
>
> On Jan 16, 2009, at 10:57 AM, kbhasin@... wrote:
>
>> I would also recommend adding notifications for the L2 processes  
>> starting and stopping. At least broadcasting the status of the L2  
>> based on L2 group comm would be a good start.
>>
>> Also, sending the ip/hostname of the machine along with the  
>> clientid would also be useful.
>>
>> Kunal
>> --Sent while mobile--
>>
>> On Jan 16, 2009, at 9:04 AM, Taylor Gautier <tgautier@...
>> > wrote:
>>
>>> For the upcoming release of Terracotta, we are considering  
>>> changing the eventing mechanism.  So far, our design discussions  
>>> have identified some core use cases, and we think we have  
>>> identified the general strategy.  I'd love to hear any comments  
>>> from our users on this new direction.
>>>
>>> ============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS  
>>> ===================
>>> MOTIVATION
>>> ===================================================
>>> 1) The current eventing mechanism has race conditions
>>>
>>> 2) The implementation is confusing slightly - a disconnected event  
>>> comes to a node for two occasions - a) if a node is disconnected  
>>> then the node gets a this.disconnected event. This event can never  
>>> have perfect knowledge, and the node may reconnect at some time.  
>>> b) if another node is quarantined (never to rejoin) it gets a  
>>> nodeX disconnected.
>>>
>>> Since these two events are named similarly, they overload the  
>>> meaning of "disconnected". In the first case the connection has  
>>> been severed, but may be restored, while the second is an absolute  
>>> measure of the membership of the cluster - the server sent the  
>>> message so it is definitive.
>>>
>>> USE CASES
>>> ===================================================
>>> 1) Client needs to change behavior when TC is no longer able to  
>>> service operations - e.g. kill themselves
>>>
>>> 2) Map evictor use case
>>>   - needs to know when a node has left the system.
>>>   - needs to query the system to know from a list of objects what  
>>> objects are not faulted into any client (it is accepted that this  
>>> query is async and the response is guaranteed to be out of date)
>>>
>>> 3) Clustered async use case
>>>   - needs to know if a node has left the system.
>>>   - needs to query the system to know from a list of objects which  
>>> ones are "orphaned" - e.g. are no longer accepted (this may be  
>>> identical to 2a)
>>>
>>> 4) Master/Worker
>>>   - needs to know when a node has joined the system to re-balance  
>>> work across all nodes (although this can easily be coded with wait/
>>> notify)
>>>   - needs to know when a node (and which node) has left the system  
>>> to re-balance work from that node across remaining nodes
>>>
>>> 5) Location Aware Cache
>>>   - need to execute work on where an object is faulted
>>>   - need to query the system about where an object, or a list of  
>>> objects, is faulted
>>>
>>> NOTE:
>>> Use case of switching to local data could be construed as #1, but  
>>> is a much more involved use case, so while someone could use the  
>>> solution for Use Case #1, we aren't specifically targeting that  
>>> capability.
>>>
>>> SUGGESTED SOLUTIONS
>>> ===================================================
>>> Roughly the following "things" seem to solve the use cases:
>>>
>>> 1) Topology Change Events
>>>   - node joined ? - no "real" use case for it - regular code  
>>> techniques can be used
>>>   - node left
>>>
>>> 2) Cluster Operational Events
>>>   - tc operations are enabled
>>>   - tc operations are disabled
>>>
>>> 3) Data Aware Information
>>>   - a list of nodes where an object is faulted
>>>   - a list of list of nodes where a list of objects is faulted
>>>   - out of this list of objects, which ones are not faulted anywhere
>>>
>>> ============ DRAFT IMPLEMENTATION THOUGHTS ===================
>>>
>>> At the meeting of today we decided that the use cases, events and  
>>> data aware operations are sufficient and appropriate.
>>>
>>> We also decided to focus on a non JMX API for several reasons:
>>> * artificially introduces a whole infrastructure that is not  
>>> needed and that is leaky
>>> * makes it more difficult for the developer to integrate
>>> * is not a compile-type API, which means less safety
>>>
>>> We're going to design a POJO API with compile-time safety that is  
>>> based on dependency injection. The first idea is to annotate a  
>>> field a being a 'DSOClusterUtil' (or whatever name). This would  
>>> then cause DSO to inject a local instance of that utility class,  
>>> providing the developers with API methods to perform listener  
>>> registration and data locality inspection.
>>>
>>> _______________________________________________
>>> tc-users mailing list
>>> tc-users@...
>>> http://lists.terracotta.org/mailman/listinfo/tc-users
>> _______________________________________________
>> tc-forge-dev mailing list
>> tc-forge-dev@...
>> http://lists.terracotta.org/mailman/listinfo/tc-forge-dev
>
>
> _______________________________________________
> tc-users mailing list
> tc-users@...
> http://lists.terracotta.org/mailman/listinfo/tc-users

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Flytecase Band - http://flytecase.be
Music and words - http://gbevin.com

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] [tc-dev] Cluster Events update

by Sergio Bossa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Geert,

> Isn't that something that you can handle by keeping track of the
> different states yourself?

Not so easily.
Take the dynamic master/worker implementation as an example: masters
need to know when a worker disconnects from the cluster due to a node
failure, how to do that without a proper event managed by the TC
runtime?
Right now, given that I recently removed JMX events from the
implementation, I'm using a queue-based keep-alive mechanism, but
using a TC event would make the whole stuff easier and probably more
efficient.
And I have to say I can't wait to add the master/worker implementation
based on the new TC events, as soon as they will be available ;)

> We were also
> thinking of always sending around the entire cluster topology data
> along with each event. This would make it easy to know the cluster
> state at the moment that the event was sent, without risking to base
> logic on an outdated cluster state.

A better approach may be to use some kind of annotation-based
dependency injection: that is, inject into TC-managed objects a
"ClusterTopology" object containing information about the current
cluster state: what do you think?

>> That is, it would be great to be able to send and receive
>> notifications of user defined events during node lifecycle, i.e.:
>> sending a user defined event when a node leaves the cluster.
>> What do you think?
>
> It might be interesting to design something like that but I'm
> wondering if this should be part of the core cluster events. It seems
> to me that this could be a forge project that consumed the cluster
> events and sets up a messaging channel for the user events. I
> personally prefer to keep the core functionalities as focused as
> possible, as long as of course all other required features can be
> implemented as a TIM.

I agree with you: the smallest the core functionalities are, the better it is.
However, I think that the messaging channel you are talking about
should be managed by the TC runtime: that is, you should be able to
set-up user defined events even for the node lifecycle bounds, i.e.
node connection and disconnection.

> Thanks, we'll send out the API suggestion when it starts taking shape.

Great, I can't wait for them ;)
Cheers,

Sergio B.

--
Sergio Bossa
Software Passionate, Java Technologies Specialist and Open Source Enthusiast.
Blog : http://sbtourist.blogspot.com
Sourcesense - making sense of Open Source : http://www.sourcesense.com
Pro-netics s.p.a. : http://www.pronetics.it
_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Parent Message unknown Re: [tc-forge-dev] [tc-dev] Cluster Events update

by Geert Bevin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sergio,

Thanks for your comments.

>> Isn't that something that you can handle by keeping track of the
>> different states yourself?
>
> Not so easily.
> Take the dynamic master/worker implementation as an example: masters
> need to know when a worker disconnects from the cluster due to a node
> failure, how to do that without a proper event managed by the TC
> runtime?

You would get a "topology changed" event with the current cluster  
topology as a data element. You then compare that with a master/worker-
specific clustered data structure that keeps track of which nodes are  
masters and which ones are workers.

If the node simply disconnects because a connection is temporarily  
severed, you would get an "operations disabled" event. There's thus a  
clear difference between both now.

Maybe I'm missing something though, but it seems to me that with a  
small amount of coding you could get want you need. If not, can you  
please tell me what the problems would be since I don't know the  
internals of the master/worker implementation?

> Right now, given that I recently removed JMX events from the
> implementation, I'm using a queue-based keep-alive mechanism, but
> using a TC event would make the whole stuff easier and probably more
> efficient.
> And I have to say I can't wait to add the master/worker implementation
> based on the new TC events, as soon as they will be available ;)

Cool! We need use cases :-)

>> We were also
>> thinking of always sending around the entire cluster topology data
>> along with each event. This would make it easy to know the cluster
>> state at the moment that the event was sent, without risking to base
>> logic on an outdated cluster state.
>
> A better approach may be to use some kind of annotation-based
> dependency injection: that is, inject into TC-managed objects a
> "ClusterTopology" object containing information about the current
> cluster state: what do you think?

We discussed that, the problem is that this is then dissociated from  
the event and that there's always a possibility of that field not  
corresponding to an event that's being processed. Since topology  
changes events are supposed to be rare, the additional overhead of  
sending the node IDs in the cluster along with the node types would be  
neglectable. This seemed like the simplest approach.

>>> That is, it would be great to be able to send and receive
>>> notifications of user defined events during node lifecycle, i.e.:
>>> sending a user defined event when a node leaves the cluster.
>>> What do you think?
>>
>> It might be interesting to design something like that but I'm
>> wondering if this should be part of the core cluster events. It seems
>> to me that this could be a forge project that consumed the cluster
>> events and sets up a messaging channel for the user events. I
>> personally prefer to keep the core functionalities as focused as
>> possible, as long as of course all other required features can be
>> implemented as a TIM.
>
> I agree with you: the smallest the core functionalities are, the  
> better it is.
> However, I think that the messaging channel you are talking about
> should be managed by the TC runtime: that is, you should be able to
> set-up user defined events even for the node lifecycle bounds, i.e.
> node connection and disconnection.

Would you mind providing a bit more detail about how you would see  
this work? Maybe some snippets of pseudo code to illustrate the usage  
for some of your specific problems. Thanks!

Take care,

Geert

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Flytecase Band - http://flytecase.be
Music and words - http://gbevin.com

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] [tc-dev] Cluster Events update

by Sergio Bossa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jan 19, 2009 at 2:32 PM, Geert Bevin <gbevin@...> wrote:

> You would get a "topology changed" event with the current cluster topology
> as a data element. You then compare that with a master/worker-specific
> clustered data structure that keeps track of which nodes are masters and
> which ones are workers.
>
> If the node simply disconnects because a connection is temporarily severed,
> you would get an "operations disabled" event. There's thus a clear
> difference between both now.

I think I am missing something.
I was talking about JMX events, but it seems you are talking about
something different: are you talking about another kind of TC events
already available with Terracotta, which I don't know of?

> Maybe I'm missing something though, but it seems to me that with a small
> amount of coding you could get want you need. If not, can you please tell me
> what the problems would be since I don't know the internals of the
> master/worker implementation?

That's exactly my point: with proper TC events, I'd be able to do that
with little code.
Current JMX events have instead too much limitations.

That's because I would need to send an event on both user-defined time
(which leads me to the other point of my thoughts), and node
disconnection.
Consider the following timeline:
1 - A node connects.
2 - After some time, a worker starts on that node.
3 - Then, the worker is stopped.
4 - After some time, the worker starts again.
5 - Then, the worker disconnects due to a node failure.
Now some considerations:
* At 1, I don't want the master (on whatever node) to consider the
just connected node as an available worker, because there's no worker
actually started.
* At 2, 3 and 4, the worker must notify the master of its
connection/disconnection: this can be addressable through custom code.
* 5 is the same as 3, but right now the former should be implemented
with JMX events, while the latter with custom code, which is a mess.

Here is what I mean when saying: user-defined events managed by the TC runtime.
It's just a way to handle with the same code both "core" TC events
(i.e. node disconnection) and "user" TC events (i.e. worker stopping).

>> However, I think that the messaging channel you are talking about
>> should be managed by the TC runtime: that is, you should be able to
>> set-up user defined events even for the node lifecycle bounds, i.e.
>> node connection and disconnection.
>
> Would you mind providing a bit more detail about how you would see this
> work? Maybe some snippets of pseudo code to illustrate the usage for some of
> your specific problems.

I understand I may have been a bit unclear :)

Let me restate with a simple example, referred again to the master/worker.
A master (on whatever node) needs to know when a worker (on whatever
node, even on the same node of the master) is disconnected.
The worker may disconnect because it's programmatically stopped, or
because of a node disconnection.
In my mind, the master should be able to listen to a TC event sent on
both use cases above: that is, when a worker is stopped, and when a
worker dies because of node disconnection. I don't really care if it's
the same event, or rather two different events: the most important
thing is that both events should be sent by the TC server to the TC
client where the master is.

Is it clearer now?
What do you think?
As a side note, I'm almost always online on skype during standard
working hours, my nick is sergio_bossa: feel free to ping me if you
want to chat (by writing, I cannot talk here in the office) about
that.

Cheers,

Sergio B.

--
Sergio Bossa
Software Passionate, Java Technologies Specialist and Open Source Enthusiast.
Blog : http://sbtourist.blogspot.com
Sourcesense - making sense of Open Source : http://www.sourcesense.com
Pro-netics s.p.a. : http://www.pronetics.it
_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] [tc-dev] Cluster Events update

by Geert Bevin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I think I am missing something.
> I was talking about JMX events, but it seems you are talking about
> something different: are you talking about another kind of TC events
> already available with Terracotta, which I don't know of?

I'm talking about the clustered events we're (re)designing and that  
Taylor sent out the draft for.

> That's exactly my point: with proper TC events, I'd be able to do that
> with little code.
> Current JMX events have instead too much limitations.
>
> That's because I would need to send an event on both user-defined time
> (which leads me to the other point of my thoughts), and node
> disconnection.
> Consider the following timeline:
> 1 - A node connects.
> 2 - After some time, a worker starts on that node.
> 3 - Then, the worker is stopped.
> 4 - After some time, the worker starts again.
> 5 - Then, the worker disconnects due to a node failure.
> Now some considerations:
> * At 1, I don't want the master (on whatever node) to consider the
> just connected node as an available worker, because there's no worker
> actually started.
> * At 2, 3 and 4, the worker must notify the master of its
> connection/disconnection: this can be addressable through custom code.
> * 5 is the same as 3, but right now the former should be implemented
> with JMX events, while the latter with custom code, which is a mess.
>
> Here is what I mean when saying: user-defined events managed by the  
> TC runtime.
> It's just a way to handle with the same code both "core" TC events
> (i.e. node disconnection) and "user" TC events (i.e. worker stopping).

I still see this as a custom notification/messaging scheme in your  
application, certainly if you want to abstract actual cluster events  
and process-based events. If you get the cluster event that tells you  
that the topology changed, you can consume it and send out an event  
that is specific for your application.

Since the POJO based API will be based on interfaces that you  
implement and register with a TC-injected utility object, you can use  
the typical Java approach of implementing several interfaces by the  
same listener class. This would allow you to bind both together.

The cluster events we're thinking of in the light of this redesign is  
to provide information about the state of the cluster that is not  
possible to obtain otherwise. Extending this towards allowing people  
to register and trigger their own events through the server would imho  
cross the lines of the purpose of Terracotta. DSO gives you the core  
tools to make things possible, not necessarily with a ready-cooked  
implementation or interface for all use cases. We extend that  
functionality through TIMs that are much higher level. So the way I  
see it is that as long as the cluster events redesign allows a TIM to  
be written with higher level features, its scope is correct.

There are other reasons for the latter though, for instance, we don't  
want to keep state about the routing and the queue of your custom  
messages in the server. Also, what about applications that don't need/
want this, they shouldn't need to pay any penalty for it, etc...  
Finally, since these features do'nt seem to be tied directly into the  
core of DSO itself, they don't need to be tied to its release  
schedule. We can push out newer versions much quicker. Those are other  
reasons why we try to extract these things out of the core and into  
TIMs as much as possible.

>>> However, I think that the messaging channel you are talking about
>>> should be managed by the TC runtime: that is, you should be able to
>>> set-up user defined events even for the node lifecycle bounds, i.e.
>>> node connection and disconnection.
>>
>> Would you mind providing a bit more detail about how you would see  
>> this
>> work? Maybe some snippets of pseudo code to illustrate the usage  
>> for some of
>> your specific problems.
>
> I understand I may have been a bit unclear :)
>
> Let me restate with a simple example, referred again to the master/
> worker.
> A master (on whatever node) needs to know when a worker (on whatever
> node, even on the same node of the master) is disconnected.
> The worker may disconnect because it's programmatically stopped, or
> because of a node disconnection.
> In my mind, the master should be able to listen to a TC event sent on
> both use cases above: that is, when a worker is stopped, and when a
> worker dies because of node disconnection. I don't really care if it's
> the same event, or rather two different events: the most important
> thing is that both events should be sent by the TC server to the TC
> client where the master is.
>
> Is it clearer now?
> What do you think?

Think I answered this already above :-)

I suggest that I write out the detailed spec and API proposal first.  
If you have time, it would be awesome if you could check that the low-
level features you needed for master/worker are now there to at least  
make it possible to implement your needs. We can then later see about  
maybe creating a TIM with a higher level scope.

How does that sound?

Take care,

Geert

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Flytecase Band - http://flytecase.be
Music and words - http://gbevin.com

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-dev] [tc-forge-dev] Cluster Events update

by Taylor Gautier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Yes, agreed, this effort is entirely geared for application - not monitoring - usage.  

----- Original Message -----
From: "Geert Bevin" <gbevin@...>
To: tc-users@...
Cc: "tc-forge-dev" <tc-forge-dev@...>, "tc-dev" <tc-dev@...>
Sent: Monday, January 19, 2009 4:45:22 AM GMT -08:00 US/Canada Pacific
Subject: Re: [tc-dev] [tc-users] [tc-forge-dev]  Cluster Events update

We sort of started migrating away from monitoring during our latest  
discussion about cluster events since there's already an overlap in  
functionality between the JMX API that the admin console uses and the  
cluster events that are currently being sent out. We were thinking of  
gearing the new cluster events towards the programmer of the  
application, ie. something that is used through a Java API and  
annotations from within a DSO application that is part of the cluster.  
We can then independently clean up the admin console JMX API for  
public consumption and have two clearly distinct approaches without  
overlap.

On 16 Jan 2009, at 20:31, Kunal wrote:

> This would be generic monitoring and would benefit all use cases to  
> ease monitoring in production, IMHO.
>
> Kunal.
>
>
>
> From: Steven Harris <steve@...>
> Date: Fri, 16 Jan 2009 11:00:37 -0800
> To: Kunal Bhasin <kbhasin@...>
> Cc: Untitled 5 <tc-users@...>, tc-forge-dev <tc-forge-dev@...
> >, tc-dev <tc-dev@...>
> Subject: Re: [tc-forge-dev] [tc-users] Cluster Events update
>
> For which usecase?
>
>
> Cheers,
> Steve Harris
> "Terracotta.  It's ten pounds of awesome in a five pound sack. <http://www.miketec.org/serendipity/index.php?/archives/7-Oracle-and-Postgres-Redux.html
> > "
>
>
>
>
>
> On Jan 16, 2009, at 10:57 AM, kbhasin@... wrote:
>
>> I would also recommend adding notifications for the L2 processes  
>> starting and stopping. At least broadcasting the status of the L2  
>> based on L2 group comm would be a good start.
>>
>> Also, sending the ip/hostname of the machine along with the  
>> clientid would also be useful.
>>
>> Kunal
>> --Sent while mobile--
>>
>> On Jan 16, 2009, at 9:04 AM, Taylor Gautier <tgautier@...
>> > wrote:
>>
>>> For the upcoming release of Terracotta, we are considering  
>>> changing the eventing mechanism.  So far, our design discussions  
>>> have identified some core use cases, and we think we have  
>>> identified the general strategy.  I'd love to hear any comments  
>>> from our users on this new direction.
>>>
>>> ============ DRAFT REQUIREMENTS/DESIGN FOR CLUSTER EVENTS  
>>> ===================
>>> MOTIVATION
>>> ===================================================
>>> 1) The current eventing mechanism has race conditions
>>>
>>> 2) The implementation is confusing slightly - a disconnected event  
>>> comes to a node for two occasions - a) if a node is disconnected  
>>> then the node gets a this.disconnected event. This event can never  
>>> have perfect knowledge, and the node may reconnect at some time.  
>>> b) if another node is quarantined (never to rejoin) it gets a  
>>> nodeX disconnected.
>>>
>>> Since these two events are named similarly, they overload the  
>>> meaning of "disconnected". In the first case the connection has  
>>> been severed, but may be restored, while the second is an absolute  
>>> measure of the membership of the cluster - the server sent the  
>>> message so it is definitive.
>>>
>>> USE CASES
>>> ===================================================
>>> 1) Client needs to change behavior when TC is no longer able to  
>>> service operations - e.g. kill themselves
>>>
>>> 2) Map evictor use case
>>>   - needs to know when a node has left the system.
>>>   - needs to query the system to know from a list of objects what  
>>> objects are not faulted into any client (it is accepted that this  
>>> query is async and the response is guaranteed to be out of date)
>>>
>>> 3) Clustered async use case
>>>   - needs to know if a node has left the system.
>>>   - needs to query the system to know from a list of objects which  
>>> ones are "orphaned" - e.g. are no longer accepted (this may be  
>>> identical to 2a)
>>>
>>> 4) Master/Worker
>>>   - needs to know when a node has joined the system to re-balance  
>>> work across all nodes (although this can easily be coded with wait/
>>> notify)
>>>   - needs to know when a node (and which node) has left the system  
>>> to re-balance work from that node across remaining nodes
>>>
>>> 5) Location Aware Cache
>>>   - need to execute work on where an object is faulted
>>>   - need to query the system about where an object, or a list of  
>>> objects, is faulted
>>>
>>> NOTE:
>>> Use case of switching to local data could be construed as #1, but  
>>> is a much more involved use case, so while someone could use the  
>>> solution for Use Case #1, we aren't specifically targeting that  
>>> capability.
>>>
>>> SUGGESTED SOLUTIONS
>>> ===================================================
>>> Roughly the following "things" seem to solve the use cases:
>>>
>>> 1) Topology Change Events
>>>   - node joined ? - no "real" use case for it - regular code  
>>> techniques can be used
>>>   - node left
>>>
>>> 2) Cluster Operational Events
>>>   - tc operations are enabled
>>>   - tc operations are disabled
>>>
>>> 3) Data Aware Information
>>>   - a list of nodes where an object is faulted
>>>   - a list of list of nodes where a list of objects is faulted
>>>   - out of this list of objects, which ones are not faulted anywhere
>>>
>>> ============ DRAFT IMPLEMENTATION THOUGHTS ===================
>>>
>>> At the meeting of today we decided that the use cases, events and  
>>> data aware operations are sufficient and appropriate.
>>>
>>> We also decided to focus on a non JMX API for several reasons:
>>> * artificially introduces a whole infrastructure that is not  
>>> needed and that is leaky
>>> * makes it more difficult for the developer to integrate
>>> * is not a compile-type API, which means less safety
>>>
>>> We're going to design a POJO API with compile-time safety that is  
>>> based on dependency injection. The first idea is to annotate a  
>>> field a being a 'DSOClusterUtil' (or whatever name). This would  
>>> then cause DSO to inject a local instance of that utility class,  
>>> providing the developers with API methods to perform listener  
>>> registration and data locality inspection.
>>>
>>> _______________________________________________
>>> tc-users mailing list
>>> tc-users@...
>>> http://lists.terracotta.org/mailman/listinfo/tc-users
>> _______________________________________________
>> tc-forge-dev mailing list
>> tc-forge-dev@...
>> http://lists.terracotta.org/mailman/listinfo/tc-forge-dev
>
>
> _______________________________________________
> tc-users mailing list
> tc-users@...
> http://lists.terracotta.org/mailman/listinfo/tc-users

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Flytecase Band - http://flytecase.be
Music and words - http://gbevin.com

_______________________________________________
tc-dev mailing list
tc-dev@...
http://lists.terracotta.org/mailman/listinfo/tc-dev

_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users

Re: [tc-forge-dev] [tc-dev] Cluster Events update

by Sergio Bossa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jan 19, 2009 at 4:58 PM, Geert Bevin <gbevin@...> wrote:

> The cluster events we're thinking of in the light of this redesign is to
> provide information about the state of the cluster that is not possible to
> obtain otherwise. Extending this towards allowing people to register and
> trigger their own events through the server would imho cross the lines of
> the purpose of Terracotta. DSO gives you the core tools to make things
> possible, not necessarily with a ready-cooked implementation or interface
> for all use cases. We extend that functionality through TIMs that are much
> higher level. So the way I see it is that as long as the cluster events
> redesign allows a TIM to be written with higher level features, its scope is
> correct.

Absolutely agree: if the new TC events API will allow developers to
extend them in order to support custom events, that will be the
perfect solution.

> I suggest that I write out the detailed spec and API proposal first. If you
> have time, it would be awesome if you could check that the low-level
> features you needed for master/worker are now there to at least make it
> possible to implement your needs.

Sure: let me know as soon as your proposal draft will be ready.

> We can then later see about maybe creating
> a TIM with a higher level scope.

Then please consider me as a volunteer for such a TIM ;)

Cheers,

Sergio B.

--
Sergio Bossa
Software Passionate, Java Technologies Specialist and Open Source Enthusiast.
Blog : http://sbtourist.blogspot.com
Sourcesense - making sense of Open Source : http://www.sourcesense.com
Pro-netics s.p.a. : http://www.pronetics.it
_______________________________________________
tc-users mailing list
tc-users@...
http://lists.terracotta.org/mailman/listinfo/tc-users