More odd behaviour

View: New views
6 Messages — Rating Filter:   Alert me  

More odd behaviour

by Rawlings, Bill A :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Ok guys, I’ve been working on an application to start, monitor and stop the River core services (LUS, TM, Space,  Class Server). 

 

I’m using an event driven model using Lookup Cache and ServiceDiscoveryListener.

 

There is a ServiceDiscoveryManager that is started to set up the LookupCaches

 

      mgr = new LookupDiscoveryManager(DiscoveryGroupManagement.ALL_GROUPS,

                                       null,         // unicast locators

                                       null);   // DiscoveryListener

      sdm = new ServiceDiscoveryManager(mgr, new LeaseRenewalManager());

 

I was having trouble stopping the LUS.  I had originally used the SDM directly with a DiscoveryListener, but kept getting uncatchable exceptions from the SDM when I killed the LUS.  I can’t terminate the SDM in this app (that stops the exception if you do), because I want to monitor the system as long as the app is running.

 

So, I tried to use a LookupCache for the ServiceRegistrar…

 

      classes = new Class[] {ServiceRegistrar.class};                               

      template = new ServiceTemplate(null,

                                     classes,

                                     null);

 

      lusCache = sdm.createLookupCache(template, null, lusMonitor);

 

I use one for the space and one the TM as well.

 

In the lusMonitor code I have this…

 

  public void serviceAdded(ServiceDiscoveryEvent evt)

  {

    ServiceItem si = evt.getPostEventServiceItem();

 

    Object service = si.service;

    if(service instanceof ServiceRegistrar)

    {

      sp.setStatusRunning();

    }

  }

 

  public void serviceRemoved(ServiceDiscoveryEvent serviceDiscoveryEvent)

  {

    sp.setStatusStopped();

  }

 

And the code that kills the LUS is in another class…

 

  public void stopLUS()

  {

    LookupCache cache = csm.getLUSCache();

    ServiceItem si = cache.lookup(null);

    Object lusProxy = si.service;

    if(lusProxy instanceof Administrable)

    {

      try

      {

        Object admin = ((Administrable)lusProxy).getAdmin();

        DestroyAdmin da = (DestroyAdmin)admin;

        cache.discard(si);

        da.destroy();

      }

      catch(Exception ex)

      {

        System.out.println("Error getting LUS DestroyAdmin");

        ex.printStackTrace();

      }

    }

  }

 

Ok, so, when the LUS starts, the ServiceAdded method in the ServiceDiscoveryListener (lusMonitor) is quickly invoked.

 

When the LUS is killed, it takes about 10 minutes for that event to be fired.  The LUS is dead, dead, dead, I assume it has been discarded from the LookupCache.

 

This works almost instantly for the space and TM, but it looks like a lease expiration or something is holding up the discard event for the LUS.

 

Any ideas on how to get around this?

 

BAR

-------------------------------------------------------------------------- Getting Started: http://www.jini.org/wiki/Category:Getting_Started Community Web Site: http://jini.org jini-users Archive: http://archives.java.sun.com/archives/jini-users.html Unsubscribing: email "signoff JINI-USERS" to listserv@...

Re: More odd behaviour

by Niclas Hedhman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jan 19, 2009 at 8:16 PM, Rawlings, Bill A
<bill.a.rawlings@...> wrote:

> When the LUS is killed, it takes about 10 minutes for that event to be
> fired.  The LUS is dead, dead, dead, I assume it has been discarded from the
> LookupCache.

Sounds like RMI/JERI expiry of remote reference.

When you say "killed", how is that done? A nice and clean shutdown or
some abrupt process termination?

Cheers
Niclas
--
http://www.qi4j.org - New Energy for Java

--------------------------------------------------------------------------
Getting Started:     http://www.jini.org/wiki/Category:Getting_Started
Community Web Site:  http://jini.org
jini-users Archive:  http://archives.java.sun.com/archives/jini-users.html
Unsubscribing:       email "signoff JINI-USERS"  to listserv@...

Re: More odd behaviour

by Gregg Wonderly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rawlings, Bill A wrote:
> Ok guys, I’ve been working on an application to start, monitor and stop
> the River core services (LUS, TM, Space,  Class Server).

> I was having trouble stopping the LUS.  I had originally used the SDM
> directly with a DiscoveryListener, but kept getting uncatchable
> exceptions from the SDM when I killed the LUS.  I can’t terminate the
> SDM in this app (that stops the exception if you do), because I want to
> monitor the system as long as the app is running.

This is an old "feature request."  At issue is that reggie does not use
Runtime.addShutdownHook() to cause it to send out appropriate events at
termination.  Thus, you don't see it disappear until the notify() leases expire.

Gregg Wonderly

--------------------------------------------------------------------------
Getting Started:     http://www.jini.org/wiki/Category:Getting_Started
Community Web Site:  http://jini.org
jini-users Archive:  http://archives.java.sun.com/archives/jini-users.html
Unsubscribing:       email "signoff JINI-USERS"  to listserv@...

Re: More odd behaviour

by Greg Trasuk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bill:

        Look in the ServiceDiscoveryManager javadocs and specification for the
'discardWait' configuration parameter and teh "service discard
problem".  Essentially, the fact that a lookup service has disappeared
(and remember, there may very well be more than one) doesn't tell SDM
anything about the availability of a service.  If a service is
un-registered with all the LUS's, then they will notify the interested
SDM's that the service is gone.  However if SDM loses contact with one
or more LUS's, or if one or more LUS's still have a service
registration, SDM can't say for sure that the service is unregistered,
so it waits until the 'discardWait' period (default 10 minutes) expires
before it sends out notifications that the service is gone from the
lookup cache.  You can configure the wait time.

        However, there's a bigger concept here, that Jini newcomers often
miss.  I don't know if you're making this mistake, but for the benefit
of posterity, let me state it again:

- You don't know if a service has failed until you try to use it, and
you can't.
- Conversely, the fact that there is a service registration, or that you
can renew a lease with the service, tells you nothing about whether a
service is "up".
- And in an odd twist, the fact that a service registration disappears
from your lookup cache in no way indicates that the service is "down".
The LUS might be down, or the service may have unregistered itself for
some reason, but might still be open for business with its current
clients.  Perhaps the LUS could come back, or another LUS might take its
place, and the service might re-register with it.

I'll say it again for emphasis:
- You don't know a service has failed until you try to use it, and you
can't
- You don't know a service is operational until you try to use, and you
can.

Also, the instant after you use it, it might be gone.  So the best you
can do is put a time bound on how long it is until you know a service
has failed.  You would do this by actually accessing the service at some
interval.  Please don't make the mistake of thinking that renewing your
lease with a service proves that the service is operational.  It just
means whatever service is renewing leases is operational.

By the way, fully embracing this concept of partial failure and limited
knowledge of the overall system state is an important step on the road
from "Wow, Jini is complex" to "Wow, Jini is a work of genius".

Cheers,

Greg.

On Mon, 2009-01-19 at 14:16, Rawlings, Bill A wrote:

> Ok guys, I?ve been working on an application to start, monitor and
> stop the River core services (LUS, TM, Space,  Class Server).  
>
>  
>
> I?m using an event driven model using Lookup Cache and
> ServiceDiscoveryListener.
>
>  
>
> There is a ServiceDiscoveryManager that is started to set up the
> LookupCaches
>
>  
>
>       mgr = new
> LookupDiscoveryManager(DiscoveryGroupManagement.ALL_GROUPS,
>
>                                        null,         // unicast
> locators
>
>                                        null);   // DiscoveryListener
>
>       sdm = new ServiceDiscoveryManager(mgr, new
> LeaseRenewalManager());
>
>  
>
> I was having trouble stopping the LUS.  I had originally used the SDM
> directly with a DiscoveryListener, but kept getting uncatchable
> exceptions from the SDM when I killed the LUS.  I can?t terminate the
> SDM in this app (that stops the exception if you do), because I want
> to monitor the system as long as the app is running.
>
>  
>
> So, I tried to use a LookupCache for the ServiceRegistrar?
>
>  
>
>       classes = new Class[]
> {ServiceRegistrar.class};                                
>
>       template = new ServiceTemplate(null,
>
>                                      classes,
>
>                                      null);
>
>  
>
>       lusCache = sdm.createLookupCache(template, null, lusMonitor);
>
>  
>
> I use one for the space and one the TM as well.
>
>  
>
> In the lusMonitor code I have this?
>
>  
>
>   public void serviceAdded(ServiceDiscoveryEvent evt)
>
>   {
>
>     ServiceItem si = evt.getPostEventServiceItem();
>
>  
>
>     Object service = si.service;
>
>     if(service instanceof ServiceRegistrar)
>
>     {
>
>       sp.setStatusRunning();
>
>     }
>
>   }
>
>  
>
>   public void serviceRemoved(ServiceDiscoveryEvent
> serviceDiscoveryEvent)
>
>   {
>
>     sp.setStatusStopped();
>
>   }
>
>  
>
> And the code that kills the LUS is in another class?
>
>  
>
>   public void stopLUS()
>
>   {
>
>     LookupCache cache = csm.getLUSCache();
>
>     ServiceItem si = cache.lookup(null);
>
>     Object lusProxy = si.service;
>
>     if(lusProxy instanceof Administrable)
>
>     {
>
>       try
>
>       {
>
>         Object admin = ((Administrable)lusProxy).getAdmin();
>
>         DestroyAdmin da = (DestroyAdmin)admin;
>
>         cache.discard(si);
>
>         da.destroy();
>
>       }
>
>       catch(Exception ex)
>
>       {
>
>         System.out.println("Error getting LUS DestroyAdmin");
>
>         ex.printStackTrace();
>
>       }
>
>     }
>
>   }
>
>  
>
> Ok, so, when the LUS starts, the ServiceAdded method in the
> ServiceDiscoveryListener (lusMonitor) is quickly invoked.
>
>  
>
> When the LUS is killed, it takes about 10 minutes for that event to be
> fired.  The LUS is dead, dead, dead, I assume it has been discarded
> from the LookupCache.
>
>  
>
> This works almost instantly for the space and TM, but it looks like a
> lease expiration or something is holding up the discard event for the
> LUS.
>
>  
>
> Any ideas on how to get around this?
>
>  
>
> BAR
>
>
> -------------------------------------------------------------------------- Getting Started: http://www.jini.org/wiki/Category:Getting_Started Community Web Site: http://jini.org jini-users Archive: http://archives.java.sun.com/archives/jini-users.html Unsubscribing: email "signoff JINI-USERS" to listserv@...
--
Greg Trasuk, President
StratusCom Manufacturing Systems Inc. - We use information technology to
solve business problems on your plant floor.
http://stratuscom.com

--------------------------------------------------------------------------
Getting Started:     http://www.jini.org/wiki/Category:Getting_Started
Community Web Site:  http://jini.org
jini-users Archive:  http://archives.java.sun.com/archives/jini-users.html
Unsubscribing:       email "signoff JINI-USERS"  to listserv@...

Parent Message unknown More odd behaviour

by Rawlings, Bill A :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Thanks for the responses guys.  I figured it was a lease waiting to expire.  I guess I can get around it by setting the LUS status to be “down” right after the destroy() call, because that does indeed kill the LUS.

 

I’d like to use this problem as something I think is important for future River use.  This is an app I should not have to be writing.

 

There was a nice, really simple UI in Jini 1 that you could start and stop the cores services with,  this type of app really needs to come back into River.

 

We are finding our Systems Admin types do not want to deal with things like configuration and start scripts.  They want a nice clean UI they can use.

 

We got complaints about them having to use “ps” commands to kill the River core services.

 

BAR

-------------------------------------------------------------------------- Getting Started: http://www.jini.org/wiki/Category:Getting_Started Community Web Site: http://jini.org jini-users Archive: http://archives.java.sun.com/archives/jini-users.html Unsubscribing: email "signoff JINI-USERS" to listserv@...

Re: More odd behaviour

by Gregg Wonderly :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rawlings, Bill A wrote:
> Thanks for the responses guys.  I figured it was a lease waiting to
> expire.  I guess I can get around it by setting the LUS status to be
> “down” right after the destroy() call, because that does indeed kill the
> LUS.
>
> I’d like to use this problem as something I think is important for
> future River use.  This is an app I should not have to be writing.

The com.sun.jini.start.ServerStarter class illustrates how a container can be
created which uses the ApplicationDescriptor mechanisms to manage service
instance lifecycle.  It is configuration based in that application, but could be
done with a GUI as well.  I started to do something along these lines a while
back, but put it aside when some other work intervened.

What little I got done is out at: http://pescade.dev.java.net/

Gregg Wonderly

--------------------------------------------------------------------------
Getting Started:     http://www.jini.org/wiki/Category:Getting_Started
Community Web Site:  http://jini.org
jini-users Archive:  http://archives.java.sun.com/archives/jini-users.html
Unsubscribing:       email "signoff JINI-USERS"  to listserv@...