Discussion:
More odd behaviour
Rawlings, Bill A
2009-01-19 19:16:46 UTC
Permalink
Ok guys, I've been working on an application to start, monitor and stop
the River core services (LUS, TM, Space, Class Server).



I'm using an event driven model using Lookup Cache and
ServiceDiscoveryListener.



There is a ServiceDiscoveryManager that is started to set up the
LookupCaches



mgr = new
LookupDiscoveryManager(DiscoveryGroupManagement.ALL_GROUPS,

null, // unicast locators

null); // DiscoveryListener

sdm = new ServiceDiscoveryManager(mgr, new LeaseRenewalManager());



I was having trouble stopping the LUS. I had originally used the SDM
directly with a DiscoveryListener, but kept getting uncatchable
exceptions from the SDM when I killed the LUS. I can't terminate the
SDM in this app (that stops the exception if you do), because I want to
monitor the system as long as the app is running.



So, I tried to use a LookupCache for the ServiceRegistrar...



classes = new Class[] {ServiceRegistrar.class};


template = new ServiceTemplate(null,

classes,

null);



lusCache = sdm.createLookupCache(template, null, lusMonitor);



I use one for the space and one the TM as well.



In the lusMonitor code I have this...



public void serviceAdded(ServiceDiscoveryEvent evt)

{

ServiceItem si = evt.getPostEventServiceItem();



Object service = si.service;

if(service instanceof ServiceRegistrar)

{

sp.setStatusRunning();

}

}



public void serviceRemoved(ServiceDiscoveryEvent
serviceDiscoveryEvent)

{

sp.setStatusStopped();

}



And the code that kills the LUS is in another class...



public void stopLUS()

{

LookupCache cache = csm.getLUSCache();

ServiceItem si = cache.lookup(null);

Object lusProxy = si.service;

if(lusProxy instanceof Administrable)

{

try

{

Object admin = ((Administrable)lusProxy).getAdmin();

DestroyAdmin da = (DestroyAdmin)admin;

cache.discard(si);

da.destroy();

}

catch(Exception ex)

{

System.out.println("Error getting LUS DestroyAdmin");

ex.printStackTrace();

}

}

}



Ok, so, when the LUS starts, the ServiceAdded method in the
ServiceDiscoveryListener (lusMonitor) is quickly invoked.



When the LUS is killed, it takes about 10 minutes for that event to be
fired. The LUS is dead, dead, dead, I assume it has been discarded from
the LookupCache.



This works almost instantly for the space and TM, but it looks like a
lease expiration or something is holding up the discard event for the
LUS.



Any ideas on how to get around this?



BAR


--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Niclas Hedhman
2009-01-19 20:05:27 UTC
Permalink
On Mon, Jan 19, 2009 at 8:16 PM, Rawlings, Bill A
Post by Rawlings, Bill A
When the LUS is killed, it takes about 10 minutes for that event to be
fired. The LUS is dead, dead, dead, I assume it has been discarded from the
LookupCache.
Sounds like RMI/JERI expiry of remote reference.

When you say "killed", how is that done? A nice and clean shutdown or
some abrupt process termination?

Cheers
Niclas
--
http://www.qi4j.org - New Energy for Java

--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Greg Trasuk
2009-01-19 20:38:02 UTC
Permalink
Hi Bill:

Look in the ServiceDiscoveryManager javadocs and specification for the
'discardWait' configuration parameter and teh "service discard
problem". Essentially, the fact that a lookup service has disappeared
(and remember, there may very well be more than one) doesn't tell SDM
anything about the availability of a service. If a service is
un-registered with all the LUS's, then they will notify the interested
SDM's that the service is gone. However if SDM loses contact with one
or more LUS's, or if one or more LUS's still have a service
registration, SDM can't say for sure that the service is unregistered,
so it waits until the 'discardWait' period (default 10 minutes) expires
before it sends out notifications that the service is gone from the
lookup cache. You can configure the wait time.

However, there's a bigger concept here, that Jini newcomers often
miss. I don't know if you're making this mistake, but for the benefit
of posterity, let me state it again:

- You don't know if a service has failed until you try to use it, and
you can't.
- Conversely, the fact that there is a service registration, or that you
can renew a lease with the service, tells you nothing about whether a
service is "up".
- And in an odd twist, the fact that a service registration disappears
from your lookup cache in no way indicates that the service is "down".
The LUS might be down, or the service may have unregistered itself for
some reason, but might still be open for business with its current
clients. Perhaps the LUS could come back, or another LUS might take its
place, and the service might re-register with it.

I'll say it again for emphasis:
- You don't know a service has failed until you try to use it, and you
can't
- You don't know a service is operational until you try to use, and you
can.

Also, the instant after you use it, it might be gone. So the best you
can do is put a time bound on how long it is until you know a service
has failed. You would do this by actually accessing the service at some
interval. Please don't make the mistake of thinking that renewing your
lease with a service proves that the service is operational. It just
means whatever service is renewing leases is operational.

By the way, fully embracing this concept of partial failure and limited
knowledge of the overall system state is an important step on the road
from "Wow, Jini is complex" to "Wow, Jini is a work of genius".

Cheers,

Greg.
Ok guys, I?ve been working on an application to start, monitor and
stop the River core services (LUS, TM, Space, Class Server).
I?m using an event driven model using Lookup Cache and
ServiceDiscoveryListener.
There is a ServiceDiscoveryManager that is started to set up the
LookupCaches
mgr = new
LookupDiscoveryManager(DiscoveryGroupManagement.ALL_GROUPS,
null, // unicast locators
null); // DiscoveryListener
sdm = new ServiceDiscoveryManager(mgr, new
LeaseRenewalManager());
I was having trouble stopping the LUS. I had originally used the SDM
directly with a DiscoveryListener, but kept getting uncatchable
exceptions from the SDM when I killed the LUS. I can?t terminate the
SDM in this app (that stops the exception if you do), because I want
to monitor the system as long as the app is running.
So, I tried to use a LookupCache for the ServiceRegistrar?
classes = new Class[]
{ServiceRegistrar.class};
template = new ServiceTemplate(null,
classes,
null);
lusCache = sdm.createLookupCache(template, null, lusMonitor);
I use one for the space and one the TM as well.
In the lusMonitor code I have this?
public void serviceAdded(ServiceDiscoveryEvent evt)
{
ServiceItem si = evt.getPostEventServiceItem();
Object service = si.service;
if(service instanceof ServiceRegistrar)
{
sp.setStatusRunning();
}
}
public void serviceRemoved(ServiceDiscoveryEvent
serviceDiscoveryEvent)
{
sp.setStatusStopped();
}
And the code that kills the LUS is in another class?
public void stopLUS()
{
LookupCache cache = csm.getLUSCache();
ServiceItem si = cache.lookup(null);
Object lusProxy = si.service;
if(lusProxy instanceof Administrable)
{
try
{
Object admin = ((Administrable)lusProxy).getAdmin();
DestroyAdmin da = (DestroyAdmin)admin;
cache.discard(si);
da.destroy();
}
catch(Exception ex)
{
System.out.println("Error getting LUS DestroyAdmin");
ex.printStackTrace();
}
}
}
Ok, so, when the LUS starts, the ServiceAdded method in the
ServiceDiscoveryListener (lusMonitor) is quickly invoked.
When the LUS is killed, it takes about 10 minutes for that event to be
fired. The LUS is dead, dead, dead, I assume it has been discarded
from the LookupCache.
This works almost instantly for the space and TM, but it looks like a
lease expiration or something is holding up the discard event for the
LUS.
Any ideas on how to get around this?
BAR
--
Greg Trasuk, President
StratusCom Manufacturing Systems Inc. - We use information technology to
solve business problems on your plant floor.
http://stratuscom.com

--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Gregg Wonderly
2009-01-19 20:34:52 UTC
Permalink
Ok guys, I’ve been working on an application to start, monitor and stop
the River core services (LUS, TM, Space, Class Server).
I was having trouble stopping the LUS. I had originally used the SDM
directly with a DiscoveryListener, but kept getting uncatchable
exceptions from the SDM when I killed the LUS. I can’t terminate the
SDM in this app (that stops the exception if you do), because I want to
monitor the system as long as the app is running.
This is an old "feature request." At issue is that reggie does not use
Runtime.addShutdownHook() to cause it to send out appropriate events at
termination. Thus, you don't see it disappear until the notify() leases expire.

Gregg Wonderly

--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Rawlings, Bill A
2009-01-20 15:19:01 UTC
Permalink
Thanks for the responses guys. I figured it was a lease waiting to
expire. I guess I can get around it by setting the LUS status to be
"down" right after the destroy() call, because that does indeed kill the
LUS.



I'd like to use this problem as something I think is important for
future River use. This is an app I should not have to be writing.



There was a nice, really simple UI in Jini 1 that you could start and
stop the cores services with, this type of app really needs to come
back into River.



We are finding our Systems Admin types do not want to deal with things
like configuration and start scripts. They want a nice clean UI they
can use.



We got complaints about them having to use "ps" commands to kill the
River core services.



BAR


--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Gregg Wonderly
2009-01-20 20:19:18 UTC
Permalink
Post by Rawlings, Bill A
Thanks for the responses guys. I figured it was a lease waiting to
expire. I guess I can get around it by setting the LUS status to be
“down” right after the destroy() call, because that does indeed kill the
LUS.
I’d like to use this problem as something I think is important for
future River use. This is an app I should not have to be writing.
The com.sun.jini.start.ServerStarter class illustrates how a container can be
created which uses the ApplicationDescriptor mechanisms to manage service
instance lifecycle. It is configuration based in that application, but could be
done with a GUI as well. I started to do something along these lines a while
back, but put it aside when some other work intervened.

What little I got done is out at: http://pescade.dev.java.net/

Gregg Wonderly

--------------------------------------------------------------------------
Getting Started: http://www.jini.org/wiki/Category:Getting_Started
Community Web Site: http://jini.org
jini-users Archive: http://archives.java.sun.com/archives/jini-users.html
Unsubscribing: email "signoff JINI-USERS" to ***@java.sun.com
Continue reading on narkive:
Loading...