Agent starts, sends autodiscovery info, then just sitting inactive, no metric

I've been running Hyperic HQ since 3.2.3 configuration with about 10 Linux and Windows servers. I've recently updated since 4.0.3 to 4.1.2. All my agents worked fine in version 4.0.3 but now, some agents are acting strange. The agent installation went well on each server. I have them associated with the server they handshaked, and everything seems good. I've done several times, so I'm confident on the installation procedure. All my agents on Red Hat/CentOS work very well. I have an agent on an Ubuntu box and two agents on Windows boxes that are acting strange. Before the upgrade, they worked very well.

Now, what happens is, when I start the agent, load plugins, sends the report to Autodiscover on the server and running the automatic discovery of Runtime such as the following:
2009-06-01 04:16:17, 774 INFO [Thread-1] [RuntimeAutodiscoverer] run runtime autodiscovery for Agent HQ
2009-06-01 04:16:17, 787 INFO [Thread-1] [RuntimeAutodiscoverer] discovered HQ Agent took 0

Unless I do something to force something to appear in the log, a line like this is generally the last line, I see in the agent.log little matter how long wait. If nothing is changed, I see the changes appear in the dashboard HQ QREA window. The problem is that, at this point, the agent will just idle and never does anything else again. He does not die, he is clearly still running. If I'm doing something to the platform on HQ that causes the server to communicate with the agent, I can see something in the newspaper. As, if I define a Script Service, but give a nonexistent file name, which will take a file not found exception in the log, so I know the talks from the server to the agent.

There is simply no metrics data are collected, or any other activity at all in agent.log after initial start-up or other forced activity. Headquarters shows the platform down with all the red flags. What could possibly be the cause the agent just be brain dead like that?

Does anyone have any advice on how I can begin to solve this problem? I searched in the server.log also but I have no idea where to start. I don't see errors in there that jump out at me. What kinds of things in the server.log that is of interest to this problem?

Lee

Deleting data directory does not change that a lot of things on the side server. I think that platfrom removing the server and then re-instatiating agent would do the trick. But maybe this isn't something you want to do if you want to keep the parameters collected.

I wonder if this is a new bug due to the upgrade. Look at jira has nothing similar.

I hope that someone from developers AC would pick this up. I don't know any direct means to update these tables. Perhaps by groovy it would be possible to collect mertic platforms failed id and call a method spesific on them. It's too long path to follow and would sound a little weird thing to do.

Tags: VMware

Similar Questions

Maybe you are looking for