Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Blogs » Chris Foot Blog » Chris Foot's Oracle10g Blog » Completing the Puzzle - Analyzing Agent to Management Server Communications
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3620
 

Completing the Puzzle - Analyzing Agent to Management Server Communications Completing the Puzzle - Analyzing Agent to Management Server Communications

This is the final blog of a three part series on troubleshooting 10G Grid Control agent to management server communication issues. I’ll start the discussion by summarizing a few key points from the previous two blogs.

I’ll also provide a few hints that will help you determine where to start the debugging process (management server or agent). I’ll complete this series by showing you how to activate detailed tracing on the agent and management server components. If you can’t identify the problem by analyzing error messages normally generated by the agents and management server, you’ll have to activate more detailed traces to gather additional diagnostic data.

Target Agent or Management Server?
In my previous two blogs, I discussed troubleshooting the 10G Enterprise Managment agents and management server. My intent was to provide you with a head start up the problem determination and analysis learning curve. Now that we have an understanding of the agent and management server environments, we need to determine which component should be analyzed first. Should we start our investigation on the management server or the target agents?

The information below should help you to determine the scope of the problem and where to start your analysis:

  • If the problem is happening on all of the monitored hosts, start your problem determination on the management server and repository.
  • If the problem is happening on a single host, check the status of the agent and then continue your problem determination on the individual targets before reviewing diagnostic information on the management server.
  • If the problem is occurring on an individual target (e.g. unable to communicate with a database or listener) and not the entire agent, the problem could be a permission issue with the agent. Start your problem determination on the targets.xml file to determine the accounts and passwords being used.

10G Enterprise Manager Information
Another way to determine the problem's scope is to review the information displayed in 10G Grid Control's agent administration and management repository services panels:

  • Management Services and Repository Overview panel. During normal processing, the Loader Backlog chart (upper right hand chart) will show a series of spikes. Notice that the blue line on our Loader Backlog chart shows a single spike. The spike means that a number of files were uploaded by the agent(s) and processed by the management server. If your blue line ever looks like the red line I have drawn as an example, your management server is not processing the uploaded files and you need to perform 10G Grid Control management server problem determination.

  • Management Services and Repository panel. The first block of information is the name of our management service (name removed for security reasons), the service's current status (Up, Down, Pending) and the last error that was generated. The block of information to the right shows the number of files waiting to be loaded and the directory that contains them. If the management service isn't processing files being uploaded by the agents, you'll see a high number in the "Files Pending Load" column.

  • Agent Administration panel. The agent administration screen lists all of the agents currently active in the 10G Grid Control environment. Each line displays the agent software version, status (up, down, problem), number of targets that are using the agent and the number of targets that aren't using the agent. Although I had to remove some of the information from this screen for security reasons, most of the dates show a Last Successful Load date of Sept. 12 while one shows a Last Successful Load date of August 30. That's a good indication that we are having problems with that agent. Each agent's name is a link that allows the user to view more detailed configuration information about that agent.

  • Agent Drill Down Panel. This panel provides detailed information on the agent's configuration, status, resource utilization, targets monitored and upload information. The most important piece of information on the agent administration panel is the column titled 'Last Successful Load'.

Management Server and Agent Logs
We continue our analysis by logging on to the hardware servers that are hosting the management server and agent processes. I don't want to rehash the information I provided in the agent and management server troubleshooting blogs. The instructions contained in these blogs should help you to identify the problems that are preventing agent to management server communications from occurring.

The files and directories listed below will be used during the analysis process:

Agent

  • Agent $OH/sysman/config/emd.properties - Agent configuration file.
  • Agent $OH/sysman/log/emagent.log - Agent log information.
  • Agent $OH/sysman/log/emagent.trc - Agent trace information.

Management Server

  • OMS $OH/sysman/config/emoms.properties - Management server configuration file.
  • OMS $OH/sysman/config/emomslogging.properties - Management server trace activation and configuration file.
  • OMS $OH/sysman/log/emoms.log - Management server log information.
  • OMS $OH/sysman/log/emoms.trc - Management server trace information.
  • OMS $OH/sysman/recv/errors/*.* - Directory containing error messages pertaining to agent files that could not be processed.

Activating Detailed Tracing
If the information in the agent and management server log and trace files don't provide you with enough information to identify the problem, you may need to activate more detailed traces to gather additional diagnostic data. The 10G EM agent and management server components provide configuration files that allow administrators to activate traces that produce more detailed tracing information.

The possible logging levels for both the agents and management server components are:

  • ERROR - Reports only critical errors.
  • WARN - Reports critical errors and warning.
  • INFO - Includes informational messages.
  • DEBUG - Full debug trace.

Agent Tracing
The agent's $OH/sysman/config/emd.properties file provides parameters that control tracing and logging file sizes and rotation limits. 10G Grid Control, by default, allows trace and log file sizes to attain a maximum size of 4096 KB before renaming them and creating a new current trace file. The LogFileMaxRolls, LogFileMaxSize, TrcFileMaxRolls and TrcFileMaxSize parameters are used to tailor the file size and number of backups for tracing and logging files.

Logging is performed in a hierarchical manner with the tracelevel.main being the highest level. All other components inherit the logging level from the components above them in the hierarchy. The default logging level for tracelevel.main is WARN meaning that all agent modules use this setting as their default. I have provided a subset of the emd.properties file containing the modules and their default trace settings.

In the sample I provided, tracelevel.fetchlets would be a parent in the hierarchy and tracelevel.fetchlets.os, tracelevel.fetchlets.osline, tracelevel.fetchlets.oslinetok, etc. would be children of that parent component. If we change tracelevel.fetchlets' trace setting to DEBUG, all children components would inherit that level of tracing.

To activate more detailed tracing, change the component's associated trace parameter in the emd.properties file, and recycle the agent using the "emctl stop agent" and "emctl start agent" or "emctl reload agent" commands. Please note that the value supplied in the tracing parameter must be entered in uppercase letters.

Management Server Tracing
The steps to activate management server tracing are similar to the steps required to activate tracing for the agents. 10G Grid Control provides a tracing configuration file $OH/sysman/config/emomslogging.properties that allows administrators to activate and configure tracing for the management server and repository services.

To activate more detailed tracing on the management server, change the "log4j.rootCategory=WARN, emlogAppender, emtrcAppendertrace" parameter in the emomslogging.properties file and recycle the management server using the "emctl stop oms" and "emctl start oms" commands. Please note that once again, the value supplied in the tracing parameter must be entered in uppercase letters.

Wrapup
I hope you enjoyed this mini-series on debugging agent to management server communication failures. The intention of this series was to create a foundation of knowledge that would assist you in the analysis process. Oracle's Metalink website provides a wealth of information on the 10G Enterprise Server environment. We currently have a 100% success rate using Metalink documents to solve our agent to management server communication problems. It is highly recommended that you leverage the information in Metalink early and often during problem analysis.


Monday, October 03, 2005  |  Permalink |  Comments (0)
trackback URL:   http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2005-10-01.9667575745/sbtrackback
 

Powered by Plone