Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Blogs » Chris Foot Blog » Chris Foot's Oracle10g Blog » Using 10G OEM Grid Control's Host Performance Monitoring and Tuning Features
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3626
 

Using 10G OEM Grid Control's Host Performance Monitoring and Tuning Features Using 10G OEM Grid Control's Host Performance Monitoring and Tuning Features

We took a quick detour from 10G in my last blog. In this blog, we get back on topic by discussing 10G Enterprise Manager's host performance monitoring and tuning features. In the final installment of this two-part series, I'll show you how to use 10G Grid Control's drilldown capabilities to identify performance problems.

Let's continue our discussion on 10G Grid Control host performance monitoring and tuning features by reviewing what currently happens in many shops. I'll use UNIX in this example but the same events can be applied to any operating system.

A performance issue arises and the hapless DBA is notified by either:

  1. The UNIX support person who is more than happy to verbally berate the DBA and his database for dragging down his perfectly configured and administered hardware platform, or
  2. An application developer who was just called by an irate user stating that every time they run their program nothing comes back, or
  3. The irate user decides to eliminate the middle man (see number "2" above) and contacts (yells) at the DBA directly.

Wouldn't it be better if the events occurred in this manner?

A performance issue arises and the DBA is paged by a 10G Enterprise Manager CPU Utilization Alert that is automatically generated by the database. The DBA brings up 10G Grid Control's host performance home page and immediately determines that CPU utilization has suddenly spiked to 100%. The DBA scrolls down and reviews the Top Processes by CPU Report and finds that all of them are parallel processes running on database DWPROD3. Our intrepid DBA navigates to the DWPROD3 performance home page, locates the offending SQL and finds that the developer has overridden the parallel setting by using the /*+ parallel degree(38) */ hint.

The DBA then does the following:

  1. Calls the application developer to tell them that they shouldn't be overriding the parallel setting in their SQL and the statement needs to be killed and resubmitted. The DBA then makes a mental note to activate Database Resource Manager as soon as possible.
  2. Sends an E-Mail to the UNIX support person telling them that there is a performance issue occurring on one of their platforms but the cause has already been identified and the problem will be fixed shortly. The DBA finishes by stating that he doesn't need to use any of their archaic performance monitoring tools like NMON and TOP anymore because Oracle 10G provides some nifty new performance reports.
  3. Meets with business unit folks after work for drinks and dinner.

Let's take a look at the above story a little more closely (and seriously).

Alert Notifications
Oracle10G provides early warning mechanisms to alert administrators of possible error conditions or performance problems. The database itself collects numerous metrics that were previously collected by Oracle Enterprise Manager. The host metric page allows the DBA to activate thresholds and edit the threshold values for up to a dozen different alerts.

Although, there are dozens of host monitoring statistics available, they all boil down to monitoring the utilization of three primary resources: CPU, Memory and Disk. 10G Grid Control immediately begins monitoring all targets at default settings when they are added to the monitoring framework. Administrators are then able to fine-tune the threshold values for individual targets when necessary. Monitored metrics are stored and aggregated in the repository to provide trend analysis capabilities.

Here is a screen print of the manage metrics page showing the resource metrics that can be activated for a host server. The percentages shown are Oracle's default threshold values but the values can be modified to customize the metrics to a specific target's monitoring requirements. We'll learn more about creating and modifying performance alerts in upcoming blogs. It is a two-part process that is relatively simple but I don't want to make this blog any longer than it already is. The key to being a proactive DBA instead of a reactive one is to SET THESE ALERTS. As I said, I'll have a whole set of blogs on creating and editing alert notifications.

But, lets get back to the topic at hand. One way for us to categorize performance problems is by the scope of their impact. We can have enterprise-wide, host server, application, program and individual SQL statement performance issues. Enterprise-wide performance problems are easy, just blame your network guys. But host, application, program and SQL statement performance issues are often much harder to solve.

The Drilldown Process
Good tuners understand that the tuning process starts with an understanding of the problem then continues with the DBA collecting statistical information. Information collection begins at a global level and then narrows in scope until the problem is pinpointed. 10G Grid Control's performance drilldown capabilities facilitate this style of problem solving by allowing administrators to start their detective work at the host level and then providing the information necessary to allow them to quickly narrow the scope of possible causes until the performance issue is found.

If you want to begin your problem determination process at the host level, 10G Grid Control's host performance home page is the natural place to start. You will want to review the utilization of the three primary resources: CPU, Memory and Disk. Take a look at the CPU chart, you'll notice that our server is currently experiencing a high level of CPU usage. We can verify this by comparing it against the host server's NMON performance monitoring output screen.

Historical Analysis
We can then use the CPU drilldown link to take a look at CPU utilization historically. I chose to review statistics for the last 7 days. Notice that CPU utilization spikes on a regular basis. We have known for some time that these spikes occur and intend to increase the number of CPUs to alleviate the problem. We have already identified and tuned all high resource consuming SQL running on that platform but were unable to reduce their resource utilization by a noticeable amount. Until we add the additional CPUs, we intend to reduce the number of parallel processes by setting the parallel process value lower in Database Resource Manager.

So we learned that we can use 10G Grid Control's drilldown capabilities to determine what resource is being over-utilized. In addition, we can use EM's historical reporting features to determine if the problem is due to a gradual increase in the utilization of the resource in question or caused by a resource utilization spike. If it is a gradual resource increase, you will have to tune the SQL running on that server to reduce the amount of resources they consume or add more horsepower. If it is a utilization spike, you will need to determine what SQL is causing the problem to occur.

Problem Process Identification
In either case, we can use EM's process reporting chart to determine what the top resource consuming processes are. The tool allows you to order the top resource consuming processes by CPU or memory. Please note that the sample EM Process reporting chart is from another server and is not related to the CPU problem we are experiencing in the previous discussion. In addition, I had to blank out some of the information for security reasons. The information I removed allows administrators to determine the process's owner and database it is executing in.

Once we know the database the process (or processes) are executing in, we can navigate to that database's performance page and continue our performance problem determination drill-down process. 10G Grid Control has some new SQL tuning features that will help us to determine exactly what the problem is.

Next Up
In my next blog, we'll continue our discussion by reviewing 10G Enterprise Manager's database performance monitoring and tuning features.


Wednesday, May 04, 2005  |  Permalink |  Comments (1)
trackback URL:   http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2005-05-03.4782910010/sbtrackback

oem on 10g

Posted by shivabn at 2005-05-12 06:01 PM
wonder full doc
 

Powered by Plone