System Triage Part II – Host Performance Analysis Using Grid Control and Host Commands

In my previous blog, we discussed the investigative process used to determine exactly what application component is causing the performance problem. This blog will cover the various Grid Control and Host Monitoring Tools we can use to further drill down into the database ecosystem (database, O/S, hardware server) to provide us with additional diagnostic information. In future blogs, we’ll continue our investigation using Grid Control’s database performance monitoring and analysis tools.

Introduction
If you haven't read my previous blog titled Using Deductive Reasoning and Communication Skills to Identify and Solve Performance Problems, I would highly suggest that you do so before continuing. The blog will provide you with recommendations that can be used to determine what architectural component is causing the problem. This blog assumes that you have determined that the performance issue is somewhere in the database ecosystem (database, operating system, hardware).

Determining the Scope of the Problem
We'll begin with the premise that the entire application is running slow. If the problem is localized to a specific transaction, you won't need to perform all of the steps I am providing. The next blog will be more pertinent for transaction specific performance problems. But you will be able to use a subset of the investigative activities in this blog because both sets of problem determination activities are very close regardless if it is a specific transaction or an application-wide performance problem. The remainder of this blog will focus on Host Performance Analysis. We'll continue the discussion on System Triage in subsequent blogs when we use the 10G Grid Control R2 toolset to indentify the specific transactions and SQL statements that are causing the performance problem.

Host Performance Analysis
We need to determine the health of the server that the database is running on. We are able to use 10G Grid Control's host performance analysis capabilities in conjunction with O/S commands to determine the current system load. In this blog we'll be using UNIX as an example. Although the commands may be a little different in LINUX, the output we will be evaluating will be very close. I'll cover Windows system performance in an upcoming blog.

Let's begin by activating 10G Grid Control R2 and navigating to the Host's Home page. We accomplish that by selecting the 'Targets' tab at the top of 10G Grid Control R2's Home Page. 10G Grid Control R2 displays the Hosts Home Page.

10G Grid Control R2 will provide you with a listing of all hosts that is monitoring. If you would like to learn how new database ecosystems are added into Grid Control, please refer to one of my previous blogs. Although the blog doesn't provide detailed installation guidelines, it does provide a high level overview of the Grid Control Management System.

Your next step will be to select the host that runs the database having the performance problem. Once the host's home page is displayed, you will want to click on the performance tab at the top of the screen to display the Host Performance Home Page.

You will want to review the utilization of the three primary resources: CPU, Memory and Disk. Take a look at the CPU chart, you'll notice that our server was experiencing a high level of CPU usage a short time ago. Here's another performance home page showing some definite CPU utilization problems.

Each of the primary resources displayed on the page provides drill down capabilities that allow you to further investigate the problem. The drilldown panels allow you to review past performance of CPU, memory and disk.

For more information on Using 10G Grid Control to evaluate Host Performance, please refer to the following blogs: Using 10G OEM Grid Control's Host Performance Monitoring and Tuning Features and Host Performance Monitoring Using 10G Enterprise Manager Grid Control. The information they provide will be pertinent to both 10G Grid Control R1 and R2 users.

Operating System Monitoring Tools
There is a host of UNIX performance monitoring tools that display performance information. Two of my favorites are NMON and TOP. Here's an NMON display verifying that we are indeed experiencing CPU problems.

NMON
Take a look at the CPU utilization display at the top of the NMON page. The letter 'U' designates the CPU being consumed by User processes, while the letter 'S" designates the CPU consumed by the system. If you see a high number of S characters on the display and very little U characters, it is time to contact your friendly system administrators and ask them why their system is consuming such a high level of CPU. If you see a lot of U characters and Oracle processes are the top resource consumers, your database is probably the culprit.

For more information on NMON (including how to show the top processes), please refer to the IBM NMON home page. IBM's NMON is a free tool that is available for download and is available for most UNIX and LINUX systems. I personally prefer it to over the Top command.

Top
Let's take a look at a Top screenshot. The important indicators are CPU States, Memory Utilization and Top processes. Like NMON, Top also provides settings to provide you with disk performance information. On both NMON and Top, you can use the Process IDs to confirm the information displayed on the 10G Grid Control R2 Host Performance Home Page.

Jonathan Lewis provides some helpful hints on the Top command in Appendix B of his book titled Practical Oracle 8i - Building Efficient Databases. Here's an addendum to the appendix that describes the output of the Top command that you may find useful. Although the book is on Oracle8i, the Top information is still pertinent.

Here's some additional information on the Top command that will help you understand Top. It is a little hard to read but the information it provides will be very useful. Lastly, if your system administrators have configured the MAN command, you can use the operating system manuals to retrieve informtion about most of the system commands we are discussing. If they don't have MAN configured, the first question you need to ask them is "why not?".

VMSTAT
The VMSTAT command is also available on many flavors of UNIX and Linux. VMSTAT provides you much of the same information that TOP and NMON do, but it also provides you with disk performance and system queueing information. Instead of me regurgitating information on VMSTAT, here is an excellent description of the VMSTAT utility. The author also shows you how to interpret VMSTAT output. Although the author's discussion pertains to LINUX, like other commands, the majority of information he provides will also pertain to the various UNIX flavors.

Two of the display columns I review frequently when I am performing host performance analysis are the numeric values listed under the "R" and "B" columns in the output. The numbers under the "R" column in VMSTAT designate how many processes in the system are queued up and waiting to run. The higher the number, the bigger the problem. The numbers under the "B" column designate how many processes are blocked and can't do anything.

IOSTAT
LINUX and the many of the UNIX variations provide the IOSTAT command to allow users to analyze disk performance. IOSTAT displays kernel I/O statistics on terminal, disk and CPU operations. By default, IOSTAT displays one line of statistics averaged over the machine's run time. The use of -c presents successive lines averaged over the wait period. Here's an excellent article that shows you how to use IOSTAT and interpret its output.

SAR
SAR is another command that can be used to evaluate host performance. One of the benefits that SAR provides is that it allows you to run it via CRON on a regular basis and spool the performance data to a SAR output file. Users are then able to use the SAR command to retrieve historical performance statistics. Once again, here's an article on how to use SAR to create historical performance reports. The article also provides information on how to intrepret its output.

Using 10G Grid Control in Conjunction with UNIX Monitoring Commands
10G Grid Control's Host Performance Home Page, provides you with a quick snapshot of the host's key performance indicators (CPU, Memory and Disk). Oracle's intent was to provide you with this information so that you would not be forced to log on to the operating system to evaluate CPU, memory and disk performance indicators. 90 percent of the time, I will use the key performance indicators provided by Grid Control. They provide me with just the right amount of diagnostic information I need to continue my investigation.

There are times when the system is so locked up that 10G Grid Control will "act up" when you attempt to access the host system's performance panels. That's a good indication that you are definitely experiencing some server resource problems.

In that case, your only choice is to revert back to the tried-and-true O/S commands that provide server performance information. I will also use O/S commands when I want to retrieve more detailed information than 10G Grid Control is able to provide.

Not all of the performance monitoring commands I have provided will be available on all flavors of UNIX and LINUX. Even if they are available, some of the commands must be specifically installed. You need to determine what tools are available, have your O/S admins install the ones that are and use the information in the links I have provided to understand them.

In my next blog, we'll continue our performance analysis using 10G Grid Control's database performance analysis features.

Thanks for Reading,

Chris Foot
Oracle Ace

Monday, November 06, 2006 | Permalink | Comments (0)

trackback URL: http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2006-11-04.1538832230/sbtrackback

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

System Triage Part II – Host Performance Analysis Using Grid Control and Host Commands