Access Path Scientific Analysis Part I

If you want to become an access path guru, you’ll need to spend some time learning how optimization parameters, statistics and hints affect SQL access paths and statement performance. This blog will provide you with a few hints and tips to help you begin your scientific analysis of the Oracle optimization process. In upcoming blogs, I’ll provide you with some sample test cases.

Introduction
The objective we are trying to accomplish is to identify the affects that startup parameters, statistics and hints have on access paths, and ultimately, SQL performance. There really is no substitute for spending time "in the seat" learning how different environmental settings influence the optimization process. You also need to spend some time changing a statement's access path and noting the impact that different access path has on performance.

I have found that most good tuners share a common set of traits. They are inquisitive by nature, understand that there is no substitute for experience and dedicate lots of time performing scientific analysis on SQL statement and database performance.

One of my favorite tuning gurus is Jonathan Lewis. Jonathan used terms like "scientific analysis" in his DBAZine podcast. It's really a very appropriate description of his activities. After reading many of his works, I would describe him as a database performance scientist. He identifies something he wants to learn more about, creates a test environment, executes a series of tests, analyzes the data and provides documented results to reienforce his theories.

That's what we all MUST DO to expand our knowledge on the Oracle optimization process. We need to become database performance scientists. We do that by creating a test environment, running and documenting performance baselines, changing the environment to influence the statement's access path and documenting the results.

We'll begin this series by learning how to select a test environment and document it. In upcoming blogs, we'll discuss the tools we will use to measure and compare our results, select a set of SQL statements to use for our testing, create our test cases, execute the tests and document the results.

Setting up a Test Environment
Running test cases to identify the affects that different database environmental settings have on a statement's access path is not as monumental as it may seem.

Most production databases have a test counterpart. Select an environment that is actively used by your application developers. Don't worry, if one of our test causes a statement to "run longer than anticipated", we can use the trusty ALTER command to kill our session. We can also run our workloads during lull times. Lastly, the majority of changes we will make to influence the statement's access path will be done in our own session. We won't be making changes that affect the entire database environment.

I prefer to use an active test environment because it allows me to easily select SQL statements to use as test cases, the data is usually more representative of what will be found in production and the developers will most likely have a firm understanding of the active workload being executed.

You need to talk to the developers who are responsible for running workloads on that test environment to ensure that the changes being made to the test data doesn't skew your results from one test execution to the next. You can't run a test, have a load insert another 100 thousand rows in the table, run another test and expect to have a good comparison. You want your test bed to be active but not so active that it complicates your testing process or causes your tests to generate incorrect results. You'll need to use common sense when selecting the test environment as well as determining the most optimal time to run your test cases.

We also want to choose a test environment that is providing adequate performance for the workloads being executed upon it. We really don't want to use an environment that isn't performing well to begin with.

There's a myriad of options available to you. If you don't want to impact any of your test environments, create a test environment of your own. Clone one of your test databases to a sand-box environment if you can.

You'll also want to make sure that statistics are up to date on the database you will be using as a test bed. If you are running 10G databases, the database will run statistics jobs for you automatically (isn't 10G great?). If you don't have statistics run automatically, it will be up to you to analyze the data objects to ensure that the statistics optimally represent what is stored in the data structures.

Documenting the Parameters That Affect the Optimization Process
The next step is to document the environment. There are a couple of dozen parameters that affect optimization and SQL statement performance. To begin, we are going to choose the basic parameters that are easy to change and have the biggest impact on optimization. These are not all of the parameters that can influence the optimization process, just the ones that are easy to change and provide the best chance of successfully achieving an access path change.

It is important that we read the documentation beforehand for these parameters for the specific Oracle release that we are using as our test environment. We know that each Oracle release may contain enhancements to these parameters that change the affect they have on the optimization process and how we alter them to different values.

We'll want to document the following parameters to begin our scientific analysis:

cursor_sharing - For our first set of initial tests, we'll hardcode values in our selection criteria to ensure that our statements aren't affected by cursor sharing. To learn more about cursor sharing and the impact it has on statements using bind variables, please turn to my blog titled "System Triage IV - Access Path Identification Part I"
db_file_multiblock_read_count - The number of blocks that Oracle will read in one I/O when performing a sequential scan of data (i.e. table scan). We also need to note that if we are using Oracle 10.2 as our test environment, the database itself may adjust this parameter dynamically if it identifies that the buffer cache is being flooded by blocks being retrieved using a table scan.
optimizer_features_enable - This parameter allows you to make the optimizer behave as it would for a specific release of the Oracle database. You set the value to a release identifier (the listing of the optimizer releases that you can set this parameter to is provided in the Reference Manual) and the optimizer will act as it would if the database were at that release.
optimizer_index_caching - Administrators will often set this parameter in conjunction with optimizer_index_cost_adj to influence the optimizer to use more indexes and Nested Loop joins. Setting this parameter makes Nested Loop joins look less expensive when compared to Hash or Sort-Merge joins.
optimizer_index_cost_adj - Allows the administrator to make index access more, or less, costly to the optimizer. The default value of 100 tells the optimizer to evaluate the index using the regular cost. The lower you set this value, the less costly the indexes will look to the optimizer. Conversely, the higher you sent this parameter, the more expensive the indexes will look.
optimizer_mode - Sets the approach the optimizer will use when analyzing queries. Since there have been a few changes made between Oracle 9i and Oracle10g, I'll provide information on both sets.

Oracle 9i provides the following settings:
Rule - The optimizer will use the rule based approach during analysis. The rule based optimizer does not use data statistics as input when generating the access path. Each access path is assigned a numerical ranking. The rule based optimizer chooses the access path that has the most favorable numerical ranking. The rule based optimizer has been superceded by the cost based approach. There are a few cases where I have seen the rule based optimizer choose a better access path than the cost based method - but not many.
Choose - If optimizer_mode is set to choose, the optimizer is able to switch between rule and cost-based optimizations. When optimizer_mode is set to CHOOSE, the optimizer uses the all_rows cost-based approach for a SQL statement if there are statistics in the dictionary for at least one table accessed in the statement. If you generate statistics on one table, every query that accesses that table will use the cost-based optimizer. What happens if other tables in the query do not have statistics collected? The optimizer will make an educated guess on the statistics for those tables. The problem is that Oracle isn't always a good statistics guesser and the end-result is a "less than optimal" access path. To learn more about the affect that statistics have on cost based optimization, please refer to my blog titled "System Triage IV - Access Path Identification Part I".
first_rows - Influences the optimizer to choose an access path that minimizes response time. Most often used for online transaction processing systems that return small result sets. The optimizer favors Nested Loop joins and index access.
all_rows - Influences the optimizer to choose an access path that minimizes total execution time. Most often used for decision support and data warehouse environments. The optimizer tends to favor full table scans, Hash and Merge-Scan joins.

Oracle10g settings for optimizer_mode:
first_rows - The same influence on the optimizer as it did in Oracle9i.
first_rows_n - Where "n" = 1, 10, 100, 1000. Influences the optimizer to optimize queries to provide the fastest response when returning the "n" number of rows. Acts as a throttle, which allows you to better balance the optimization process.
all_rows - The same influence on the optimizer as it did in Oracle9i.

We will use the ALTER SESSION SQL statement to alter these parameters during our scientific analysis on the affects they have on the Oracle optimization process.

Documenting Our Test Tables
After we document some of the parameters that affect optimization, let's turn our attention to documenting the tables we will be accessing. In my next blog, I'll provide you with a few hints and tips on how to select or create SQL statements to use in your test cases but let's continue our discussion on documentation.

The following information will provide you with a good base of information on the data objects our statments will be accessing. Since we are just beginning our scientific analysis, we'll use basic storage objects (tables and b-tree indexes). We'll discuss some of the more complex objects (bitmap indexes, partitioning, etc.) in later blogs.

Row counts for all tables that are accessed by our test queries. Can be found in the num_rows column in dba_tables if we have statistics generated for our tables.
Number of blocks the table is using. Can be found in the blocks column in dba_tables if we have statistics generated for our tables.
Index listing for all indexes on our tables. The query below will provide you with a listing of indexes for a given table:

select b.index_name, b.column_name, a.uniqueness, b.column_position
from sys.dba_indexes a, sys.dba_ind_columns b
where a.index_name = b.index_name
and a.table_owner='&table_owner' and
a.table_name = '&table_name'
order by b.index_name, b.column_position
/
Once we find all of the indexes and columns in those indexes, let's check them for both selectivity and data skew. Selectivity is the number of unique values for a given column. Skew means that some values can occur a few times in a column while other values can occur many, many times. Since data skew will affect optimization, that information will also be important to us.

We can easily find the selectivity for a single or multi-column index by accessing the distinct_keys column in our dba_indexes table if we have statistics generated. For multi-column indexes, we will want to check the individual selectivity for each column in our multi-column index. We can do this with the following query:

select count (distinct index_colname) from owner.table_name;

Where index_colname is one of the columns in our multi-column index,owner is the table owner and table_name is the name of our table. We'll need to do this for all columns in our multi-column indexes.

We'll use the following queries to identify data skew and determine if any histograms are on our tables.

select index_colname, count(*) from owner.table_name
group by index_colname;

Where index_colname is the column name, owner is the table owner and table_name is the name of our table. We'll need to do this for all columns in our indexes. We'll do this for both single column and multi-column indexes.

select * from dba_tab_histograms where owner='&owner' and table_name='&table_name';

Where owner is the table owner and table_name is the name of our tables. We'll need to do this for all tables that our queries will be accessing.

Coming Up Next
We'll continue our discussion in the next blog when we review the hints we will be using to influence the optimizer's choice of access paths. In addition, we'll discuss the various tools we will use to measure and compare our results.

Thanks for Reading,

Chris Foot
Oracle Ace

Thursday, February 15, 2007 | Permalink | Comments (0)

trackback URL: http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2007-02-15.4209681965/sbtrackback

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Access Path Scientific Analysis Part I