Access Path Scientific Analysis
Introduction
This information is intended to help you begin your quest to become a database
performance scientist. There's a difference between tuning and learning how
to tune. Tuning is what happens when you get that call at 3:00 on a Friday afternoon.
You have a concerned customer stating that something is running longer (usually
way longer) than they would like. Your first response is to consider your choice
of career paths, dreaming how nice it would be to sell wood carvings at a roadside
stand in Montana. You then perform your diagnostic analysis, come to some sort
of conclusion and implement your solution.
Learning how to tune requires that you spend time with the optimizer experimenting with different environmental settings and database objects that affect access paths. It also requires that you have a general understanding of the basics of optimization.
I dedicated an entire blog on educational resources that will help you build a firm foundation of Oracle optimization knowledge. It's critical that we understand as much as we can about optimization before we begin our experimentation. But the key to success is to spend dedicated time experimenting. I'm not professing that you spend the rest of your career becoming "at one" with the optimizer. But if you don't want to be at a total loss each time something runs long, you need to experiment to learn how to tune.
Environmental
Documentation and SQL Statement Selection
Before we begin, we'll need to document some of the key parameters that affect
optimization and gather some information on the data objects we will be using
in our experimentation. Please refer to my blog titled "Access
Path Scientific Analysis Part I" for more information on the importance
of documentation as well as the documentation process itself. In a previous
blog, I provided a few hints and tips on identifying what types of statements
and access paths would be most beneficial to analyze.
Access Path
Identification Tools
Here are the tools that I most often use to review Oracle access paths:
- SQL Scratchpad
- I like to see graphical representations of access paths. Even though I have
lots of experience reading SQL Trace output and the multitude of variations
of access path displays, I like seeing a graphical representation. The problem
is that Scratchpad must be installed on your client (comes embedded with 9I
OEM) and it doesn't like accessing 10G databases. If I am tuning a 9I database
and I have access to SQL Scratchpad, that is the tool I am going to use to
display access path information. For more information on SQL Scratchpad and
other 9I OEM tools, please turn to my blog titled "Access
Path Identification Part V".
- SQL*PLUS Autotrace
- If I want to quickly display the statement's access path and associated
run-time performance statistics, I will use SQL*PLUS Autotrace. Here's a
blog containing a few hints and tips on Autotrace.
- Explain Plan
- If I don't want to run the query, and I am not concerned about the access
path changing during SQL statement execution (read next bullet), I will used
the tried and true "explain plan into" clause to ask Oracle to dump
the access path information into a table. I'll run a statement to retrieve
the data from the plan table and format the output. Please read my blogs titled
"Access
Path Identification Part II" and "Access
Path Identification Part III" for more information.
- SQL Trace -
If I want to be absolutely sure that the access path the other tools are predicting
the statement will take matches what access path is chosen during runtime,
I'll run SQL Trace. It is one of the only tools you can use to determine the
path chosen during execution. How can the access path change during execution?
Read this
blog to find out! The blog contains an example of a production problem
that was caused by an access path changing during runtime. For lots of information
on activating SQL Trace, please turn to my blog titled "Access
Path Identification Part IV". SQL Trace is the tool I will turn to
if I want to analyze a set of statements that belong to a specific online
transaction or batch job. The output quickly tells me what the top resource
consuming statements are.
- V$SQLPLAN -
V$SQLPLAN contains access path information the SQL statement takes during
execution. I'll run the statement, review the access path taken during execution,
make my tuning change, alter the text of the statement look different to the
optimizer, retrieve the access path information again and compare the before
and after results. My blog titled "Access
Path Identification Part II" provides all of the information you
need when you are using the V$SQLPLAN table to retrieve information on access
paths taken during execution.
- Regardless of which tool I select, I will use the output to graphically display the statement's access path. Although it is time consuming, graphically displaying a statement's access path allows me to more clearly understand the access path and simplifies the tuning process. This blog will show you how to graphically display a statement's access path.
Another Graphical
Display I find Useful
During my career as an Oracle instructor, I was often asked in the SQL tuning
classes how I personally documented a query I was attempting to tune. This is
the
display that I showed them. Let's take a look at the contents, starting
from the top of the page:
- SQL statement
text - the full text of the SQL statement.
- Columns in the
SELECT clause - a listing of the columns that are in the SELECT clause are
displayed under the table they belong to.
- Columns in the
WHERE clause - a listing of the columns used in the WHERE clause are displayed
under the table they belong to. If the columns are tables used in a join operation,
an arrow is used to designate which of the other table's columns they are
joined to. If I am interested in cardinality and data skew, I would document
that information directly below the column name. For more information on determining
column cardinality and data skew, please turn to this
blog.
- Indexed columns - The bottom of the page contains all of the indexes that are on columns in the WHERE and SELECT clauses. This allows me to easily determine if I need to create additional indexes are begin analyzing why an index isn't being used.
Performing the
Scientific Analysis
Before we begin, we need to understand the effect that preloading the buffers
has on SQL statement performance. If you run the query and return the results,
you will be loading the buffer cache with data. As a result, subsequent runs
of the same statement could be faster. This is because your statement is retrieving
data from memory while the initial runs were forced to retrieve the data from
disk. You'll need to run the statement twice, make your change and run the statement
twice again to ensure that buffers don't impact your timings. You can then compare
the output of the second execution of each test. 10G provides a SQL statement
to flush the buffer cache.
If you are building indexes during your testing, don't forget to generate statistics on the new objects. You don't want the lack of object information to affect the optimization process.
In the discussion that follows, I'll be providing you with some examples of changes to make during your scientific analysis. You will run the SQL statement, record the execution time and statistics and document the access path. You will then make the suggested alteration, review the access path to see if it has changed and run the statement again to record the new performance statistics and execution time. Please note that in the examples below, I'm not going to remind you each time to run the query to evaluate the performance statistics and execution times.
Index Testing
Let's use the same sample query I used in my graphical display in this discussion:
select a.employee_id, a.last_name, b.department_id, b.department_name,
c.street_address, c.postal_code, c.city, c.state_province
from hr.employees a, hr.departments b, hr.locations c
where a.department_id=b.department_id
and b.location_id=c.location_id
and a.employee_id = 174
order by a.last_name;
There is a myriad of different changes we can make. Here's a couple of examples to get you started:
- Drop all of
the indexes on the tables and review the access path. We know it will use
table scans, but what join method did it choose, how about the join order?
- Build an index
on one of the join columns (i.e. dept.department_id) and review the access
path. Determine if it chose the index, what join method was used and the order
in which the tables were joined.
- Drop that index
and build an index on the other column in the join. Identify if it used the
index, join method and join order.
- Build an index
on employee_id, department_id and last_name. Identify if it used the index,
join method and join order. Did you get an index only access path?
- Drop the index
you just created and change the order putting employee_id between department_id
and last_name. What was the access path?
- Change the
WHERE clause to look for rows based on the last_name and first_name columns.
Remove the a.employee_id = 174 clause. Build an index on the first_name and
last_name columns. What happened?
- Drop the index
built on the first_name and last_name columns and build two indexes, one on
first_name and the other on last_name. Did the optimizer use both indexes?
Which one did it choose?
- Add local predicates
to the other tables and identify the access path changes.
- Identify a
column in one of the tables that has low cardinality. Change the WHERE clause
to access that low cardinality column. If you don't have a low cardinality
column available, its pretty simple to add a column and update it with a few
repeating values to create a low cardinality column. Build a bitmap index
on the low cardinality column. Did the optimizer choose the bitmap index?
What happens if you use a hint?
- Build a bitmap
index on a column that has high cardinality (i.e. employee_id). If it doesn't
use the bitmap index, use a hint to influence the optimizer to choose it.
Drop the bitmap index and build a B-tree index. Does the optimizer use that
index without the hint? Compare before and after execution performance statistics
and timing.
- Identify columns that have skewed data. Build an index on the skewed column. Change the query to search for a value that occurs many times. Compare that execution to a search on a value that appears few times. Analyze the table and build a histogram. Re-execute the tests and compare the results.
Parameters that
Affect Optimization
We also need to learn how different parameter settings affect optimization.
We can start by adjusting a few of the basic parameters that have the best chance
of changing the access path for a given statement. Most of the parameters can
be changed in your session by using the ALTER SESSION statement. Using the ALTER
SESSION statement will allow you to change the parameter for your session only.
You won't have to worry about affecting other folks executing statements in
the same database that you are using for your testing.
Here's a blog that provides a listing of parameters we can use during our initial testing. Please note that this is not an all inclusive list. It is a listing of parameters that I feel have the greatest chance to influence the optimizer to choose a different access path.
You need to execute the statement, change a parameter and note its affects on the statement's access path, performance statistics and runtimes. If you are going to test changing values for different parameters, don't forget to change the one you just tested back to its original setting. That way you won't have multiple changes impacting your test results. You can also test combinations of changes. Just be aware of the changes you are making.
Hints
Administrators embed hints in a SQL statement to influence the optimizer to
choose a particular access path. Using hints will allow us to evaluate the affect
that different access paths have on SQL statement performance. We will run the
statement without any modification, review the access path and performance statistics,
use a hint to (hopefully) change the access path, run the statement again and
compare the before and after results. Since hints can be embedded in virtually
any SQL statement, they will provide us with an easy mechanism to learn more
about access paths. I have devoted a
previous blog to hints. The blog also contains a demo document that will
show you how to embed hints and the affects the hints have on access paths.
One of the key tests is to determine the impact that operation cardinality has
on SQL statement performance. You can test this by using the ORDERED hint to
ask the optimizer to change the join order for statements that join more than
two tables together.
Conclusion
The tests above will hopefully stimulate your creativity and allow you to generate
your own test cases. As we become more experienced we'll need to review subselects,
views, complex selectivity issues (and the list goes on and on
) Oracle
performance tuning is truly a wonderfully complex subject. The important thing
is that we start the process. That's the only way we'll become database performance
scientists!
Thanks for Reading,
Chris Foot
Oracle Ace
FYI.
Replies to this comment