Query Tuning Using DBMS_STATS
From a paper previously presented by presented by Dave Ensor at Oracle OpenWorld 2001, San Francisco, December 2001.
Introduction
Increasingly enterprises are purchasing their mission-critical applications, whether these use Oracle or other data management software. Typically the licensing and support agreements for such applications seek to prevent the customer from reverse engineering or modifying the application in any way. Although such restrictions may be extremely sensible from a supplier's point of view, they can prevent an individual site from making changes that would result in valuable performance benefits. This paper describes the work performed to overcome a specific performance problem in a purchased application without having to resort to the obvious (but impossible) solution of modifying the application code.
Test Environment
The experiments conducted during the preparation of this paper were performed on the author's laptop, a Compaq Armada M700 with 448 Mb of memory and a single 20 Gb hard disk partitioned into a FAT device for the operating system and an NTFS device for both the Oracle installation and the database files. The single processor is a Pentium III with a reputed clock speed of 833MHz; it certainly executed queries from the Oracle9i cache at impressive speed.
The machine was running Microsoft Windows/2000 Professional with Service Pack 2, and Oracle9i Enterprise Edition release 9.0.1.0.1 with Oracle's pre-packaged "general purpose" database, although this was customized in a number of ways that did not affect the issues discussed in this paper.
Background
BMC Software, Inc. uses Oracle as the dataserver for the majority of its administrative applications, and in most cases the applications themselves are proprietary and run under license. These applications are supported by their vendors but their administration is performed by BMC's internal IS department, which includes a specialist team of Oracle DBAs. As part of their role, this team monitors the resource consumption of the applications and the SQL statements being run on the data servers, and in the summer of 2001 one of the team identified that a very frequently executed statement form in one application was using excessive amounts of CPU time.
The general form of this statement is
select all ...
from dm_qual_comp_sp dm_qual_comp,
dm_qual_comp_rp dm_repeating
where (dm_qual_comp.r_object_id in
('080f449c80009d10', '080f449c80009d13', ...))
and (dm_qual_comp.i_has_folder = 1
and dm_qual_comp.i_is_deleted = 0)
and dm_repeating.r_object_id=dm_qual_comp.r_object_id;
It is worth noting that the SELECT list is not exceptional and is always the same. It has been omitted simply because of its length. The two objects in the FROM list are both join views, and any number of hexadecimal strings can appear in the IN list though typical forms of the statement contain between 6 and 20 items in the list. The use of the all qualifier and the placement of the parentheses in the WHERE clause may tell us something about the level of Oracle experience of the person writing the statement, but neither is relevant to its performance.
Because this statement was deemed to be using excessive CPU (around 250 msec per execution with several thousand executions per hour under peak load) the DBA used BMC's SQL Explorer for Oracle to capture an example of the statement from the shared pool, and started to experiment with various optimizer hints in an effort to improve performance. He quickly discovered that the query executed much more efficiently if it could be persuaded to use hash joins, and the TKPROF output from his experiments is summarized below.
Original Statement
call count cpu elapsed disk query current rows
------- ----- ----- ------- ---- ----- ------- ----
Execute 1 0.00 0.01 0 0 0 0
Fetch 8 0.25 0.24 9 17675 0 92
With Hash Join Hints
call count cpu elapsed disk query current rows
------- ----- ----- ------- ---- ----- ------- ----
Execute 1 0.01 0.01 0 0 0 0
Fetch 8 0.03 0.09 9 331 8 92
It should be noted that the TKPROF output has been slightly reformatted to fit on both the page in this paper and on the PowerPoint slides within the associated conference presentation. It is also worth noting that the production application runs on an 8 processor server, and therefore is entirely capable of handling well in excess of the 4,000 executions per hour that might at first sight appear to be the maximum possible service rate.
The statements are clearly being built dynamically for each execution in order to include the literal values in the in list. BMC Software does not have access to the source code and was therefore unable to include the required hints directly. At this point, the author was contacted and asked whether or not he could think of a strategy that could allow IS to force this particular query form to perform hash joins without needing to change either the application code or the application schema. As discussed earlier, BMC Software does not have the ability to change the code. Schema changes, although technically feasible, would run the risk of being in violation of the relevant support agreement.
Oracle's Cost-based Optimizer
This paper assumes the use of Oracle's Cost Based Optimizer (CBO). This is is used by the application in question rather than the older Rule Based Optimizer (RBO). The latter is still available under Oracle but is best viewed as being present solely for older applications that have been tuned to use it. All new development should assume the use of CBO. However until Oracle9i CBO does not use CPU resource as part of its cost equation, estimating instead what the Oracle documentation tends to refer to as I/O cost but which in reality equates more to “block visits for read.” The difference between (nominal) disk reads and block visits is accounted for by Oracle's block caching mechanism, and is sometimes magnified by operating system or device controller caching.
CPU Cost
Given that well-tuned Oracle applications are more likely to bottleneck on disk activity than on CPU, it might seem that basing CBO solely on I/O was an inspired decision. Unfortunately by factoring in the I/O savings of Oracle's multiblock reads, CBO has a marked tendency to prefer full table scans over indexed access for retrieval of more than about 5 percent of the rows in a table and this can lead to the execution of query predicate clauses for every row in the table. Depending on the complexity of these predicates, and the SQL features and functions used, the result can cause a surprising CPU load.
select count(membname) records from million;
ran in just over 2 seconds and the query
select count(membname) records from million where memb# > 0;
took exactly the same elapsed time because the additional CPU required was overlapped with disk reading. However, the unlikely variant
select count(*) records
from million
where upper(substr(membname,1,1)) between 'A' and 'Z'
or substr(membname,1,1) in ('_', '/index.html');
took over 4 seconds, and the equivalent form using an in list
select count(*) records
from million
where upper(substr(membname,1,1))
in ('A','B','C','D','E','/index.html','F','G','H','I','J','K','L','M'
,'N','O','P','Q','R','_','S','T','U','V','W','X','Y','Z');
took 12 seconds. As an aside, it may be worth noting that about 50 percent of the names in this test started with '_' and the query execution time can be varied between about 4 and 16 seconds by moving the position of the '_' within the list. The earlier it appears, the faster the query runs.
Key Statistics
Once statistics have been gathered on a table and its indexes, CBO has available to it a number of values that it can use in estimating the number of database block visits that will be required. For tables these include the number of rows in the table, the number of blocks in the table and the average row length, and for indexes the number of keys in the index, the number of distinct keys, the number of blocks in the index leaf set, and the average number of both leaf blocks and data blocks per distinct key value. In addition, column value histograms can be captured, and these can be used to estimate the selectivity of specific key values.
These statistics are not maintained in real time, but captured on demand using either the SQL command ANALYZE (now deprecated) or the supplied package DBMS_STATS, about which more later. We can see intuitively that with this kind of information available CBO should be able to make fairly accurate estimates of the number of block visits required to execute each discrete execution step, and at its simplest CBO works out for each of the possible approaches to the query which one generates the lowest total.
Other Factors
Life is not, in reality, quite that simple as there are a number of other factors that influence how CBO works out the notional I/O cost. The simplest to understand is optimizer_mode, which is set at instance level but which can be over-ridden at session level.
At a more complex level, a series of additional parameters, again settable at both index and session level, allow even more variation in how the notional I/O cost is computed. For example the larger the value of the parameter SORT_AREA_SIZE, the more likely CBO is to elect to use a sort because large work areas reduce the disk traffic caused by sorting. Indeed, if the sort area is large enough then the sort can be performed entirely in memory with no I/O cost whatever. There is even a parameter OPTIMIZER_INDEX_COST_ADJ that takes a value in the range 1 to 10,000 (default 100) and allows index usage to be made more or less attractive (low values make indexes more attractive).
In some simple cases, we can get the changes that we need to the query plan simply by changing these parameters, though in many cases we will also need to change the statistics themselves.
Cursor Sharing
The problem query that started the author's investigation contains a number of literal values supplied at run time and therefore the statement is almost certain to be slightly different each time that it is used. Oracle allows DBAs to get around this problem by setting the parameter CURSOR_SHARING = FORCE (the default value is exact).
With force set, any literal value in a SQL statement is replaced with a "bind variable" and thus each of
select ORD# from ORDS where OTYPE = 'STD';
select ORD# from ORDS where OTYPE = 'GOV';
select ORD# from ORDS where OTYPE = 'STAFF';
will be executed as
select ORD# from ORDS where OTYPE = :"SYS_B_0";
This has both advantages and disadvantages. The main advantage is that the statement need only be parsed and optimized once, and for a simple query parsing can take longer than executing the query so the parse saving is significant. The main disadvantage is that if the selectivity of the key varies the optimizer will not be able to use a different approach for key values of different selectivity (in the example above, 'STD' might select several hundred thousand rows, whereas 'STAFF' might only select a few hundred). This problem is partly overcome in Oracle9i by having the optimizer to look at the “bind value” for the first execution, and to use that value to determine the execution path.
Package DBMS_STATS
In Oracle9i, this Oracle supplied package contains about 35 procedures. Simple usage examples include:
dbms_stats.gather_schema_stats ('SCOTT'); dbms_stats.delete_index_stats ('SCOTT', 'EMPPK'); dbms_stats.set_table_stats ('SCOTT', 'EMP' , numrows => 14 , numblks => 1 , avrglen => 48);
The procedures not only allow statistics to be gathered (and deleted) on tables, indexes, and column values but also allow these statistics to be inspected, changed, saved, and restored. Statistics can also be moved from one object to another and from an object in one database to an object in another database.
Plan Stability
This is a relatively recent feature of Oracle and sets out to ensure that the same plan is always used for a particular SQL statement even if the optimizer parameters or the object statistics have been changed in the interim. The basic mechanism is to use the SQL statement CREATE OUTLINE to create a “stored outline” that contains both the SQL statement itself and a series of Oracle-generated hints that should always generate the same query plan as was generated when the outline was created. Creating outlines is quite simple, but managing them can quickly become a major problem if a large number are present within a single database.
Because of the time constraints under which this paper was written, it has not been possible to include a full discussion of Plan Stability and the stored outline mechanism, but three observations may worth noting. First, the author's discussions with DBAs at user sites has persuaded him that the facility is little used and has yet to really impact on the consciousness of the Oracle community. Second, although its use adds a considerable CPU overhead to 'hard parses' this is rarely a significant problem because the feature is intended for high throughput transaction-based applications in which hard parses should be rare except in the period immediately after instance startup.
Last, and rather surprisingly, the application of stored outlines to statements being parsed can only be enabled dynamically through alter system statements. It cannot be set in an INIT.ORA or SPFILE. This is apparently deliberate but the reasoning behind it is unknown to the author.
Getting CBO to the Required Plan
The Oracle manual Oracle8i Designing and Tuning for Performance tells us that CBO uses hash joins when the two “row sources” to be joined together are large and of about the same order of magnitude in size. Forcing the sample query to use hash joins was a relatively simple matter of adjusting the statistics on the four tables that underlie the two views in the query so that CBO felt that the conditions for using hash join were achieved.
The table sizes in the production system (or at least in the statistics on the production system) were:
Table Rows Blocks
-------------------- -------- -------
DM_QUAL_COMP_R 250 1
DM_QUAL_COMP_S 125 1
DM_SYSOBJECT_R 398,790 1,392
DM_SYSOBJECT_S 271,966 9,313
It was realized early in the process that making small changes was invariably ineffective. However, by changing the table sizes to
Table Rows Blocks
-------------------- -------- -------
DM_QUAL_COMP_R 20,000 1,600
DM_QUAL_COMP_S 10,000 800
DM_SYSOBJECT_R 398,790 1,392
DM_SYSOBJECT_S 10,000 800
(and also making some changes to index selectivity to avoid index use) it was relatively easy to derive the required execution plan of three hash joins. However, this had the obvious drawback that every query that accessed any one of these tables might now use a new execution plan because the statistics had been changed.
Localizing the Impact
The easiest solution to preventing the changed statistics from affecting other queries was to create an outline for the query as soon as it had been tuned, and then to revert the statistics back to their previous values. This still had the effect of destabilizing a production system during the trial and error process of deriving the required plan.
The method adopted was to export both the production schema and the production statistics to a test instance using a combination of the DBMS_STATS package and Oracle's EXPort and IMPort utilities. Not a single row of application data was transferred, just the schema definition (using EXP with Table Data set to No) and the table and index statistics, by separately exporting the table STATISTICS_TABLE used by DBMS_STATS as both a destination to which to save object statistics and a source from which to restore them.
With a production schema and production statistics on the test instance, and having taken some care to ensure that the optimizer environment was the same, we were able to conduct the trial and error process without affecting the production system. We already knew from hint-based tuning what plan we wanted, and once we achieved this an outline could be built for the statement and this outline transferred to the production system. The query being tuned was never actually executed on the test system.
It was suggested to the author that under Oracle9i the supplied package DBMS_OUTLN_EDT could be used. In this scenario all that would be necessary, once the desired plan was known, would be to create an outline with the existing plan and then edit it to achieve the desired plan. Unfortunately a brief examination of the package showed that it was nowhere near powerful enough to achieve the required transformations.
Ensuring Outline Use
In order to ensure that the outline was used in production it was necessary not only to enable outlines, but also to set CURSOR_SHARING = FORCE so that whatever the literal values that appeared in the statement, it would match the statement text stored in the outline table. Sadly, despite the success of the approach, it has not yet been deployed on the production system because of the need for this change, which has been deemed to require a performance test under production load using a test instance. This test has not yet been performed, and as the production server has sufficient CPU to cope with the inefficient query plan at current load volumes, IS have decided not to implement the solution until the load rises to the point at which the inefficiency starts to impact throughput.
Postscript
It is worth noting that Jonathan Lewis has proposed a sneaky alternative approach to the method outlined above for creating the “preferred” stored outline for a query. In this simplified approach, an outline is first created for the original query and following tuning using hints, a second outline is created for the hinted version of the query. Once both outlines have been created, their hint entries in the table OUTLN.OL$HINTS then (assuming sufficient privilege) these entries can be swapped over using simple update command of the general form
update OUTLN.OL$HINTS
set OL_NAME = decode(OL_NAME, 'VERSION1', 'VERSION2', 'VERSION1')
where OL_NAME in ('VERSION1', 'VERSION2');
Once the referential links have been reversed in this way, the stored outline for the hinted query will be used whenever the original query is parsed. However neither Jonathan Lewis nor the author feel able to recommend the use of this approach in a production instance unless or until it has been approved by Oracle Corporation, and in particular Oracle Support. It currently seems unlikely that such approval will be forthcoming.
Conclusions
It is clear from the exercise described in this paper that careful modification of object statistics can be an effective way of persuading Oracle to use a better query plan. The technique is not powerful enough to overcome the potential impact of every “query from hell” but in those cases where it is effective it has low implementation cost and is also totally non-intrusive in that no changes are required to any code or script that has been supplied by the application vendor.
However, despite being able to support the claim of being non-intrusive, full deployment of the technique may require a number of changes to be made to the instance environment and in particular it will often be necessary both to enable the use of stored outlines and to cause Oracle to replace literal values with bind variables.
---
Dave Ensor is a Product Developer with BMC Software where his mission is to produce software solutions that automate Oracle performance tuning. He has been tuning Oracle for 13 years, and in total he has more than 30 years' active programming and design experience.
As an Oracle design and tuning specialist Dave built a global reputation both for finding cost-effective solutions to Oracle performance problems and for his ability to explain performance issues to technical audiences. He is co-author of the O'Reilly & Associates books Oracle Design and Oracle8 Design Tips.
Contributors : Dave Ensor
Last modified 2005-02-23 10:53 PM