Inside Oracle Indexing

by Donald K. Burleson

There are many compelling reasons to manage indexes within Oracle. In a OLTP system, index space is often greater than the space allocated for tables, and fast row data access is critical for sub-second response time. As we may know, Oracle offers a wealth of index structures:

B-tree indexes –This is the standard tree index that Oracle has been using since the earliest releases.
Bitmap indexes – Bitmap indexes are used where an index column has a relatively small number of distinct values (low cardinality). These are super-fast for read-only databases, but are not suitable for systems with frequent updates.
Bitmap join indexes – This is a index structure whereby data columns from other tables appear in a multi-column index of a junction table. This is the only create index syntax to employ an SQL-like from clause and where clause.

        create bitmap index
           part_suppliers_state
        on
           inventory( parts.part_type, supplier.state )
from
           inventory i,
           parts     p,
           supplier  s
where
           i.part_id=p.part_id
        and
           i.supplier_id=p.supplier_id;

In addition to these index structures we also see interesting use of indexes at runtime. Here is a sample of index-based access plans:

Nested loop joins – This row access method scans an index to collect a series of ROWID’s.
Index fast-full-scans – This is a “multi-block read” access where the index blocks are accessed via a “db file scattered read” to load index blocks into the buffers. Please note that this method does not read the index nodes.
Star joins – The star index has changed in structure several times, originally being a single-concatenated index and then changing to a bitmap index implementation. STAR indexes are super-fast when joining large read-only data warehouse tables.
Index combine access – This is an example of the use of the index_combine hint. This execution plan combines bitmap indexes to quickly resolve a low-cardinality Boolean expression:

        select /*+ index_combine(emp, dept_bit, job_bit) */
           ename,
           job,
           deptno,
           mgr
        from
           emp
        where
           job = 'SALESMAN'
        and
           deptno = 30
        ;

Here is the execution plan that shows the index combine process:

        OPERATION
        ----------------------------------------------------------------------
        OPTIONS                        OBJECT_NAME                    POSITION
        ------------------------------ ---------------------------- ----------
         SELECT STATEMENT
                                                                             2
          TABLE ACCESS
        BY INDEX ROWID                 EMP                                   1
            BITMAP CONVERSION
        TO ROWIDS                                                            1
              BITMAP AND
                BITMAP INDEX
        SINGLE VALUE                   DEPT_BIT                              1
                BITMAP INDEX
        SINGLE VALUE                   JOB_BIT                               2

While the internals of Oracle indexing are very complex and open to debate, there are some things that you can do to explore the internal structures of your indexes. Let’s take a closer look at the method that I use to reveal index structures.

Inside Oracle b-tree indexes

There are many myths and legends surrounding the use of Oracle indexes, especially the ongoing passionate debate about rebuilding of indexes for improving performance. Some experts claim that periodic rebuilding of Oracle b-tree indexes greatly improves space usage and access speed, while other experts maintain that Oracle indexes should “rarely” be rebuilt. Interestingly, Oracle reports that the new Oracle10g Automatic Maintenance Tasks (AMT) will automatically detect indexes that are in need of re-building. Here are the Pros and Cons of this highly emotional issue:

Arguments for Index Rebuilding – Many Oracle shops schedule periodic index rebuilding, and report measurable speed improvements after they rebuild their Oracle b-tree indexes. In an OracleWorld 2003 presentation titled Oracle Database 10g: The Self-Managing Database by Sushil Kumar of Oracle Corporation, Kumar states that the Automatic Maintenance Tasks (AMT) Oracle10g feature will automatically detect and re-build sub-optimal indexes. “AWR provides the Oracle Database 10g a very good "knowledge" of how it is being used. By analyzing the information stored in AWR, the database can identify the need of performing routine maintenance tasks, such as optimizer statistics refresh, rebuilding indexes, etc. The Automated Maintenance Tasks infrastructure enables the Oracle Database to automatically perform those operations.”
Arguments against Index Rebuilding – Some Oracle in-house experts maintain that Oracle indexes are super-efficient at space re-use and access speed and that a b-tree index rarely needs rebuilding. They claim that a reduction in Logical I/O (LIO) should be measurable, and if there were any benefit to index rebuilding, someone would have come up with “provable” rules.

So who is right? I suspect that they both are correct. There is a huge body of evidence that index rebuilding makes the end users report faster response time, and I have no wonder if this is only a placebo effect, with no scientific basis. Some experts suspect a placebo effect may be at work here, and the end users, knowing that they have new index trees, report a performance gain when none exists.

Because of their extreme flexibility, Oracle b-tree indexes are quite complex, and to really gather scientific evidence we must examine all of the index metrics. Getting a meaningful measure of the benefit of an index rebuild in a production environment is very difficult because the system is under heavy load and usage patterns change constantly. Plus, many IT managers require periodic index re-building because it is a low-risk job and it curries favor from the end-user community.

Where are the index details?

Most Oracle professionals are aware of the dba_indexes view that is populated with index statistics when indexes are analyzed. The dba_indexes view contains a great deal of important information for the SQL optimizer, but there is still mote to see. Oracle provides an analyze index xxx validate structure command that provides additional statistics into a temporary tables called index_stats, which, sadly, is overlaid after each command.

To get the full picture, we must devise a table structure that will collect data from both sources. Here is a method that will do the job:

1. Create a temporary table to hold data from dba_indexes and index_stats

2. Verify quality of dbms_stats analysis

3. Populate temporary table from dba_indexes

4. Validate indexes & send output into temp table

Let’s start by creating a table to hold our index data. I call this table index_details:

drop table index_details;

Create table index_details
(
-- ********* The following is from dba_indexes ******************
OWNER_NAME                                         VARCHAR2(30),
INDEX_NAME                                         VARCHAR2(30),
INDEX_TYPE                                         VARCHAR2(27),
TABLE_OWNER                                        VARCHAR2(30),
TABLE_NAME                                         VARCHAR2(30),
TABLE_TYPE                                         VARCHAR2(11),
UNIQUENESS                                         VARCHAR2(9),
COMPRESSION                                        VARCHAR2(8),
PREFIX_LENGTH                                      NUMBER,
TABLESPACE_NAME                                    VARCHAR2(30),
INI_TRANS                                          NUMBER,
MAX_TRANS                                          NUMBER,
INITIAL_EXTENT                                     NUMBER,
NEXT_EXTENT                                        NUMBER,
MIN_EXTENTS                                        NUMBER,
MAX_EXTENTS                                        NUMBER,
PCT_INCREASE                                       NUMBER,
PCT_THRESHOLD                                      NUMBER,
INCLUDE_COLUMN                                     NUMBER,
FREELISTS                                          NUMBER,
FREELIST_GROUPS                                    NUMBER,
PCT_FREE                                           NUMBER,
LOGGING                                            VARCHAR2(3),
BLEVEL                                             NUMBER,
LEAF_BLOCKS                                        NUMBER,
DISTINCT_KEYS                                      NUMBER,
AVG_LEAF_BLOCKS_PER_KEY                            NUMBER,
AVG_DATA_BLOCKS_PER_KEY                            NUMBER,
CLUSTERING_FACTOR                                  NUMBER,
STATUS                                             VARCHAR2(8),
NUM_ROWS                                           NUMBER,
SAMPLE_SIZE                                        NUMBER,
LAST_ANALYZED                                      DATE,
DEGREE                                             VARCHAR2(40),
INSTANCES                                          VARCHAR2(40),
PARTITIONED                                        VARCHAR2(3),
TEMPORARY                                          VARCHAR2(1),
GENERATED                                          VARCHAR2(1),
SECONDARY                                          VARCHAR2(1),
BUFFER_POOL                                        VARCHAR2(7),
USER_STATS                                         VARCHAR2(3),
DURATION                                           VARCHAR2(15),
PCT_DIRECT_ACCESS                                  NUMBER,
ITYP_OWNER                                         VARCHAR2(30),
ITYP_NAME                                          VARCHAR2(30),
PARAMETERS                                         VARCHAR2(1000),
GLOBAL_STATS                                       VARCHAR2(3),
DOMIDX_STATUS                                      VARCHAR2(12),
DOMIDX_OPSTATUS                                    VARCHAR2(6),
FUNCIDX_STATUS                                     VARCHAR2(8),
JOIN_INDEX                                         VARCHAR2(3),
-- ********* The following is from index_stats ******************
HEIGHT                                             NUMBER,
BLOCKS                                             NUMBER,
NAMEx                                              VARCHAR2(30),
PARTITION_NAME                                     VARCHAR2(30),
LF_ROWS                                            NUMBER,
LF_BLKS                                            NUMBER,
LF_ROWS_LEN                                        NUMBER,
LF_BLK_LEN                                         NUMBER,
BR_ROWS                                            NUMBER,
BR_BLKS                                            NUMBER,
BR_ROWS_LEN                                        NUMBER,
BR_BLK_LEN                                         NUMBER,
DEL_LF_ROWS                                        NUMBER,
DEL_LF_ROWS_LEN                                    NUMBER,
DISTINCT_KEYSx                                     NUMBER,
MOST_REPEATED_KEY                                  NUMBER,
BTREE_SPACE                                        NUMBER,
USED_SPACE                                         NUMBER,
PCT_USED                                           NUMBER,
ROWS_PER_KEY                                       NUMBER,
BLKS_GETS_PER_ACCESS                               NUMBER,
PRE_ROWS                                           NUMBER,
PRE_ROWS_LEN                                       NUMBER,
OPT_CMPR_COUNT                                     NUMBER,
OPT_CMPR_PCTSAVE                                   NUMBER
)
tablespace tools
storage (initial 5k next 5k maxextents unlimited);

(Note: the index_stats table has a column named PCT_USED even though Oracle indexes do not allow changes to this value.)

Now that we have a table that will hold all of the index details, the next step is to populate the table with data from freshly-analyzed indexes. Remember, you should always run dbms_stats to get current index statistics. Here is the script.

insert into index_details
(
OWNER_NAME,
INDEX_NAME,
INDEX_TYPE,
TABLE_OWNER,
TABLE_NAME,
TABLE_TYPE,
UNIQUENESS,
COMPRESSION,
PREFIX_LENGTH,
TABLESPACE_NAME,
INI_TRANS,
MAX_TRANS,
INITIAL_EXTENT,
NEXT_EXTENT,
MIN_EXTENTS,
MAX_EXTENTS,
PCT_INCREASE,
PCT_THRESHOLD,
INCLUDE_COLUMN,
FREELISTS,
FREELIST_GROUPS,
PCT_FREE,
LOGGING,
BLEVEL,
LEAF_BLOCKS,
DISTINCT_KEYS,
AVG_LEAF_BLOCKS_PER_KEY,
AVG_DATA_BLOCKS_PER_KEY,
CLUSTERING_FACTOR,
STATUS,
NUM_ROWS,
SAMPLE_SIZE,
LAST_ANALYZED,
DEGREE,
INSTANCES,
PARTITIONED,
TEMPORARY,
GENERATED,
SECONDARY,
BUFFER_POOL,
USER_STATS,
DURATION,
PCT_DIRECT_ACCESS,
ITYP_OWNER,
ITYP_NAME,
PARAMETERS,
GLOBAL_STATS,
DOMIDX_STATUS,
DOMIDX_OPSTATUS,
FUNCIDX_STATUS,
JOIN_INDEX
)
        select * from dba_indexes
        where owner not like 'SYS%'
;

Now that we have gathered the index details from dba_indexes, we must loop through iterations of the analyze index xxx validate structure command to populate our table with other statistics. Here is the script that I use to get all index details.

/* INDEX.STATS contains 1 row from last execution */
/* of ANALYZE INDEX ... VALIDATE STRUCTURE         */
/* We need to loop through validates for each      */
/* index and populate the table.                   */

DECLARE
v_dynam varchar2(100);
cursor idx_cursor is
select owner_name, index_name from index_details;

BEGIN
for c_row in idx_cursor loop
   v_dynam := 'analyze index '||c_row.owner_name||'.'||c_row.index_name||
             ' validate structure';
   execute immediate v_dynam;
   update index_details set
              (HEIGHT, BLOCKS, NAMEx, PARTITION_NAME, LF_ROWS, LF_BLKS,
               LF_ROWS_LEN, LF_BLK_LEN, BR_ROWS, BR_BLKS, BR_ROWS_LEN,
               BR_BLK_LEN, DEL_LF_ROWS, DEL_LF_ROWS_LEN, DISTINCT_KEYSx,
               MOST_REPEATED_KEY, BTREE_SPACE, USED_SPACE, PCT_USED,
               ROWS_PER_KEY, BLKS_GETS_PER_ACCESS, PRE_ROWS, PRE_ROWS_LEN,
               OPT_CMPR_COUNT, OPT_CMPR_PCTSAVE)
= (select * from index_stats)
     where index_details.owner_name = c_row.owner_name
       and index_details.index_name = c_row.index_name;
   if mod(idx_cursor%rowcount,50)=0 then
     commit;
   end if;
end loop;
commit;

END;
/

update
   index_details a
set
   num_keys =
   (select
      count(*)
    from
      dba_ind_columns b
    where
       a.owner_name = b.table_owner
    and
       a.index_name = b.index_name
   )
;

After running the script from listing 3, we should now have complete index details for any index that we desire. However, with 70 different metrics for each index, it can be quite confusing about which columns are the most important. To make queries easy, I create a view that displays only the columns that I find the most interesting. Here is my view:

drop view indx_stats;

Create view idx_stats
as
select
OWNER_NAME                 ,
INDEX_NAME                 ,
INDEX_TYPE                ,
UNIQUENESS                ,
PREFIX_LENGTH             ,
BLEVEL                    ,
LEAF_BLOCKS               ,
DISTINCT_KEYS             ,
AVG_LEAF_BLOCKS_PER_KEY    ,
AVG_DATA_BLOCKS_PER_KEY    ,
CLUSTERING_FACTOR          ,
NUM_ROWS                  ,
PCT_DIRECT_ACCESS         ,
HEIGHT                    ,
BLOCKS                    ,
NAMEx                     ,
PARTITION_NAME            ,
LF_ROWS                   ,
LF_BLKS                   ,
LF_ROWS_LEN               ,
LF_BLK_LEN                ,
BR_ROWS                   ,
BR_BLKS                   ,
BR_ROWS_LEN               ,
BR_BLK_LEN                ,
DEL_LF_ROWS               ,
DEL_LF_ROWS_LEN           ,
DISTINCT_KEYSx            ,
MOST_REPEATED_KEY         ,
BTREE_SPACE               ,
USED_SPACE                ,
PCT_USED                  ,
ROWS_PER_KEY              ,
BLKS_GETS_PER_ACCESS      ,
PRE_ROWS                  ,
PRE_ROWS_LEN              ,
num_keys,
sum_key_len
from
Index_details;

While most of these column descriptions are self-evident, there are some that are especially important:

CLUSTERING_FACTOR – This is one of the most important index statistics because it indicates how well sequenced the index columns are to the table rows. If clustering_factor is low (about the same as the number of dba_segments.blocks in the table segment) then the index key is in the same order as the table rows and index range scan will be very efficient, with minimal disk I/O. As clustering_factor increases (up to dba_tables.num_rows), the index key is increasingly out of sequence with the table rows. Oracle’s cost-based SQL optimizer relies heavily upon clustering_factor to decide whether to use the index to access the table.
HEIGHT - As an index accepts new rows, the index blocks split. Once the index nodes have split to a pre-determined maximum level the index will “spawn” into a new level.
BLOCKS – This is the number of blocks consumed by the index. This is dependent on the db_block_size. In Oracle9i and beyond, many DBAs create b-tree indexes in very large blocksizes (db_32k_block_size) because the index will spawn less. Robin Schumacher has noted in his book Oracle Performance Troubleshooting: “As you can see, the amount of logical reads has been reduced in half simply by using the new 16K tablespace and accompanying 16K data cache. Clearly, the benefits of properly using the new data caches and multi-block tablespace feature of Oracle9i and above are worth your investigation and trials in your own database.”
PCT_USED – This metric is very misleading because it looks identical to the dba_indexes pct_used column, but has a different meaning. Normally, the pct_used threshold is the freelist un-link threshold, while in index_stats pct_used is the percentage of space allocated in the b-tree that is being used.

Is there a criterion for index rebuilding?

If we believe the antidotal reports that index rebuilding improved end-user-reported performance, how can we analyze this data and see what the criteria (if any) might be for an index rebuild? For example, here are the criteria used by a fellow Oracle DBA who swears that rebuilding indexes with these criteria has a positive effect on his system performance:

-- ***  Only consider when space used is more than 1 block    ***
   btree_space > 8192
   and
-- ***  The number of index levels is > 3  ***
   (height > 3
-- ***  The % being used is < 75%          ***
    or pct_used < 75
-- ***  Deleted > 20% of total             ***
    or (del_lf_rows/(decode(lf_rows,0,1,lf_rows)) *100) > 20)
;

In reality, I suspect that the rules are far more complicated than this simple formula. To see the commonality between indexes of similar nature you can use the data from your new index_details table to write summary queries (refer to listing 5). Here we see the average number index blocks, leaf rows and leaf blocks for indexes of different heights:

This gives us a high-level idea of Oracle threshold for spawning an index onto new levels. We can take this same approach and attempt to answer the following auctions:

At what point does an index spawn to another level (height)? It should be a function of blocksize, key length and the number of keys.
The number of deleted leaf nodes may not be enough to trigger an index rebuild. This is because if clustering_factor is low (dba_indexes.clustering_factor ~= dba_segments.blocks), then the rows are added in order, and the index is likely to re-use the deleted leaf nodes. On the other hand, if clustering_factor is high (dba_indexes.clustering_factor ~= dba_tables.num_rows), then a rebuild may be beneficial. To illustrate, assume I have an index on the last_name column of a 1,000,000 row table and the clustering_factor is 1,000,000 indicating that the rows are in the same sequence as the index. In this case, a bulk delete of all people whose last_name begins with the letter “K,” would leave a dense cluster of deleted leaf nodes on adjacent data blocks within the index tablespace. This large section of space is more likely to be re-used than many tiny chunks.

We can also use the data from our detail table to compute our own metrics. In the example query below, we create a meta-rule for indexes:

Dense Full Block Space - This is the index key space (number of table rows * index key length) as a function of the blocksize and free index space.
Percent of Free Blocks - This is the estimated number of free blocks within the index.

Using these metrics, we can analyze the system and produce some very interesting reports of index internals:

select
   height           c1,
   avg(blocks)      c2,
   avg(lf_rows)     c3,
   avg(lf_rows_len) c4,
   avg(lf_blks)     c5,
   avg(lf_blk_len) c6
from
   index_details
group by
   height
;

                        Average      Average                Average
Average Average           Leaf     Leaf Row     Average Leaf Block
Height   Blocks           Rows       Length Leaf Blocks     Length
------- -------- -------------- ------------ ----------- ----------
      1      236             12          234           1      7,996
      2      317         33,804      691,504         106      7,915
      3    8,207      1,706,685   41,498,749       7,901      7,583
      4 114,613     12,506,040 538,468,239     113,628      7,988

As we see, we can compute and spin this data in an almost infinite variety of ways.

The Debate Continues

Today, a battle is raging between the “academics” who do not believe that indexes should be re-built without expensive studies and the “pragmatists” who rebuild indexes on a schedule because their end users report faster response times.

To date, none of the world’s Oracle experts has determined a reliable rule for index rebuilding, and no expert has proven that index re-builds “rarely” help. Getting statistically valid “proof” from a volatile production system would be a phenomenal challenge. In a large production system, it would be a massive effort to trace LIO from specific queries to specific index, before and after the rebuild.

Academic approach - Many Oracle experts claim that indexes rarely benefit from rebuilding, yet none have ever proffered empirical evidence that this is the case, or what logical I/O conditions arise in those “rare” cases” where indexes benefit from re-building.
Pragmatic approach – Many IT managers force their Oracle DBAs to periodically re-build indexes because the end-user community reports faster response times following the re-build. The pragmatists are not interested in “proving” anything; they are just happy that the end users are happy. Even if index re-building were to be proven as a useless activity, the placebo effect on the end users is enough to justify the task.

It is clear that all 70 of the index metrics interact together in a predictable way, and some scientist should be able to take this data and reverse-engineer the internal rules for index rebuilding, if any actually exist. For now, the most any Oracle professional can do is to explore their indexes and learn how the software manages to b-tree structures.

Donald K. Burleson is one of the world's top Oracle Database experts with more than 20 years of full-time DBA experience. He specializes in creating database architectures for very large online databases and he has worked with some of the world's most powerful and complex systems. A former Adjunct Professor, Don Burleson has written 15 books, published more than 100 articles in national magazines, serves as Editor-in-Chief of Oracle Internals and edits for Rampant TechPress. Don is a popular lecturer and teacher and is a frequent speaker at Oracle Openworld and other international database conferences. Don's Web sites include DBA-Oracle, Remote-DBA, Oracle-training, remote support and remote DBA.

Contributors : Donald K. Burleson
Last modified 2005-06-22 12:17 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Inside Oracle Indexing

Inside Oracle b-tree indexes

Where are the index details?

Is there a criterion for index rebuilding?

The Debate Continues