When Should You Rebuild an Index?
In an earlier article about B-tree indexes, I ended with the comment, “Resist the argument that you need to rebuild Indexes regularly because ‘they become unbalanced.’ It isn’t a valid argument.” Some time later, I received an email message pointing out that there were other, valid, reasons for rebuilding indexes. This is true, and rebuilding B-tree indexes does sometimes have benefits, so I thought I’d write a short article investigating the reasons for rebuilding B-tree indexes.
Why Rebuild an Index?
There are only two reasons for rebuilding indexes. One is driven by the question, “Will the performance gain be worth the cost of rebuilding this index?” The other is driven by a similar question: “Is the administrative benefit worth the cost?”
One convenient aspect of phrasing the questions like this is that if the cost is effectively zero, then even in the case in which rebuilding an index is a pointless exercise, I’m not going to insist that you to stop doing it. Of course, the time and effort may not really be zero, and just occasionally, an index rebuild can cause subsequent short-term performance problems, so do carry on reading the rest of the article.
The standard argument for rebuilding indexes is based on the fact that B-tree indexes tend to “degenerate” (i.e., become less space-efficient) as they are populated over time. And the argument is true — but only because the standard theory predicts that B-tree indexes will tend to operate at 75 percent packing, and Oracle creates them, by default, at 90 percent packing. (In fact, Oracle does not coalesce adjacent, low-usage, leaf blocks automatically, so its packing will often be a few percent lower than the theoretical value). So the question of the rebuild generally comes down to the following implicit question:
If I think the index is running at X percent packing, should I rebuild it to 90 percent?
Let’s start by considering the significance of the change from X percent, and before we worry about values for the magic number X.
Potential Benefits
When you rebuild an index, the physical effect is to reduce the size of the index (especially the leaf block count). This may result in three possible benefits:
- The optimiser calculations may produce a smaller value for the cost of using the index, so the optimiser may use the index in more execution plans.
- Typical queries that use the index may have to visit fewer index blocks, and therefore run more efficiently.
- Because the index is smaller, it may become a better survivor in the buffer cache LRU chain, so that blocks from the index are found in the buffer more frequently, and fewer blocks from other objects are flushed to allow index blocks to be reloaded. If this occurs, system-level I/O is reduced, resulting in a possible performance gain for everyone.
So let’s check each effect in turn, and see what the different benefits look like in different circumstances. We’ll start with the optimiser cost calculations and a simple worked example.
Optimiser Costing
If you’ve read the article, “Why Isn't Oracle Using My Index?!” you will be familiar with the formula originally published by Wolfgang Breitling at IOUG-2002 which gives the general cost of accessing a table through an index as:
blevel + selectivity * leaf_blocks + selectivitity * clustering_factor.
Consider an index on 10,000,000 rows, with an average index entry size of 40 bytes. This would give us about 200 entries per leaf block if we use an 8K block size. Assume that the index (according to some statistics in the viewindex_stats) is running at 50 percent efficiency and is therefore, according to common dictate, a good candidate for rebuilding (to the default 90 percent — but we will use 100 percent in our example). In other words, the index is currently running with about 100 entries per block.
Let’s do the arithmetic, first before the rebuild:
10,000,000 rows at 100 rows per block | => 100,000 leaf blocks |
100,000 leaf blocks (at 50 percent packing) | => 1,000 level 1 branch blocks |
1,000 level 1 branch blocks (at 50 percent packing) | => 10 level 2 branch blocks |
10 level 2 branch blocks (at 50 percent packing) | => 1 level 3 branch block |
Now after the rebuild (with pctfree = 0):
10,000,000 rows at 200 rows per block | => 50,000 leaf blocks |
50,000 leaf blocks (at 100 percent packing) | => 250 level 1 branch blocks |
250 level 1 branch blocks (at 100 percent packing) | => 2 level 2 branch blocks |
2 level 2 branch blocks (at 100 percent packing) | => 1 level 3 branch block |
So the index has blevel = 3 before and after rebuilding, but 100,000 leaf_blocks before the rebuild, dropping to 50,000 afterwards. So does this tell us anything about the new cost estimate? Not really, although it is an important little detail that the blevel, which is one component of the cost calculation, rarely changes on rebuilding an index. The exponential relationship between the blevel and the maximum row count makes this almost inevitable.
We need to examine not just the leaf_blocks, but the entire cost component: selectivity * leaf_blocks. How much does this change? Well, that depends on how many rows you have for each index value. Let’s try a couple of numbers, one for a high-precision index (five rows per value), and one for a lower quality index (50 per value).
Rows per value = 5 => selectivity = 1/2,000,000 Selectivity * leaf_blocks (old) = 100,000 / 2,000,000 = 0.05 Selectivity * leaf_blocks (new) = 50,000 / 2,000,000 = 0.025
Because of rounding (I believe Oracle rounds up at this point in the formula), this component of the cost estimate does not change.
Rows per key = 50 => selectivity = 1/200,000 Selectivity * leaf_blocks (old) = 100,000 / 200,000 = 0.5 Selectivity * leaf_blocks (new) = 50,000 / 200,000 = 0.25
Again, this component of the formula shows no change. In fact, it is not until each key value represents 100 rows in the table that the cost estimate is reduced as a result of our rebuilding the index to pack it from 50 percent to 100 percent efficiency
So, there are cases where rebuilding an index will change the optimiserââ¬â¢s opinion of the usefulness of that index — but those cases may be fairly rare in your system.
Typical Queries
So what if the optimiser isn’t convinced that your index has improved; maybe the end-user queries that use that index already will run better. As ever, it is important to know the data and know the application. When you start to think about rebuilding an index “because index_stats says it is running at 50 percent,” the first thing you should do is to ask yourself if (for example) that 50 percent means that half the index is nearly packed and the other half is nearly empty — one magic number for the whole index may not be giving you any useful information. (In the absence of knowledge, you might try a rather expensive treedump to check the details).
Then, when you have decided what the number means, you have to decide if rebuilding would actually have any perceptible effect for your users. Depending on the way the table and index are used, the answer may be no.
Consider this example. The table is addressed by an “Object Oriented” system, which always uses meaningless numeric keys. Every query uses a key value to collect data. One trip down the index fetches one rowid from a leaf block, and then one row from the table. Rebuilding the index to pack 200 rowids per leaf_block doesn’t make any difference if you want only one row.
How about an example closer to the other end of the spectrum: a ââ¬Åchildâ⬠table in which one foreign key value picks up 100 rows? Packing the index means that you only visit one leaf_block instead of two for those rowids. Terrific ââ¬â except maybe those 100 table rows come from 100 table blocks, so the effort of rebuilding the index makes a difference of slightly less than one percent to the end-user. You have to check out the figures when trying to work out whether or not itââ¬â¢s going to be worth rebuilding an index, especially if thereââ¬â¢s some reason that make it awkward to do the rebuild.
Buffering Benefits
Clearly, if you have a well-packed index, it is likely to cause less “pollution” in the buffer — there are fewer index blocks, so the whole index may get into the buffer, and because there are fewer blocks to knock other data blocks out of the buffer, there may be a general reduction in I/O. On the down side, a well-packed index may be subject to extra contention on inserts, updates, and deletes. It’s harder to measure a case like this, but it’s probably a good enough reason to rebuild an index (partition) that’s about to be made read-only.
Three thoughts to consider, though. If it’s a really popular index, then the LRU and touch count could be keeping it buffered all the time anyway, so whilst it may be “wasting space” in the buffer, it may not be causing any extra I/O. Secondly, for many queries the significant cost of the query is the cost of visiting table blocks which have a low revisit rate and cause much more aggressive buffer flushing, so worrying about indexes may be distracting you from the problem you should be addressing. Finally, it is possible that by rebuilding indexes too frequently, you may be exaggerating the problem rather than fixing the problem — you might like to try the following (fairly long-running, expect it to take a few minutes) test on a system with an 8K block size and not in an ASSM tablespace if you are using Oracle9:
drop table t1; create table t1(n1 number(38)); create index i1 on t1(n1); execute dbms_random.seed(0) begin for i in 1..400000 loop insert into t1 values( trunc(power(10,14) * dbms_random.value) ); commit; end loop; end; / -- default pctfree for rebuild alter index i1 rebuild pctfree 10; begin for i in 1..100000 loop insert into t1 values( trunc(power(10,14) * dbms_random.value) ); commit; end loop; end; / analyze index i1 validate structure; select lf_blks from index_stats;
Check the number of leaf blocks in the index when you do the rebuild part of the way through the test, then the repeat the experiment without the rebuild. You may be surprised at the difference in results. (In my case, the count with the “performance-enhancing” rebuild was 2227, but when I didn’t do the rebuild, the count stayed down at 1778 blocks).
Please note, this test does not prove (and is not intended to prove) that you should not rebuild indexes. It shows only that it is possible for a rebuild to have a counter-intuitive effect. There are scenarios, based on data patterns, where this type of effect is likely to appear in critical subsections of an index shortly after a rebuild. The point I am making is that you should always think carefully before deciding to rebuild an index.
Reminder
Throughout this article, I have been discussing B-tree indexes. There is a completely different argument about bitmap indexes that I have already covered in various articles about bitmap indexes previously published on www.dbazine.com
I have also ignored the possibility that you could choose to rebuild indexes (or at least index partitions) to 100 percent packing (pctfree = 0) just before you make a tablespace read-only. Even then, it might not be worth the effort if it causes problems whilst you are doing the rebuild (remember, there was a fairly serious bug in online index rebuilds which may still be present in be recent versions of Oracle).
Finally, there are always some indexes which, for application-dependent reasons, behave in a completely catastrophic manner — there are always special cases where a regular rebuild is likely to be a good idea. Even then, you should check whether a regular coalesce might be a better idea in the short term, with a strategic redesign involving function-based indexes as a longer term solution.
Conclusion
There is only one sound argument for rebuilding an index:
Will the total cost of rebuilding the index be a reasonable price to pay for the resulting benefit to the system?
The answer to this question is frequently a resounding NO. In fact, sometimes the overall impact of rebuilding an active index will be detrimental to the system. However, there are still plenty of misconceptions about indexes that result in DBAs the world over wasting valuable time and effort rebuilding indexes unnecessarily.
BUT, if you have a regular window when the system is not in use, have a simple batch job that runs in that window without putting pressure on other batch tasks, and that batch job can never fail — then you might as well rebuild all the “safe” indexes you want to; there is often a slight performance gain to be had, and if it costs you nothing, then you might as well take it.
--
Jonathan Lewis is a freelance consultant with more than 18 years experience of Oracle. He specialises in physical database design and the strategic use of the Oracle database engine, is author of Practical Oracle 8i - Building Efficient Databases published by Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit. Further details of his published papers, presentations and seminars and tutorials can be found at http://www.jlcomp.demon.co.uk, which also hosts The Co-operative Oracle Users’ FAQ for the Oracle-related Usenet newsgroups.
Jonathan Lewis
Last modified 2005-04-15 09:32 PM