Yesterday, Scott Hayes described when it’s useful to mark a DB2 table as volatile. It should be considered when table data changes so frequently and wildly that it’s hard to keep statistics current enough and applicable. Such as with those “flush and fill” tables often present in OLTP systems.
Because the DB2 optimizer looks only at statistics (not current table data) when generating an access plan, outdated statistics can cause it to choose a costly tablescan even when there’s a perfectly good index available to cover one or more sargs. It’s one of those things that’s easy to explain but hard to justify. Sometimes we developers will bring this on ourselves through excessive use of literal values, rather than markers and bound parameters. The volatile property prevents that: it tells the optimizer to ignore statistics and just grab whatever indexes are there.
Yet volatile can send you back into the weeds again when there are multiple indexes on the table, especially multi-column indexes. When several index candidates can satisfy a query, the optimizer really does need the statistics to wisely choose between them. I’ve seen some really bizarre and costly access plans when it doesn’t. That’s why I’ve un-marked (not volatile) about as many tables as I’ve marked volatile. Volatile is great for single-index tables, but often not for others.
So I usually prefer good old-fashioned runstats scripts instead. And by using multiple scripts tailored to match the workload and data flow, one can significantly reduce the runstats overhead. Besides, cron jobs and scheduled tasks are often more fruitful than betting against the optimizer.
I expect that soon enough that DB2’s auto runstats will be rich enough to eliminate scheduled scripts altogether, but it’s just not there yet. Among other things, the two hour wait is an eternity for some systems. I’d like to see timing/scheduling settings added to runstats profiles and more frequent polling. Here’s hoping that’ll be in a future DB2.