Daily Archives: May 10, 2010

Sorting it Out

In prior posts, I described some of the benefits of DB2’s Self-Tuning Memory Manager (STMM), along with a few caveats.  One caution was around automated sort heaps, since sort requirements can often be unpredictable and confusing.

High-volume OLTP systems can, at times, experience heavy loads of small sorts or sort-like activity (hash joins, grpby’s, etc.).  It’s not uncommon for many of these to overflow (spill), or for a sub-optimal access plan to arise, even when more than enough memory is available for sorts.  Like the proverbial bad check writer, you’re left thinking, “I can’t be overflowing, I still have memory left!”

Sort overflow symptoms include high I/O and CPU utilization, poor response times, high sort overflow snapshot counts, long sort times, and large numbers of rows written on read-only SQLs like SELECTs.  Sort heap tuning is often one of the “last frontiers” in performance work (after fixing low-hanging fruit like SQLs, indexes, and bufferpools), and can be a stubborn one.

From my experience, most sort overflow problems in DB2 9.x fall into these categories:

  1. SQL.  In high-volume OLTP systems, it’s important to avoid sorting as much as possible.  Developers usually recognize that order by clauses require sorts, but can miss more subtle uses of the sort heap.  Count, distinct, and group by clauses also often require sort heap space, as do hash joins.  Statement snapshots and explains will reveal these.  The first step should be to try to rewrite problem SQLs to eliminate the sorts.  But wherever that isn’t possible, try step 2:
  2. DDL.  Adding or changing indexes and clustering may eliminate sorts.  If an existing index is used but sorting still occurs, check to see if adding index or include columns will help, and verify that allow reverse scans is specified when needed.  Again, the only way to know for sure is to measure by snapshots and explains.  In some cases, MDCs can help, but check it first by “doing the math” or running db2advis.  Sometimes the problem is as simple as outdated statistics (among other things, outdated statistics can cause DB2 to request too small a sort heap), so make sure your runstats is occurring as frequently as needed.
  3. Configuration.  And here we come to the STMM benefits and caveats.  You should “go automatic” with sort heap sizes, but be prepared to monitor and adjust as needed.

STMM only tunes shared sort memory, so to enable self-tuning sort memory, you must:

  • Set the instance (DBM)-level sheapthres to zero (0).  You can verify this with: db2 get dbm cfg  | grep -i sheap
  • Set the database-level sheapthres_shr (total amount of sort memory) and sortheap (memory available for each sort) to automatic.  You can verify this with: db2 get db cfg for <database> | grep -i sort

In some cases (such as when you’re experimenting and don’t want to wait on STMM’s tuning cycles), you may want to set an initial value for sortheap.  To do this, specify it alongside the automatic parameter, like so: db2 update db cfg using sortheap <size> automatic immediate.   But the initial value matters little after STMM “warms up” and then saves its settings.

When monitoring a database with sort overflow problems, keep an eye on the percentage of overflows (under 3% is a good target) and on the current sortheap and sheapthres_shr sizes.  You can view these with the snapshot monitors (db2 get snapshot for all on <database> | grep -i sort) and database configuration (db2 get db cfg for <database> show detail | grep -i sort).   But you may find, as I often have, that frequent, small overflows are occurring even when the sort memory areas have stretched to more than enough space.

Why this happens is a mystery to me.  A parallel symptom I often see is that the sort heap size reported in explain (db2exfmt) outputs is often far smaller than the current sortheap size.  At this point, the only choices are to leave sort heaps at automatic and accept some overflows, or abandon automatic and set a fixed size.  When setting large fixed sizes, remember that an overflowed sort writes the entire sort heap to tempspace.  So, a very large sort heap may coax the optimizer into relying more on sorts, only to have some of these very large sorts spill everything.

IBM is considering allowing ceiling and/or floor values for certain automatic parameters.  I do hope this gets added in a future DB2, and that sort heaps are among the first that can be configured this way.  Until then, DBAs will be left to sort this out on their own.