Monthly Archives: February 2010

Spring sports, winter weather

“No winter lasts forever; no spring skips its turn.”
Hal Borland

It’s that time of year when my family’s calendar gets ahead the climate. Luke has been at sub-freezing soccer for weeks now with NASA and Crown, but his first “spring” tournament (the RYSA Spring Kickoff Classic, a blustery affair) made it all the more real.  Lydia and I joined the soccer fray with our first team practice last night.  And I continued my new tradition of frigid races today by running the Guns & Hoses 5K: a benefit sponsored by the Cherokee Recreation & Parks Agency, the Cherokee County Fire Department (hoses) and the Cherokee County Sheriff’s Office (guns).  The 27 degree temps and 17 degree wind chill slowed me below my goal pace, but I did end up with a second place age group finish.  It was a well-organized race with about 400 runners (more than expected), and one that I’ll look for again next year.

I’ll settle down for some more Winter Olympics viewing, to warm up after the soccer games and remind myself that it is still winter.  And I’ll patiently accept that bout of snow and freezing rain predicted for early next week.  But it’s getting time: out with the cold, in with the new!

Academic Pursuits

Like any obedient grad student, I wrote a lot of papers while recently working on my Masters degree.  While most were admittedly specialized and pedantic (and probably read like they were written by SCIgen), some may accidentally have some real world relevance.  Just last week, I handed out my XTEA paper to a co-worker who was foolish enough to ask.

At the risk that others might be interested, I posted a couple of the less obscure ones where I was the sole author; they are:

The Tiny Encryption Algorithm (TEA) The Tiny Encryption Algorithm (TEA) was designed by Wheeler and Needham to be indeed “tiny” (small and simple), yet fast and cryptographically strong. In my research and experiments, I sought to gain firsthand experience to assess its relative simplicity, performance, and effectiveness. This paper reports my findings, affirms the inventors’ claims, identifies problems with incorrect implementations and cryptanalysis, and recommends some solutions.
 
Practical Wireless Network Security Security measures are available to protect data communication over wireless networks in general, and IEEE 802.11 (Wi-Fi) in particular. Unfortunately, these measures are not widely used, and many of them are easily circumvented. While Wi-Fi security risks are often reported in the technical media, these are largely ignored in practice. This report explores reasons why.
 

Click on a title to access a PDF.

Thinking Inductively

It’s hard to argue with the massive scalability one can achieve with functional programming.  I’ve done just enough in Erlang, Scala, and Lisp/Scheme to appreciate the benefits.  Much of that “freedom to run” comes with stateless flows and stackless recursion.  But more than a change of language, it really requires a change of thinking.

The classic examples in functional programming languages often employ inductive techniques, since that’s where they shine and demonstrate elegance.  If you want to compute an exponent, prime, or Fibonacci series; stack a Hanoi tower; or even walk a tree (like an XML DOM), you can usually do it in a few lines of clever inductive code.  And with a functional language, poor recursive performance and stack overflows are typically not a problem (Scala being one exception).  If you’re not in a functional language and want to use the same algorithms without these risks, you’re left with the exercise of converting tail recursion to iteration.  That’s much like watching movies on your phone: it’s not hard, just much less interesting.

That’s why I dabble periodically in functional languages: to try to keep my waning inductive brain cells alive.  It sometimes pays off at work, too: I used induction just yesterday to simplify (and recurse) a clumsy and confusing method.  Inductive techniques are not the golden hammer that some would believe, but they do have their “best fit” use cases.

Given our current software crisis of how to go truly parallel (and keep all those cores we’re now getting busy), some would argue that stateless functional programming is where we should start.  Stateless is fine for tasks that can live in their own sandboxes and play with their own toys, but most solutions have to share with others.  We need a unified blend of stateless operation, classic resource control, and higher-level concurrency semantics (no threads or synchronized keywords please, that’s the assembly language of the 21st century).  More on this later.

Nike++

There are plenty of good GPS running tools for the Android; my favorites are SportyPal and the new Google MyTracks.  Both measure pace, distance, and route, and display results on Google Maps.  With them, I get the benefits of a GPS running watch, but with more features and no extra cost.

But I’ve had mixed results with GPS tracking.  Sometimes I can’t get reliable signals, even in open areas with no overhead obstructions (MyTracks can revert to determining location by cell and Wifi signals, but that’s a very weak approximation).  Sometimes inaccurate map data and other things can throw distance off; that’s most noticeable when I run places where I know the exact distance of the course.  And sometimes, the truly weird happens, like the run shown at right.  The red (for “fastest”) lines depict how I left the course for a half mile sprint through fences, houses, and rough terrain at 27.3 miles per hour (then quicky returning to the track, of course).  MyTracks reported a recent 5 mile street run as 785.16 miles in just over 39 minutes (I was ready to give up at the 784 mile mark).  And tight loops on small tracks are almost never right with GPS tracking.  Obviously, such failures get in the way in trying to maintain a good read on my pace.

So I decided to go with the Nike+ Sportband.  Its accelerometer technology is not dependent on GPS signals for measurement: it’s nice and simple.  That’s helpful because I don’t always like to carry my phone with me, nor fiddle with it while running.  But I do miss the automatic mapping, especially when exploring new areas.  So when running new routes, I carry both: Nike+ for accurate pace tracking, and my GPS Droid for route tracking.  Call it Nike++.

Once the new wears off, I’ll probably settle into a “just one at a time” mode: Nike+ for running and Droid GPS for walking, hiking, and kayaking.  But, for now, it’s belt and suspenders, and that allows for some often interesting comparisons.

admin_cmd can

The question came up again today: “how do I run a DB2 export through my ODBC connection?”  Before recent versions of DB2, the answer was, “you can’t.”  If you tried just running the command, DB2 would give you the classic SQL0104 “duh” error message: “…an unexpected token was found…”

That’s because administrative commands and utilities require special handling.  And before DB2 8.2, they could only be run through a command line processor session.  Programs could not use their normal database connections for things such as export, runstats, reorg, or “update db cfg.”  The alternatives were often inelegant, such as calling cumbersome utility functions like db2Export/sqluexpr or shelling out to a script.

Fortunately, the new admin_cmd stored procedure lets you run several of these commands through a normal CLI or ODBC connection, much like any SQL statement.  You just pass the command as a parameter; for example:

call admin_cmd(‘export to sales.ixf of ixf select * from sales’)

Even if you’re not writing code, admin_cmd is useful for doing maintenance and data movement directly from your favorite tools.  Since so many programs and tools use ODBC connections, it’s a convenient and portable way of handling your DB2 administrivia.

The Hacker Crackdown

It seems like only yesterday, but it’s been 20 years now since a simple bug in a C program brought down much of AT&T’s long distance network and brought on a national phreak hunt.  It was January 15, 1990: a day I’ll never forget because it was my 25th birthday, and the outage made for a rough work day.  But, in retrospect, it offers a great story, full of important lessons.

The first lesson was realized quickly and is perhaps summed up by Occam’s razor: the simplest explanation is often the most likely one.  This outage wasn’t a result of a mass conspiracy by phone phreaks, rather the result of recent code changes: System 7 upgrades to 4ESS switches.

There are obvious lessons to be learned about testing.  Automated unit test was largely unknown back then, and it could be argued that this wouldn’t have happened had today’s unit test best practices been in place.

This taught us a lot about code complexity and factoring, since a break statement could be so easily misaligned in such a long, cumbersome function.  The subsequent 1991 outage caused by a misplaced curly brace in the same system provided yet another reminder.

Finally, this massive chain reaction of failures reminded us of the systemic risk from so many interconnected systems.  That situation hasn’t improved; rather, it’s much worse today.  We don’t call it the internet and the web for nothing.

I was reminded of 1990 when Google recently deployed Buzz: yet another player in our tangled web of feeds and aggregators.  These things are convenient; for example, I rarely login to Plaxo, but it looks as though I’m active there because I have it update automatically from this blog and other feeds.  It makes me wonder if someone could set off a feedback loop with one of these chains.  Aggregators can easily prevent this (some simple pattern matching will do), but there may be someone out there trying to make it happen.  After all, there’s probably a misplaced a curly brace in all that Java code.

Bruce Sterling’s book, The Hacker Crackdown provides an interesting read of the 1990 failure, and he has put it in the public domain on the MIT web site.  If you want a quick partial read, I recommend Part I.

You’ve Got Questions…

“Who is this that darkens my counsel with words without knowledge?  Brace yourself like a man; I will question you, and you shall answer me.
Job 38:2-3

Sometimes problems come in groups, and that seems to be the case lately among many friends and family members.  Often times, our immediate reaction is to ask, “why God?”  If asked in anger, we need a perspective checkDave’s latest sermons on Job 38 and 39 provide that, and tell us how, comparatively, we have ostrich brains.  Give them a listen.

Haven’t I-RD Seen This?

In today’s sports action, we learned that Bank A has been returning Bank B’s re-presented IRDs (image replacement documents) as duplicates.  Bank B asked us to referee a bit, since both are customers.

It’s quite common for a collecting bank to re-present a returned item to the paying bank to “try again.”  This second presentment looks just like the first one, but with additional prior return data like return reasons and endorsements. If you ignore the additional data, it does in fact look like a duplicate transaction.  True duplicate transactions are a bad thing, but can happen with all the conversion that goes on between paper, image, IRD, and ACH.  Duplicates must be stopped, but separating potential duplicates from real ones can be tricky business.

Exactly where this prior return data shows up depends on when the item was truncated (converted from paper to electronic or image).  In this case, a paper check was converted to an electronic image (X9) for collection and return, and then converted to an IRD for re-presentment.  Because of the early conversion, there were no return reason stamps on the face of the image.  And, since it was a forward presentment IRD, there was no Region 7F to house the return reason on the front.  From looking at the IRD, the only clue that it was once returned is on the back: the return reason appearing in the (small and sideways) subsequent endorsements, after “RR”.  That’s how the standards go.

Bank A had kicked in some new aggressive duplicate detection rules that flagged re-presents as suspects.  It’s tough enough that operators had to review the images to check for returns, but they were only looking at the front image, and missing the fact that all these transactions were legitimate.

One option is to put the return reason on the face, but that would be bending the standards.  That’s because, in theory, things like this should never happen.  And so it goes when standards are involved: in theory, there is no difference between theory and practice, but, in practice there often is.

Don’t Get CLOBbered

The subject of direct I/Os came up in an unexpected phone call today.  This was from a friend who had long ago recovered from CLOB abuse and was concerned that he had somehow fallen off the wagon again.

Often new OO developers are tempted to treat the database as little more than a file system.  With wonderful things like CLOBs, BLOBs, and long varchars, who would actually parse their objects into columns?  Why bother with ORM frameworks?  Why take the car apart to put it in the garage?

The answer, of course, lies in the system requirements.  If all you need are LOBs, you probably don’t need a relational database.  And if you need a relational database, you probably shouldn’t use LOBs.

It’s not only an important design issue, but a big performance issue as well.  In DB2, LOBs and other large objects do not go in the bufferpools.  This means every read and write is physical – DB2 must go directly to disk.  Enabling file system caching on large object tablespaces helps some.  But, even with that, heavy use of LOBs is one of the quickest ways to kill OLTP performance.

There is a place in this world for LOBs and other large objects, used carefully and sparingly.  But far too often they’re used for object storage shortcuts.  I’ve had to tune a few systems where CLOBs factored heavily into the design and were among the most frequently-accessed data in the system.  LOB abuse has become so common that the snapshot counters for direct reads and direct writes are among the first I check when measuring a new system.  Sometimes the fix is simple: convert the CLOB types to varchars, with for bit data, if needed. Sometimes, though, the data is so large or the process so entrenched that deeper surgery is required.

This post-op friend of mine had long ago eradicated direct I/Os in the billions, but was now seeing counts in the few hundred thousands range after several hours of heavy system activity.  A quick check of the tablespace snapshot for syscatspace found the expected smoking gun: nearly all of these were in the system catalog.

Several of the system catalog tables (such as syscat.tables, columns, and indexes) do use LOBS.  They are strategically accessed, but can (especially after changes) cause some direct I/O spikes.  There are some tricks you can play to minimize this, but these direct I/Os are typically unimportant in the grand scheme of things.  Just subtract the syscatspace numbers from the DB-level totals and see if what’s left demands a closer look.  This will help you distinguish true LOB abuse symptoms from flashbacks.

Minum Data Redaction

WriteStreams.com is pleased to announce its new Minum Data Redaction (MDR) product.  MDR provides physical data security for sensitive bank and credit card information, complementing the electronic data security covered by IBM’s just-announced Optim Data Redaction product.  Used together, these products can help you achieve PCI DSS compliance with little or no coding.

IBM’s new Optim Data Redaction automatically removes account data from documents and forms.  You can get that wondrous XXXXXXXXXXX1234 credit card number formatting with little or no effort on your part (apart from buying software, of course).

Our new Minum Data Redaction product extends account number protection to the physical world, protecting the bank cards you carry.  Its super-strong rear adhesive and front opaque covering ensures that your sensitive credit card information stays protected.  It comes in a variety of colors (including duct silver and black), and our Premium version provides extra thickness to cover embossing.

But seriously now, we go to great lengths to protect electronic card information by encrypting it in stored files (56-bit DES isn’t good enough); redacting it on printed receipts, reports, and statements; and setting disclosure requirements that publicly embarrass companies who slip up.  But yet our simple payment process requires that we hand all this information over to any clerk or waiter who usually goes off with it for awhile: certainly long enough to copy it all down.  PCI DSS offers the classic false sense of security.

I was recently a victim of fraud against my Visa card.  A series of small (mostly $5) fraudulent charges hit my account over several days until I closed the account.  From what I learned, the charges where only authorized by account number and expiration date; there was no zip code verification.  I don’t know how the perps got my credit card number, but I doubt they grabbed data from a financial institution in the dark of night, nor devoted the $250,000 and 56 hours required to run an EFF DES crack against it.  It probably came from a clerk or waiter who handled my card.  My cards, like everyone else’s, have account number, expiration date, and CCV printed right on them.  Zip code isn’t there, but anyone who wants it can just ask to see my driver’s license for verification.  It’s a gaping hole.

Until credit cards gain better physical security, there is no “silver bullet.”  But banks and card companies could enlist help from their own customers.  For example, let me specify which types of charges I would allow/authorize.  It would spare consumers the hassles of disputing charges, and would save issuers the dispute processing fees and write-offs.

The Right Retool for the Job

Facebook’s publication of the HipHop transformer and runtime raised the question of whether PHP is really the right language for a web site that has scaled to massive volumes.   The same was asked of Twitter when parts of its Ruby code were rewritten in Scala.

Of course it is.

Proper language choice is based on “the best tool for the job.”  In the case of Facebook, Twitter, and many others like it, “the job” was to get a web site up fast and very quickly evolve it.  Without PHP, Facebook probably wouldn’t have been created.  Productivity, web-orientation, innovation, and even fun trumped raw performance, so PHP and Ruby won the day for those two sites.  No-one would code a web site in C++, just as no-one would code a real-time system in PHP.

If productivity and agility are your primary concerns, pick a language and platform that makes it fast and fun.  I recommend a dynamic, object-oriented language.  If it’s a web app, consider Seaside.  If productivity isn’t important but commodity skill sets and ubiquity are your primary concerns, you might choose Java: the COBOL of the 21st century.  Just don’t be a lemming, letting TPCI ranking become a deciding factor.

And don’t make the decision based on hearsay and perceptions of performance.  Today’s performance tuning and scaling tools (profilers, monitors, load balancers, and even static analyzers) make easy to identify the true bottlenecks and fix those, leaving the non-time-critical majority of the system as is.  Speculative performance tuning can be counter-productive, so if it ain’t broke, don’t fix it.

HipHop is a welcome addition to the PHP universe: the ease and productivity of PHP with the speed of C++.  Hopefully it will mean that PHP developers no longer have the specter of scalability issues hanging over them.  Now, if we could just get some overdue cleanup and deprecation (eliminating redundancy in favor of cleaner object-oriented design), life would be grand.  But that’s another story.

I’d like to try out HipHop, but frankly, I don’t need to.  With the possible exception of a CiviCRM site that I pushed to the limit, my PHP sites just don’t need additional horsepower, and I certainly don’t have to worry about reducing CPU utilization on a few thousand servers.  Obscurity isn’t always a bad thing.

Reduce, Reuse, Recycle

“To innovate is a mistake ’cause there’s nothing new under the sun.”
Wit’s All Been Done Before, Relient K

I threw out a lot of code today.  It felt good.

I recently started coding some new components for a new (to me) system.  Part of the work required generating XML with well-defined XSDs.  I started with the available DOM framework which, unfortunately, did not support XSDs, only DTDs.  So I had cobbled together something of my own to try to drive the XML generation from it.  It just didn’t feel right.

Shortly afterward, I taught myself how to use the system’s very flexible and dynamic export mapping framework.  It was built primarily for fixed record formats, not XML, but could be coaxed into XML structures.  For example, the mapping mechanism supported hierarchies (tree structures) and higher-level unmapped tags, which were perfect for XML envelopes and the like.  So, with a bit of recursive code and metadata, I got exactly the XML I needed with a wee bit of code.  And I tossed aside a lot of really klunky code I no longer needed.

And so it goes with programming: read twice, write once.  It often takes longer to first learn the lay of the land  before blazing your own trail, and it may appear less productive.  But significantly less code to do the same task is always a good thing.

Semi-automatic

I recently wrote about using DB2 9.x’s Self Tuning Memory Manager (STMM) to automatically configure memory sizes.  If you’re running DB2 9.5 and 9.7, automatic configuration is available for a long list of parameters.  But what if you have to support 9.1, before several settings went automatic?

This question came to me yesterday from a co-worker who was getting database heap “out of memory” errors.  This was under DB2 9.1, where automatic wasn’t yet allowed for dbheap.  As I wrote earlier, several memory areas (like the log buffers) come out of dbheap, so if you increase one of these or set one to automatic, the dbheap needs to grow to accommodate it.  If you have to support 9.1, an automatic dbheap is not an option, so you may have to find a “high enough” fixed ceiling the hard way: trial and error.

You can get DB2 to help you come up with a number, a couple of ways:

  • Run autoconfigure.  Use “apply none”; for example, “db2 autoconfigure apply none”.  See what it recommends.
  • Run the system for awhile (under volume) on a DB2 9.7 install, with STMM enabled and dbheap set to automatic.  Then run “db2 get db cfg show detail” and see what it computed; it’ll be in parentheses after the word automatic.

These same approaches can also be used to determine good optional initial sizes to use along with  automatic settings.  STMM remembers and stores the results of prior self-tuning activities, so once your database has been running awhile under typical volumes (for at least a few 3 minute tuning cycles), the initial sizes don’t matter much.  But, in some cases, you may want to provide initial values so that a new database install is ready to bolt into action.

Make an Object for That

“I wanted nothing else than to make the object as perfect as possible.”
Erno Rubik

I recently enhanced one of my search functions to support additional criteria.  It was originally a very modest search, accepting only an account number and amount.  But, over time, search fields and flags were added and the primary method grew to accepting a very long parameter list, and then to receiving a tightly-coupled dictionary of search keys and their values.  I looked at the method and thought, “what idiot wrote this thing?”  That’d be me.

It’s an easy trap to fall into.  Layering new functions atop an existing O-O system requires ongoing diligence to refactor and maintain clean design patterns.  The temptations are many; for example, standard objects like those workhorse collections can lure one away from crafting valuable new custom classes.  But there are defenses, such as a good library of xUnit test cases to support refactoring.  Good design and test-driven development pay dividends long after the original code is written.

In this case, a command object did the trick.  Have a findPersons* method that takes far too many parameters?  Create a PersonSearch or PersonQuery command/value object to house them.

There are many benefits to this besides cleaner code.  xUnit test cases usually come easier and are less brittle.  You can better factor behaviors such as validation and conversion onto the command object itself.  This can improve reuse across the client and server.

A basic security requirement for any rich web or client/server system is that validation occurs both at the client and at the server.  If the command object is commutable, this means you write the validation code once, to be used on both sides.  This works well with Google Web Toolkit (GWT), Server Smalltalk (SST), and similar frameworks.

This sent me on a brief witch hunt; for example, I searched for related methods with too many parameters: MyClass methodDictionary do: [ :ea | ea parameterCount > 3 ifTrue: [ Transcript cr; show: ea printString ]].  I found and fixed a few, and toyed with a Smalllint rule for it.   It never seems to end with the perfect object, but often much closer.

The Mute Button

Today, Craig Mullins and others took Larry Ellison to task about his sweeping, unfounded claims about Oracle vs. DB2.  Ellison’s comments often invite easy parody (remember the “Fake Larry Ellison”s?).  Apart from that, such remarks aren’t valuable and can safely be ignored.

Both Oracle and DB2 are solid databases and the ongoing TPC-C competition is good for both products.  But stick to facts and evidence; this is not a topic for uninformed, biased opinions from 50,000 feet.  See Craig’s post for some facts.  And Craig refers to our friend, the Viewpointe national check image archive.

Larry did comment on something that he knows and does have direct control over.  Recent history and guidance aside, Ellison said that, rather than laying off any Sun employees, he will be hiring 2,000 more in the next few months.  Great news!  We’ll be watching.

That’s Us

Carly Simon aside, many folks often think a particular Dilbert strip “must be about them.”  Thanks to Scott Adams’ policy of freely inviting ideas from readers, sometimes it truly is.  Today’s strip is about our prior CEO’s email offer of employee discounts on Quivira wines.

I have no idea who sent it in; someone probably simply forwarded the email.

Taking Inventory

Some questions came in today from a customer’s conversion programmer who was exploring uses of various related coded columns in the system’s DB2 tables.  In this case, the quickest way was to “dive in” and run some simple count and group by queries.  For example,

select mycodecolumn1, mycodecolumn2, count(*) from mytable group by mycodecolumn1, mycodecolumn2 with ur

Sometimes it’s helpful to rollup or cube these to get sub-totals:

select mycodecolumn1, mycodecolumn2, count(*) from mytable group by rollup(mycodecolumn1, mycodecolumn2) with ur

A common problem with DB2’s standard cube implementation is that it’s hard to distinguish null values from the summary (“all”) values.  If you don’t want to see null values, simply add that to the where clause:

select mycodecolumn1, mycodecolumn2, count(*) from mytable where mycodecolumn1 is not null and mycodecolumn2 is not null group by rollup(mycodecolumn1, mycodecolumn2) with ur

If you do want to see null values, use the grouping function to distinguish:

select mycodecolumn1, mycodecolumn2, grouping(mycodecolumn1), grouping(mycodecolumn2), count(*) from mytable group by rollup(mycodecolumn1, mycodecolumn2) with ur

Zero (0) means it’s the actual column value (null or otherwise); one (1) means it’s a summary row.

Translating those coded values into meaningful descriptions is often helpful.  This can be done by joining to lookup tables (those tables that map codes to descriptions), or by using case statements.  But there are other, quicker ways.  More on that later.