Tag Archives: Data Mining

Market Games

Recent record highs have focused a lot of attention on the stock market.  The broad market rise is largely due to Fed actions (quantitative easing and a near zero discount window), creating lots of excess cash and nowhere else good to put it. It’s a risky solution that props up markets while inflation is delayed.

But what about individual stocks? In this rising tide market that can gloss over things, how do you better discern individual winners? Of course, company metrics (fundamentals, earnings, balance sheets, etc.) and movers (news and innovation) are the mainstays. Technical analysis can be helpful, but that tends to focus on surface effects. Can big data look behind the scenes?

Just as tons of consumer market data now drive product marketing decisions, the wealth of available corporate stats increasingly influence stock buy and sell decisions, sometimes to a fault.  In this data mining era, we’re much better at correlation than causation, but that’s often good enough.

The individual investor is perhaps the only truly random walk (or uncertain walk) left in the stock market. Since prices are most influenced by large holders and program trades, movements can be partly predicted by comprehensive mathematical models on the big players and their trading strategies. With enough data and processing power, it’s possible to run rich behavior models in predictive mixed strategy games to forecast prices and actions. There’s been some interesting research in this area, and I think we’ll see more. At least while the current bubble continues to grow.

Guilt By Association

Anyone who has done a little data mining knows that simple association rules (a.k.a., market basket analysis) and decision trees can reveal some of the most strange and wondrous things.  Often the results are intuitive, which builds confidence in the techniques.  But then let it run loose and you’ll usually find some (strongly correlated) wild surprises.

Folks who fold their underwear tend to make their bed daily.  I’ll buy that.  But people who like The Count on Sesame Street tend to support legalizing marijuana – are you kidding?

Those are some of the conclusions reached at hunch.com.  This site will happily make recommendations for you on all your life decisions, big or small.  There’s no real wisdom here – it just collects data and mines it to build decision trees.  So, as with most data mining, the results are based on pragmatics and association, and they never answer the question, “why?”  Yet “just because” is usually good enough for things like marketing, politics, and all your important life decisions.

In school they made me work through many of these data mining algorithms by hand: classifiers, associations, clusters, and nets using Apriori, OneR, Bayes, PRISM, k-means, and the like.  When it got too rote, we could use tools like Weka and DMX SQL extensions.  It was, of course, somewhat time-consuming and pedantic, but it made me realize that most of these “complex data mining techniques” that seem to mystify folks are actually quite simple.  The real value is in the data itself, and having it stored in such a way that it can be easily sliced and diced into countless permutations.  (NoSQL fans: that typically means a relational database.  Oh the horror.)

Yet simple associations can be valuable and entertaining.  I’ve run enough DMX and SQLs against large database tables (housing contact management, payment, and contribution data) to find some surprising ways to “predict” things like risk and likely contributors.  But since “past performance is no guarantee of future results”, these outputs must be used carefully.  It’s one thing to use them to lay out products in a store, quite another to deny credit or insurance coverage.

American Express, Visa, and others have caught a some flack lately for their overuse of these results.  “OK, so I bought something from PleaseRipMeOff.com and you’ve found that other cardholders who shop there have trouble paying their bills.  But that doesn’t mean I won’t pay my bill!  Don’t associate me with those guys!”  Well, associate is what data mining does best.  And, like actuarial science, it’s surprisingly accurate: numbers don’t lie.  But companies must acknowledge and accommodate exceptions to the rules.

Meanwhile, data mining will continue to turn wheels of business, so get used to it.  Just don’t let anyone know that you like The Count.