Tag Archives: concurrency

Threadspeed

Among all the measures used to compare languages and platforms, parallel processing doesn’t often top the list.  True, in most programming tasks, multiprocessing is either handled under the hood or is unnecessary.  Machines are usually fast enough that the user isn’t waiting, or the thing being waited on (such as I/O) is external, so asynchronous call-outs will do just fine.

But occasionally you really have to beat the clock, and when that happens, good multi-thread support becomes a must-have.  Often in this case, green threads don’t go far enough: you really need native OS threads to make the most of those CPU cores just sitting there.

Such was the case last night with a data load task.  This code of mine does its share of I/O, but also significant data munging over some large data sets.  Running it against a much smaller subset of the data required nearly 279 seconds: not a good thing.  Fortunately, I had coded it in Ruby, so splitting up the work across multiple worker threads was fairly easy.  Elapsed run time of the threaded version: 294 seconds.  Oops, wrong direction.

The standard Ruby runtime uses green threads, and I could see from Task Manager that only one CPU was busy. So my guess is all threading did was add internal task switch latency to a single-threaded, CPU-intensive process.

But JRuby now uses native threads, so I downloaded the latest version and ran my code (unmodified!) against it.  Elapsed time: 20 seconds, keeping all 4 CPUs busy during that time.  Just wow. Since that’s far better than a 4x improvement, I suspect JRuby is generally faster across the board, not just for threading.

I believe dynamic languages are superior to static ones, but too many dynamic language environments lack robust native thread support.  Fortunately, JRuby has native threads, giving a quick path to parallel programming and threadspeed when it’s needed.

Go For It

I’ve been exploring new concurrent programming approaches lately, and decided to give Google’s new Go programming language a try.  It’s fast, straightforward, and a good fit wherever you might use C or C++,  such as systems programming.  It borrows a lot from C (including syntax, pointers, and the gcc compiler), but adds significant improvements, including garbage collection and its own flavor of coroutines called goroutines.  “Goroutines” is certainly clever, but I’ll never understand why Google gave its creation the decidedly ungoogleable name of “go” (hence the alternate name, golang: so you can find it).

To get started, I had to check out sources from mercurial and build them, but the instructions were clear and complete (“cookbook”, as we say).  They were written for Linux (complete with sudo apt-get installs for gcc, bison, and other pre-reqs), so I used Ubuntu.  I suppose it’s possible to develop under Windows with Cygwin, but I didn’t try it.

The emphasis on clean, lightweight, and terse code is clear throughout.  For example, it takes a simple approach to common OO mainstays: it uses structs, types, and interfaces instead of classes and inheritance (like C#, structs can have methods); new and make allocators with automatic reference counting; and a built-in map syntax instead of dictionary or hash table classes.  It provides syntactic sugar for common shorthands, such as := to declare and initialize a variable in one step, permission to omit semicolons and parentheses, and code-reducing conveniences like returning multiple values from a function.  It uses Java-style packages and imports for organizing code and supports Python-inspired slices for array references.

Goroutines are surprisingly easy to use.  Any function call can be invoked as a goroutine (forked) simply by prefixing the call with the keyword go.  The function itself can be inner, like a block or anonymous function.  Channels provide the communication and rendezvous vehicle (like a future and queue) via a unique and simple syntax.

There’s a lot missing from the language (much of that deliberate, like classes and inheritance), and it lacks maturity and production readiness.  But it’s easy to appreciate how its primary inventions (goroutines, channels, maps, slices, return lists, and implied interfaces) work together to form a clean, cohesive toolbox for fast concurrent programming.  While I don’t foresee using go for “real work”, I hope to see it influence other environments which often make parallel programming harder than it has to be.

Maybe It’s Just a Fool’s Errand

I’ve been whining for some time about how we need “a unified blend of stateless operation, classic resource control, and higher-level concurrency semantics.”  That is, we need some way of making all those parallel programming building blocks easily fit in programmers’ heads: functional programming, lambdas, closures, futures, async callouts, delegates, TBB, and, if we must, semaphores and threads.  That would go a long way toward more widespread utilization of all those cores we’re now getting.

Larry O’Brien has answered the call, at least for futures. And it follows the metaphor of our highly-leveraged financial system, so even I can understand it (understand: yes, trust: no).  Bummer that scarcity really does exist in economics and CPU cycles, and this is just an April Fools’ spoof.

Oh well, at least we can continue to hope, while laughing (or whining) about it.

Thinking Inductively

It’s hard to argue with the massive scalability one can achieve with functional programming.  I’ve done just enough in Erlang, Scala, and Lisp/Scheme to appreciate the benefits.  Much of that “freedom to run” comes with stateless flows and stackless recursion.  But more than a change of language, it really requires a change of thinking.

The classic examples in functional programming languages often employ inductive techniques, since that’s where they shine and demonstrate elegance.  If you want to compute an exponent, prime, or Fibonacci series; stack a Hanoi tower; or even walk a tree (like an XML DOM), you can usually do it in a few lines of clever inductive code.  And with a functional language, poor recursive performance and stack overflows are typically not a problem (Scala being one exception).  If you’re not in a functional language and want to use the same algorithms without these risks, you’re left with the exercise of converting tail recursion to iteration.  That’s much like watching movies on your phone: it’s not hard, just much less interesting.

That’s why I dabble periodically in functional languages: to try to keep my waning inductive brain cells alive.  It sometimes pays off at work, too: I used induction just yesterday to simplify (and recurse) a clumsy and confusing method.  Inductive techniques are not the golden hammer that some would believe, but they do have their “best fit” use cases.

Given our current software crisis of how to go truly parallel (and keep all those cores we’re now getting busy), some would argue that stateless functional programming is where we should start.  Stateless is fine for tasks that can live in their own sandboxes and play with their own toys, but most solutions have to share with others.  We need a unified blend of stateless operation, classic resource control, and higher-level concurrency semantics (no threads or synchronized keywords please, that’s the assembly language of the 21st century).  More on this later.