Tag Archives: Parallel Programming

Node.js

A lingering sense of abandonment dating back to Netscape’s LiveWire leaves me leery of server-side JavaScript (SSJS).  Not that it’s a bad idea, it just didn’t gain traction.  JavaScript as a language is great but, by design, it lacks built-in I/O support and other things required for writing servers.  And with no standard platform to fill these holes, JavaScript remained relegated to sandboxes, mainly browers and system extensions.

That’s all been changing as SSJS proponents have rallied around Node.js, a toolkit for easily building scalable event-driven servers running atop Google’s V8 JavaScript Engine.  It was easy to ignore Node’s initial hype, but when I found myself needing to serve up some new REST APIs, I thought I’d check it out.  Besides, my AJAX calls can point anywhere, so there was really no risk in trying.  If it didn’t work out, I could always abandon Node and fall back to something more conventional, maybe even EventMachine.

Getting started was quick and easy.  Cloning and building takes only a few minutes; even faster, binaries are just a sudo apt-get install away.  There are many online tutorials, so I worked through the obligatory web server and other examples.  There’s not a lot going on in my services, so I couldn’t factor them into many event-driven/parallel fragments.  But I quickly got a feel for doing things the Node way: spinning off handlers, writing callbacks, and even running multiple node processes.  With help from Node’s non-blocking libraries, that single-threaded event loop wasn’t a bottleneck.

It didn’t take long to complete my work, so I quickly got to the question of deployment.  Sure, I have VMs out there where I can run whatever I want, but I needed this running under my jailed hosting service.  The awesome Heroku service now supports Node, but I stopped short of rolling it out there.  So I’m keeping it local now as I build out the other pieces.

Node has come a long way fast and appears to have a bright future.  It has a rich and growing ecosystem of basic modules and frameworks for web apps/MVC, content management, blogs, comet, etc.  Time will tell how pervasive Node becomes, but for shops wanting JavaScript all the way down, it’s a very nice platform choice.

The Friday Fragment

It’s Friday, and time again for the Friday Fragment: our weekly programming-related puzzle.

This Week’s Fragment?

There’s no new fragment for this week.  The Friday Fragment will take a break while a couple of interesting projects keep me busy.  But somehow, some way, some Friday, it’ll be back to remind us all what just a fragment of code can accomplish.

Last Week’s Fragment – Solution

Last week’s challenge was to outsmart a call center math PhD who was just trying to get rid of me for awhile:

Write code to do ten threaded runs of the basic Leibniz formula to calculate pi to nine correct digits (one billion iterations) in significantly less time than ten successive runs.

I offered my single-threaded Ruby implementation as a starting point.

Threading this in Ruby was quick: just change the single-threaded (successive) calls to this:

     threads = []
     10.times do
       threads << Thread.new do
         puts Pi.new(1000000000).calculate
       end
     end
     threads.each { |thr| thr.join }

The updated Pi class is here.

I added some timing code and compared this to the single-threaded version, both running on my quad core machine.  A typical single-threaded run under the standard VM took over 166 minutes.

As I wrote before, multi-threaded code doesn’t buy much when run under the standard (green threads) VM. It’s easy to see why – it doesn’t keep my four CPUs busy:

And so it actually ran slower than the single-threaded version: 185.9 minutes.

To get the real benefits of threading, we have to run in a native threads environment.  The threaded version running under JRuby kept all four CPUs busy until complete:

This completed in 23.4 minutes, leaving me time to call back before the end of my lunch break.

This simple fragment does a nice job of demonstrating the benefits of native threads and multi-core machines in general, and JRuby in particular.

Threadspeed

Among all the measures used to compare languages and platforms, parallel processing doesn’t often top the list.  True, in most programming tasks, multiprocessing is either handled under the hood or is unnecessary.  Machines are usually fast enough that the user isn’t waiting, or the thing being waited on (such as I/O) is external, so asynchronous call-outs will do just fine.

But occasionally you really have to beat the clock, and when that happens, good multi-thread support becomes a must-have.  Often in this case, green threads don’t go far enough: you really need native OS threads to make the most of those CPU cores just sitting there.

Such was the case last night with a data load task.  This code of mine does its share of I/O, but also significant data munging over some large data sets.  Running it against a much smaller subset of the data required nearly 279 seconds: not a good thing.  Fortunately, I had coded it in Ruby, so splitting up the work across multiple worker threads was fairly easy.  Elapsed run time of the threaded version: 294 seconds.  Oops, wrong direction.

The standard Ruby runtime uses green threads, and I could see from Task Manager that only one CPU was busy. So my guess is all threading did was add internal task switch latency to a single-threaded, CPU-intensive process.

But JRuby now uses native threads, so I downloaded the latest version and ran my code (unmodified!) against it.  Elapsed time: 20 seconds, keeping all 4 CPUs busy during that time.  Just wow. Since that’s far better than a 4x improvement, I suspect JRuby is generally faster across the board, not just for threading.

I believe dynamic languages are superior to static ones, but too many dynamic language environments lack robust native thread support.  Fortunately, JRuby has native threads, giving a quick path to parallel programming and threadspeed when it’s needed.

Go For It

I’ve been exploring new concurrent programming approaches lately, and decided to give Google’s new Go programming language a try.  It’s fast, straightforward, and a good fit wherever you might use C or C++,  such as systems programming.  It borrows a lot from C (including syntax, pointers, and the gcc compiler), but adds significant improvements, including garbage collection and its own flavor of coroutines called goroutines.  “Goroutines” is certainly clever, but I’ll never understand why Google gave its creation the decidedly ungoogleable name of “go” (hence the alternate name, golang: so you can find it).

To get started, I had to check out sources from mercurial and build them, but the instructions were clear and complete (“cookbook”, as we say).  They were written for Linux (complete with sudo apt-get installs for gcc, bison, and other pre-reqs), so I used Ubuntu.  I suppose it’s possible to develop under Windows with Cygwin, but I didn’t try it.

The emphasis on clean, lightweight, and terse code is clear throughout.  For example, it takes a simple approach to common OO mainstays: it uses structs, types, and interfaces instead of classes and inheritance (like C#, structs can have methods); new and make allocators with automatic reference counting; and a built-in map syntax instead of dictionary or hash table classes.  It provides syntactic sugar for common shorthands, such as := to declare and initialize a variable in one step, permission to omit semicolons and parentheses, and code-reducing conveniences like returning multiple values from a function.  It uses Java-style packages and imports for organizing code and supports Python-inspired slices for array references.

Goroutines are surprisingly easy to use.  Any function call can be invoked as a goroutine (forked) simply by prefixing the call with the keyword go.  The function itself can be inner, like a block or anonymous function.  Channels provide the communication and rendezvous vehicle (like a future and queue) via a unique and simple syntax.

There’s a lot missing from the language (much of that deliberate, like classes and inheritance), and it lacks maturity and production readiness.  But it’s easy to appreciate how its primary inventions (goroutines, channels, maps, slices, return lists, and implied interfaces) work together to form a clean, cohesive toolbox for fast concurrent programming.  While I don’t foresee using go for “real work”, I hope to see it influence other environments which often make parallel programming harder than it has to be.

Maybe It’s Just a Fool’s Errand

I’ve been whining for some time about how we need “a unified blend of stateless operation, classic resource control, and higher-level concurrency semantics.”  That is, we need some way of making all those parallel programming building blocks easily fit in programmers’ heads: functional programming, lambdas, closures, futures, async callouts, delegates, TBB, and, if we must, semaphores and threads.  That would go a long way toward more widespread utilization of all those cores we’re now getting.

Larry O’Brien has answered the call, at least for futures. And it follows the metaphor of our highly-leveraged financial system, so even I can understand it (understand: yes, trust: no).  Bummer that scarcity really does exist in economics and CPU cycles, and this is just an April Fools’ spoof.

Oh well, at least we can continue to hope, while laughing (or whining) about it.