Monthly Archives: April 2010

The Friday Fragment

It’s Friday, and time again for a new Fragment: my weekly programming-related puzzle.

This Week’s Fragment

This week’s fragment is programming-related (by a stretch), and is borrowed from a recent Car Talk episode.  A listener there wrote how he didn’t get a programming job partly because he failed to solve the following interview puzzle within the allotted five minutes:

You have a four ounce glass, a nine ounce glass and an unlimited supply of water, which can be poured and dumped as much as needed.  Can you measure exactly six ounces of water in as few steps as possible?

If you want to “play along”, post the solution as a comment or send it via email.   To avoid “spoilers”, simply don’t expand comments for this post.  It’s your chance to demonstrate your qualifications for a job as a programmer.  Or at least a soda jerk.

Last Week’s Fragment – Solution

Last week was another SQL challenge, where IRS programmer Owen Moore had trouble joining his Employee table to the EmployeeAddress table.  When he ran his SQL, he found that it dropped some of the employees: those who did not have addresses on file.  He doesn’t mind having multiple rows for one employee (whenever there are multiple addresses), but the IRS never wants to overlook a taxpayer.  His SQL was:

select companyid, e.employeeid, hiredate, address
from employee e, employeeaddress ea
where e.employeeid = ea.employeeid
order by companyid, hiredate

Owen’s bane was that, by default, SQL joins are inner joins, meaning that results are included only if there are matching rows in both tables.  Owen needs an outer join, so that it includes all the rows from the Employee table, even if there aren’t matching rows in the EmployeeAddress table (the EmployeeAddress column values will be null were the rows are missing).  Outer joins can be left joins or right joins, depending on the order you list the tables.  Owen reads and thinks left-to-right, so he’ll list the Employee table first and use a left join, like so:

select companyid, e.employeeid, hiredate, address
from employee e
left join employeeaddress ea
on e.employeeid = ea.employeeid

Congratulations to Spencer for quickly spotting the problem and proposing the solution.  For that, he gets an extra hour to file his quarterly estimated tax payments.

If you’ve been following along, you know this was part of a larger SQL for Owen’s “pink slip pick list” report which shows IDs and addresses for all but the 49 most senior employees in each company.  The full SQL with the left join is now:

select companyid, e.employeeid, hiredate, address
from employee e
left join employeeaddress ea
on e.employeeid = ea.employeeid
where
not exists
 (select *
 from employee ei
 where ei.companyid = e.companyid
 and ei.employeeid = e.employeeid
 and 49 >
    (select count(*) from employee eii
    where eii.companyid = ei.companyid and
    eii.hiredate < ei.hiredate))

Simon Says

There was a bit more dialog today about impersonating the DB2 instance owner.  It’s a quick way to get around controls that newer versions of DB2 and tighter Windows and network security have brought us.  The extra step is annoying, but trying to convince the system you don’t need it is often worse.

Impersonation and elevation have become the “new normal” these days.  I’ve grown so accustomed to opening “run as administrator” shells in UAC Windows (7/Vista/2008), typing runas commands in XP, and using sudo in Ubuntu that these have become second nature.  And that level of user acceptance usually translates into approval to expand the practice, rather than a mandate to remove the inconvenience.  Enhancing security usually includes putting up new barriers.

A former co-worker has often said that what we really need is software that determines whether a user’s intentions are honorable.  Perhaps then security would become seamless.  But it’s more likely that its implementation would also test our manners and fading patience.

The Friday Fragment

It’s Friday, and time again for a new Fragment: my weekly programming-related puzzle.

This Week’s Fragment

Owen Moore at the IRS needs our help again.  He wants to add mailing addresses to his “pink slip pick list” report.  Easy enough: he just added in the EmployeeAddress table to get these fields.  But when he ran it, he found that it dropped some of the employees.  He doesn’t mind having multiple rows for one employee (whenever there are multiple addresses), but the IRS never wants to overlook a taxpayer.

Upon investigation, he discovered that the dropped ones didn’t have addresses on file. Here is his SQL:

select companyid, e.employeeid, hiredate, address
from employee e, employeeaddress ea
where e.employeeid = ea.employeeid
and ...
order by companyid, hiredate

Can you help Owen correct his mistake?  The “…” is the rest of his where clause (see solution below).  Since it’s not important to the solution, it’s omitted here.

If you want to “play along”, post the solution as a comment or send it via email.  You can use the attached file for sample data.  To avoid “spoilers”, simply don’t expand comments for this post.  Owen promises to add you to the “do not audit” list if you can help.

Last Week’s Fragment – Solution

Last week’s fragment was missing from a SQL statement.  IRS programmer Owen Moore needed a report of all but the 49 most senior employees of each company.  That is, fill in the dot, dot, dots (ellipsis) here:

select  companyid, employeeid, hiredate
from employee e
where ...
order by companyid, hiredate

Fortunately, Owen had a couple strokes of luck.  First, during his lunch break, the fast food drive-through attendant asked, “would you like a correlated subquery with that?”  Not knowing what such a thing was, he said “yes”, and it turned out to be just the thing he needed.  Second, upon arriving back at work, he was greeted with a couple of emails (including one from Spencer) suggesting a SQL like the following:

select companyid, employeeid, hiredate
from employee e
where 49 >
  (select count(*) from employee ei
   where ei.companyid = e.companyid and
   ei.hiredate > e.hiredate)
order by companyid, hiredate

That got him in the ballpark, but, alas, there was a gap:  it revealed the 49 newest employees (a good thing), but not all but the 49 oldest.  Well, Owen pulled up Google Translate and found that “all but” in English loosely translates to “not exists” in SQL.  So he wrapped an outer select around it, flipped the greater than sign (to exclude older ones), and came up with the following:

select companyid, employeeid, hiredate
from employee e
where not exists
  (select *
   from employee ei
   where ei.companyid = e.companyid
   and ei.employeeid = e.employeeid
   and 49 >
     (select count(*) from employee eii
      where eii.companyid = ei.companyid and
      eii.hiredate < ei.hiredate))

By the way, the “i” suffixes on table aliases mean “inner” (so “ei” is “employee inner” and “eii” is “employee inner-inner”), just a convention.

Owen has a “make it right, then make it fast” mentality, so he’ll consider tuning/rewriting later if performance is bad.  But if you’re interested in tuning it, he attached a script to create the table, load some sample data, and run the SQL.  This script also has SQLs and data to work on this week’s fragment.

Guilt By Association

Anyone who has done a little data mining knows that simple association rules (a.k.a., market basket analysis) and decision trees can reveal some of the most strange and wondrous things.  Often the results are intuitive, which builds confidence in the techniques.  But then let it run loose and you’ll usually find some (strongly correlated) wild surprises.

Folks who fold their underwear tend to make their bed daily.  I’ll buy that.  But people who like The Count on Sesame Street tend to support legalizing marijuana – are you kidding?

Those are some of the conclusions reached at hunch.com.  This site will happily make recommendations for you on all your life decisions, big or small.  There’s no real wisdom here – it just collects data and mines it to build decision trees.  So, as with most data mining, the results are based on pragmatics and association, and they never answer the question, “why?”  Yet “just because” is usually good enough for things like marketing, politics, and all your important life decisions.

In school they made me work through many of these data mining algorithms by hand: classifiers, associations, clusters, and nets using Apriori, OneR, Bayes, PRISM, k-means, and the like.  When it got too rote, we could use tools like Weka and DMX SQL extensions.  It was, of course, somewhat time-consuming and pedantic, but it made me realize that most of these “complex data mining techniques” that seem to mystify folks are actually quite simple.  The real value is in the data itself, and having it stored in such a way that it can be easily sliced and diced into countless permutations.  (NoSQL fans: that typically means a relational database.  Oh the horror.)

Yet simple associations can be valuable and entertaining.  I’ve run enough DMX and SQLs against large database tables (housing contact management, payment, and contribution data) to find some surprising ways to “predict” things like risk and likely contributors.  But since “past performance is no guarantee of future results”, these outputs must be used carefully.  It’s one thing to use them to lay out products in a store, quite another to deny credit or insurance coverage.

American Express, Visa, and others have caught a some flack lately for their overuse of these results.  “OK, so I bought something from PleaseRipMeOff.com and you’ve found that other cardholders who shop there have trouble paying their bills.  But that doesn’t mean I won’t pay my bill!  Don’t associate me with those guys!”  Well, associate is what data mining does best.  And, like actuarial science, it’s surprisingly accurate: numbers don’t lie.  But companies must acknowledge and accommodate exceptions to the rules.

Meanwhile, data mining will continue to turn wheels of business, so get used to it.  Just don’t let anyone know that you like The Count.

The Friday Fragment

It’s Friday, and time again for a new Fragment: my weekly programming-related puzzle.

This Week’s Fragment

This week’s puzzle is a new type of challenge: SQL coding.  And since taxes and health care reform are all the buzz this April 16, that will be our backdrop.

Owen Moore, fearless programmer for the IRS, would like to help small businesses cope with certain health care reform provisions.  In particular, he’ll provide free reports to companies showing who to fire to get below the 50-employee limit by 2014.  Fortunately, he has an Employee database table populated from recent tax filings.  It has, among other fields, EmployeeID (SSN, char(9)), CompanyID (EIN, char(9)), and HireDate (a date).  Since last-in-first-out seems reasonable, his “pink slip pick list” will show all but the 49 most senior employees in each company.  That is,

select companyid, employeeid, hiredate
from employee e
where …
order by companyid, hiredate

Can you provide the missing fragment (fill in the “…”) to complete Owen’s SQL?

If you want to “play along”, post the SQL as a comment or send it via email.  To avoid “spoilers”, simply don’t expand comments for this post.  Owen will be owing you a big favor.

Last Week’s Fragment – Solution

Last week’s puzzle required helping Mark Duke recover his forgotten Linux password.  We had his /etc/shadow entry and knew that he was a “roller”: someone whose password is simply his user ID followed by a number (two digits in this case).

The /etc/shadow password format is well documented; even Wikipedia covers it.  The “$6” at the beginning indicates that it uses SHA-512 hashing with the salt following.  The unix crypt(3) function makes easy work of this, especially since there are wrappers in just about every language imaginable.  I chose to code it in PHP:

for ($i=1; $i<99; $i++) {
   $attempt = sprintf("%s%02d", $userid, $i);
   if (crypt($attempt, $salt) == $password) {
      echo "Password is: " . $attempt;
      break;
   }
}

The password is markdupe42.  42: what else?

My son, Spencer, coded in Python, and posted his solution (see comments in last week’s post) shortly after he read the fragment.  Now you see what I’m up against: this guy can dump your LM hashes and dictionary-crack your Windows passwords in no time.

I commend readers for not just posting “sudo apt-get install john”.  Yes, there are tons of programs like John the Ripper for quickly cracking passwords, with no coding required.   This reinforces the need to choose strong passwords (likely not in any password dictionary) and use different passwords for different sites.  Frequently changing passwords is really no help, as we learned this week.

Authenticate This

I had a good lunch today with a friend who wanted to quickly set up simple (yet strong) authentication on a Tomcat web server using his own login page.  Since forms authentication is built into all J2EE web servers (and ASP .NET servers, for that matter), it’s quite easy.

In summary, the steps for Tomcat are:

  1. Add security-constraints to WEB-INF/web.xml to specify protected resources / folders.   Also include auth-constraints and security-roles for access.
  2. Create user and user role tables and configure the JDBC realm in server.xml.  Or, simply start with defining users directly in tomcat-users.xml; you can always add the database later.
  3. Create the login and login error JSPs, pointing to them from the login-config section of web.xml.  Remember to include the required form element names in the JSP (j_security_check, j_username, j_password, etc.).  Also note that these pages can’t use style sheets and other external files, so you have to (redundantly) embed style information directly into the JSP.

By default, all traffic (including login passwords) isn’t encrypted, so this should only be used with SSL/TLS encryption in place.  That means installing a digital certificate, which is also fairly easy.  That is:

  1. Purchase an SSL certificate.  For initial testing, you can create a self-signed cert using keytool, included with JSSE.
  2. Edit server.xml and enable the SSL (port 8443) connector.  The commands are already present, just un-comment them.
  3. If the server is local, re-start Tomcat, open your browser, and access your site using the https://localhost:8443 URL.  Look for the browser cues for a secure site: padlock icon, green or yellow address bar, etc.

You may eventually switch to more sophisticated methods, like integrating with external security systems for single sign on (e.g., using SAML).  But the simple steps above will get you going quickly with basic, unbreakable authentication.

Isolation Level Levels

I got a question today about whether CS (cursor stability) really was the default isolation level for dynamic SQLs from a DB2 CLI application.  The short answer is, “yes”, but that can be overridden at many different levels.  So many that I thought of Shrek: onions have layers, ogres have layers, DB2 has layers.  Here are just a few:

  • In db2cli.ini, using the TXNISOLATION keyword.  It does require work to get it to apply to your connections.
  • At the connection level, by sending “set current isolation”, or setting the SQL_TXN_ISOLATION connection attribute (SetConnectAttr API)
  • For static SQLs, in the access plan (the bind).  For the DB2 CLI packages, the package naming convention includes the expected isolation level, and you can verify with: select pkgname, isolation from syscat.packages where pkgname like ‘SYS%’
  • At the statement level, using the with keyword.

There are further settings to tweak the semantics of some isolation levels.  For example, DB2 9.7 offers things like the cur_commit db config parameter and db2_evaluncommitted registry variable to change CS behaviors.

So with all these knobs and overrides, it’d be nice if an application could query its actual effective isolation level.  Running “values(current isolation)” only returns a value if “set current isolation” is used.  Also, it would be nice if dynamic SQL snapshots included the isolation level, or if db2exfmt consistently showed the correct level.  Lacking this, you have to combine these outputs with db2pd results.  That’s a lot of layers to peel back, which could make even an ogre cry.

Back In Time

A common problem with CVS is that it doesn’t preserve file modification timestamps.  For normal source control, preserving or relying on these timestamps is neither needed nor recommended.  But there are plenty of cases where this is a legitimate requirement, such as storing non-source files used directly by customers.  Rough partial workarounds are possible in CVS (such as “import -d” and Entries file hacking), but these are incomplete, kludgy, and problematic.

So, a simple solution I used today is to generate a script that can put back the original timestamps:

find . -printf ‘touch -c -t %TY%Tm%Td%TH%TM “%p”\n’

I checked the generated script into CVS alongside the files so it can be re-run and updated as needed.

The Friday Fragment

Since it’s spring break (and I’ll be kayaking tomorrow), this week’s Friday Fragment comes to you a day early.

This Week’s Fragment

Mark Dupe has forgotten his Linux password.  But he has done two (insecure) things that should make it easy to hack his way back in: 1) his password is always his user name, followed by a two-digit number (he “rolls” it), and 2) he has a readable backup of his /etc/shadow file.  Here is the entry for his account:

markdupe:$6$mLo3uqlc$EM4O2X5zaUo.zWzZW2Ksz.IZyxKL6cz0DIjd.wJB3s.6uuKQ7K7/
3cSz3eEL3D0VXzgbmtxSlkeoZLGYMIp89/:14706:0:99999:7:::

Can you write code or a script to determine his password?

If you want to “play along”, post the password and/or code as a comment or send it via email.  To avoid “spoilers”, simply don’t expand comments for this post.  Mark will thank you, at least until someone else hacks into his account.

Last Week’s Fragment – Solution

Last week’s puzzle required correcting a check routing transit number (374088048) and/or credit card number (4754 2700 7476 257x) from a torn slip of paper handwritten by Pierre Lefebvre.  I received some excellent responses on this; see the comments for last week’s post.

As Joel Odom pointed out, you can plug in possible digits until you get a valid checksum.  Credit card numbers use a mod 10 checksum (the Luhn formula), where you double alternate digits, add the digits of the products, and then add these to the other digits.  The total should be evenly divisible by 10.

For 4754 2700 7476 257x (where x is the missing digit), the alternate “doubled” digits are:

(4*2), (5*2), (2*2), (0*2), (7*2), (7*2), (2*2), (7*2)
or:
8, 10, 4, 0, 14, 14, 4, 14

adding the digits of the products:
8 + 1+0 + 4 + 0 + 1+4 + 1+4 + 4 + 1+4 = 32
adding in the alternate (“un-doubled”) digits gives:
32 + 7 + 4 + 7 + 0 + 4 + 6 + 5 = 65

To make this an even multiple of 10, that last missing digit must be 5 (65 + 5 = 70), so the credit card number is 4754 2700 7476 2575.

The routing transit number was a bit trickier because there wasn’t a missing digit, and I didn’t specify exactly what was wrong.  But routing transit numbers follow strict rules; for example, the first two digits have to be within certain ranges that identify the Fed district, credit unions/thrifts, special types of checks, etc.  As Joe Richardson pointed out, Wikipedia is right on this one, and from their article you can determine that a valid RT can start with 3, but cannot start with 37.

To find the correct second digit (3x4088048), we use a routing/transit mod check.  It’s similar to the credit card algorithm, except that it uses weights: 3, 7, 1, repeating (many check old-timers know 37137137 by heart).  Wikipedia and IRS publication 1346 document this, among other places.  Here, too, when running a mod check against all the digits, the remainder must be evenly divisible by 10. So, we have:

((3*3) + (x*7) + (4*1) + (0*3) + (8*7) + (8*1) + (0*3) + (4*7) + (8*1)) % 10 = 0
or
((9 + 7x + 4 + 0 + 56 + 8 + 0 + 28 + 8 ) % 10 = 0
or
((7x + 113) % 10 = 0

Plugging in values for x, we see that

(((7 * 1) + 113) % 10 = 0

So the second digit should be 1, and the correct routing/transit is 314088048.

Kudos to Joe Richardson for correcting the routing/transit, and further determining that this belongs to the memorable Alamo Federal Credit Union.  His Rescue 5 product can correct these and other account problems automagically.  Joe also determined that the credit card number is a rebate card issued by MetaBank.  Right again: it’s an old rebate card of mine that I’ve used up.

As Joe pointed out, there are good implementations of both check digit algorithms readily available on the net, in almost every language imaginable.  Even Wikipedia has several, but be careful: their routing/transit code is wrong.  It seems Wikipedia is continuing their tradition of posting bad code; see another example I pointed out in an encryption research paper.

One big question remains: how did Inspector Lestrade solve this so quickly in his head?  Frankly, I think he was bluffing about the Visa number.  That calculation is simple, but usually does require pencil and paper to get it right.  We would all be very impressed, however, if he memorized all the routing/transit rules and could do weighted mod checks in his head.

Well, Lestrade had an advantage: he recognized from the name (Pierre Lefebvre) and handwriting that the second digit was a 1, not a 7.  In handwritten French, ones (1s) look like sevens (7s), which is why the French put an extra line through sevens to distinguish them.  From the Sherlock Holmes stories, it’s questionable whether Lestrade even understands the French language, but he certainly would have recognized the handwriting.

The Last Step

I finally finished my bridge over the creek (which, lumber-wise, was closer to a 28′ x 5′ deck).  As promised, here’s the final photo.

Since I started it New Year’s weekend, this project wasn’t exactly a model of speed.  The day job, weather, soccer, family, and other activities often chopped up my work into short, sporadic intervals.  But it was a nice project.

BTW, it passed Tina’s “level test,” although a few other things bug me.  Oh well, “it’s a bridge in the woods.”

Maybe It’s Just a Fool’s Errand

I’ve been whining for some time about how we need “a unified blend of stateless operation, classic resource control, and higher-level concurrency semantics.”  That is, we need some way of making all those parallel programming building blocks easily fit in programmers’ heads: functional programming, lambdas, closures, futures, async callouts, delegates, TBB, and, if we must, semaphores and threads.  That would go a long way toward more widespread utilization of all those cores we’re now getting.

Larry O’Brien has answered the call, at least for futures. And it follows the metaphor of our highly-leveraged financial system, so even I can understand it (understand: yes, trust: no).  Bummer that scarcity really does exist in economics and CPU cycles, and this is just an April Fools’ spoof.

Oh well, at least we can continue to hope, while laughing (or whining) about it.

The Friday Fragment

It’s Friday, and time again for a new Fragment: my weekly programming-related puzzle.

This Week’s Fragment

“I knew I couldn’t trust that guy!”, exclaimed Joe the Plumber, having just returned from selling left-handed monkey wrenches at the pipe fitters’ convention.  “I sold a wrench to this guy, Pierre Lefebvre, who didn’t have enough cash, nor have his checkbook or credit card with him.  So I had him write his account information for both on a sheet of paper and give it to me.  The checking account information is bad (my bank says that the routing number, 374088048, is invalid), and the paper is torn such that I don’t have the last digit of the credit card number, just 4754 2700 7476 257.”

Inspector Lestrade took one look at the paper and responded, “I can tell you the correct checking number or credit card number, or both.  Which would you like?”

How did Lestrade do it?  Can you determine the correct numbers?

If you want to “play along”, post a solution as a comment or send it via email.  To avoid “spoilers”, simply don’t expand comments for this post.  For an extra challenge, try writing code to correct the numbers or at least prove that the numbers you chose were valid.  Maybe one day a plumber will thank you.

Last Week’s Fragment – Solution

Last week’s puzzle was “borrowed” from a recent Car Talk puzzler; here’s the gist:

You have 13 bottles of wine, and have been tipped off that one of them is tainted with a deadly poison: so deadly that one drop can kill a person or large animal within 24 hours.  With the help of four lab rats, how can you determine which one contains the fatal potion within one day?

The solution requires giving some rats wine from multiple bottles, and since the poison is so deadly, a little dab’ll do ya’ (or do in a rat).  Joel proposed a “divide and conquer” or “binary search” approach, which is nice and elegant and does the job (it’s also the solution I first thought of when I heard this on Car Talk).  But there’s only time for one round of testing, so you’ll have to throw all four rats at the problem at once.

Thinking in binary is the right approach.  Number the bottles 1-13 and represent each number in binary: 0001 through 1101.  Assign each of the four rats to one of the binary “positions”; let’s call them “A” in the one’s position, “B” in the two’s position, “C” in the four’s position, and “D” in the eight’s position.  And dole out the wine based on the binary representation of each bottle number.  For example, bottle number 1 goes to rat A, bottle 2 to rat B, bottle 3 (binary 0011) to rats A and B, and so on up to bottle 13 (binary 1101) given to rats A, C, and D.  The rats that kill over will identify the bottle number.

Wayne posted this solution and should probably get a bottle of poisoned Quivira wine as a prize.  Both Wayne and Joel noted that we could take on up to 16 bottles of wine with this solution, but I went with Car Talk’s lucky thirteen to keep it interesting.

That No HLL Thing

Probably the worst thing about high level languages (HLL) is that they are so good in what they are doing. Good enough to conquer the entire market on programming and hold it for decades.

Wait! That is a bad thing? How?

It is a bad thing because high level languages are appropriate for a wide range of tasks, but not for every task. Yet it is exactly that that caused them to be used in contexts where they are not appropriate. In the last month alone, my strong recommendation for two different clients was that they need to switch to assembly language because it would greatly simplify the work that they need to do.

That met with some (justified) resistance, predictably. People think that high level languages are the way to write software. I decided to write a series of blog posts about the topic, trying to explain why you might want to work with a low level language.

High level languages have the following properties:

* Standardization
* Simplification
* Optimization
* Ease of trainification

Just about any of the No HLL approaches give up on some of those properties., usually, it gives up on all of those properties. But think about how useful a HLL is, how flexible it can be. Why give it up?

Indeed, the most common reason to want to move from a high level language is running into the HLL limitations. In short, high level languages don’t scale. Actually, let me phrase that a little more strongly, high level languages cannot be made to scale.

The problem is inherit into the basic requirements of a high level language, it must be compiled, to handle things like converting text to machine language, etc. The problem, however, is that trying to scale a high level language system over a set of coding styles. At that point, you run head on into the Abstraction Penalty, which state that if performance is your absolute requirement, you need to give up on either optimization or ease of coding.

In most high scaling environments, it is not feasible to give up on either option, so high level languages are out. That leaves you with the No HLL options, I am going to dedicate a post for each of the following, let me know if I missed something:

* MASM
* Gas
* Typing machine code into a hex editor
* Typing machine code into notepad or vi

Other languages, namely PL/S and inline assembly, exists. PL/S suffers from the same problem regarding the Abstraction Penalty as high level languages, and inline assembly is basically a special case of No HLL.

But seriously now… this post is, of course, an April Fools Day joke: a spoof on (and transliteration of) That No SQL Thing at Ayende.com.  (You can follow the link to the post, or just Topeka it).  Oren and other NoSQL advocates do make some good points and offer very good advice.  But be careful out there, especially when it comes to such architecture decisions, so that you pick the best tool for the job: SQL or NoSQL.  And don’t get fooled again.