Easy multi-threading with Groovy

I finished writing an ETL process today. I know, you’re so jealous. Actually it was pretty fun, it pulls in some cool data. Although it was satisfying to have it working, some quick calculations showed it was going to take, “um, way too long” (58 hours!).

List stuffToProcess = Stuff.findAllByProcessedIsNull()

stuffToProcess.each {stuff ->
    try {
        Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
        importService.storeTheStuff(data, stuff)
    } catch(Exception e) {...}
}

enter the Groovy Parallel features to the rescue.

List stuffToProcess = Stuff.findAllByProcessedIsNull()

GParsPool.withPool(64) {
   stuffToProcess.eachParallel {stuff ->
      try {
         Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
         importService.storeTheStuff(data, stuff)
      } catch(Exception e) {...}
   }
}

Two lines of code and it’s 10 times faster!

Continue reading

Less is more for Machine Learning inputs

I came across some good advice on maintenance of Machine Learning algorithms. The short version, Less is More when you are deciding what data to feed into your algorithm.

It also reminded me of some systems issues Jaron Lanier pointed out in a recent interview.

  • Over time will your model be primarily processing its own predictions?The Netflix movie recommendation engine has narrowed its available inputs because most users only choose between the top predictions.
  • Will your prediction model cannibalize its source of data?Machine learning powered language translation is automating the work of many human translators, whose previous work provides the input needed by the machine learning algorithms.

Never, ever do anything important in Windows

Just learned a great Microsoft Windows lesson.

There is a database program I like that I use in a Windows VM.  I was in the middle of  executing some commands when: BAM, windows update and restart.

What brilliant design Microsoft, of course the update should just suddenly take control mid-keystroke.

Those commands I was in the middle of could have been something critical. For instance when switching over from one system to another, the last step is often to quickly run a couple of commands to execute the changeover so that there is just a second or two of downtime.

“Ok, tell the request router to stop sending millions of users to the old system.”
“Now, tell it to point at the new… <WINDOWS UPDATE>”

So, never, ever do anything important in Windows.  And change your settings to not automatically apply updates.

Brook’s law, why software engineering is not programming

Part 2 of Managing Software Projects

Do you know the difference between programming and software engineering?

Not a Jeopardy question, the difference affects tens of millions of people who work directly on software. Programming is writing instructions for a computer, something everyone should learn. Software engineering occurs when two or more people work on a project, which introduces a major difference: communication. Communication is also something everyone should learn. Writing good instructions is clearly important but the communication between two or ten or a hundred programmers quickly becomes the critical factor in producing a good product.

No it's this way

The important thing to know here is Brooks’ law. Fred Brooks ran software development for IBM during the 60s and 70s; thousands of smart people writing millions of lines of code. For large projects like operating systems he tried to scale up by adding more programmers, just as a construction project would add more workers or a factory would add more assembly lines. But it didn’t work.  He concluded that the more programmers he added the slower things went.

Brooks formulated his law of communication as “adding programmers to a late project makes it later.”

Here is the problem. For every programmer there is a communication link between them and every other programmer working on the project or n(n-1)/2 links. For two programmers you have one communication link; three, three links. So far, not so bad, sit those three people in the same room and they might avoid too many misunderstandings. Keep going up, with seven programmers you’ve got 21 links, about half of everyone’s time is spent coordinating with others. A sixteen programmer team? 256 links!

Clearly the project cannot be built in half the time by doubling the number of workers.

Very quickly the number of communication paths increases faster than the number of programmers,  Worse it’s not just programmers, it’s everyone who needs to intimately know the software: QA staff, the development manager, the product manager. This limits the number of people who can work on a single project.

bricks_smallBut wait, plenty of companies employ thousands of programmers, how does Lockheed Martin build something like an artillery targeting system for the army?  One piece at a time, with vast amounts of planning, hundreds of people dedicated to project communication, working for decades and probably 100% over budget. Much of this effort goes into splitting the project into pieces that a small team can handle and defining detailed interfaces to minimize the communication they need to do with each other.

So that’s the difference, communication is the key skill of software engineering. Software engineers are often not known for their communication skills – or maybe we should say programmers are not known for their communication skills but a well-functioning software engineering team is one that communicates well.

Age Adjusted Median Income

The other day my wife mentioned that median income in the US had gone down over the past few years, which led to a random thought while I was driving home.  Perhaps when median income is plotted over time it should be weighted based on the relative earning power of the current age of the population.  When a large portion of the population is at the age of their optimum earning power (around 50) you would expect median income to be higher than when a large portion is very young or very old.

http://www.census.gov/hhes/www/income/data/historical/household/

At least it’ll make a good pair programming scenario next time I’m interviewing a candidate.

Today’s Challenge: Build a better system during the training meeting

Had a mandatory training meeting today on our time tracking software. Since we already had to figure out the 33 step process a few weeks ago so we could get paid…  today we had an impromptu hackathon.

This software is so bad that surely I can build something better during the training meeting for said software.  -Me

Now, how to beat my coworker Mikkel who is wicked good? We both have to vaguely pay attention to the webinar so that’s an equal handicap. Aha, don’t acknowledge that I’m serious until about 15 minutes in. Especially since my computer is creakily working toward four years of service so a few tasks that should take 20 seconds instead drag on for several minutes. It’s about five minutes into the hour when I start, in theory that leaves 55 minutes to build something cool.

Ok, grails create-app timetrack.
Create a user domain class to sub in for a real authentication plugin.

Now the biggest problem with the real time-tracking software is that you have to enter hours for each day.  This is dumb when the only useful purpose it serves is to track vacation time and try to avoid everyone being on vacation at once during a big release.

Don’t want an hours worked for the day object, let’s do the opposite and create-domain-class TimeOff.  Next create some controllers, set scaffold=true and voila we’ve the world’s simplest app for entering time off.

One of those commands takes much longer than it should and after firing up Intellij as well I’m at the 15 minute mark.  Mikkel realizes I’m serious and starts cranking out a rails app. 

No problem, time for a secret weapon: Dojo.

I know Dojo has some great calendar widgets and a calendar sounds like a good interface for something concerned with days and time. Start looking through docs; not that calendar; this looks right; nope wasn’t that either.  Ok, finally got the right code, still getting a weird error though…

And the webinar is over, done after 40 minutes.

What?  I’ve only been coding for 35 mins and Mikkel for 20.  Well, time to show our results.  Mikkel’s lets you enter hours worked one day after another.  With about 15 steps to fill out a two week pay period it’s a more than 100% improvement over the same part of the real app.  And mine?  Giant javascript error, typical dojo.  Damn.


Couldn’t let it end there though.  Put another 15 minutes in to wire up json output in Grails and get the javascript error fixed and…  Bam, a decent prototype for the interface.

timetrack

For comparison, here’s the real app.  Now I just need to turn my prototype into a real app…

Submit Time Sheet Express Page