All Opinions are…

All Opinions expressed in this blog are my own … the OFFICIAL POSITION OF THE UNITED STATES OF AMERICA!

And furthermore, the official position of YOU. Yes that’s right. Didn’t get the memo? Well I sent you an email. And in the footer there was a bunch of legalese that by receipt of the email included the right to officially represent you in this blog, oh and your eternal soul. Nothing to worry about, just standard, “if you are or are not the intended recipient” stuff. Could be in your spam folder, you might want to check there. Still applies of course because, “even if this is in your spam folder”.

You have hereby been notified.

Easy multi-threading with Groovy

I finished writing an ETL process today. I know, you’re so jealous. Actually it was pretty fun, it pulls in some cool data. Although it was satisfying to have it working, some quick calculations showed it was going to take, “um, way too long” (58 hours!).

List stuffToProcess = Stuff.findAllByProcessedIsNull()

stuffToProcess.each {stuff ->
    try {
        Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
        importService.storeTheStuff(data, stuff)
    } catch(Exception e) {...}

enter the Groovy Parallel features to the rescue.

List stuffToProcess = Stuff.findAllByProcessedIsNull()

GParsPool.withPool(64) {
   stuffToProcess.eachParallel {stuff ->
      try {
         Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
         importService.storeTheStuff(data, stuff)
      } catch(Exception e) {...}

Two lines of code and it’s 10 times faster!

Continue reading

Less is more for Machine Learning inputs

I came across some good advice on maintenance of Machine Learning algorithms. The short version, Less is More when you are deciding what data to feed into your algorithm.

It also reminded me of some systems issues Jaron Lanier pointed out in a recent interview.

  • Over time will your model be primarily processing its own predictions?The Netflix movie recommendation engine has narrowed its available inputs because most users only choose between the top predictions.
  • Will your prediction model cannibalize its source of data?Machine learning powered language translation is automating the work of many human translators, whose previous work provides the input needed by the machine learning algorithms.

Never, ever do anything important in Windows

Just learned a great Microsoft Windows lesson.

There is a database program I like that I use in a Windows VM.  I was in the middle of  executing some commands when: BAM, windows update and restart.

What brilliant design Microsoft, of course the update should just suddenly take control mid-keystroke.

Those commands I was in the middle of could have been something critical. For instance when switching over from one system to another, the last step is often to quickly run a couple of commands to execute the changeover so that there is just a second or two of downtime.

“Ok, tell the request router to stop sending millions of users to the old system.”
“Now, tell it to point at the new… <WINDOWS UPDATE>”

So, never, ever do anything important in Windows.  And change your settings to not automatically apply updates.

Brook’s law, why software engineering is not programming

Part 2 of Managing Software Projects

Do you know the difference between programming and software engineering?

Not a Jeopardy question, the difference affects tens of millions of people who work directly on software. Programming is writing instructions for a computer, something everyone should learn. Software engineering occurs when two or more people work on a project, which introduces a major difference: communication. Communication is also something everyone should learn. Writing good instructions is clearly important but the communication between two or ten or a hundred programmers quickly becomes the critical factor in producing a good product.

No it's this way

The important thing to know here is Brooks’ law. Fred Brooks ran software development for IBM during the 60s and 70s; thousands of smart people writing millions of lines of code. For large projects like operating systems he tried to scale up by adding more programmers, just as a construction project would add more workers or a factory would add more assembly lines. But it didn’t work.  He concluded that the more programmers he added the slower things went.

Brooks formulated his law of communication as “adding programmers to a late project makes it later.”

Here is the problem. For every programmer there is a communication link between them and every other programmer working on the project or n(n-1)/2 links. For two programmers you have one communication link; three, three links. So far, not so bad, sit those three people in the same room and they might avoid too many misunderstandings. Keep going up, with seven programmers you’ve got 21 links, about half of everyone’s time is spent coordinating with others. A sixteen programmer team? 256 links!

Clearly the project cannot be built in half the time by doubling the number of workers.

Very quickly the number of communication paths increases faster than the number of programmers,  Worse it’s not just programmers, it’s everyone who needs to intimately know the software: QA staff, the development manager, the product manager. This limits the number of people who can work on a single project.

bricks_smallBut wait, plenty of companies employ thousands of programmers, how does Lockheed Martin build something like an artillery targeting system for the army?  One piece at a time, with vast amounts of planning, hundreds of people dedicated to project communication, working for decades and probably 100% over budget. Much of this effort goes into splitting the project into pieces that a small team can handle and defining detailed interfaces to minimize the communication they need to do with each other.

So that’s the difference, communication is the key skill of software engineering. Software engineers are often not known for their communication skills – or maybe we should say programmers are not known for their communication skills but a well-functioning software engineering team is one that communicates well.

Age Adjusted Median Income

The other day my wife mentioned that median income in the US had gone down over the past few years, which led to a random thought while I was driving home.  Perhaps when median income is plotted over time it should be weighted based on the relative earning power of the current age of the population.  When a large portion of the population is at the age of their optimum earning power (around 50) you would expect median income to be higher than when a large portion is very young or very old.

At least it’ll make a good pair programming scenario next time I’m interviewing a candidate.