Machine Learning – The Grand Canyon of Jello

Let’s say you have a block of Jello. We’re going to call that Jello our ‘Machine Learning Algorithm’. We want to teach it the difference between cubes and spheres.

Step 1. Drop a cube on the left side of the Jello. Did it leave a mark? Great. Drop it several more times.
Step 2. Drop a sphere on the right side. Same deal.

Ok, now we bake our Jello so it’s nice and hard.

Knowing the difference between cubes and spheres
Pick up a shape, does it fit a spot in the right side?
* Then it’s a sphere.
No, it fits a spot on the left side?
* Ok, then it’s a cube.

More Data
A machine learning algorithm is an impressionable material that picks up the shapes of data that passes through it.

Like water flowing through a canyon, data flows in, cube shaped data repeatedly hitting a spot until a cube shaped impression is formed. Sphere shaped data hits another spot until a sphere shaped impression is left. The more that different shapes of data flow through the more shapes the ML can recognize. The more that same and similar shapes flow in the better the impression of that shape.

After training, we harden the material so that it acts a strainer. Now when cube shaped data comes in, it fits the cube shaped channel and is sorted into the cube side, our strainer “knows” it’s a cube.

Instead of cubes and spheres if we have dog and cat shaped data our Jello or our canyon can learn the difference between dogs and cats. Feed in words and it can sort the difference between words. Feed in collections of words that make up ideas and it can sort different ideas.

Why is that so powerful? Well, our brains are the same sort of Jello. We drink in ideas until we understand them, sorting them into ‘things that apply right now’ and ‘things that don’t’.

Photo Credits:
Photo by Isaiah-Phillips Akintola on Unsplash

AI Assistants and Smart Contracts

Smart contract blockchains are clearly a powerful technology. Digital signatures, ledger databases, and public markets are all significant on their own, in combination they form a whole much greater than the parts. This whole has clear advantages in several areas: micropayments, loyalty points, virtual goods and other applications all benefit from the transparency and trust of blockchains. In other cases sober decisions have to be made about whether the whole is needed or if just one of the components provides the real value. Private ledgers or centralized markets can solve many problems.

A similar evaluation has to be made about business models. For example a social network built on a blockchain smart contract platform (let’s call that Web3 for short) gives users and developers more control and allows new models of sharing the value created. However, the dominant business model for centralized social networks is advertising and in a Web3 social network users often pay for the services they are using. For heavy users those penny transaction costs will add up and they are faced with a choice of whether to pay and deliberately choose the benefits of the Web3 app or continue on the path they are used to.

Everyone likes things that seem free so at first glance Web3 apps appear that they would stay a niche for those users who really care about portability of their data or other benefits or who contribute enough to the system to offset the value they receive.

AI assistants will change this calculus. In the next year you will start using an AI assistant to do more and more tasks for you. That work will get done through programatic interfaces rather than user interfaces. Even more impactful; AI assistants don’t watch ads; at least not in the same way we care about now.

Ok, there will be a transitional period where your assistant watches ads, an arms race between content providers and users, and a switch to a new paradigm. In this new paradigm the tradeoffs will be much clearer and Web3 more competitive. In fact the aspects that remain advertising oriented are likely to be tracked on a blockchain: prove you have engaged with this ad in order to earn.

Big changes are coming to how we interact with the digital world. It’s important to still be clear eyed about what will work and won’t but we should expect major shifts in some large economic systems.

Frontier of Knowledge

With one of the first optical telescopes on earth, Galileo could observe novel phenomena everywhere he looked. A few hundred years later and new astronomical discoveries require instruments like the $10 billion James Webb. The more we as a species know, the harder it becomes to discover new information. Each new discovery requires more previous knowledge to build upon and often requires more advanced instruments to plumb unknown depths. That at least is the productivity problem also known as the “burden of knowledge“.

The idea that “there is nothing new under the sun” traces at least as far back as ancient Hebrew. In modern times anyone starting a company will quickly find that the same idea has been tried a few times before. Though learning from those failures is one of the best places to start it can be a depressing way to start one’s day.

Fortunately there are a few countervailing forces. Much knowledge acquisition is what in computer science would be called NP-hard. It takes a lot of work to discover – running years long experiments, trying many avenues of research – but once discovered is easy to verify or put into practice. Learning to use CRISPR to edit genes is much faster than discovering it as a mechanism

Next, old ideas are supplanted with new and our working model of what is important to know is constantly refined and updated. We spend a lot of time on this. Is somebody wrong on the internet? I better help them update their model. This process of compaction and encoding allows us to carry around and more quickly find efficient representations of how the universe works.

Ideally those improved representations then make it into school curricula where we can apply improvements in pedagogy. The better we can teach both raw knowledge and the general skills of how to learn and how to reason, the more tractable endless facts become.

We can’t ever read all the books and that of which we are ignorant will always be expanding, however we still have good tools to get us to that frontier and on to exploring. We should expect to discover great and wondrous new things.

Cognitive Decline or Generalization

It’s funny reading even expert reports that wrap up with statements like “That will require us to tap into a superpower that can’t be programmed into a robot: imagination.” which are already proving false.

One only has to look at any image generation program to see AI imagination at work. Yes, it’s currently human directed but with advances in internal AI monologues we can see that AI will soon be deliberately daydreaming to find novel solutions.

An ML generated image of 'neural network generalization'

More and more we are able to transfer the evolved mechanisms of brains directly to advances in machine learning. Using sleep to consolidate weights from one task before learning the next is helping with generalization across tasks.

And the reverse is true as well. Seeing and experimenting with ML models gives us insight into the daily machinations of the wet neural networks we each carry around. That led to the following question.

How much of age related cognitive decline is due to the normally beneficial process of generalization?

When your Mom goes through her sibling’s names before getting to your own, those similar concepts have been grouped together and are less differentiated. Commonality becomes generalization. Names in particular are highly specific, high resolution information that map to things like “that tall blond guy I met at that neighbor’s party”.

Perhaps you visit the Botanic Gardens. How many petals did that flower have? What was its latin name written next to it? Neither fact is as useful as the vague memory that it was purple with a yellow center and pretty, so the petal count and latin name may not even make it to long term memory. As time passes the specific high resolution information about each flower on the trip is consolidated into mosaics of areas you passed through, that flower is now just a purple dot. “Yes, it was beautiful, lots in bloom. You should stop by.” Eventually the whole trip is mostly a dot in the timeline of memory, “Yes, I’ve been to the botanic gardens many times”. You probably remember where it’s located, the general layout, some highlights of visits, and have some very general emotional impressions of how you feel about the place.

All of which is a useful process. Knowing the exact petal counts of thousands of flowers is fairly useless compared “it’s pretty there in June and makes me feel relaxed”. The process of learning is powered by that consolidation. All these things are flowers whether they have 3, 5, or 10 petals.

What expectations then should we have of memory? The more one lives, the greater the surface area grows of subjects we expect to know and remember. Are most memories stored with 5 or 10 years of detail before being compacted into more general knowledge or erased to make room for more? The existence of Highly Superior Autobiographic Memory indicates there could be space for all the detail but that forgetting or at least generalizing is useful for getting through life.

It is theorized that memory storage is similar to holographic storage. A scratch across the surface of digital holographic storage will introduce noise rather than destroying specific information; reducing the fidelity and detail. Whether due to cell death or age related decrease in neuron performance, we expect similar effects on our memory as we age.

With two mechanisms, generalization and aging, that have similar results, how do we tell the difference? Your mom eventually retrieves your name but that may not be the case for more obscure information. If aging is the predominant force, perhaps in our knowledge driven economy we should focus more on understanding the cellular mechanisms and developing treatments. If generalization is stronger we could adapt in other ways: changing our cultural expectations of memory or spending time reviewing memories to curate the level of specificity vs generalization we want in areas of our memory.

In an age where we’re able to store more knowledge and detailed photographic memories outside of our brain perhaps generalization is ok and we should remember the bare minimum.

Education is Strategy

Ok, you’ve had a great strategy retreat, identified the major challenges and opportunities you see coming in the next five years. There will be many specific initiatives and roadmaps to build but the fundamental question is “how do we equip our workforce to deal with that future?”

The current wave of change coming at us makes this need abundantly clear. Machine learning services and AI assistants will sweep through the economy: aiding, changing, or eliminating every job. Computers are becoming programmable in English (and every other language). Information, predictions, optimization paths, and a dozen suggestions of how to do every task are coming to every worker’s information sphere, whether that revolves around a phone, cash register, augmented reality, or ambient computing.

In the next five years this force will sweep through innovators. In the next 10 it will be a constant pressure on companies to adapt or lose out. In 20 years AI assistants will be pervasive in the economy. Walmart needs to adapt just as much as Google will (and is). 

Companies are still catching up on the skills needed for remote work and the changed technology expectations of customers. How are they going to going to handle this next shift? By equipping their workforce with skills so that workforce can do the adapting themselves.

In 6th grade, an insightful school librarian drilled into my class that we were growing up in the information age. “You need to know some things but most importantly you have to be able to find information and to learn”. The shift for everyone now is similar: how to apply and guide machine intelligence to accomplish goals.

I know the bear market is top of mind right now for most executives and boards. However the companies that will lead their markets in the coming years are the ones thinking right now about their talent pipelines and how to maximize success for that talent. We don’t know what tactics will be needed and what key business decisions our businesses will need to make in the next five years. Experience tells us that our best planning is still going to encounter an ocean of uncertainty and changing circumstances.

My strategy for the future is to give my colleagues every opportunity to learn and upskill so they handle the coming challenges and opportunities better that anyone.

Lament of the full stack engineer

Like many kids I loved reading. I was probably lucky to grow up before the constant digital gamification of our world, a quiet corner and a the pages of another world made for a happy afternoon. So it was a crushing realization that I could never read all the books in the world. Probably not even all the books in our small local library.

Being a “full stack” engineer often feels that way.

Even more daunting now that I’ve been away from it for a while. I’m doing a technical rotation this year, refreshing my sense of assembling code and being on the front lines of tech. In many ways it’s a refreshing feeling, learning tons of new things feels fresh and exciting.

On the other hand it’s an embarrassment of choices. In 2006 I became highly proficient in MySQL. Barely seems useful in the age of global causal plus consistency. In 2008 I was writing JavaScript unit tests and running layered javascript builds. Thanks to the Dojo framework I was five years ahead of most of the industry. But it means nothing now, a dozen years later it’s a full time job to keep up with the changes in React.

Oh for simpler times!

Just kidding. Better to be awash in the great tech firehose than standing around thirsty. Pass the quantum homomorphic encryption please.

Prompt Engineering

Prompt engineering of humans continues to be a popular pastime, and career.

I was reminded of this while reading an article on (eventual) chip shortages (if there is conflict around Taiwan). Given that prompt I thought, “ I should buy a new GPU”. Though I then dismissed it I could feel the generated idea still being added to my training set of world data.

Of course this is the domain of marketing, sales…management, parenting…human interaction.  It was a reminder though of why negative advertising works; why Facebook posts could influence an election. 

Thoughts become actions become deeds. Even our small choices, deliberate or not, adjust the multi-level dataset of humanity and that of our technological progeny.

All those Electric Sheep

A friend pointed me at this paper, “The Overfitted Brain: Dreams evolved to assist generalization”, which made me think of some machine learning concepts that still stand out five years after I learned them.

I used to talk a lot with two engineers building GAN (Generative Adversarial Network) frameworks. One day walking back from lunch a few things came up that have stood out since. Context here is that the output of a Neural Network, the probability distribution function encoded in the network, is a manifold over the search space. In other words, if the real terrain is the problem we’re looking at then the network we trained is a map. Maybe a map in crayon, maybe a high res printout but in either case a representation.

Ok so the two interesting things were:

  1. what we think of as Intelligence is ‘how well does the map represent reality?’
  2. what we think of as Creativity is ‘how diverse are the generated manifolds?’
    “Oh cool, you made a crayon map that lets me understand the high level layout of the city” “This black and white map makes the major highways really stand out”.

The latter relates to that paper on dreaming. By generating all kinds of different (often crazy) representations of the world we (or algorithms) are able to develop a more complex understanding of how it all really works and fits together. Dreaming is a great way to generate those representations. Turns out so is lunch with friends!

So remember, “Lunch is the most important meeting of the day“. And creative pursuits are key to improving our understanding of the world. Keep dreaming.

Photo by Trinity Kubassek:
Photo by Nishant Meena:

All Opinions are…

All Opinions expressed in this blog are my own … the OFFICIAL POSITION OF THE UNITED STATES OF AMERICA!

And furthermore, the official position of YOU. Yes that’s right. Didn’t get the memo? Well I sent you an email. And in the footer there was a bunch of legalese that by receipt of the email included the right to officially represent you in this blog, oh and your eternal soul. Nothing to worry about, just standard, “if you are or are not the intended recipient” stuff. Could be in your spam folder, you might want to check there. Still applies of course because, “even if this is in your spam folder”.

You have hereby been notified.

Easy multi-threading with Groovy

I finished writing an ETL process today. I know, you’re so jealous. Actually it was pretty fun, it pulls in some cool data. Although it was satisfying to have it working, some quick calculations showed it was going to take, “um, way too long” (58 hours!).

List stuffToProcess = Stuff.findAllByProcessedIsNull()

stuffToProcess.each {stuff ->
    try {
        Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
        importService.storeTheStuff(data, stuff)
    } catch(Exception e) {...}

enter the Groovy Parallel features to the rescue.

List stuffToProcess = Stuff.findAllByProcessedIsNull()

GParsPool.withPool(64) {
   stuffToProcess.eachParallel {stuff ->
      try {
         Map data = someRestServiceClient.fetchDataThatTakesOneSecond(stuff)
         importService.storeTheStuff(data, stuff)
      } catch(Exception e) {...}

Two lines of code and it’s 10 times faster!

Continue reading