14 Comments

'For our part, we must demand that our leaders think more about lasting stability than short-term sugar highs.'

Yes, but they are a reflection of the societies that elect them. Most of us are collectively high on the sugar of cheaply-purchased convenience.

Better politicians are much needed, but might be hastened or found to be a consequence of addressing our own individual shortcomings.

We cannot outsource change to people who are no better than we are.

Expand full comment

Yesterday I was reading Fluke, thinking of all the people whose life's trajectories had been changed by one tiny mistake. I like having my worldview expanded.

Expand full comment

The irony of this is that the original intent of the internet was to decentralize command operations, but now it’s dominated by massive centralized platforms and services.

Expand full comment

It seems as though (and this is off the cuff) that the CrowdStrike engineers had forgotten that Murphy was an optimist. At one time, airline, banking, and other critical computer systems were fond of a company called Tandem Computer who essentially made everything in redundant pairs so that if one went down the other would take over seamlessly. I don’t know what happened to Tandem but I doubt that the concept has fallen to tech arrogance. It seems as though one system could have been upgraded and tested while the backup was running and the backup updated when confidence was established in the new upgraded system. My guess is that the redundant concept is still in place but CrowdStrike has just become sloppy.

What this does illustrate, as Brian suggests, is that there is a place in the world for oversight by experts. It is a vivid rebuke to the Chevron decision and highlights with fireworks that lawyers should not be trusted to make decisions in engineering and science.

Expand full comment

Or having judges who have agendas either!

Expand full comment

Poor Homo sapiens! Our hubristic quest to optimize profits for a small number of our species has joined the pantheon of those “Too Much of a Good Thing” gifts from the gods. Once more, we have failed to manage the exquisite balance between temporal human cleverness and the timeless cosmic wisdom of Mother Nature. We are like a spoiled 5 year-old child boasting about having started a fire with a magnifying glass in a pile of oily rags in the garage, oblivious to the potential life-altering outcome.☺️

Expand full comment

The chance of a Bad Actor accomplishing the same thing as this software glitch is scarier than hell, and there are Bad Actors out there honing their skills as I type. Since my version of Windows didn't get the upgrade, since I had no immediate need of bank or health or travel services or a work environment, this one passed me by in the distance: I still had access to the websites I use (including Substack). It so easily could have been so much worse. The question is HOW we get that redundancy and slack into our systems. I hope those in charge of those systems are heeding the wake-up call.

Imagine the Bad Actor hitting on election day.

Expand full comment

The back up is called paper!

Expand full comment

I’m still drowning in the paper I accumulated before the internet existed. I keep trying to digitalize it. I do occasionally back up all the files where I keep stuff on line (Dropbox, One Drive, Evernote) onto an external hard drive that I connect to the net only when doing the backup. With luck, even if the NET goes down I’ll still have my basic data on the computer, just as I did before everything went on line.

So when Win Home gets back up, I can start running. Without paper, even if the net is still down.

Expand full comment

Although CrowdStrike’s bad update file caused Windows machines to crash (blue screen of death BSOD) the actual chaos that ensued was because Windows could not recover automatically from this particular BSOD. Often Windows will recover from a BSOD with a restart but not so here because CrowdStike software runs with highly priviledged access to the Windows operating system - it needs this to do it’s cyber security thing – and the Windows architecture, an old one at it’s core, is not designed to recover from a BSOD automatically if one is caused by a bad 3rd party file installed with such highly priviledged access. Instead, recovery required an IT engineer (with system admin privileges) to visit each machine to boot Windows into a recovery mode and manually replace the bad file. If the machine’s disc was encrypted, and most would have been, the IT engineer would also need its unique recovery decryption key and you won’t be surprised to know some IT departments keep better records of each machines recovery decryption keys than others. More than a year ago Microsoft started rewriting significant parts of the Windows operating system using a technology called Rust, which amongst other things, can prevent this type incident happening in future. CrowdStrike have sped this re-write up a tad (you would hope).

Expand full comment

Back in the 1950s, Dad would ask, What would you do it such and such stopped working? You’d have no idea how to fix it.

Expand full comment

I was a victim of the CrowdStrike fiasco on Friday morning stuck at a gate at MSP waiting 5 hours for a Delta flight (as were so many others at other gates). It was absolute chaos. Neither gate agents or airline representatives had any clue about what was happening (all of their computers were down). I can’t even imagine how anyone would know how to respond to what happened if this hadn’t been just a “computer glitch.” Pretty scary.

Expand full comment

Brian, this is an example of the idea of the law of conservation of risk I have written in comments before. In this case risk associated with inefficiency and those associated costs has been transformed into event risk just like CrowdStrike! And this risk was transferred unknowingly to those who were using Microsoft systems who had never heard of CrowdStrike.

As for the hyper-efficiency we are striving for, it is about “reliability” as we would say in the power industry. I have seen this first hand on multiple occasions trying to be hyper-efficient and forgetting about the reliability/resilience needs of the system. Again, we are transforming financial risks into reliability risks.

Expand full comment

The term "Resilience" is possibly too limited a quality in this case as it seems to denote elasticity of slack, not necessarily major reconstruction or redirection when events (esp. unknowable unknowns) consume all the allowed slack and then some. What would be the term for the alacrity to dust yourself off and go back to the drawing board to replan or even reset goals when plans and outcomes fall to dust?

This is the real stuff of hero myths.

Expand full comment