The Wisdom of Brown M&M's and Leaky Fire Hydrants
From M&M's to leaky fire hydrants, there are lessons to be learned for how to test systems to expose weaknesses and create resilience. But we rarely do it.
Thank you for reading The Garden of Forking Paths. If you’d like to support my work, please consider upgrading to a paid subscription or buy my new book, FLUKE.
I: Van Halen and the Brown M&M’s
Van Halen can, perhaps a bit unexpectedly, teach us how to make our societies more resilient.
Fame can make you weird, demanding, and eccentric. But for rock stars, those traits often get enshrined in documents, in the forms of contract riders which produce specific demands for what stars want before a concert, tailored to their exact specifications.
Some would expect a precise variety of chilled oranges; others would demand a gallon of milk, served at an exact temperature. One concert was almost called off because the Schlitz beer was presented chilled in a sink full of ice, rather than a wine bucket, as had been requested. In the 1970s, Crosby, Stills, Nash, and Young were notorious for their contract riders, demanding six different kinds of cigarettes, an exact form of music sheet paper with an exact kind of pen to write on it. Neil Young required Baby Duck wine; Crosby needed Canadian Club rye; Nash drank light white rum. In one clause that I hope has gone extinct, they wrote that they expected “shapely women” to serve them dinner.
But the strangest of all was Van Halen, fronted by the late Eddie Van Halen. And Eddie had a very specific request: M&M’s were to be available before they went on stage, but with a twist: “WARNING: ABSOLUTELY NO BROWN ONES.”
Initially, when they were asked about why they included this bizarre demand, the drummer, Alex Van Halen, told Rolling Stone magazine: “Why not?”
Organizers took the demands seriously. In the 1980s, the El Paso Times reported that kitchen workers Andrea Pena and Maria Gonzales had to put on gloves and sift through 24 one-pound bags of M&M’s, removing the brown ones “gingerly.” (The article, by the way, inadvertently produced one of the great all-time graphics accompanying a news story. I love the caption: “Outcast, left, and approved model”).
Rumors suggest that Van Halen would trash venues if the brown M&M request was not followed, causing enormous damage. All of this sounds incredibly gross—and in many ways, it was, with low-paid kitchen staff spending their work day plucking candies out of bags for millionaire rockstars.
But later, Van Halen claimed that the M&M served a specific, reasonable purpose.1 At their concerts, they would often perform with scaffolding, adorned with weighty speakers, pyrotechnics, or other theatrics that, if it all went wrong, could prove deadly to the musicians or the crowd.
The brown M&M clause was deliberately placed, Van Halen claimed, to determine whether the event staff had carefully read the specifications that would ensure a safe event. It was, by design, buried deep in the fine print of the contract.
If there were brown M&M’s backstage, it was a red flag: event staff were careless, and carelessness could cost lives. The M&M’s were a test—and Van Halen wanted to be sure that the organizers passed it before they went on stage and fired up their amps.
II: On Fake Explosives and Red Teams
In 2015, fourteen years after security failures at America’s airports had allowed 9/11 to happen, a series of fake passengers tried to smuggle dangerous items through various US airport security checkpoints. These were known as Homeland Security “Red Teams” and their task was simple: test whether the systems in place would successfully detect cleverly disguised weapons and explosives—or if they’d be able to get past the screeners.
The results weren’t comforting. In 67 of the 70 attempts, the Transport Security Administration screening failed to detect fake explosive devices, or other banned weapons. It was a 95 percent failure rate.
A review found that an additional investment of $540 million in security technology had produced no meaningful improvements in the ability to detect dangerous items. In 2019, a passenger somehow forgot that he had a loaded gun in his carry-on bag and made it through Atlanta’s airport security undetected. (He realized his rather egregious error on the flight and notified authorities). None of this is exactly chicken soup for the national security soul.
But it’s good that authorities are trying to test these systems to expose weaknesses. There are certain realms of the world where we’ve learned from Van Halen and have introduced Brown M&M-style tests. In the 1960s, the RAND Corporation developed the idea of “red teams,” named after the Soviets, or “reds,” to see how an attack might play out and determine whether American systems were ready for it.
Red teams are commonly used for critical infrastructure to see if fake attackers can physically gain access to a highly sensitive area. In cyberspace, a similar approach is often referred to as “penetration testing,” in which a “white hat hacker” will try to breach a system just to expose its vulnerabilities. (In my “Power Corrupts” podcast, I produced an episode called “Click Here to Kill Everybody,” which includes a story of a penetration tester who managed to easily hack into a container ship and could have caused the ship to sink by just clacking away on his keyboard).
These harrowing tests make systems more resilient. So, why don’t we perform them in other realms?
III: Ripton, Massachusetts and Gotcha Questions
Government budgets are unfathomably large. In the United States, the government spends roughly $6 trillion per year—a figure so large that if you stacked six trillion $1 bills on top of each other, the pile would be 407,000 miles high (for context, the moon is about 238,000 miles from Earth).
Trying to account for every dollar is therefore notoriously difficult. But we should still strive to spend wisely, so that the maximum public benefit can be derived from the minimum amount of money necessary to achieve it.
In 1985, a Massachusetts university professor and a co-conspirator invented a fake town. They called it Ripton and claimed it had been founded in 1767. Then, they applied for government grants, filling out all the right forms and making the case that their little town in rural Massachusetts warranted state funding.
They got it. The state budget allocated money to Ripton and sent money to a fake escrow account. At that point, they returned the money, but made their point: the state bureaucracy had so little knowledge of rural areas that they hadn’t even bothered to check that the town was real before spending taxpayer cash on it. It was a stunt that produced a wake-up call: it was astonishingly easy to get the government to waste money, so long as something sounded plausible on paper.
This story stands out precisely because it is rare. I’ve previously written about how we need sting operations for politicians, not just police, but we should also have sting operations for any system in which abuse or misuse can create needless harm, produce avoidable waste, or unnecessarily mislead people about the world. But most of the time, we simply don’t do that. The Brown M&M-style tests are relegated to realms of national security and cybersecurity, not to other systems that make our societies function. And it’s a serious missed opportunity.
Part of the explanation for our inaction lies in an uncomfortable truth: many of the systems that our world relies upon are vulnerable to manipulation, fraud, and abuse—or even to simple but significant errors. The results would be dismal.
In 2008, researchers decided to test the resilience of academic peer-review, a critical system that is central to vetting the knowledge that we use to make decisions on everything from economics to medical best practices. They submitted a series of research papers to 607 peer reviewers (academics who are volunteering their time to evaluate whether research should be published based on its merits).
But there was a twist: the papers they were sent had nine major methodological errors inserted into them. These were problems that, if published without correction, could undercut the basic veracity of the research.
What happened? On average, reviewers detected just 2.38 out of the 9 major errors. Sixteen percent of the reviewers found none of the nine errors. The lesson was clear: peer review fails to detect the majority of significant methodological mistakes. Some have argued that this is part of the case against peer review. Regardless of the relative merits of various proposals for reform, this study was important: by using a brown M&M-style test, we can now recalibrate our expectations from this system of knowledge production. Peer review can help vet research, but it’s a flawed system.
Too often, these tests end up as mere curiosities, blips that expose the weaknesses of a system and lead to no meaningful change. While none of the Red Team data from the TSA has been published since 2015, one would hope that the failures prompted changes that have led to better detection rates. With peer review or Massachusetts government allocation, it appears that little has changed. A new incarnation of Ripton, Massachusetts could probably get funding in plenty of jurisdictions in 2024.2
Corporate malfeasance is also an area that’s ripe for internal sting operations. Enrons just shouldn’t happen. We should be testing people to see if they’ll behave in blatantly dishonest and fraudulent ways that could bring down a major company. There should be Red Teams trying to take over the board room, not just the war room.
Admittedly, we would live in a dystopian world if every system operated without testing and verification. Some aspects of culture are better off with the principle of trusting in the virtues of our fellow human rather than instituting a pervasive feeling of paranoia. But as the power of a system—or an individual—rises, it becomes more important to use the brown M&M principle. Such tests are necessary if our societies are to follow the two wise words of wisdom from my grandfather on how to live a successful life:
And when it comes to the locus of catastrophes in modern life, many of them emanate in swarms from our failing politicians, who are often ignorant, but rarely questioned about basic facts. So much of our journalism today asks powerful people for their opinions. Opinions are easy. You can dodge an opinion question without much effort. Facts are harder.
Consider these two questions:
“Mr. Trump, what do you think about the Biden administration’s decision to withdraw from Afghanistan?”
“Mr. Trump, can you point to Afghanistan on this map of the world?”
The former will produce blather. The latter will likely produce an illuminating data point about Trump’s ignorance about our world. As I previously highlighted in an article for The Washington Post, it’s particularly important to test basic knowledge about facts rather than opinions when a powerful figure has shown intense ignorance. In an interview with The New York Times in 2017, here’s what Trump said about pre-existing conditions coverage within health insurance policy:
“Because you are basically saying from the moment the insurance, you’re 21 years old, you start working and you’re paying $12 a year for insurance, and by the time you’re 70, you get a nice plan.”
What is he talking about? Voters should know, definitively, whether a person running to be president has a basic working knowledge of the systems they are about to oversee. We should know whether Trump knows where Afghanistan is, or whether he thinks that actual Americans pay just $12 a year for health insurance. The “gotcha question” is a good thing. There should be more of them, inspired by the wisdom of brown M&M’s.
The Leaky Fire Hydrant Principle
In the TV show The Wire, fictional Baltimore Mayor Tommy Carcetti goes around to various public works offices, making vague demands about problems he’s witnessed. In one instance, he complains about a leaky fire hydrant.
“But Mr. Mayor, we’ve got more than 9,100 hydrants,” the official protests.
Without skipping a beat, Carcetti shoots back: “And one of them’s leaking.” The next shot shows a series of leaking fire hydrants being repaired.
The genius of this approach—which I will call The Leaky Fire Hydrant principle—is that it demands a solution to an individual problem within general systemic dysfunction while exploiting the uncertain knowledge about which exact problem needs to be solved. Normally, information gaps are bad and it’s best to coordinate with the best available information. But in this instance, the only way for the city officials to get the mayor off their back is to fix all of the problems, lest they miss the one that he noticed personally.
Of course, it’s not always possible to fix every leaky fire hydrant in all human systems, but the principle provides a clever way of producing action in which uncertainty is deliberately deployed as a driver of far more aggressive reform. Just as the brown M&M test wasn’t really about the brown M&M’s, Carcetti’s diatribe about one leaky fire hydrant wasn’t really about a specific fire hydrant at all; it was a proxy to ensure system-wide reform.
Taken together, these two ideas—of brown M&M’s and leaky fire hydrants—offer frameworks for thinking about how deception and uncertainty can, at times, be used as part of good faith efforts to spark systemic reform that tackles avoidable risks that are undetected or ignored within our social systems. We should use them more widely.
But they also point to a wider lesson, and one that I highlight in Fluke. We too often focus on efficiency and optimization within systems, happily chugging along so long as nothing disastrous jolts us out of complacency. Until it does.
We race toward the edge, hoping to slay every last bit of slack within our social systems, prostrating ourselves before the God of Efficiency. In recent years, we’ve fallen off the cliff repeatedly with human-made calamities, amplified by fully optimized systems with no room for error, yet we stick to the same gospel, no matter the toll.
As a result, the world—which was already an uncertain jamboree of accidents and flukes—has become even more uncertain. That kind of uncertainty, in which lives and livelihoods teeter on a knife’s edge of our own making, laces catastrophic risk into our societies. We should learn our lesson, build more slack into our systems, and trade perfect efficiency for better resilience. It’s a better, sturdier way to live.
There are some uncertainties and perils of social worlds that are unavoidable. We can’t prevent every pandemic or terrorist attack, nor can we tame much of the uncertainty we face. But we can do better by thinking more carefully about how to expose weakness and build resilience through deceptive, but well-intentioned, testing. And one of guiding principles for how to better test our systems can be derived from a clever trick used by some boisterous rockstars with an apparent disdain for a certain color of candy.
I only lament that the story originated with Van Halen, rather than with a rockstar more fitting for the argument: Sting.
Thank you for reading The Garden of Forking Paths. As you might imagine, these essays don’t spring fully-formed from my brain in an instantaneous deluge, but rather take a fair bit of time to research and write. It’s freely available to you, but if you want to support my research and writing and make this newsletter sustainable, please consider upgrading to a paid subscription ($4/month annual, or $5/month monthly). Alternatively, you can support my work by buying FLUKE: Chance, Chaos, and Why Everything We Do Matters.
This account in Smithsonian Magazine pokes a few holes in this account and argues that the whole thing was just a publicity stunt.
Digital systems can automatically deal with some of these problems. It would probably produce a computer error in some systems if you tried to allocate money to a fake town in 2024, but I imagine that many systems would still fail this test without anyone ever bothering to check whether the town actually exists.