The Finding Nobody Implemented

The DORA research said culture predicts engineering performance. Nicole Forsgren's most important finding never made it into a single commercial product.

As a computer scientist, I love data. Things feel good, things feel bad, but our biases shape those feelings, and data is what pulls the signal out. I've believed that since my first software engineering job.

At that first job, I was fortunate to work with people who fully believed in measuring things. This was before the age of observability, but the organization was advanced in terms of how it measured the product. Not from a data analytics standpoint. I mean that as engineering teams, as we built features, we made hypotheses about behavior and made sure we measured it. We had anomaly detection in place well before the analytics team did.

For all that, the only things we tracked about the engineering team itself were deployment frequency (we were an early continuous deployment shop) and whether we hit our delivery dates. That was the extent of it.

In 2017, I attended DevOps Days. That was the year they did the Monsters of DevOps tour: Gene Kim, Jez Humble, Kelsey Hightower, Nicole Forsgren, among others. Hightower threw out his prepared talk and just told his story. An unexpected diversion into human struggle and vulnerability in a conference about automation. My favorite talk was still Forsgren's. She presented the data on correlations between deployment frequency and high-performing organizations, and then spent the last fifteen or twenty minutes on something I hadn't heard anyone talk about seriously before: how to measure culture using Westrum typology and the research behind it.

Hightower got there from a completely different direction and landed in the same place. The people who gave a damn were the ones producing the generative cultures Forsgren was measuring.

Of everything I heard that week, it was the ability to measure culture that landed hardest. I spent the next couple of years trying to figure out how to actually do it, because she didn't show the algorithm. The internet had almost nothing on operationalizing Westrum at that point. Over time, as her work gained traction, more information surfaced, and I was eventually able to piece together a measurement approach. I've run it at several organizations since, usually when I first join. If you want to move culture in a direction, you need to know where it is.

That was 2017. We are now most of the way through 2026, and the industry still hasn't figured out what to do with that finding.

DORA came out of that same research community, with Forsgren as one of its architects. If you read the DORA reports carefully, culture is there. Westrum shows up. The acknowledgment that organizational culture type is one of the strongest predictors of software delivery performance made it into the text. And then every commercial implementation of DORA metrics that I've ever seen quietly dropped it. Deploy frequency, lead time for changes, change failure rate, mean time to restore. Those made the dashboards. Culture didn't.

That's not an accident, and it's not just that culture is harder to dashboard. It's that a Westrum score is inconvenient in a way that a deployment frequency metric never is. Deployment frequency can't indict you. A culture assessment that shows your organization is pathological or bureaucratic absolutely can. Nobody building a commercial product wanted to be the tool that told a VP their engineering culture was broken, because that VP is also the buyer. So the finding that most directly implicates leadership quietly became an appendix, and every tool in the market followed the same logic.

Last week Cortex released DRIVE, a new framework for measuring engineering organizational health. It covers Delivery, Reliability, Initiatives, Vigilance, and Efficiency. The metrics are deploy frequency, lead time for changes, incident counts, CVE status, cloud spend, token costs. They are, in other words, DORA metrics with some security and infrastructure measurements added.

Cortex is upfront about the philosophy behind it. Their website opens with this: "Software engineering is having its industrial revolution. We've gone from writing code by hand to building software factories." They go further in their OpEx review section, which describes a practice with "roots in manufacturing, where Operational Excellence emerged as a discipline for treating an entire factory as one observable, continuously improving unit." Amazon and Google are cited as exemplars.

Culture does not appear in the framework.

Cortex isn't wrong about the factory framing. AI agents are allowing some engineering work to be done in a more factory-like sense. It's fair to introduce new concepts in which to measure these changes and their impacts. But until the engineering organization is composed of nothing but agents, this is just a layer on top of the most important part.

Engineering teams are people. The work is creative, non-deterministic, dependent on psychological safety and trust in ways that deployment pipelines are not. The industry has been trying to apply factory floor logic to that work for decades, and the frameworks keep reflecting that aspiration back at us while leaving Forsgren's most important finding on the floor.

AI is accelerating this. The workforce cuts in favor of agent spend, the framing of engineers as replaceable capacity to be optimized rather than people worth developing and leading. These aren't new instincts, they're old instincts with better cover. I use AI every day. It absolutely increases capacity. But the question of whether your engineers feel psychologically safe, whether they feel ownership over their work, whether the organization is generative or pathological. None of that changes because you gave everyone a coding agent. If anything it gets harder to see, because the output metrics look better while the culture quietly degrades.

Forsgren's research didn't just find that generative culture correlates with good delivery metrics. It found that culture is a precondition for the practices that produce those metrics in the first place. The causality runs culture first, practices second, metrics third. Organizations running DORA dashboards without addressing culture are missing the foundational preconditions for the metrics to mean anything. And when the numbers don't move, the response is to mandate the number. Deploy more frequently. Reduce your lead time. The metric becomes the goal, which is exactly backwards. Deployment frequency goes up when people who give a damn go fix their deployment pipelines. The dashboard didn't do that. The culture did.

Hire the right people, they'll build the culture, which implements the practices, which enables the metrics. The industry keeps trying to shortcut to the last step.

Meta is the clearest current example and also the most complicated one, in both directions. For most of their existence they were revered as a top-tier engineering organization. Not because they stumbled into good culture but because Zuckerberg built it deliberately. The psychological safety, the bottom-up technical decision-making, the model of engineers choosing projects they believed in. That was a product of his leadership, and he understood exactly what it produced.

He's now dismantling it. Keystroke loggers. Forced reconfigurations into data labeling work. Engineers who once had near-full autonomy over what they worked on are being redirected by executive edict. The Westrum survey would show you exactly where that culture score is heading. It wouldn't change anything, because the person who built the culture has decided the frontier AI race is worth the cost, and he owns the company. No board is going to stop him.

That's the real limit of measurement, and it's worth being clear about it. The data tells you where you are. It can't govern what a decision-maker chooses to do next. What it can do is make the cost visible to leaders who are willing to look and to engineers deciding where they want to work.

The finding Forsgren put in that room in 2017 was real. The research was solid. The industry picked up everything around it and left the most important part behind, and the frameworks keep arriving to confirm that choice. DRIVE is just the latest one.

I write about engineering leadership and team health in the Engineering Health newsletter on LinkedIn. Search "Engineering Health" to find it.