The bus factor audit: what if your top engineers quit?
If your three most senior engineers resigned tomorrow, what breaks? The bus factor audit every CEO should run, and what a low score means for company value.
Pick your three most senior engineers and imagine they resign on the same day: they found something better, gave two weeks’ notice, and left. What breaks? What slows to a halt, and what can nobody else in the organization operate, explain, or fix? If those questions don’t have fast answers, you have a bus factor problem, and this article is the audit.
What the bus factor actually measures
The bus factor originated in software engineering as a thought experiment: how many people would need to be “hit by a bus” before a project became unable to continue? The framing is deliberately uncomfortable, and intentionally so. A bus factor of one means a single person’s departure would leave the organization unable to maintain a critical system. Most engineering leaders know the concept but very few have measured it.
The measurement is harder than the concept because knowledge concentration is invisible from a distance. An org chart shows reporting lines without touching knowledge distribution. A sprint board shows who is working on what this week but cannot tell you who understands how the billing integration actually works, or who has ever deployed to the production environment. The knowledge that matters most in a bus factor assessment lives in people’s heads and has never been written down, because it was never urgent enough to document until the person who held it was gone.
Engineering turnover tends to run higher than in many other functions, and senior departures are the ones that hurt most. On a team of ten, it is normal to lose a few people over the course of a year. The question worth asking is whether the organization survives each departure intact.
Why the CEO needs to run this audit
CTOs know their bus factor because they live with it daily: they know that Sarah is the only person who understands the legacy payment integration, that Marcus built the data pipeline and has been meaning to document it for two years, and that the deployment process for the production environment lives in a Notion page that Marcus also wrote and that nobody else has ever tested end to end.
What CTOs often cannot do is surface this risk in terms that reach a board conversation. The bus factor lives in the engineering team’s informal awareness, treated as a known limitation rather than a quantified organizational risk. It gets mentioned in retrospectives, added to the technical debt backlog, and deprioritized in the next sprint planning meeting because the shipping deadline is closer than the risk of someone leaving.
CEOs who ask this question directly, and ask for a specific answer rather than reassurance, are the ones who discover the risk before it becomes an incident. Ask your CTO: if I named three specific senior engineers and told you they were leaving next month, what is your plan? If the answer involves naming other specific people whose departure would also be catastrophic, the bus factor has become a company risk that belongs in the board conversation, not just the engineering backlog.
The five zones where knowledge concentrates
Bus factor risk does not distribute evenly across an engineering organization. It concentrates in five specific zones, and most organizations have exposure in at least three of them.
Legacy system ownership
Every organization that has been building for more than three years has systems built by people who are no longer there, maintained by people who inherited them without full documentation. The maintainers know enough to keep the system running but often not enough to change it without risk, and the people who could answer the hard questions left two years ago. When the current maintainer leaves, the organization discovers how much it never actually knew.
Deployment and infrastructure
The person who set up the production infrastructure, the cloud environment, the CI/CD pipeline, the monitoring configuration, and the incident response runbooks often did so before the team had the documentation habits it has now. The setup works; the knowledge of why specific decisions were made, what would break if a particular configuration changed, and how to recover from a specific failure mode lives with that person. In our experience, losing a senior engineer can stall delivery for weeks while the team rebuilds context, and for infrastructure knowledge the recovery period is often longer.
Vendor and integration relationships
Enterprise integrations with payment processors, identity providers, data warehouses, and third-party APIs accumulate undocumented tribal knowledge: the account contact who answers the phone, the workaround for the API behavior that isn’t in the official docs, the reason a specific configuration was chosen after a week of debugging. That knowledge lives in the engineer’s email history and memory, not in the codebase or any documentation system.
Architecture decision rationale
Current engineers can read the code but rarely understand why it is structured the way it is. Architecture decisions made under constraint (this service was split out because of a specific scaling problem, this data model was designed around a third-party system’s limitations, this API contract was chosen because of a planned integration that was later cancelled) leave behind artifacts that look like choices but were actually reactions. Without the rationale, the next engineer to touch the system cannot tell which parts are load-bearing and which are historical accidents.
Undocumented operational procedures
Every production system has operational procedures that were never written down because the person who developed them did it once and it just became how things were done: the quarterly database maintenance that requires running three scripts in a specific order, the deployment sequence that must happen before a major release, the emergency rollback procedure that exists in one person’s muscle memory. These procedures surface when the person who knows them is unavailable and something needs to happen immediately.
The audit
This is the executive version of a bus factor assessment: the set of questions that reveal whether the risk is real and how concentrated it is. Run it with your CTO or Head of Engineering and expect honest answers to be uncomfortable.
Round 1: name the dependencies
For each of the following, identify the one or two people whose absence would create the most severe operational problem. If you can name them immediately, they are a bus factor dependency.
- Production deployment (who can safely deploy to production without assistance?)
- Incident response (who do you call first when something breaks at 2am?)
- Payment or billing systems (who understands the integration end to end?)
- Data pipeline (who knows how the data gets from the source systems to where it is used?)
- Customer-facing API (who can diagnose and fix an outage in the core API?)
- Security and access management (who knows what has access to what, and why?)
- Legacy systems (what are the oldest systems in production, and who understands them?)
Round 2: test the documentation
Pick three of the named dependencies from the first round. For each one, ask: if the named person were unreachable today, could someone else perform the function using existing documentation? Test this by asking another engineer to find and follow the documentation without assistance from the named person. The result of this test is the actual bus factor.
Round 3: the resignation scenario
Name your three most senior engineers by tenure and domain knowledge rather than seniority title. Now ask: if all three resigned with two weeks’ notice, what would you do in weeks three through six? Be specific. Which systems would slow? Which deployments would pause? Which customer commitments would be at risk? Which incidents would you be unable to diagnose?
If the answer to any of these involves “we would figure it out,” that is an acknowledgment that no plan exists.
Score interpretation:
A healthy engineering organization can answer Round 1 with multiple names for each dependency. Round 2 produces documentation that other engineers can follow successfully without the named person’s assistance. Round 3 produces a specific contingency plan with named backups for each critical function. If your audit produces single names in Round 1, failed documentation tests in Round 2, and vague answers in Round 3, the bus factor is one or two and the organization is operating with concentrated knowledge risk.
What a low bus factor costs
The cost of a low bus factor is concrete and paid in four specific ways.
Delivery slowdown. When knowledge is concentrated, the people who hold it become bottlenecks. They cannot take real vacations. They get pulled into reviews, incidents, and architectural discussions that should involve more people. The organization’s delivery velocity becomes capped by their personal bandwidth. Senior engineers in this position spend a large share of their time fielding the same questions and transferring context, time redirected from the actual engineering work they were hired for.
Retention risk compounding. The engineers with the highest bus factor are, by definition, the most valuable and the most overstretched. They cannot be replaced, which means they are also the hardest to retain. They are pulled into every critical situation, which means they burn out faster than peers with lower organizational dependency. The organization that most needs them to stay has created the conditions most likely to make them leave.
Acquisition and financing friction. Acquirers probe exactly this in technical due diligence, treating bus factor as a measure of organizational risk. A company whose key systems are understood by one or two people who can leave is a company whose enterprise value is discounted for the risk of that departure. The asset the acquirer is buying runs on knowledge that may walk out the door, and that risk is priced into the valuation.
Incident recovery cost. When a critical system fails and the person who understands it is unavailable, the cost of the incident multiplies. Recovery time extends. The scope of potential fixes narrows to what the remaining team can confidently attempt. Customer impact extends. Institutional knowledge walks out the door with them, and for engineering knowledge specifically, that loss translates directly into extended incident timelines and increased risk in subsequent system changes.
The three things that actually reduce bus factor
Most bus factor discussions end with a list of practices: documentation, pair programming, code reviews, rotation. These are right in principle, but the organizations that actually distribute knowledge effectively tend to reach it through three specific mechanisms rather than trying to adopt every practice at once.
Forced, disconnected leave is the most effective diagnostic and mitigation tool in practice. Requiring key engineers to take real leave of at least two weeks, during which they cannot be contacted, forces knowledge transfer that would never happen otherwise. The week before they leave forces handoffs that would never otherwise occur. Their absence reveals what is actually missing. When they return, those gaps get filled. Organizations that enforce this consistently find and address bus factor dependencies before they become incidents. The ones that allow “I’m on vacation but available if needed” never discover the gaps.
Documentation as a prerequisite for shipping. Documentation written retrospectively is documentation that never gets written. The practice that works is making documentation a prerequisite for a feature being considered shipped. The requirement is narrower than comprehensive documentation: what someone else needs to deploy, monitor, and roll back the feature without the author’s involvement. That is also, not coincidentally, the knowledge that matters most when the author is unavailable.
Rotating incident response across the engineering team, rather than routing every incident to the person who built the affected system, builds bus factor resilience as a byproduct of normal operations. Engineers who participate in incidents across systems they did not build develop working knowledge of those systems. The knowledge that transfers in a live incident is more durable than the knowledge that transfers in a documentation session, because it is knowledge exercised under real conditions.
WAYF’s engagement pattern with organizations that have estate drift (the accumulation of undocumented knowledge and invisible dependencies) begins with a dependency mapping exercise before any technical work starts. That map reveals the bus factor concentrations, and the technical work can then be sequenced to address the highest-risk dependencies first rather than discovering them when the person who holds the knowledge has already left.
FAQ
-
What is the bus factor in software engineering?
The bus factor measures how many people in a team or organization would need to become unavailable before a critical project or system could no longer be maintained. A bus factor of one means a single person's departure would create a critical operational gap. The term originated as "hit by a bus," a deliberately uncomfortable thought experiment designed to force honest assessment of knowledge concentration risk.
-
How do you calculate bus factor?
The technical method involves analyzing code contribution history to identify which engineers have ownership over which system components, then simulating their removal to identify when critical functionality becomes unmaintainable. The executive method is simpler: identify the one or two people whose sudden unavailability would create the most severe operational problem for each critical system, then verify that others could operate those systems using existing documentation. The gap between who holds the knowledge and whether that knowledge is documented is the bus factor risk.
-
Why is bus factor important for acquisitions?
Acquirers and investors assess bus factor during technical due diligence because it represents a specific form of operational risk. A company whose critical systems are understood by one or two people who can leave after an acquisition is a company whose enterprise value depends on those individuals staying. This risk is typically priced into acquisition valuations. Organizations with high bus factor, where knowledge is distributed across multiple engineers and documented for the team, are more attractive acquisition targets at higher valuations.
-
What is the "Dead Sea Effect" in engineering teams?
The Dead Sea Effect, named in a 2008 essay by Bruce Webster, describes the pattern where mediocre engineers stay and strong engineers leave, concentrating knowledge in the people most likely to depart. Organizations where the best engineers feel most constrained, where bureaucracy and technical debt make the environment unrewarding, tend to retain the engineers least capable of carrying critical knowledge and lose the engineers most capable of doing so. The result is a progressive concentration of important knowledge in engineers who are actively looking for an exit.
-
How long does it take to recover from a key engineer departure?
A senior developer departure routinely sets a team back by weeks in delivery time. For engineers holding high bus factor knowledge, the recovery period is typically longer, because the knowledge lost includes accumulated system understanding that cannot be onboarded quickly, beyond just current project context. Organizations that have invested in documentation and knowledge distribution before the departure tend to recover within a few weeks; those that have not often keep discovering the gaps for months.
-
What is the difference between bus factor and documentation?
Documentation is one mitigation for low bus factor, often the first one reached for and rarely the most effective on its own. Bus factor risk includes tacit knowledge: the intuitions, shortcuts, workarounds, and contextual understanding that accumulate in an engineer's working memory and resist transfer to a document. Effective bus factor reduction combines documentation with pair programming, incident response rotation, forced disconnected leave, and the explicit practice of having multiple engineers perform each critical operational procedure rather than delegating it permanently to one person.
WAYF works with organizations to map knowledge dependencies and reduce bus factor risk as part of broader technology estate assessments. If the audit in this article revealed concentrations you were not aware of, the next step is a conversation.
We're booking content platform
engagements for 2026.
Twenty-five minutes to walk through the work and decide if we're the right team for it. Scoping and a fixed price come after.