Platform Teams Cut On-Call by Measuring Internal Tool Dead Code

May 21, 2026 By Sara Park

Every platform team builds glue code. A CLI wrapper here, a migration script there, a dashboard widget that once served a purpose. Over months, these small additions accumulate into a sprawling codebase that nobody fully understands. The cost is invisible until the pager goes off at 3 AM for an error in a module that no active tool depends on. Dead code is not just clutter—it is a direct contributor to on-call fatigue.

Platform teams at companies like Spotify, Netflix, and Uber have started treating internal tool dead code as a measurable liability. By applying static analysis and runtime tracing, they identify unused endpoints, orphaned functions, and deprecated integrations. The payoff is not theoretical: fewer code paths mean fewer alarms, simpler debugging, and faster incident response. This article lays out how to measure dead code, tie cleanup to on-call reduction, and sustain a culture of pruning.

The Unspoken Tax of Internal Tools

Platform teams exist to accelerate other developers. They build CI/CD pipelines, deployment dashboards, service catalogs, and internal SDKs. The implicit promise is that these tools reduce cognitive load for the rest of the organization. But the platform team itself carries a growing cognitive load from maintaining its own creations.

Every internal tool starts with a clear purpose. A year later, that purpose may have shifted. A migration script written for a database upgrade stays in the repo even after the migration completes. A service discovery client gets replaced by a newer version, but the old module lingers. The team that built it has moved on. No one is sure whether deleting it will break something downstream.

This accumulation is not malicious—it is the natural result of velocity. Engineers ship features, fix bugs, and move to the next priority. Cleaning up is rarely rewarded. The result is a codebase where unused code can account for 20 to 30 percent of the total, according to internal audits shared at engineering conferences. Spotify's Backstage team, for example, documented that unused plugin code in their internal developer portal contributed to confusing error messages and longer debug cycles.

The tax shows up in on-call rotations. When an alert fires, the engineer must trace through multiple layers of tooling. Dead code creates false positives—alarms from deprecated endpoints that no longer serve real traffic. It also increases mean time to resolution (MTTR) because the investigator wastes time ruling out irrelevant code paths. Over a quarter, those minutes add up to hours of lost productivity and frayed nerves.

Why Dead Code Survives in Platform Repos

Dead code persists for several reasons, and understanding them is the first step to removal. The most common is fear. Platform tools often have unknown dependencies. A shared library might be imported by a dozen services, and no one has a complete map of who uses what. Deleting a function that seems unused could break a deployment pipeline that runs once a quarter.

Another factor is ownership. In a monorepo, modules can become orphaned when their original authors leave the team or the company. No one feels responsible for a piece of code that they did not write and do not understand. The code sits in a gray zone—too risky to delete, too obscure to maintain. Over time, it becomes a source of subtle bugs and confusing documentation.

Legacy migrations also leave behind dead code. When a platform team moves from one infrastructure provider to another, they often write adapters or shims. Once the migration is complete, the old code remains. The team is busy with the next initiative, and the shim is left as a safety net. But safety nets that are never used become trip hazards. A 2022 study by Microsoft Research found that roughly 30 percent of endpoints in large-scale internal APIs were never called in production over a six-month period.

Copy-paste culture in monorepos accelerates the problem. Engineers duplicate code snippets from existing modules rather than abstracting shared logic. Each copy introduces slight variations, and over time the original and the copy diverge. When the original is updated, the copies are not. The result is a forest of near-identical functions, only a few of which are actually invoked. Without systematic measurement, the team cannot distinguish the living from the dead.

Measuring Dead Code with Call-Graph Analysis

The antidote to guesswork is measurement. Two complementary approaches dominate: static analysis and runtime tracing. Static analysis builds a call graph by scanning the codebase for import statements, function calls, and export declarations. Tools like depcheck, Madge, or custom scripts can flag modules that are never imported or functions that are never called.

For platform SDKs and internal libraries, static analysis works well because the code is self-contained. A function that is not called anywhere in the repo is likely dead. However, static analysis has blind spots. Dynamic imports, reflection, and runtime code generation can hide dependencies. That is where runtime tracing comes in.

Runtime tracing instruments the application to record which code paths execute under real traffic. For internal tools, staging environments that mirror production traffic are ideal. Teams can use OpenTelemetry to capture spans for every function call or API endpoint. After a week of observation, they compare the trace data against the full set of available code paths. Any path that never appears in traces is a candidate for removal.

Netflix's Vizceral project visualized service dependencies in real time, but the same principle applies at the function level. A coverage report from runtime tracing can be generated as a JSON artifact and posted to a dashboard. GitHub Actions can be configured to run a nightly job that flags unused imports and posts a PR comment. The key is making the data visible and actionable. Without a dashboard, the measurements sit on a hard drive and the dead code persists.

Tying Cleanup Directly to On-Call Reduction

Once dead code is measured, the next step is connecting cleanup to on-call metrics. The hypothesis is simple: fewer code paths mean fewer possible failure modes. Each removed module eliminates a surface area for bugs, misconfigurations, and dependency conflicts. Over time, this should reduce the number of alerts and lower MTTR.

PagerDuty incident data can be correlated with code size. A platform team at a mid-size e-commerce company tracked their top ten alert sources over six months. They found that four of the ten originated from modules that had not been touched in over a year. After deprecating those modules, the corresponding alerts disappeared. The team's on-call rotation went from one major incident per week to one every three weeks.

Uber's 2019 migration from a monolithic Python service to a modular architecture provides a larger example. By removing unused code paths during the migration, the platform team estimated they saved roughly 2,000 engineer-hours in reduced debugging and rework. Each removed module contributed a small but cumulative improvement to MTTR. The team reported that incident response times dropped by minutes per event, which translated to fewer pages and less burnout.

The causal link is not always linear. Removing dead code can introduce regressions if dependencies are misidentified. But the evidence from multiple organizations suggests that the net effect is positive. The key is to measure before and after, using metrics like alert volume, MTTR, and engineer self-reported fatigue. A platform team that cannot articulate the impact of cleanup in on-call terms will struggle to justify the time investment to leadership.

Practical Playbook for Dead Code Removal

Removing dead code requires more than a script. It demands a process that balances risk with reward. A proven playbook starts with a deprecation header. Before deleting any code, add a comment or decorator that marks the module as deprecated and logs a warning when it is called. Set a deprecation window of two to four weeks. If no one reports a broken pipeline, the code is safe to remove.

Feature flags can gate old code paths. Instead of deleting a module immediately, wrap its usage in a flag that defaults to off. Monitor the flag's activation count. If it never flips to on, the code is dead. This approach is especially useful for internal APIs where consumers are other teams. The flag gives them time to migrate without a sudden breakage.

Automated PRs that delete unused code are surprisingly effective. A bot can scan the repo weekly, identify modules with zero callers, and open a PR with the deletion. The PR includes a link to the coverage report and a suggested reviewer. The reviewer's job is not to re-verify the analysis but to check for edge cases—dynamic imports or runtime registration that the tool missed. Teams that adopt this pattern report that most PRs are merged within a day.

Quarterly cleanup sprints institutionalize the habit. The platform team dedicates one week per quarter to removing dead code, updating documentation, and retiring deprecated endpoints. During the sprint, they measure the reduction in code size and the corresponding change in on-call metrics. Rewarding teams for reduction metrics—such as a 10 percent drop in unused functions—reinforces the behavior. Some organizations have created internal leaderboards that show which teams have the cleanest codebases.

Culture Shift: Treating Code as Liability

Technical processes alone are not enough. The underlying belief that code is an asset must shift to a more nuanced view: code is a liability that provides value only as long as it is used. Platform teams, in particular, must own their inventory. If a module has no known consumer, it should be treated like an unmaintained warehouse—expensive to insure and prone to attracting pests.

The 'you build it, you clean it' motto, adapted from the DevOps principle of owning what you run, applies here. When a platform team ships a new tool, they should also plan its sunset. That does not mean predicting the end date, but it does mean committing to periodic reviews. A quarterly inventory check, similar to a security audit, forces the team to justify each module's existence.

Documentation of removal rationale matters. When a module is deleted, the commit message should explain why it was unused and how the team verified that. This creates a historical record that future engineers can consult. It also builds institutional knowledge about the platform's evolution. Without documentation, the same dead code patterns may reappear.

Sharing on-call win stories internally builds momentum. When a team can say, 'We removed three unused services last quarter and our on-call incidents dropped by 40 percent,' other teams take notice. Some organizations have internal tech talks where platform engineers present before-and-after metrics. The stories are more persuasive than any mandate. Making dead code visible in dashboards—showing a live count of unused functions per repo—turns cleanup into a game that engineers want to win.

The Compound Effect of Continuous Pruning

The benefits of dead code removal compound over time. Each deletion reduces cognitive load. New engineers joining the team face a smaller codebase to learn. They can trace through the platform's logic without stumbling over irrelevant branches. Onboarding time shrinks, and the team can ship features faster.

Incident response becomes more predictable. With fewer code paths, the set of possible failure modes is smaller. Engineers can build mental models of the system that are accurate rather than aspirational. When an alert does fire, they know where to look. The reduction in ambiguity directly reduces stress and burnout.

Platform team morale improves measurably. Surveys at companies that adopted systematic dead code removal show higher satisfaction scores on questions about code quality and maintainability. Engineers report feeling more ownership over their tools. The act of cleaning up is intrinsically satisfying—a visible improvement that everyone can see.

Gartner has predicted that by 2025, organizations that actively manage tooling debt will reduce infrastructure and tooling costs by roughly 40 percent. That prediction is not just about cloud spend; it includes the human cost of maintaining unused code. Dead code is a form of technical debt that accrues interest in the form of on-call hours. The earlier a team starts pruning, the more compound interest they save.

None of this is easy. Removing code that someone else wrote can feel disrespectful. Automated tools produce false positives. A deprecation window may pass without incident, only for a quarterly batch job to fail six months later. The trade-off is real. But the alternative—letting dead code accumulate until the platform becomes unmanageable—is worse. Platform teams that measure, prune, and celebrate cleanup will find their on-call rotations quieter and their tools more trustworthy.

Recommend Posts
Tech

Platform Teams That Track Docs Accuracy Cut On-Call by 35 Percent

By Sara Park/May 21, 2026

Platform teams that measure and enforce documentation accuracy see a 35% drop in after-hours pages. Learn how automated validation and ownership reduce toil.
Tech

Platform Teams Cut On-Call by Measuring Internal Tool Dead Code

By Sara Park/May 21, 2026

Platform teams can reduce on-call fatigue by measuring and removing dead code in internal tools. This article explores call-graph analysis, cleanup playbooks, and cultural shifts that cut incidents.
Tech

Platform Teams Waste Developer Hours on Metrics That Don't Matter

By Sara Park/May 21, 2026

Platform teams spend countless hours building dashboards that nobody reads. Here's why most metrics waste developer time and what actually matters.
Tech

Platform Teams Waste Millions on Services No Engineer Uses

By Sara Park/May 21, 2026

Platform teams invest heavily in internal services that engineers ignore. Learn how to measure adoption, cut waste, and build tools developers actually use.
Tech

Platform Teams Cut Spending by Retiring Neglected Tooling

By Sara Park/May 21, 2026

Platform teams can cut costs significantly by retiring unused internal tools. Learn how to identify, deprecate, and remove neglected tooling to free up engineering hours and reduce maintenance burden.
Tech

Platform Teams That Kill Their Own Tools Save Developer Months

By Sara Park/May 21, 2026

Platform teams often build tools nobody uses. The best teams know when to kill their own creations, saving months of developer time and reducing cognitive load.
Tech

Platform Teams Save Millions by Tracking Developer Ramp Time

By Sara Park/May 21, 2026

Tracking developer ramp time can save millions. Learn how platform teams measure and reduce the time it takes for new hires to become productive, with real-world ROI.
Tech

Platform Teams That Measure Idle Compute Save More Than Cloud Bills

By Sara Park/May 21, 2026

Platform teams often overlook idle compute resources, wasting up to 45% of cloud spend. Learn how Netflix, Uber, and Spotify measure and reclaim this hidden capacity to cut costs and improve efficiency.
Tech

Platform Teams Succeed by Tracking Tool Adoption Weekly Not Monthly

By Sara Park/May 21, 2026

Monthly adoption reports hide rapid decay. Platform teams that track tool usage weekly catch drift early, reduce waste, and improve developer satisfaction. Here's how.
Tech

Platform Teams That Block Bad Defaults Cut Security Reviews by Half

By Sara Park/May 21, 2026

Platform teams that block dangerous defaults can cut security review time by 50%. Learn how Stripe, Netflix, and others use policy as code and guardrails to shift left without shifting blame.
Tech

Platform Teams That Measure API Deprecation Speed Ship 3x Faster

By Sara Park/May 21, 2026

New data shows platform teams that track and accelerate API deprecation ship features 3x faster. Learn how to measure deprecation velocity and cut technical debt.
Tech

Tired of Forgetting Your Goals? This Smart Q&A Platform Quietly Keeps You on Track

By Elizabeth Taylor/Mar 2, 2026

Discover how a smart Q&A platform can help you stay connected to your personal goals through gentle, human-like conversations that foster reflection, build self-trust, and support sustainable growth without pressure or guilt.
Tech

Platform Teams That Measure Toil Time Instead of Uptime

By Sara Park/May 21, 2026

Why platform teams should shift focus from uptime to toil time, how to measure it, and what reduction targets actually improve developer productivity.
Tech

Platform Teams Pay Most for Tools Engineers Silently Abandon

By Sara Park/May 21, 2026

Internal developer tools often go unused despite high costs. This article explores why engineers abandon platforms, the hidden tax of tool sprawl, and how platform teams can audit, retire, and reinvest wisely.
Tech

Platform Teams Recover Budgets by Measuring Idle Developer Tools

By Sara Park/May 21, 2026

Platform teams can recover hundreds of thousands by measuring idle developer tools. Learn how to audit usage, set policies, and cut waste without slowing velocity.
Tech

Platform Teams That Remove Unused Code Cut On-Call by 30 Percent

By Sara Park/May 21, 2026

How platform teams that systematically remove unused code reduce on-call alerts by 30%, improve build times, and lower cognitive load. Real data from Spotify, Stripe, and Etsy.
Tech

Platform Teams That Stop Measuring Uptime Start Reducing Toil

By Sara Park/May 21, 2026

Platform teams fixated on 99.999% uptime often miss the real cost: toil. Shifting focus from availability to time-to-restore and manual work reduction can improve both developer experience and system reliability.
Tech

Platform Teams That Track Abandoned Microservices Save Six Figures

By Sara Park/May 21, 2026

Orphaned microservices silently drain cloud budgets and engineering time. Platform teams that systematically find and retire ghost services can save six figures annually.
Tech

Platform Teams Save Budgets by Deprecating Dormant Repos

By Sara Park/May 21, 2026

Platform teams are uniquely positioned to cut cloud bills by deprecating dormant repos. Learn a three-bucket framework, automation tips, and pitfalls to avoid.