Dead Code Finder: GitLab Orbit-based static analysis that turned out to be harder than expected

I built a GitLab Duo Agent Platform flow for a hackathon. The goal was simple: find code that nothing actually calls.

Not "what breaks if I delete this." That question already has a dozen tools. I wanted the narrower one: does anything call this at all?

What I built

A flow called Dead Code Finder. It queries GitLab Orbit's knowledge graph for CALLS and IMPORTS edges on every Definition node in the project, then sorts findings into three buckets:

Confident: zero incoming edges, not behind a decorator, not an external entry point
Uncertain: ambiguous cases I can't fully resolve statically (inheritance, MRO dispatch)
Skipped: decorator-based dispatch, test framework reflection, hardware entry points. Explicitly flagged, not silently dropped.

One hard rule: never say "safe to delete." The report only says "no reference found in the static call graph, here is exactly what was checked."

It posts one report and stops. Never modifies files, never opens an MR.

The core logic lives in a SKILL.md that defines the Orbit traversal procedure step by step. The agent flow reads it, runs it, writes the report.

What I expected

Query the graph, check edge counts, flag anything with zero incoming CALLS or IMPORTS. Pretty mechanical.

I expected the hard part to be writing clean classification logic. It wasn't.

What actually happened

1. The platform didn't cooperate — until it did

Two things broke at runtime.

query_graph and get_graph_schema were declared in the flow's YAML config but weren't in the actual toolset when the flow ran. Separately, skill injection was unreliable — sometimes the flow received only the manifest entry for SKILL.md (name and description) instead of the actual procedure body.

Both turned out to have explanations. The skill injection issue I handled by inlining the full procedure directly in the system prompt as a guaranteed fallback — the injected skill is preferred when present, but the inline copy is always there. The missing graph tools turned out to be a per-user account setting: GitLab ships an "Orbit in GitLab Duo" preference (User Settings → Preferences) with separate opt-in toggles, and the "Custom Agents" toggle was off by default. Enabling it gave the flow query_graph/get_graph_schema access immediately.

I only found the toggle after building a file-based fallback mode. I kept both. The toggle can be off, skill injection can still fail independently, and a flow that degrades loudly is more useful than one that degrades silently.

In fallback mode, the flow reads actual repo files using list_repository_tree, get_repository_file, find_files, and blob_search, applies the same heuristics, labels every finding [INFERRED], and opens the report with an explicit banner naming which tools were missing.

No pretending to have graph evidence when I didn't.

2. Static analysis is messier than it looks

A few cases a naive edge-count check gets wrong:

__init__, __enter__, __exit__, __del__

These almost always look dead in a naive check. When you instantiate a class, Orbit registers a CALLS edge to the class, not to __init__. So __init__ shows zero incoming edges even when the class is instantiated dozens of times. I confirmed this against a ~500-line SDK class (EphemeralAgentExecutor) with a full unittest suite. Correcting the check to look at the enclosing class's incoming edges instead fixed the false positives.

Decorator-based dispatch

A function registered into a dict via a decorator and called later by string-key lookup is structurally indistinguishable from dead code in a static call graph. There is no literal call in source. These go in Skipped with the actual reason stated.

Test framework reflection

unittest.TestCase methods are discovered through reflection over class attributes, not through any literal call anywhere. The test runner finds them at runtime in a way the static graph can't see. Same bucket, same reasoning.

Inheritance and MRO

A method only reachable through a subclass that doesn't override it might show no direct incoming edges on the base class method itself. These go in Uncertain rather than flagged dead.

Validation

I built test fixtures specifically to hit the tricky cases: plain unused functions, cross-file import resolution, inheritance/MRO, decorator dispatch, constructor/dunder handling, and unittest reflection.

Two live runs in fallback mode correctly identified every planted dead-code fixture with cited file and line evidence, correctly excluded the decorator and test-discovery cases with reasoning for each, and correctly downgraded the ambiguous inheritance and script entry point cases to Uncertain.

Then I ran the Orbit traversal procedure manually against the live graph via CLI to check whether fallback mode's file-based guesses matched actual graph data.

Results:

totally_unused_helper — zero incoming CALLS/IMPORTS edges. Actually dead.
cross_file_dead_helper — zero incoming edges. Actually dead.
undecorated_dead_function — zero incoming edges. Actually dead.
Base.greet — incoming edge from Child.run. Reachable via inheritance. Fallback mode correctly put this in Uncertain rather than flagging it dead.
summarize_text — alive via script entry point. Correctly Uncertain.
__init__, __exit__, __del__ — zero direct incoming edges, with usage only on the enclosing class. Constructor correction validated.

The graph also surfaced something fallback mode couldn't confirm: add_temp_file and issue_credential show zero incoming CALLS edges. Potential genuine dead code that file-reading alone couldn't settle.

Every fallback finding held up against the graph.

What the first Orbit run found

Once the Custom Agents toggle was on, I re-triggered the flow live and ran it against the graph for the first time from inside the flow itself. It worked — and immediately turned up two bugs in the procedure logic.

Bug 1: standalone driver functions misclassified as Bucket A.

use_it() is a top-level function that instantiates Child and calls .run() on it to exercise the MRO chain. It has zero incoming CALLS edges, because nothing calls it — it's meant to be run directly. The flow flagged it as confidently dead. It isn't. It's the same shape as a single-definition main file: an entry point that external tooling invokes, just not literally named main. The procedure now explicitly checks whether a zero-edge function's body instantiates a class and chains method calls — if so, it goes to Bucket B (Uncertain), not Bucket A.

Bug 2: a genuine call edge came back as zero.

cross_file_used_helper is directly called by cross_file_test_caller.run() — a simple, one-hop call. The batched query returned zero edges for it anyway. A separately observed issue in this environment: 19-digit Definition IDs sometimes come back truncated by one digit in query responses, which produces a false "no edges" result for nodes that are genuinely called. The fix: when a simple, expected edge comes back as zero, re-run that single node ID on its own before trusting the result. The solo re-check resolved correctly — cross_file_used_helper is alive and doesn't appear in any bucket.

A second run after both fixes corrected both cases, and independently re-confirmed add_temp_file and issue_credential as Bucket A, matching the CLI finding from before.

What I'd do differently

Confirm which tools are actually available in the runtime before writing any logic that depends on them. Building the fallback mode wasn't hard, but designing for both paths from the start would have been cleaner than retrofitting it.

Also: run the flow against the graph earlier in development. The two procedure bugs above only showed up once query_graph calls were running — fallback mode's file-reading was too weak to surface them.

What's still open

Extend validation past Python
Transitive dead code detection (code only referenced by other dead code)
References from non-code files: YAML, CI templates, cron jobs

The thing I actually learned

A report that says "I don't know" in the cases where it genuinely doesn't know is more useful than one that sounds confident everywhere and is occasionally wrong.

That ended up applying to the platform too. Once I confirmed the Orbit tools weren't available, the right call was to say so loudly in the report, not mask it with a fallback that looked like the real thing. And when the tools became available and the first run found bugs, the right call was to fix the procedure and run again rather than rationalize the bad output.

The three-bucket structure came from the same instinct. Most dead-code tools give you a flat list. The flat list trains you to ignore it once it's wrong a few times. Labeling the uncertainty explicitly is what makes the confident findings actually worth acting on.