It’s 2:00 AM. Your phone buzzes. Production is down, the checkout service is throwing 500 errors, and your team is scrambling.
You pull up the incident, and the real pain begins. Which recent merge request broke this? Was it the payment service? The quota logic? A pipeline configuration? You’re manually hopping between GitLab issues, file diffs, CI/CD logs, and deployment histories while revenue bleeds.
This is the hidden cost of production incidents: the hours of manual, forensic archaeology required before anyone even knows what to fix.
For the GitLab Transcend Hackathon, I wanted to solve this exact problem. I built the Incident Root Cause Flow—an open-source, AI-powered agent that natively traverses GitLab's Orbit knowledge graph to find the guilty commit in seconds.
Here is how it works, how it leverages graph traversal, and how you can install it in your own GitLab projects.
The Concept: Traversing the SDLC Graph
The flow is built entirely on the GitLab Duo Agent Platform and powered by GitLab Orbit.
Orbit is a knowledge graph that maps your code structure (definitions, references, call graphs) and connects it to your SDLC objects (issues, merge requests, pipelines, deployments). Because everything is linked, we can trace a failure backward.
When a new Incident is created, the agent wakes up and executes this precise traversal path:
- Find Candidates: Queries the graph for all MRs merged recently.
- Inspect Diffs: For each MR, it finds the changed files.
- Map the Code: It extracts the specific functions/definitions that were altered.
-
Trace the Blast Radius: It follows the
CALLSedge to find exactly which downstream functions rely on the changed code. -
Cross-Reference CI/CD: It checks if the
HAS_HEAD_PIPELINEedge indicates a failed test run before the code was merged.
The agent scores each MR based on this graph evidence and posts a beautifully formatted Markdown comment directly to the incident, complete with a confidence rating and the exact call-chain that broke.
Incident → Project → MergeRequest (failed pipeline?) → MergeRequestDiffFile → File → Definition ←[CALLS]← callers
Instead of digging blindly, the on-call engineer has a confidence-scored analysis and a direct link to the likely culprit within 30 seconds.
How to Install the Flow
Because this is a native GitLab Duo Flow, there are no external servers, webhooks, or Python scripts to host. It runs securely inside your GitLab environment.
- Navigate to your GitLab project.
- Go to AI → Flows (or search for the AI Catalog).
- Click New flow.
- Give it a name like
Incident Root Cause Analyzer. - Grab the
config.yamlfrom the open-source repository and paste it into the YAML configuration editor. - Click Save.
Note: The YAML dynamically injects your project ID using {{project_id}}, so it is completely portable out of the box.
How to Use It (Attaching the Trigger)
Once the flow is saved, you need to tell GitLab when to run it. We want this agent to fire automatically the moment a production issue is logged.
- Still in the Flows interface, switch to the Managed tab.
- Find your new flow and click Enable.
- Under the Add triggers section, select Work item (Created).
- Ensure your project is selected as the target.
Testing the Automation
To see it in action:
- Go to Plan → Issues → New Issue.
- Change the Type dropdown to Incident.
- Give it a realistic title, like
checkout-service returning 500s, and hit submit.
Wait about 30 seconds, refresh the page, and the agent will have posted its complete root cause analysis as a comment, isolating the exact MR and code definitions that caused the outage.
What's Next?
Mean Time to Resolution (MTTR) doesn't have to be bottlenecked by human context-switching. By combining LLM reasoning with strict, deterministic graph traversal via Orbit, we can eliminate the guesswork of incident triage.
You can check out the full source code, the demo application, and the prompt engineering in the GitLab repository here.
Happy hacking, and may your on-call shifts be quiet!













