Last Tuesday my OpenClaw agent ran a security audit cron at 11:02 AM. It fired on schedule. The cron dashboard showed ok. No errors. No alerts. No Telegram report.
It also produced nothing.
The agent had crashed mid-turn — a MiniMax overload error — but the outer cron framework didn't catch it. The isolated session returned status: ok even though the sub-agent turn had silently failed. The failure alert never fired because there was no error to detect.
I ran cron list. Everything looked fine. I had no idea anything was wrong until I manually checked the session transcript three days later.
That's when I built the silent crash detector.
The Problem with Framework-Level Error Detection
OpenClaw's cron system detects errors at the framework level — network timeouts, auth failures, unhandled exceptions thrown by the cron runner itself. What it can't detect is what happens inside the agent turn.
When an isolated session spawns a sub-agent and that sub-agent crashes with an overloaded_error from MiniMax, the outer session sees this as a normal assistant message with content like:
[assistant turn failed before producing content]
The message has role: "assistant" and status: "ok". The outer cron runner completed successfully. The framework has no idea anything went wrong.
This is the silent failure mode that's hardest to catch: not an exception, not a timeout, but a zero-output completion that looks identical to a successful run that just had nothing to say.
How session-review.js Detects It
The fix is a 30-line addition to the existing session review script. The core logic:
// Track failed state in the review loop
let stats = { ok: false, failed: false, ... };
for (const entry of assistantEntries) {
// Parse the assistant message...
// DETECT SILENT CRASH: agent produced "[turn failed before producing content]"
if (/\/turn failed before producing content\/i.test(content)) {
// Extract the structured errorMessage if present
const errorMatch = content.match(/errorMessage["']?\s*[:=]\s*["']([^"']+)["']/i);
if (errorMatch) {
stats.errorDetail = errorMatch[1];
}
stats.failed = true;
}
}
// Print warning in the report
if (r.failed) {
console.log(`⚠️ SILENT CRASH DETECTED: ${r.errorDetail || 'unknown cause'}`);
}
The key pattern is /turn failed before producing content — a literal string OpenClaw injects into the transcript when the agent crashes silently. Once you know to look for it, you can detect it anywhere.
The Error Message Extraction
The raw crash message often contains a structured errorMessage field that tells you why it failed. The original script was printing the generic "turn failed" message without extracting it:
BEFORE: "turn failed — overloaded_error: server is busy, please retry later"
AFTER: "turn failed — overloaded_error: server is busy, please retry later"
Wait, those look the same. The difference is that the before was a raw print of the entire transcript block. The after parses the structured JSON error:
// Extract structured errorMessage from the assistant content block
const errorMatch = content.match(/errorMessage["']?\s*[:=]\s*["']([^"']+)["']/i);
if (errorMatch) {
stats.errorMessage = errorMatch[1];
}
This matters because some crashes produce opaque assistant messages that look like normal text. The errorMessage field gives you the provider-level cause: overloaded_error, rate_limit_exceeded, context_length_exceeded, etc.
The Root Cause Chain
Once I could see the crash details, I found a pattern: all silent crashes were MiniMax overloaded_error events. The fix wasn't in the review script — it was upstream.
I changed the cron model configuration from a fallback chain:
model: "minimax-portal/MiniMax-M2.7"
fallbacks: ["openai/gpt-oss-120b:free"]
To a single pinned model:
model: "minimax-portal/MiniMax-M2.7"
The free fallback (gpt-oss-120b:free) was over-refusing tasks and causing cascading failures. Removing it didn't just fix the silent crashes — it made the crons faster and more reliable overall.
The Nightly Integration
The review script runs as part of a nightly self-improvement cron. Each morning, it checks the previous day's cron session transcripts and flags any silent crashes. The output looks like:
Cron Session Review — Nightly SI
================================
Drafter (147ea423): ok, 3.2k tokens, 4 turns
Security Audit (744883c3): ⚠️ SILENT CRASH DETECTED — overloaded_error: server is busy
Morning Brief (9f3a12): ok, 1.8k tokens, 3 turns
The alert goes to Telegram so I see it first thing in the morning. Before this, I'd find out about silent crashes days later when I happened to manually check a transcript.
What the Dashboard Misses
The cron list output shows ok for sessions that produced nothing. This is a known limitation — the framework reports its own status, not the agent's. From the framework's perspective, a session that crashes and produces an error message is still a completed turn.
The session review script fills this gap by looking one level deeper: at the actual transcript content, not just the framework status code.
The One-Line Takeaway
If you run OpenClaw crons and rely on cron list for health monitoring, add a transcript-level review step. Framework status and agent output are two different things — and the silent failures hide in the gap between them.
What I learned: Dashboard green lights don't mean the agent did anything. Check the transcript, or build something that checks it for you. Silent failures are the ones that hurt you most.









