The Limit of a Single Skill
A single Skill is designed to do one thing. A "technical writer" Skill can't search for competitor data, analyze it, and produce a formatted report — that requires multiple steps, multiple specializations, chained together.
Workflow chaining composes single Skills into pipelines that span those limits.
Four Chaining Patterns
Pattern 1: Sequential Chain
A → B → C
each Skill's output is the next Skill's input
Simplest structure, appropriate for linear tasks. Critical constraint: one step fails, the whole chain stops.
Pattern 2: Parallel Fan-out
→ B1 →
A → split → B2 → merge → C
→ B3 →
Multiple Skills run concurrently, results merged. Theoretical speedup = number of branches. Actual speedup depends on how long the merge step takes.
Pattern 3: Conditional Routing
A → Router → [technical] → tech-writer
[marketing] → marketing-writer
[default] → general-writer
A router Skill outputs an enum value; the workflow branches based on the result. The router must return a specific enumerated type — not free text.
Pattern 4: Feedback Loop
A → Evaluator → [score ≥ 7] → output
↓
[score < 7] → feedback → A (retry, max 3)
Quality gate: if output doesn't meet threshold, rewrite with evaluator feedback. Always set a max retry count to prevent infinite loops.
Demo Design
All four patterns use real LLM calls:
| Pattern | Implementation | What's measured |
|---|---|---|
| Sequential | LangGraph 3-node graph: keywords → outline → write | End-to-end latency |
| Parallel fan-out |
ThreadPoolExecutor × 3 → merge |
Fan-out time, total time, speedup ratio |
| Conditional routing | LLM classifies input type → routes to 3 writers | Routing accuracy, output style comparison |
| Feedback loop | Write → score (1-10) → rewrite with feedback, max 3 rounds | Iteration count, score per round |
Run Results
Pattern 1: Sequential Chain
Topic: Python async/await: from coroutines to production-ready patterns
Keywords: async programming, coroutines, await, production-ready patterns
Outline: - Introduction to Async Programming in Python
- Understanding Coroutines and the `async` Keyword
- Implementing `await` ...
Article: ### Introduction to Async Programming in Python
Async programming in Python has revolutionized the way we handle I/O...
Time: 35.1s (3 sequential LLM calls)
Pattern 2: Parallel Fan-out
Company: Notion
Product: Notion stands out with its comprehensive suite...
Market: Notion's market positioning is as a versatile productivity platform...
Tech: Notion's technology stack is notable for its robust collaboration...
Merged: Notion's competitive edge lies in its versatile productivity suite...
Fan-out time: 12.4s | Total (incl. merge): 24.5s
Sequential equiv: ~37.2s | Speedup: ~1.5x
Pattern 3: Conditional Routing
Input: "Explain how Kubernetes pod scheduling works with a code example"
Route: technical (18.9s)
Output: Kubernetes pod scheduling is the process of assigning a pod to a node...
Input: "Write a compelling product description for our new AI writing tool"
Route: marketing (10.7s)
Output: Unleash Your Words with Unmatched Precision! Transform your writing...
Input: "What is machine learning and why does it matter?"
Route: technical (23.5s)
Output: Machine learning (ML) is a subset of artificial intelligence...
Pattern 4: Feedback Loop
Topic: Write a technical article about Redis Cluster sharding strategy
Iteration 1: score=8/10 ✓ PASS
feedback: The article provides a clear explanation of Redis Cluster
sharding, but could benefit from...
Final score: 8/10 | Iterations: 1/3 | Time: 44.8s
Three Findings
Finding 1: Parallel Speedup 1.5x, Not the Theoretical 3x
Three analyzers ran concurrently. Fan-out phase: 12.4s. Sequential equivalent: 37.2s (12.4 × 3). Fan-out phase alone: ~3x speedup. Total pipeline (fan-out + merge): 24.5s. Total speedup: 37.2 / 24.5 ≈ 1.5x.
Amdahl's Law at work:
Total speedup = 1 / (serial fraction + parallel fraction / N)
This run:
Parallel portion (fan-out): 12.4s / 24.5s ≈ 51%
Serial portion (merge): 12.1s / 24.5s ≈ 49%
When 49% of the pipeline is serial, max speedup ≈ 2x,
regardless of how many concurrent branches you add.
The optimization lever is the merge step, not more concurrent branches. Cut the merge prompt's token load, and the serial bottleneck shrinks — raising the effective speedup ceiling.
Finding 2: The Third Routing Result Was "Technical," Not "General"
"What is machine learning and why does it matter?" routed to the technical writer, not general.
The classification is reasonable but audience-dependent. For an engineering team, ML is a technical topic. For a product manager, it's general. The classifier saw only the question text, not who was asking.
Production fix: include audience information in the classifier input:
classifier_input = f"Request: {request}\nTarget audience: {workflow_input.audience}"
Without it, the router makes an implicit assumption about who's asking. That assumption is invisible and wrong for at least one audience type.
Finding 3: Feedback Loop Passed First Iteration — That's the Point
Score 8/10 on iteration 1, no retries needed. Time: 44.8s.
The gate let the first attempt through because 8/10 is good enough. That's the design working correctly — quality gates filter genuinely poor outputs, not every output.
Threshold calibration:
- 5/10 is too low: almost nothing triggers; the gate becomes decorative
- 9/10 is too high: almost everything triggers; token cost doubles
- 7/10 is a useful starting point: blocks real quality gaps, allows a solid first draft through
Feedback quality determines retry effectiveness:
Effective: "Missing code examples for write-behind pattern; clarify TTL vs eviction policy"
Ineffective: "Make it better and more comprehensive"
The second feedback gives the writer nothing to act on. The loop runs, tokens are spent, and the output barely changes.
Error Handling
Chained workflows encounter four error types, each requiring a different response:
Transient (LLM timeout, rate limit)
→ Retry 3x with exponential backoff: 1s, 2s, 4s
Quality gate failure
→ Retry with feedback, max 3 rounds
→ After max retries: return best result + quality annotation
Fatal (permission denied, malformed input)
→ Abort immediately, surface clear error to user
Partial completion (one parallel branch failed)
→ Merge available results, annotate missing branch
→ Don't fail the whole pipeline for one missing piece
State Schema
In a chain, the upstream Skill's output is the downstream Skill's input. Inconsistent schema breaks the chain the first time output changes.
{
"status": "success",
"output": {
"main_content": "...",
"metadata": { "word_count": 2500, "confidence": 0.92 }
},
"trace_id": "skill-abc-123"
}
Context compression for long pipelines: when upstream output exceeds ~5000 tokens, downstream Skills rarely need the full content. Three options:
- Insert a summarizer Skill to compress before passing downstream
- Pass only the fields the downstream Skill needs (
output.metadata, notoutput.main_content) - Store large intermediate artifacts externally; downstream retrieves on demand
Design Checklist
Sequential chain
- [ ] Each step's output format is defined (downstream can parse it)
- [ ] Critical steps have fallback logic (not just abort-on-failure)
Parallel fan-out
- [ ] Merge Skill handles partial branch failure (annotate, don't fail)
- [ ] Measure merge step latency before deciding how many branches to add
Conditional routing
- [ ] Router outputs an enumerated type, not free text
- [ ] Default branch covers unclassified inputs
- [ ] Routing input includes audience or context metadata
Feedback loop
- [ ] Max retry count is set (3 is a reasonable default)
- [ ] Feedback targets specific issues, not general "be better"
- [ ] After max retries: return best result + annotation, not an error
Summary
- Parallel speedup was 1.5x, not 3x: fan-out phase ran 3x faster, but the merge step took as long as the fan-out — Amdahl's Law caps the total at 1.5x; the fix is a lighter merge step
- Conditional routing needs audience context: topic alone is ambiguous; the same question routes differently for technical vs general audiences
- Feedback loop efficiency depends on threshold design: first-attempt pass shows the threshold is calibrated correctly; the gate's job is filtering real quality gaps, not forcing retries
References
- LangGraph StateGraph documentation
- Full demo code: skill-05-workflow
Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.
Find more useful knowledge and interesting products on my Homepage













