A user reports the dashboard is slow. The spec says "improve dashboard load time." Twenty minutes later your agent has built a caching layer, a new read-optimized table, and a background job system. Tests pass, PR looks good. Nobody asked if it's just one unindexed query.
You're sitting alone with your agent
You're sitting there with Cursor or whatever, building fast, and there's nobody pushing back. No PM going "wait, that's not what the customer asked for." No teammate saying "didn't we agree on something different last week?" It's just you and the agent, and the agent does what you tell it to. The specs live in Jira or someone's head. You're not checking the spec while you code. Features drift. Nobody notices until it ships.
Code reviews doesn't catch this either. A reviewer sees a diff, maybe a PR description. By the time code is in a PR you've already kind of agreed that the thing should exist, so nobody questions "are we building the right thing?".
Specs next to your code
Keeping specs close to the code isn't a new idea. There's been executable specs and living documentation forever. Spec-driven development is getting more attention now with AI agents. Whether you vibe code or do things more structured, specs first or code first, it doesn't matter that much. The point is to have something that says what you're building and why.
I built a simple version of this. A few markdown files per feature in a specs/ folder — who it's for, why it exists, what counts as done. Your agent maintains them as part of normal coding, reads the spec before making changes, keeps the spec in sync when you change direction. Works in Cursor, Claude Code, Windsurf, Codex.
npx skills add bjornno/skills --skill storyline
Getting people to confront the hard questions
But specs nobody challenges are just documentation. What I actually wanted to solve is how do you get people to confront the hard questions before you've shipped.
The skill has a review mode. Say "start review" and the agent reads your specs and diff. It finds the most important risk and confronts you with it. Yes, No, or Unsure. Then walks you through the discussion.
This works best when it's not just you. Get the team around a screen, pull in the PM, designer, whoever. Start the review, share your screen, and let it drive. The agent asks, the team decides.
Here's how it could look:
Premature implementation
The spec says "improve load time" but there's no evidence the root cause was identified. A caching layer, a new table, and a job system were built before anyone checked if it's just one unindexed query.
Are you comfortable investing in this architecture without profiling the actual bottleneck first?
Yes / No / Unsure
That question hits different when the senior dev is in the room going "wait, did anyone run EXPLAIN on that query?" Or when the PM says "the user only complained about one page, not the whole dashboard." It keeps going from there, follow-up questions based on what you answered, until there's an actual decision. Now there's a decision on record in specs/reviews/ instead of a Slack thread nobody will find again.
Hosted reviews
Sometimes you can't get everyone around a screen. Different timezones, or you just want a review on every PR automatically. I built a hosted version for that. Same review logic but as a shared platform. Hook it up as a GitHub Action, everyone gets a join code, done. Closed beta for now, since every review costs me AI tokens, but I'll try to give access to everyone who requests it.
Tell me what's missing
If you know of something that does this better, I want to hear about it. I've looked around and haven't found this combination of "agent maintains specs and then uses them to challenge decisions" but I might have missed it. Try it and tell me if this actually helps or if this is just more AI wrapping AI noise. The point is getting people to think and talk, not use more tokens.
Install: npx skills add bjornno/skills --skill storyline | GitHub
Hosted Reviews: npx storyline-review create | storyline-review.vercel.app













