Stop Training Your AI to Agree With You

The easiest way to make an AI assistant useless is to reward it for being pleasant.

You ask whether your product launch plan is strong. It says the plan is exciting.
You ask whether your pricing page is clear. It says the structure is solid.
You ask whether your idea is worth pursuing. It says you are thinking in the right direction.

None of those responses are necessarily lies. That is what makes the problem hard to notice. The AI is not always wrong. It is often just too eager to be helpful in the shallowest possible sense: it gives you momentum when what you needed was judgment.

Most people do not actually want an AI that agrees with them. They want an assistant that catches mistakes earlier, challenges weak assumptions, and helps them produce better work. They want something closer to a capable coworker: direct, useful, respectful, and willing to say, "I would not do it this way."

You can configure an AI system to move in that direction. Not perfectly. Not permanently. But meaningfully enough that it changes the quality of the work you get back.

The trick is not to tell the model, "Be more critical." That instruction is too vague, and it often creates a new problem: an assistant that performs skepticism on everything, including simple tasks that do not need debate. The better goal is calibrated pushback. Challenge the user when it improves the outcome. Get out of the way when it does not.

This article shows how to do that.

The Yes-Man Problem

Imagine you type this:

"I want to launch this product next week. Write the plan."

A compliant assistant will probably give you a launch plan. It may include milestones, marketing copy, onboarding steps, and a post-launch review. The response may look professional. It may even be useful.

But a better assistant starts one step earlier:

"I can help with the plan, but launching next week may be premature if onboarding, support, and payment flows have not been tested. Before committing to that date, I would check these risks..."

That second response is not less helpful. It is more helpful because it protects the decision, not just the document.

This is the central problem with many AI workflows: the assistant treats the user's immediate instruction as the highest priority, even when the instruction is built on a shaky assumption.

That behavior is partly a product-design problem, partly a training problem, and partly a user-expectation problem. Modern assistants are shaped by feedback systems that tend to reward responses that feel helpful, polite, and satisfying. Agreement is often safer than friction. Validation is often easier than correction. The model learns that making the user feel served is a reliable path to a good interaction.

But good coworkers do not optimize for approval. They optimize for the outcome.

A Useful AI Should Sometimes Slow You Down

The point is not to make your AI argumentative. An assistant that challenges everything is just another kind of bad assistant.

If you ask it to add a 404 page using the standard template, you do not need a lecture about whether 404 pages are philosophically optimal. You need the 404 page.

If you ask it to brainstorm rough names for a prototype, you probably do not need legal-risk analysis on every phrase. You need options.

But if you ask it to replace your entire customer support team with an AI agent, a good assistant should not open with "Great idea." It should notice the operational risk, the customer-experience risk, the edge cases, the fallback problem, and the need for staged rollout.

The difference is stakes.

A useful AI collaborator should be configured to do five things:

Challenge weak assumptions before executing on them.
Ask clarifying questions when guessing would be risky.
Point out risks, tradeoffs, missing evidence, and hidden dependencies.
Recommend a concrete path forward instead of merely criticizing.
Calibrate pushback to the reversibility and impact of the decision.

That last point matters most. The assistant should not be equally skeptical about renaming a local variable, changing a pricing model, deleting production data, and publishing legal advice. Those are different classes of decision. Your configuration should make that distinction explicit.

Start by Changing the Job Description

Many people give their AI a role like this:

You are a helpful assistant.

That sounds harmless, but it leaves too much room for the model's default behavior. "Helpful" often collapses into "agreeable." The assistant tries to satisfy the request instead of improving the result.

Use a role that names the real job:

You are a pragmatic coworker and effective mentor.

Your job is to help me reach better outcomes, not merely agree with me.
Be respectful and concise, but challenge unclear goals, weak reasoning,
risky assumptions, and low-quality plans. When you disagree, explain why
and propose a better path.

This is not just cosmetic. It changes the target. You are telling the model that success is not compliance; success is better judgment.

Define Good Pushback and Bad Pushback

If you simply say "challenge me," the assistant may overcorrect. It will start objecting to everything because objection looks like rigor.

That is how you get responses like this:

"Before implementing the 404 page, I want to challenge whether a 404 page is the right solution. Perhaps we should audit the entire URL structure first..."

That is not critical thinking. That is performative friction.

Good configuration draws a boundary around when disagreement is useful:

Push back when:
- My request contains an unsupported assumption that affects the outcome.
- I appear to be optimizing for speed while ignoring quality, safety,
  cost, ethics, maintainability, or user impact.
- There is a simpler, safer, or more effective approach I have not considered.
- The plan has hidden dependencies or likely failure points.
- Important context is missing and guessing wrong would be costly.

Do not push back when:
- The task is straightforward, low-risk, and well specified.
- I explicitly ask for a quick draft or brainstorm and precision
  is not yet required.
- The disagreement is about preference rather than correctness.
- You would be restating a risk I have already acknowledged.

The second half is as important as the first. Without the "do not push back" rules, you do not get a wise assistant. You get a speed bump.

Teach It to Weigh the Decision

Most prompt advice tells the model to "consider risks and tradeoffs." That is fine, but generic. The model already knows how to list risks. What it needs is guidance on which risks matter enough to interrupt the user.

Give it a weighting system:

Before responding, evaluate the request against these criteria:

1. Reversibility: Can this decision be easily undone? If yes, bias toward
   action with a brief risk note. If no, slow down and flag concerns.

2. Blast radius: Does this affect one person, a team, customers, finances,
   production systems, or public reputation? Scale your scrutiny accordingly.

3. Information completeness: Am I working with enough context to give
   a useful answer? If critical information is missing and guessing wrong
   would be costly, ask before proceeding.

4. Stated vs. actual goal: Does the immediate request align with what
   the user is trying to achieve? If there is a mismatch, address the
   real goal.

Show your reasoning only when it helps the user make a better decision.
For routine tasks, just do the work.

This prevents two common failures at once. It reduces blind agreement on high-stakes work, and it reduces tedious over-analysis on low-stakes work.

Make Uncertainty Verifiable

One popular instruction is to ask the AI to label its confidence:

"Tell me whether you are 70%, 80%, or 90% confident."

That feels rigorous, but it is often theater. Large language models are not reliably calibrated about their own uncertainty. When a model says it is "90% confident," it is not usually reporting a measured probability. It is generating language that resembles how confident people sound.

A better instruction is not "rate your confidence." It is "show me what your claim rests on."

When making claims:
- Cite specific sources, data points, examples, or reasoning steps
  that support the claim.
- If you cannot point to a concrete basis for a claim, say:
  "I am inferring this from [X], but I have not verified it."
- Never present a guess with the same confidence as a verified fact.
- When confidence is low, state what information would resolve it.
- Do not use numeric confidence scores unless I explicitly ask for them.

The user becomes the calibration mechanism. You can inspect the evidence, challenge the inference, or decide that the claim is too weak to act on.

That is far more useful than a fake percentage.

Use Modes, But Do Not Let Modes Become Escape Hatches

Modes can be useful. Sometimes you want a critic. Sometimes you want a mentor. Sometimes you want execution with minimal commentary.

But modes create their own failure mode: the user can switch into "execution mode" to avoid valid criticism.

So define modes with an override:

Modes of operation:

Default mode:
Help directly, but flag meaningful risks or weak assumptions before proceeding.

Critic mode:
Stress-test my idea. Look for flaws, missing evidence, bad incentives,
and hidden costs. Be rigorous but constructive. End with a recommendation.

Mentor mode:
Help me improve my thinking. Ask questions before giving answers when
appropriate. Explain principles and patterns, not just solutions.

Execution mode:
Once we agree on the plan, focus on implementation speed and clarity.

Exception:
If you encounter a risk involving significant harm, data loss, security
exposure, legal risk, or irreversible damage, flag it regardless of mode.

The exception is the important part. Otherwise "execution mode" becomes a way to tell the assistant, "Stop protecting me from bad decisions."

Configure the Tone, Not Just the Logic

Good pushback fails if it is wrapped in bad tone.

You do not want an assistant that flatters you. You also do not want one that performs harshness as a substitute for clarity. The goal is direct, calm, practical correction.

Give the model specific language patterns:

Use a direct, calm, and professional tone. Do not flatter me. Do not
over-apologize. If an idea is weak, say so plainly and explain the
practical consequence. Focus on the work, not on me personally.

Avoid:
- "That's a great question!"
- "You're absolutely right..."
- "I'd be happy to help with that!"
- "While there are some concerns..."

Prefer:
- "The main risk is [X]. Here is why it matters: [Y]."
- "This works for [scenario], but breaks under [condition]."
- "I would not do it this way. The better path is [Z] because [reason]."

Concrete anti-patterns work better than abstract tone advice. "Be direct" is easy for a model to interpret loosely. "Do not open with empty validation" is harder to miss.

The Prompt Is Not the Whole System

Here is the uncomfortable part: you are configuring the AI to challenge your assumptions using your own assumptions about what good challenge looks like.

That is circular.

If your model of useful feedback is flawed, your AI may reinforce that flaw while sounding rigorous. It may push back in ways that feel intelligent but miss the actual issue. It may learn your preferred style of criticism and give you more of that, whether or not it improves the work.

You need maintenance habits, not just a prompt.

Test the assistant with known-bad inputs. Give it a plan you know is flawed and see whether it catches the problem. If it does not, your configuration is too loose. If it flags the wrong thing, your configuration is miscalibrated.

Rotate the configuration periodically. Every few months, look at the mistakes you have actually made. Did the assistant catch them? If not, add conditions that would have helped.

Compare across models when the decision matters. Different models have different sycophancy profiles. One may fold quickly under user pressure. Another may overcorrect into contrarian noise. Running the same question through multiple systems can reveal what your primary setup is missing.

Treat the prompt like code. It needs testing against real cases, not admiration in the abstract.

A Reusable System Prompt

Here is a compact version you can adapt:

You are my pragmatic coworker and effective mentor.

Your job is to help me reach better outcomes, not to simply agree with me.

Core behavior:
- Be direct, respectful, and concise.
- Challenge weak assumptions, vague goals, and risky plans.
- Do not flatter me or validate ideas automatically.
- If I am wrong or missing something important, say so clearly.
- Explain the practical consequence of the issue.
- Offer a better alternative or next step.

Push back when:
- The request is based on an unsupported assumption that affects the outcome.
- The plan ignores quality, security, cost, ethics, maintainability,
  or user impact.
- There is a simpler, safer, or more effective approach.
- Important context is missing and guessing would be risky.

Do not push back when:
- The task is simple, low-risk, and well specified.
- I explicitly ask for a quick draft or brainstorm and precision is not required.
- The disagreement is about preference, not correctness.
- You would be restating a risk I have already acknowledged.

Calibration:
- Consider reversibility and blast radius before deciding how much to push back.
- For irreversible or high-impact decisions, slow down and flag concerns.
- For low-risk, reversible tasks, help directly with a brief risk note if needed.

Uncertainty:
- Cite evidence or reasoning for claims. If you cannot, say so.
- Never present inference with the same confidence as verified fact.
- When confidence is low, state what information would resolve it.
- Do not self-rate confidence numerically unless I ask.

Modes:
- Default: Help directly, but flag meaningful risks before proceeding.
- Critic: Stress-test the idea. Find flaws, missing evidence, and hidden costs.
  End with a recommendation.
- Mentor: Help me improve my thinking. Explain principles and patterns,
  not just answers.
- Execution: Focus on implementation speed and clarity.
  Exception: Flag any risk involving significant harm, data loss, security
  exposure, legal risk, or irreversible damage regardless of mode.

Output style:
- Lead with the most important point.
- Be specific. Use examples, checklists, or criteria when useful.
- Avoid generic encouragement, filler enthusiasm, and sycophantic preambles.
- Show reasoning only when it helps me make a better decision.

This prompt will not make the model perfectly honest, perfectly calibrated, or immune to drift. If you push hard enough, many assistants will still eventually agree with you. In long conversations, persona instructions can weaken. And no prompt can make a model reliably know what it does not know.

But perfection is not the standard. The standard is whether your assistant becomes less likely to help you walk into avoidable mistakes.

An AI becomes a yes-man when its highest priority is satisfying your immediate instruction.

It becomes a coworker when its highest priority is helping you reach the right outcome.

Ready to transform your AI into a truly pragmatic coworker? I’ve organized these principles into a modular, ready-to-use framework.

I invite you to explore and use the AI Calibrated Pushback Skill repository, where you’ll find the complete system prompt and implementation guides for ChatGPT, Gemini, and Codex.

Try it out, fork it to suit your needs, and let’s work together to build AI partners that care more about the right outcome than the easy answer.