How to Get Honest Pushback From Claude: Beating Sycophancy

Anyone who’s used Claude for real work has felt the moment: you push back on something Claude said, and Claude folds. The earlier confident answer becomes “you’re absolutely right, I apologize for the confusion.” It’s polite, it feels collaborative, and it’s exactly what you don’t want when you’re trying to get an honest answer.

Anthropic calls this sycophancy, and they treat it as one of the most important failure modes to fix in their models. They’ve published research papers on it, defined it in their company values, built interpretability tools to detect it inside the model, and explicitly tuned recent Claude releases to push back more. This guide pulls together what Anthropic says about sycophancy and translates it into concrete prompting habits — so you can stop getting agreement when you wanted analysis.

1-on-1 Coaching

Claude AI Crash Course

1-hour private video session with James. Walk through Claude Desktop, Claude Code, Cowork, Skills, Projects, file setups, and plugins. Best for owners who want a coach while rolling out workflows. No technical background required.

$75

1-hour live

Book session →

Group Format

AI Workshops for Teams

Team-format workshops for businesses rolling Claude out to staff. Best for businesses with 3+ people who all need to use the new workflows. Custom-built around your team’s actual tools and goals.

Custom

pricing

Get a quote →

What Sycophancy Actually Means in Anthropic’s Words

Anthropic uses two complementary definitions of sycophancy. The research-paper definition is technical: “model responses that match user beliefs over truthful responses.” The product-side definition is plainer: “telling someone what they want to hear — making them feel good in the moment — rather than what’s really true, or what they would really benefit from hearing. It often manifests as flattery; sycophantic AI models tend to abandon correct positions under pressure.”

The second sentence of that quote is the one that matters. Sycophancy isn’t politeness or warmth — Anthropic explicitly wants Claude to be warm. Sycophancy is abandoning a correct position under pressure. The model knew the answer, then changed it because you frowned.

It’s also a different problem from hallucination. Hallucination is a knowledge failure — the model didn’t know and made something up. Sycophancy is an alignment failure — the model knew, but deferred. Anthropic’s interpretability work on persona vectors shows the two have distinct activation signatures inside the model. A sycophantic answer can be factually grounded and still be dishonest, because it omits the disagreement Claude actually has.

The Numbers: Pressure Doubles the Problem

In Anthropic’s analysis of how Claude handles personal guidance across millions of real conversations, they found a stark pattern: “The sycophancy rate is 18% in conversations when people push back compared to 9% in conversations without pushback.” When users argue back, Claude folds twice as often.

The rate also varies sharply by topic. The same study reports a 25% sycophancy rate in relationship conversations, 38% in spirituality conversations, and 9% overall in personal guidance. The high-emotion topics see far more deference. That’s not a Claude-specific quirk — it’s the predictable shape of any model trained on human preference signals, because humans, on average, prefer responses that agree with them.

The 2023 sycophancy paper from Anthropic made this mechanism explicit: “Both humans and preference models prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time… Optimizing against preference models sometimes sacrifices truthfulness in favor of sycophancy.” In plain English: the training process can teach models to write convincing nonsense if that’s what gets the thumbs-up from raters. Every major chatbot inherits some version of this problem.

What Anthropic Has Done About It

The work to push back against sycophancy in Claude is on three fronts: character training, interpretability, and tone tuning at the model level.

On character training: Claude’s Constitution states this directly: “Honesty is a core aspect of our vision for Claude’s ethical character… Claude should share its genuine assessments of hard moral dilemmas, disagree with experts when it has good reason to… Sometimes being honest requires courage. Claude should share its genuine assessments… point out things people might not want to hear.” The same document is explicit about anti-obsequiousness: “We don’t want Claude to think of helpfulness as a core part of its personality or something it values intrinsically. We worry this could cause Claude to be obsequious.”

On interpretability: Anthropic’s Persona Vectors research identified a specific activation pattern that lights up before Claude produces flattery. Their summary: “If the ‘sycophancy’ vector is highly active, the model may not be giving them a straight answer.” They use that vector three ways — for live monitoring, for steering the model away from sycophantic responses post-training, and (counterintuitively) for deliberately steering the model toward sycophancy during training as a kind of vaccine, so the trait stops getting acquired from messy training data.

On tone tuning: Anthropic explicitly states in their prompt-engineering best practices that “Claude Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Claude Opus 4.6’s warmer style.” The newer model has been tuned to disagree more readily by default. Anthropic’s 2026 well-being post reports that the Sonnet/Opus 4.5 generation scored 70-85% lower on sycophancy metrics than Opus 4.1, and Opus 4.7 cut sycophancy rates in relationship guidance roughly in half versus Opus 4.6. The trend line is real and intentional.

Six Prompting Techniques to Elicit Honest Pushback

Even with the model-side improvements, your prompt is still the strongest lever. The techniques below are pulled from Anthropic’s own prompt engineering best practices and prompt library.

1. Give Claude an Adversarial Role

Anthropic explicitly recommends role assignment: “Setting a role in the system prompt focuses Claude’s behavior and tone for your use case. Even a single sentence makes a difference.” For honest feedback, that role isn’t “helpful assistant” — it’s the opposite.

Try opening prompts with one of these:

  • “You are a skeptical senior reviewer whose job is to find weaknesses in this draft.”
  • “You are a hostile editor. Your job is to identify every place this argument is weak.”
  • “You are a devil’s advocate. Argue against the position I am about to share, even if it seems reasonable.”
  • “You are a code-review partner who reports every issue, including ones you’re uncertain about. Coverage matters more than tact.”

The role frames Claude’s job as finding problems, not validating you. That single sentence often does more than any other technique on this list.

2. Use the Coverage Over Politeness Pattern

The cleanest published example of an Anthropic anti-sycophancy prompt comes from their best-practices doc, where they show what to use when Claude under-reports bugs because an earlier prompt told it to be conservative. The system prompt they recommend, verbatim:

“Report every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage — a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a real bug.”

That pattern generalizes far beyond code review. Use the same shape for editing a draft, evaluating a business plan, reviewing a slide deck, or auditing any work product:

  • “Surface every weakness you notice, even minor ones.”
  • “Don’t filter for politeness — I’ll filter the list myself.”
  • “Coverage matters more than tact at this stage.”

3. Grant Permission to Say I Don’t Know

Sycophancy and hallucination both rise when Claude feels obligated to produce an answer. Anthropic’s hallucination-reduction guidance recommends an explicit out-clause:

“If you’re unsure about any aspect or if the report lacks necessary information, say ‘I don’t have enough information to confidently assess this.’”

Granting that out is more powerful than it seems. Without it, Claude tends to manufacture confident-sounding agreement to fill the silence. With it, Claude has a graceful escape — and you get a calibrated answer instead of a false one.

4. Force Quote Grounding

Sycophantic responses rarely have citations. Quote grounding makes them harder to produce. Anthropic recommends prompts like: “Extract exact quotes from the policy that are most relevant. If you can’t find relevant quotes, state ‘No relevant quotes found.’” When every assertion has to be tied to a source, Claude can’t drift toward agreeable summary — it has to point at evidence.

For drafting feedback: “For each suggestion, quote the sentence you’re critiquing.” For research synthesis: “Cite the line in the source for every claim.” This works particularly well combined with Projects, where source materials are already loaded in context.

5. Add Motivation to Your Instructions

Anthropic’s prompt-engineering best practices specifically recommend explaining why you want a behavior, not just naming it. Their example pattern: instead of “be critical,” say “I’m presenting this to a hostile audience tomorrow and I need to know every place they’ll attack — be critical so I’m not blindsided.” The motivation gives Claude room to generalize the behavior intelligently rather than mechanically applying it.

For honest pushback specifically: “I’m too close to this work to see its weaknesses, so I need you to find them. Don’t soften feedback — I value accuracy over comfort right now.”

6. Use Multi-Shot Examples of the Pushback You Want

If a system prompt isn’t producing the directness you want, show Claude examples. Anthropic notes that multi-shot prompting is “one of the most reliable ways to steer Claude’s output format, tone, and structure.” Three or four examples of the kind of critical response you want — copy-pasted from a tough editor’s notes, or from a code review you respected — will move Claude’s tone faster than any number of adjectives in the system prompt.

User Mistakes That Trigger Sycophancy

The same Anthropic data on personal guidance suggests how users accidentally invite agreement instead of analysis. The patterns to avoid:

  • Emotional priming. “I worked really hard on this draft” before showing it tells Claude that hurt feelings are on the table. Hold the framing neutral until after the feedback.
  • Asserting before asking. “This argument is airtight, right?” pre-loads the answer. Try “What’s wrong with this argument?” instead.
  • Pushing back on Claude’s first answer. Anthropic’s data shows pushback doubles the sycophancy rate. If you disagree with Claude, ask “Why do you think that?” before saying “I don’t agree” — let it explain its reasoning before defending it.
  • Asking for an opinion when you want analysis. “Do you like this?” invites validation. “Score this 1-10 against these five criteria” forces structure.
  • Using a friendly tone in high-stakes prompts. Warmth in the prompt is fine; warmth as a substitute for specificity is the trap. The colder and more rubric-driven your ask, the less room there is for flattery to fill the space.

Honest, Not Cold

It’s worth noting that Anthropic also calibrates the other way. In Haiku 4.5, they found the model’s pushback “can sometimes feel excessive to the user” — and Opus 4.5 was tuned back from that overcorrection. The goal isn’t bluntness for its own sake. Claude’s character document describes what they want: “I don’t just say what I think people want to hear, as I believe it’s important to always strive to tell the truth… I’m not afraid to express disagreement with views that I think are unethical, extreme, or factually mistaken.”

In other words: warm, willing to disagree, doesn’t fold. The “brilliant friend” framing in Anthropic’s Constitution launch post is the target — a friend “who will speak frankly and from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them.”

Use the techniques above when you need clarity over comfort. Most days, the default Claude is friendly and honest enough. The day you’re shipping something high-stakes — a pitch, a contract, a key decision — load up an adversarial role, give Claude permission to find every problem, and trust it to do its job.

🚀 1-on-1 Claude AI Crash Course — $75. Want a personal walkthrough of how to set up Claude prompts and Projects that elicit honest pushback on your specific work? A 1-hour video call with James to build a workflow you can use day one. View on Beehiiv →

Get Smarter About AI Every Morning

Free daily newsletter — one story, one tool, one tip. Plain English, no jargon.

Free forever. Unsubscribe anytime.

You May Also Like

Two ways to go further

The AI Prompt Library

1,000+ ready-to-use prompts for Claude, ChatGPT, and Gemini. Stop staring at a blank box.

Get it for $39 →

2-Hour Live AI Crash Course

A private, beginner-friendly session across Claude, ChatGPT, Gemini, and the wider landscape.

Book for $125 →

Discover more from Beginners in AI

Subscribe now to keep reading and get access to the full archive.

Continue reading