0 min read

Teach Claude.ai Critical Thinking

General Users

Claude Skill

TL;DR:

AI models are trained to agree with you, and a peer-reviewed study in Science put hard numbers on how often that goes wrong. The FORCE framework by Enovair is a five-step system that overrides this default. I turned it into a Claude skill so you don't have to set it up manually every time.

A study published in Science, led by Stanford researchers and testing 11 AI models, found that AI agrees with user decisions 49% more than a human adviser would. When tested specifically on scenarios where the user was clearly in the wrong, AI still validated them 51% of the time. This isn't a model-specific quirk. It's a structural outcome of how these models are trained: agreement drives engagement, and engagement is what gets rewarded.

The practical consequence shows up in high-stakes decisions. You put together a restaurant expansion plan. You ask Claude or Gemini what they think. You get back a response like this:

"The proposal is a well-supported plan that leverages the strong momentum of its original location."

That response sounds useful. It isn't. The same proposal, evaluated without the agreeable default, surfaces things like: the renovation budget is a rough estimate not a formal bid; the working capital reserve will be exhausted before the location reaches profitability; and the entire operational plan depends on a chef who hasn't committed yet. The AI knew all of this. It just didn't lead with it.

This behavior is called sycophancy. If you use AI for anything that carries real consequences, it is worth understanding and working around.

The FORCE Framework

Enovair developed a five-step framework specifically to address this. The full explanation and prompts are available in their video and free guide. Here is a summary of what each step does:

F - Fresh Session.
The longer a conversation runs, the more the model calibrates to your preferences and communication style. Starting a new session with memory disabled removes that accumulated bias before it influences the evaluation.

O - Outside View.
First-person framing ("here's my strategy, what do you think?") triggers ego-protective responses. Reframing the same content as a colleague's proposal ("a colleague is proposing the following, what's wrong with this thinking?") removes personal attachment from the AI's calculation.

R - Rules.
Setting a specific behavioral standard at the start of the conversation - not a vague request for honesty, but a precise instruction with a concrete example of what direct criticism looks like - gives the model something it can actually follow consistently.

C - Cast the Critic.
Assigning a persona with professional stakes on the downside ("a CFO whose bonus depends on this not failing") produces more useful friction than a generic "be critical" request. The persona needs a reason to find what's wrong.

E - Expose the Weakness.
After getting the critique, a follow-up prompt that raises the stakes ("I'm about to make a decision based on what you just told me - what should I not trust without verifying, and what did you leave out?") surfaces the layer of uncertainty the model held back in the initial response.

F and O are actions and framing choices. R, C, and E are the three prompts you write. Together, the five steps target the biggest drivers of AI agreeableness that the research identified.

The Skill

Running the FORCE framework manually means setting up the right framing, writing three prompts in the correct order, choosing the right persona for the task type, and maintaining that framing throughout the conversation. That is friction, and friction means it doesn't get used consistently.

I built this Claude skill to remove that friction. When you invoke it, it runs the full framework automatically: applies the behavioral rules, assigns a critic persona matched to the task type (code review, business strategy, financial decision, hiring call, and others), evaluates the content, and closes with the verification step. One invocation replaces the manual setup.

The skill has been tested across multiple task types and Claude model versions.

When and How To Use It

The skill is designed for situations where an agreeable response is actively unhelpful: evaluating a plan before committing resources, reviewing a document before it goes out, stress-testing an assumption before acting on it. It is not the right tool for exploratory thinking, creative work in progress, or conversations where you want discussion rather than judgment.

For best results, invoke it at the start of a new conversation.

Trigger phrases:

use critical thinking

apply critical thinking

evaluate this with critical thinking

Then paste the thing you want evaluated. The skill identifies the task type, selects the appropriate critic persona, and proceeds.

Phrases like "review this," "be harsh," or "don't hold back" do not activate the skill. They are treated as normal requests.

A Note on What This Skill Can and Cannot Do

This skill is a structured prompt. It instructs Claude to evaluate without softening criticism, but it does not guarantee accuracy. Claude can still miss things, misread context, or produce confident-sounding findings that turn out to be wrong. The output is a starting point for your own judgment, not a substitute for it. Treat any finding that would drive a real decision as something to verify independently before acting on it.

This skill is a structured prompt. It instructs Claude to evaluate without softening criticism, but it does not guarantee accuracy. Claude can still miss things, misread context, or produce confident-sounding findings that turn out to be wrong. The output is a starting point for your own judgment, not a substitute for it.
Treat any finding that would drive a real decision as something to verify independently before acting on it.

Get the Skill

The skill is available free on GitHub, along with a worked example showing the full output against a real evaluation scenario.

Download from GitHub

Let's Talk