Claude Extended Thinking: How It Works and When to Use It (2026)

There is a specific frustration with AI assistants that anyone who has used them seriously has encountered. You ask a complex analytical question. The answer comes back in seconds — well-formatted, confident, plausible. And it is wrong. Not obviously wrong, but subtly wrong in a way you only catch if you think carefully about the reasoning behind it.

The problem is not that the AI lacks the knowledge to answer correctly. Often it has all the relevant information. The problem is that it went from question to answer without adequate reasoning time. It produced the most plausible-looking response rather than the most correct one.

Extended Thinking is Claude’s solution to this problem. It gives Claude explicit time and cognitive space to reason through a problem before producing a final answer — showing you the reasoning process as it happens. The result is a qualitatively different kind of AI output: slower, more expensive, and genuinely more reliable for the tasks that need it most.

This guide covers everything: what Extended Thinking actually is, how it works mechanically, when to use it, how to enable it in both claude.ai and the API, the specific task types where it produces meaningful improvements, and honest discussion of where it does not help and what it costs.

🔗 This is Post #3 in the Claude Unlocked series. Extended Thinking is most powerful when combined with the right model — primarily Claude Sonnet 4.5 and Opus 4.5. See Claude’s Model Family Explained for model selection context. For a foundation, start with Claude AI Masterclass.

What Is Extended Thinking? Plain English Explanation

When Claude operates in standard mode, it processes your prompt and generates a response directly. The response is produced through the model’s normal inference process — drawing on training, context, and pattern matching to produce the most likely high-quality answer.

Extended Thinking changes this process fundamentally. When you enable it, Claude first generates a thinking block — an internal scratchpad where it reasons through the problem step by step before writing its final answer. You can see this thinking process displayed (in a collapsible section) before the final response.

Think of the difference like this:

Standard mode: A student reads the exam question and immediately starts writing their answer.

Extended Thinking mode: The student reads the question, takes out scratch paper, works through the problem explicitly, checks their work, considers alternative approaches, and then writes the final answer based on that preparatory reasoning.

The scratch paper work is the thinking block. The final written answer is the response. Both are visible to you.

What the Thinking Block Contains

The thinking block is Claude’s unfiltered reasoning — exploratory, sometimes self-correcting, occasionally wrong before arriving at right. You might see Claude:

Trying an approach and then abandoning it when it realizes the direction is wrong
Listing considerations before deciding which are most important
Working through a calculation step by step and catching an error before the final answer
Considering multiple interpretations of an ambiguous question and choosing the most plausible
Explicitly checking its own logic before committing to a conclusion

This reasoning is meant to be exploratory and honest, not polished. It reads differently from Claude’s usual prose — more like watching someone think than reading their conclusions.

How Extended Thinking Works: The Technical Reality

Thinking Tokens

Extended Thinking uses what Anthropic calls thinking tokens — additional computational budget allocated to the reasoning process. You set a maximum number of thinking tokens when enabling Extended Thinking, which determines how much reasoning space Claude has before producing its final answer.

Thinking token ranges:

Minimum: 1,024 tokens (brief reasoning for moderately complex problems)
Standard: 5,000–10,000 tokens (appropriate for most Extended Thinking use cases)
Maximum: 32,000+ tokens (for the most complex, multi-faceted problems)

More thinking tokens allow deeper, more thorough reasoning — at additional cost. The right budget depends on problem complexity.

Pricing for Thinking Tokens

Thinking tokens are billed at the same input rate as regular tokens. If Claude uses 8,000 thinking tokens to reason through a problem before producing a 500-token response:

You pay for 8,000 thinking tokens at the input rate
You pay for 500 response tokens at the output rate
Plus your input prompt tokens at the input rate

For extended reasoning on complex tasks, this adds up meaningfully. However, the cost comparison should be: “does the quality improvement justify this cost for this task?” — not absolute cost in isolation.

Which Models Support Extended Thinking

Extended Thinking is available on:

Claude Sonnet 4.5: Strong Extended Thinking capability, more cost-efficient than Opus
Claude Opus 4.5: Maximum Extended Thinking depth, appropriate for the most complex problems

Haiku does not support Extended Thinking — its architecture is optimized for speed and efficiency, not deep reasoning.

Enabling Extended Thinking in Claude.ai

Via the Interface (Claude Pro and Above)

Open a new conversation in claude.ai
Look for the “Extended thinking” toggle or button — it appears in the conversation controls area (typically near the model selector or below the text input)
Toggle it on
Optionally adjust the thinking budget if the option is available
Type your prompt and send

When Extended Thinking is enabled, you will see the response take longer to generate — this is the reasoning phase happening. When it arrives, you will see a collapsed “Thinking” section above the final response. Click to expand it and read Claude’s reasoning process.

When Claude.ai Engages Extended Thinking Automatically

For some particularly complex questions — especially those that benefit from careful reasoning — Claude may engage Extended Thinking automatically, even without explicit toggling. You will recognize this by the appearance of a “Thinking” section in the response.

Enabling Extended Thinking via the API

For developers and technical users, Extended Thinking is activated through the thinking parameter in API calls:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,  # Must be higher than thinking budget
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Thinking token budget
    },
    messages=[{
        "role": "user",
        "content": "Analyze the strategic trade-offs between building \
                    versus buying a core technology component for a \
                    startup with $2M in funding, 8 engineers, and an \
                    18-month runway. Consider both short-term and \
                    long-term implications."
    }]
)

# Response contains both thinking and final answer
for block in response.content:
    if block.type == "thinking":
        print("THINKING PROCESS:")
        print(block.thinking)
        print("\n---\n")
    elif block.type == "text":
        print("FINAL ANSWER:")
        print(block.text)

Important technical notes:

max_tokens must be set higher than budget_tokens — Claude needs room for both thinking and the final response
The thinking block is returned as a separate content block with type: "thinking"
Streaming is supported for Extended Thinking — you can stream the thinking and final response in real time
Thinking blocks cannot be passed back as part of conversation history (they are for your reference only)

When Extended Thinking Actually Makes a Difference

Extended Thinking does not improve all tasks equally. Understanding where it genuinely helps versus where it is unnecessary is critical for using it efficiently.

Where Extended Thinking Provides Meaningful Improvement

Complex multi-step mathematical and logical reasoning:

This is Extended Thinking’s strongest domain. Problems that require carrying values through calculations, tracking logical dependencies, or applying formal reasoning rules benefit dramatically from explicit step-by-step working.

Example: A compound interest calculation involving multiple rates, partial year periods, and tax implications. Standard mode may produce a plausible but incorrect figure. Extended Thinking works through each step explicitly, checking intermediate results, and is significantly less likely to produce a confident error.

Strategic analysis with many competing variables:

Decisions involving many factors that push in different directions — where weighting the factors, considering interactions between them, and thinking through second-order effects matters — benefit from Extended Thinking’s ability to work through the analysis methodically before concluding.

Example: “Should we expand to a new market now, in 6 months, or not at all? Consider our current runway, competitive dynamics, team capacity, and the strategic importance of the market.”

Extended Thinking surfaces the dependencies between variables and the implications of different assumptions in ways that direct answering tends to compress or skip.

Code debugging with complex failure modes:

When a bug involves multiple interacting systems, unexpected state, or subtle logic errors that are hard to trace, Extended Thinking allows Claude to work through the execution path explicitly — following the code’s logic step by step — rather than pattern-matching to the most common bug type.

Ambiguous problems requiring interpretation:

Some questions have multiple reasonable interpretations, and which interpretation you take changes the answer significantly. Extended Thinking allows Claude to explicitly consider the interpretations, choose the most plausible, and reason from there — rather than implicitly picking one without acknowledging the choice.

Formal logical and mathematical proofs:

For problems where correctness must be demonstrated through a valid chain of reasoning — proving a mathematical claim, validating a logical argument, checking the internal consistency of a framework — Extended Thinking produces substantially more reliable outputs.

Complex ethical or values-based analysis:

Questions involving competing values, edge cases, and genuine moral complexity benefit from Extended Thinking’s ability to consider multiple ethical frameworks, identify the tensions between them, and reason carefully about what considerations carry most weight in a specific situation.

Where Extended Thinking Does NOT Provide Meaningful Improvement

Straightforward factual questions: “What is the capital of Finland?” does not benefit from extended reasoning. There is no reasoning process that improves a factual lookup. Standard mode is faster and costs less.

Standard writing tasks: Drafting a professional email, writing a blog post, or producing standard business writing does not require deep reasoning chains. Extended Thinking adds cost and latency without meaningfully improving the output. Use standard Sonnet for writing.

Simple extraction and classification: Pulling specific fields from a document, categorizing items, or labeling content are tasks that standard mode handles correctly. Extended Thinking is unnecessary overhead here.

Creative tasks where exploration is the process: For creative writing, brainstorming, or generative ideation, the value comes from breadth and variety — not from carefully reasoned correctness. Extended Thinking optimizes for the latter and may actually produce more constrained creative output by over-reasoning creative choices.

Tasks with well-known correct procedures: If a task has a standard methodology that Claude already knows — a specific type of statistical test, a coding design pattern, a common analytical framework — Extended Thinking adds little over correctly applying the known procedure.

Real Before-and-After Examples

Example 1: Mathematical Reasoning

The prompt: “A company’s revenue grew 23% in Year 1, contracted 8% in Year 2, grew 31% in Year 3, and contracted 12% in Year 4. Starting with $1M in Year 0, what is the revenue at the end of Year 4? Also calculate the compound annual growth rate (CAGR) across the four-year period.”

Standard mode response: Produces a revenue figure and CAGR. Frequently makes small calculation errors — applying growth rates incorrectly, or using approximate rather than exact multiplication.

Extended Thinking response: The thinking block shows Claude calculating each year explicitly:

Year 0: $1,000,000
Year 1: $1,000,000 × 1.23 = $1,230,000
Year 2: $1,230,000 × 0.92 = $1,131,600
Year 3: $1,131,600 × 1.31 = $1,482,396
Year 4: $1,482,396 × 0.88 = $1,304,508.48

Then explicitly applying the CAGR formula, checking the intermediate result, and producing the final answer with the reasoning visible. The explicit working catches the kind of calculation errors standard mode is prone to.

Example 2: Strategic Decision Analysis

The prompt: “We are a 12-person B2B SaaS startup with $800K ARR growing 8% month-over-month and 14 months of runway. A competitor just raised $15M. Should we raise now, continue organic growth, or explore strategic options? We have interest from two VCs but terms are not finalized.”

Standard mode response: Produces a structured analysis covering the options — reasonable, but tends to list considerations without fully reasoning through the interdependencies between them.

Extended Thinking response: The thinking block shows Claude working through:

What the competitor’s fundraise actually signals about the market
The relationship between 8% MoM growth and what investors will offer
The 14-month runway as a timing constraint on the decision
The “strategic options” consideration and what that implies about valuation expectations
The risk profile of the VC interest with unfinalized terms
How these factors interact — specifically, that raising now with unfinalized terms creates negotiating disadvantage, but that 14 months of runway gives less urgency than founders often feel

The final recommendation draws explicitly on these interactions rather than just weighing them independently.

Example 3: Code Debugging

The prompt: [Includes 80 lines of Python code with a subtle bug causing intermittent failures]

Standard mode response: Identifies the most commonly associated error pattern and suggests a fix — which may or may not address the actual bug.

Extended Thinking response: The thinking block shows Claude tracing the execution path of the relevant functions, noticing the state variable that is only initialized in one branch of a conditional, following through the logic of the scenario that causes the intermittent failure, and identifying the specific line where the uninitialized state gets read. The fix is precisely targeted.

Practical Workflow: Integrating Extended Thinking

The “Try Standard First” Rule

Unless you know a task requires deep reasoning, start with standard Sonnet. If the output has reasoning errors, misses important considerations, or produces a confidently wrong answer, enable Extended Thinking and try again. This approach uses the enhanced mode where it is needed rather than reflexively everywhere.

The Prompt Adjustment for Extended Thinking

Extended Thinking responds well to prompts that explicitly frame the reasoning task:

Standard prompt: “What is the best pricing strategy for our SaaS product?”

Extended Thinking prompt: “Reason through the trade-offs between the following three pricing models for our SaaS product [describe product]. Work through: how each model affects customer acquisition, long-term revenue predictability, competitive positioning, and operational complexity. Then give me a recommendation with your reasoning clearly stated.”

The explicit reasoning request activates Extended Thinking’s methodical analysis mode more effectively than open-ended questions.

Building Extended Thinking Into API Workflows

For developers using Extended Thinking in production:

Use it selectively: Build model selection logic that routes simple tasks to standard Sonnet and complex analytical tasks to Extended Thinking Sonnet or Opus.

Display thinking transparently: If you are building user-facing applications, consider showing users a condensed version of the thinking process (“Claude reasoned through 3 scenarios before answering”) — this builds appropriate calibration of AI outputs.

Log thinking blocks: Store thinking blocks alongside final responses for quality review and debugging. Thinking blocks reveal reasoning failures that would otherwise be invisible in the final answer.

Free Tier Optimization for Extended Thinking

Extended Thinking is primarily a Claude Pro and API feature. The standard free tier has limited or no access to Extended Thinking mode.

For free tier users wanting deeper reasoning: The free tier equivalent of Extended Thinking is explicit in your prompt:

I need you to work through this problem step by step before 
giving me your final answer. Show your reasoning explicitly 
before concluding. Do not jump to the answer — work up to it.

[Your problem or question]

This prompt-based approach activates chain-of-thought reasoning without Extended Thinking mode. It is less powerful than native Extended Thinking (Claude is not given explicit thinking budget, and the reasoning is in the main response rather than a dedicated block) but produces meaningfully better results than direct answers on complex problems.

For API users managing costs: Set thinking budgets conservatively and increase them when the problem genuinely warrants deeper reasoning. A 5,000-token thinking budget handles most analytical tasks well. Reserve 15,000–32,000 token budgets for the most complex problems.

Common Mistakes With Extended Thinking

Mistake 1: Using Extended Thinking for Everything

Extended Thinking is not a universal quality booster. For writing, simple factual questions, and standard task execution, it adds cost and latency without improving output. Use it selectively for problems that require genuine reasoning depth.

Mistake 2: Setting the Thinking Budget Too Low

If you enable Extended Thinking with a very small budget (1,024 tokens), Claude may not have enough reasoning space to properly work through complex problems. The thinking block will be cut off before the reasoning is complete. Start with at least 5,000 tokens for meaningful complex analysis.

Mistake 3: Treating Thinking Blocks as Facts

The thinking block is Claude’s exploratory reasoning — it may contain wrong turns, abandoned approaches, and intermediate errors. Do not extract specific claims from the thinking block and treat them as outputs. The final response is what Claude concluded; the thinking block is how it got there.

Mistake 4: Not Prompting Toward Reasoning

Extended Thinking works best with prompts that frame the task as a reasoning problem. “What is the answer to X?” prompts an answer. “Work through the considerations for X and give me a reasoned recommendation” prompts reasoning. The latter activates Extended Thinking more effectively.

Mistake 5: Ignoring the Thinking Block Entirely

The thinking block contains valuable information beyond the final answer — alternative approaches Claude considered and rejected, assumptions it made, and confidence levels on different aspects of the analysis. Reading the thinking block helps you calibrate whether the conclusion is reliable and where to probe further.

Conclusion

Extended Thinking is one of those features that, once you have used it for a problem that genuinely warrants it, makes you recalibrate what AI-assisted reasoning can do. The difference between standard mode and Extended Thinking on a complex strategic decision or a subtle logical problem is not marginal — it is qualitatively different.

The discipline is knowing when to use it. Not every question needs a reasoning process. Many do not. But for the specific class of problems where the quality of reasoning is the output — complex analysis, careful mathematical work, multi-variable decisions, subtle logical problems — Extended Thinking is the tool that makes Claude genuinely competitive with the deepest human analytical thought.

Your next step: Take a problem that you have asked Claude before and gotten an unsatisfying answer to — something analytical or logical where the answer felt superficial. Enable Extended Thinking and ask the same question. Read the thinking block. Notice the difference.

That experience will tell you, more precisely than this guide can, exactly where Extended Thinking belongs in your workflow.

📚 Continue the Series:

← Previous Claude’s Model Family: Haiku vs. Sonnet vs. Opus

Next → Claude Projects: Your Personal AI Memory System

For technical implementation The Claude API for Non-Developers and Claude for Developers: Advanced Techniques

For complex analytical work Claude for Research and Analysis: Deep Dives Without the Grind

For the AI comparison context Claude vs. ChatGPT vs. Gemini: The Honest 2026 Comparison

Last updated: April 2026. Extended Thinking feature availability, thinking token limits, and API parameters are updated by Anthropic regularly. Verify current specifications at docs.anthropic.com/extended-thinking.

⚠️ Extended Thinking is billed at standard token rates — thinking tokens count against your usage at input pricing. Monitor thinking token usage in production applications to avoid unexpected costs. Always set max_tokens higher than budget_tokens in API calls or the request will fail.

Frequently Asked Questions (FAQ)

Is Extended Thinking available on the free tier?

Extended Thinking is primarily available on Claude Pro and via the Claude API. Free tier users can approximate it with explicit chain-of-thought prompting instructions, as described in the optimization section.

How long does Extended Thinking take compared to standard responses?

Significantly longer. A standard Sonnet response to a complex question might take 10–30 seconds. The same question with Extended Thinking and a 10,000-token budget may take 60–180 seconds. The additional time is the cost of better reasoning.

Can I turn off the visible thinking block if I don't want to see it?

In claude.ai, the thinking block is collapsible — you can keep it closed while still benefiting from the improved reasoning. Via the API, you can receive only the final text response and not surface the thinking block in your application.

Does Extended Thinking always produce better results?

No. For tasks that do not require deep reasoning chains, Extended Thinking produces comparable results to standard mode at higher cost and latency. The improvement is specific to tasks with genuine analytical complexity.

Can I use Extended Thinking with Claude Haiku?

No. Extended Thinking is only available on Sonnet and Opus. Haiku is optimized for speed and efficiency, not deep reasoning.

Is the thinking block used in subsequent conversation turns?

No. Thinking blocks cannot be passed back to Claude as part of conversation history in the API. They are intended as a transparency and reasoning quality tool, not as persistent context.