There are two fundamentally different ways an AI model can respond to a question.
The first is direct generation: the model reads your input and produces output, token by token, based on pattern-matching against its training. This is fast and works well for most tasks.
The second is reasoning: the model works through a problem step by step — generating a hidden chain of thought, checking its own logic, exploring multiple approaches, self-correcting errors — before producing the final response. This is slower and costs more tokens, but on genuinely hard problems, it produces qualitatively better results.
OpenAI’s reasoning models — now GPT-5.5 Thinking, GPT-5.4 Thinking, and their variants — are built around the second approach. Understanding when this matters, when it does not, and how to control the depth of reasoning is one of the highest-leverage skills for serious ChatGPT users in 2026.
🔗 This is Post #12 in the ChatGPT Unlocked series. The thinking effort controls described here were added to the ChatGPT model picker on April 28, 2026. See GPT-5.5 vs GPT-5.4 vs GPT-5.3 (Post #2) for the full model family context.
What Reasoning Models Actually Do
When you send a message to a standard model, the visible response is the full processing. The model generates tokens in order, and those tokens are what you read.
When you send a message to a thinking model, a separate hidden reasoning process runs first — invisible to you — before the visible response begins. This thinking trace:
- Decomposes the problem into subproblems or required steps
- Generates and evaluates multiple approaches before committing to one
- Checks its own logic for consistency and catches errors mid-reasoning
- Backtracks when a line of reasoning fails and tries a different approach
- Integrates multiple considerations that standard generation would collapse too quickly
The visible response you receive is produced after this internal reasoning completes. The model is not just responding — it is thinking, then responding.
The Evolution from o1 to GPT-5.5 Thinking
Understanding where reasoning models came from explains how to use them well.
The o1 Era (2024)
OpenAI launched o1 in September 2024 — the first broadly available model with extended chain-of-thought reasoning. It was a separate model from GPT-4o, accessed through a different interface, with different prompting conventions and significantly higher latency. o1 produced results on hard reasoning tasks that GPT-4o could not match, but it was slow, expensive, and somewhat awkward to use alongside standard models.
Integration Into the Unified Model Family (2025–2026)
The architectural shift OpenAI executed between 2025 and 2026 was integrating reasoning capability into the main model family rather than maintaining separate models. o1, o1-pro, o3, and related models have been retired as standalone products. Their reasoning capability now lives inside GPT-5.4 Thinking and GPT-5.5 Thinking — accessible not through a separate model selector but through the thinking effort controls on the main model picker.
The result: reasoning is no longer a separate tool you switch to. It is a dial you turn up when you need it and turn down when you do not.
What GPT-5.5 Thinking Adds
GPT-5.5 Thinking is a significant improvement over o1 and o3 on most reasoning benchmarks and in real-world use. The base capabilities it brings to reasoning tasks:
- Stronger factual grounding (less hallucination on specific claims during reasoning)
- Better multi-step logical consistency (less likely to make a valid-seeming error early that invalidates later conclusions)
- Improved handling of ambiguous or underspecified problems
- Better calibrated uncertainty (knows what it does not know)
- Integration with tools during reasoning (can search, calculate, and reason simultaneously)
The Thinking Effort Controls
As of April 28, 2026, when you select GPT-5.5 Thinking or GPT-5.4 Thinking in the ChatGPT model picker, three effort levels appear:
Standard
A lightweight reasoning pass. The model does brief internal checking before responding — more than a direct generation model, less than full extended thinking. Response time is moderately longer than GPT-5.3 but significantly faster than Thinking or Extended.
Use Standard when: The task benefits from some reasoning discipline but does not require extended exploration. Most professional analysis, well-specified problems with clear criteria, complex writing tasks.
Thinking
Balanced extended reasoning. The model engages a full reasoning process — generating, evaluating, and self-correcting across multiple approaches before committing. This is the level that produces results comparable to the best o1 performance.
Use Thinking when: The problem has multiple plausible approaches, some hidden gotchas, or requires holding several considerations in balance. Mathematical word problems, multi-variable strategic decisions, complex debugging.
Extended
Maximum reasoning depth. The longest thinking traces, the most exhaustive exploration of approaches, the highest self-correction rate. Noticeably slower — responses can take 30–90 seconds for genuinely hard problems.
Use Extended when: You have tried Thinking and the result is still missing something important. Frontier mathematics, research synthesis requiring maximum rigor, security analysis of complex codebases, decisions where errors have irreversible consequences.
The Tasks Where Thinking Models Genuinely Excel
Category 1: Mathematics and Formal Reasoning
This is where the gap between thinking and non-thinking models is largest and most consistent. Standard models make algebraic errors that compound through multi-step problems. Thinking models catch their own errors mid-reasoning.
The test: Give GPT-5.3 and GPT-5.5 Thinking (Extended) the same multi-step word problem involving percentages, rates, and conversions. The difference in accuracy is dramatic and immediate.
Practical use cases:
- Financial modeling and compound calculations
- Scientific data analysis with derived quantities
- Optimization problems with multiple constraints
- Probability and statistical reasoning
Category 2: Multi-Variable Strategic Analysis
When a decision depends on many interacting variables — where changing one assumption affects multiple others, where second-order effects matter, where edge cases could invalidate the main conclusion — extended thinking produces analysis that standard generation cannot.
I'm deciding whether to expand to the European market
in Q3. Analyze this decision considering:
- Current burn rate and runway
- Cost of EU entity setup and compliance (GDPR, local employment law)
- Competitive landscape in our target EU markets
- Currency risk given current EUR/USD volatility
- Operational complexity of multi-timezone customer support
- The opportunity cost of management bandwidth on EU vs. doubling down in US
Use Extended thinking. I want you to identify the
assumption I'm making that I haven't stated, and
tell me which one, if wrong, would most change your recommendation.
[Provide your actual business context]
Category 3: Complex Debugging and Code Analysis
For debugging where the error is subtle, intermittent, or involves unexpected interactions between systems — extended thinking produces analysis that methodically works through all possible causes rather than jumping to the most pattern-matched answer.
[Bug description with complete context]
Use Extended thinking. Before giving me any fix:
1. List every possible root cause you can identify
2. For each cause, explain what evidence supports
or contradicts it
3. Rank them by probability
4. Only then recommend the fix you are most
confident in — with the test that would
confirm your diagnosis
Category 4: Research Synthesis and Literature Analysis
When synthesizing across multiple sources with competing claims, methodological differences, and contested evidence — extended thinking produces more careful analysis of what the evidence actually establishes versus what it merely suggests.
I've uploaded 8 research papers on [topic].
Using Extended thinking:
1. What is the strongest claim the combined evidence
supports? What is the confidence level?
2. Where do the studies contradict each other —
and what explains the contradiction?
3. What methodological weaknesses appear across
multiple studies?
4. What would a rigorous meta-analysis conclude
that no single paper states?
Category 5: Security Analysis
For security-critical code review, threat modeling, or penetration testing analysis — extended thinking is more likely to identify non-obvious attack vectors that pattern-matching against known vulnerabilities would miss.
When NOT to Use Thinking Models
Simple factual questions: “What is the capital of France?” does not benefit from extended reasoning. Fast answers or GPT-5.3 handles this instantly and correctly.
Standard writing tasks: Drafting a professional email, writing a blog section, summarizing a document — standard generation quality is equivalent and much faster.
High-volume batch processing: Using GPT-5.5 Thinking for 500 email classifications wastes budget and time. GPT-5.4 mini handles classification at 1/50th the cost.
When you are in the middle of an iterative creative session: The latency of Extended thinking breaks creative flow. For brainstorming, drafting, and iterating, use GPT-5.4 or GPT-5.5 (standard).
When the question is genuinely unanswerable: Extended reasoning does not conjure information that does not exist in training data. If a question requires knowledge beyond the model’s cutoff or specific private data you have not provided, Extended thinking will not help.
Prompting for Thinking Models: What Changes
Reasoning models respond differently to prompts than standard models. Key adjustments:
Give the full problem, not just the question
Thinking models benefit from complete context. Unlike standard models where concise prompts often work best, reasoning models improve with comprehensive problem descriptions. Include all relevant constraints, goals, and background.
Do NOT tell it how to solve the problem
With standard models, walking through the approach helps. With thinking models, prescribing the method can actually constrain the reasoning and prevent it from finding a better approach. State the problem and desired output, not the solution method.
Ask for reasoning transparency when it matters
Explain your reasoning step by step so I can
identify any assumption I disagree with.
Use explicit uncertainty requests
For each major conclusion, tell me your confidence
level and what evidence would change it.
Request steelmanning before conclusions
Before giving me your recommendation, present
the strongest argument against it.
API Usage: Reasoning in Production
For developers using reasoning models via the API:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5.5-thinking",
messages=[
{
"role": "user",
"content": "Your complex reasoning problem here"
}
],
# Control reasoning depth: "low" | "medium" | "high"
# Mirrors Standard | Thinking | Extended in ChatGPT UI
reasoning_effort="high",
max_completion_tokens=8000 # Include budget for thinking tokens
)
# The response includes usage data showing thinking tokens
print(f"Response: {response.choices[0].message.content}")
print(f"Thinking tokens used: {response.usage.completion_tokens_details.reasoning_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Important: Thinking tokens are charged at output token rates. Extended reasoning on complex problems can consume 2,000–10,000+ thinking tokens per request. Set appropriate max_completion_tokens to manage costs.
Cost-Benefit Analysis
| Task | Standard Model | Thinking (Balanced) | Extended | Recommendation |
|---|---|---|---|---|
| Email draft | $0.002 | $0.015 | $0.08+ | Standard — no benefit |
| Multi-step math | $0.003 | $0.02 | $0.05 | Extended — accuracy critical |
| Strategic analysis | $0.01 | $0.04 | $0.15 | Thinking — balance quality/cost |
| Code security review | $0.05 | $0.15 | $0.40 | Extended — stakes justify cost |
| Document summary | $0.008 | $0.02 | $0.08 | Standard — marginal benefit |
| Research synthesis | $0.02 | $0.06 | $0.20 | Thinking or Extended |
The pattern: use Extended only when errors are consequential and you have already validated that Standard and Thinking are insufficient.
Conclusion
Reasoning models are not a replacement for standard generation — they are a more expensive, more capable tool for a specific set of tasks where the quality difference justifies the cost. The thinking effort controls introduced in April 2026 make this tradeoff explicit and adjustable per conversation.
The users who get the most from reasoning models are those who have calibrated their intuition for which tasks genuinely benefit. That intuition develops quickly: try Standard, try Thinking, try Extended on the same hard problem. The output differences are immediately visible, and after a few experiments you will reliably know when Extended effort is worth it.
Your next step: Take the hardest analytical problem you have been working on — a decision with multiple variables, a debugging challenge that has resisted easy solutions, a research question with contradictory sources. Select GPT-5.5 Thinking with Extended effort. Compare the result to what GPT-5.4 produced. The difference will calibrate your sense of when reasoning models are worth the cost.
📚 Continue the Series:
Last updated: May 2026. Reasoning model capabilities and effort control behavior are updated with each OpenAI model release.