OpenAI’s reasoning models — GPT-5.5 Thinking, the o-series before it — represent one of the most significant capability advances in AI: models that think through problems step-by-step before answering, catching their own errors, exploring multiple approaches, and producing qualitatively better results on hard problems.
Until recently, this capability required a paid cloud subscription. DeepSeek-R1 changed that.
Released by Chinese AI lab DeepSeek in January 2026, R1 is a reasoning model — it generates a visible chain-of-thought thinking trace before producing its final answer, exactly like OpenAI’s thinking models. R1 runs locally via Ollama, meaning your math problems, complex analyses, and logical reasoning tasks can benefit from extended chain-of-thought with full privacy and no per-token cost.
R1’s performance is genuine. On MATH benchmarks, DeepSeek-R1 32B scores comparably to GPT-4 with reasoning. On AIME and competition mathematics, R1 70B approaches GPT-4o performance. On software engineering benchmarks (SWE-bench), R1 outperforms several reasoning model alternatives.
This guide covers every R1 variant, how to use the thinking trace effectively, the tasks where R1 genuinely excels, and where it is overkill.
🔗 This is Post #6 in the Ollama Unlocked series. For the general model guide including R1’s position in the model landscape, see The Local LLM Model Guide (Post #2). For coding models, see Qwen3 and Kimi K2.6 (Post #7).
How DeepSeek-R1 Works
The Chain-of-Thought Architecture
Standard language models generate tokens sequentially — each token directly builds toward the response. R1 uses a different process:
- Thinking phase: The model generates a
<think>...</think>block containing its internal reasoning — exploring approaches, checking logic, self-correcting - Answer phase: After thinking completes, the model generates the final response based on its reasoning
The thinking trace is visible to you — you can watch R1 work through a problem, see where it corrects itself, and understand the reasoning path to the conclusion.
This matters because:
- The model can backtrack when it realizes an approach is wrong
- Multi-step problems can be broken into verifiable sub-steps
- Mathematical errors are more likely to be caught during thinking
- You can identify exactly where reasoning fails if the answer is wrong
R1 vs. Standard Models: When It Matters
The chain-of-thought thinking adds latency — R1 is slower than standard models for the same apparent output length because of the hidden thinking phase. This cost is worth it specifically when:
- The task requires multi-step logical reasoning
- Mathematical accuracy matters
- The problem has multiple approaches and you need the best one
- Errors in the chain of reasoning cascade to a wrong answer
It is NOT worth it for:
- Simple factual questions
- Writing and creative tasks
- Conversational exchanges
- Tasks where the reasoning is not complex
The R1 Size Variants
DeepSeek-R1 is available in multiple sizes. Choose based on your hardware:
# 7B — Entry level reasoning (6GB VRAM)
ollama pull deepseek-r1:7b
# 8B — Slightly better (6GB VRAM)
ollama pull deepseek-r1:8b
# 14B — Solid reasoning (10GB VRAM)
ollama pull deepseek-r1:14b
# 32B — Strong reasoning (20GB VRAM)
ollama pull deepseek-r1:32b
# 70B — Best local reasoning (40GB VRAM)
ollama pull deepseek-r1:70b
# 671B — Cloud-scale via Ollama cloud offloading
ollama pull deepseek-r1:671b-cloud
Performance vs. Hardware Trade-offs
| Variant | VRAM | Math Performance | Speed | Recommendation |
|---|---|---|---|---|
| 7B | 6 GB | Basic reasoning improvement | 25–40 t/s | Entry point only |
| 14B | 10 GB | Good — noticeable improvement | 15–25 t/s | Best value |
| 32B | 20 GB | Strong — competitive with paid models | 8–15 t/s | If hardware allows |
| 70B | 40 GB | Excellent — approaches GPT-4 level | 4–8 t/s | Workstation only |
The 14B recommendation: DeepSeek-R1 14B provides a genuine, noticeable improvement over standard 7B models on reasoning tasks while fitting comfortably in 10GB VRAM. It is the practical sweet spot for most users.
Installation and Basic Usage
# Pull your preferred size
ollama pull deepseek-r1:14b
# Run interactively
ollama run deepseek-r1:14b
# Important: give it a reasoning problem, not a simple question
# Bad: "What is the capital of France?"
# Good: "A train travels 150km in 2 hours. It then travels
# another 100km. If it takes 30 minutes less for the
# second leg than the first, what is the average speed
# of the entire journey?"
What you see when running R1:
<think>
Let me work through this step by step.
First leg: 150km in 2 hours → 75 km/h
Time for first leg: 2 hours
For the second leg: 100km, takes 30 minutes less than first leg
Time for second leg: 2 - 0.5 = 1.5 hours
Speed for second leg: 100/1.5 = 66.67 km/h
Total distance: 150 + 100 = 250 km
Total time: 2 + 1.5 = 3.5 hours
Average speed = 250 / 3.5 = 71.43 km/h
Wait, let me re-read the problem. "30 minutes less for the second
leg than the first" — the first leg took 2 hours, so second leg
takes 1.5 hours. That's what I calculated. ✓
</think>
The average speed for the entire journey is **71.43 km/h**.
Working:
- First leg: 150km ÷ 2h = 75 km/h, time = 2 hours
- Second leg: 100km in 1.5 hours (30 min less than first leg)
- Total: 250km in 3.5 hours = **71.43 km/h**
The thinking trace shows the model’s work — including the self-check at the end.
Where DeepSeek-R1 Dramatically Outperforms Standard Models
Mathematics
This is R1’s clearest domain. Multi-step math problems where one error propagates through subsequent calculations — R1’s self-checking significantly reduces error rates.
Test it yourself — compare R1:14B and llama4:scout on:
A company has 3 factories. Factory A produces 200 units/day,
Factory B produces 350 units/day, Factory C produces 150 units/day.
Production increases: A by 15%, B decreases by 10%, C increases by 25%.
If the company needs to produce 850 units/day total, by what
percentage would they need to further increase combined production?
R1 will typically show its calculation steps and catch arithmetic errors. A standard model will often calculate confidently but make errors.
Logic and Constraint Problems
Problems where multiple constraints must all be satisfied simultaneously:
Five people — Alice, Bob, Carlos, Diana, and Eve —
need to be seated at a round table.
Constraints:
- Alice and Bob cannot sit adjacent
- Carlos must sit directly opposite Diana (across the table)
- Eve must sit next to either Alice or Carlos
- Bob cannot sit next to Eve
Find a valid seating arrangement and verify it satisfies
all constraints.
R1 systematically explores possibilities and verifies constraints — standard models often miss one constraint or give inconsistent answers.
Code Analysis and Debugging
For debugging complex code where the bug involves subtle logic:
This function is supposed to find the longest palindromic
subsequence in a string, but it gives wrong answers for some inputs.
[paste code]
Find the bug. Think through the algorithm step by step before
identifying what is wrong.
R1’s systematic approach to tracing execution paths catches errors that pattern-matching approaches miss.
Multi-Step Reasoning Chains
Questions where the answer requires building on intermediate conclusions:
A startup has these metrics:
- Monthly burn rate: $85,000
- Current ARR: $240,000
- MoM ARR growth: 15%
- Current cash: $510,000
At current growth rate, when will monthly revenue exceed monthly
burn rate, and how much cash will they have left at that point?
Assume burn rate stays constant.
Prompting R1 Effectively
Let It Think — Don’t Interrupt
The thinking phase produces better final answers. Do not add instructions that cut it short:
# Bad — cuts off reasoning
"Answer briefly: [complex math problem]"
# Good — lets R1 think fully
"[complex math problem]"
# Or explicitly invite thinking:
"Think through this carefully: [problem]"
Ask for Verification
R1 benefits from explicit verification requests:
After solving, verify your answer by checking it satisfies
all the conditions in the problem.
This triggers an additional self-check loop in the thinking trace.
Format for Reasoning Tasks
Problem: [state the problem clearly with all given information]
Work through this step by step. Show your reasoning.
After reaching an answer, verify it against the original conditions.
When to Hide the Thinking Trace
In production applications using R1 via API, you may want to suppress the <think> tags from the final output:
import ollama
import re
response = ollama.chat(
model='deepseek-r1:14b',
messages=[{'role': 'user', 'content': your_problem}]
)
# Remove thinking trace, keep only final answer
answer = re.sub(r'<think>.*?</think>', '',
response['message']['content'],
flags=re.DOTALL).strip()
print(answer)
DeepSeek-R1 vs. GPT-5.5 Thinking: Honest Comparison
This is the comparison most people want:
GPT-5.5 Thinking (Extended) advantages:
- Higher absolute performance ceiling — especially on competition math
- Better natural language understanding in reasoning tasks
- More reliable on very long reasoning chains
- Integrated with web search for current information
DeepSeek-R1 advantages:
- Free to run locally after initial setup
- Complete privacy — reasoning process stays on your machine
- No per-token cost — run as many reasoning tasks as needed
- 671B parameter version available (via cloud offloading) for research-grade tasks
- MIT licensed distilled versions (1.5B–70B) for commercial use
Honest performance gap: R1 32B and R1 70B are competitive with GPT-4-class reasoning on most practical tasks. They fall behind GPT-5.5 Thinking (Extended) on frontier mathematical olympiad problems and highly nuanced analysis. For everyday professional reasoning — business analysis, debugging, systematic problem-solving — the gap is smaller than the cost difference suggests.
R1 Distilled Variants
DeepSeek released distilled versions of R1 — smaller models trained to mimic R1’s reasoning using knowledge distillation. These use standard model architectures (Qwen, Llama) with R1’s reasoning capability transferred:
# Distilled on Qwen2.5 architecture
ollama pull deepseek-r1:1.5b-qwen # Tiny, surprisingly capable
ollama pull deepseek-r1:7b-qwen # Good for constrained hardware
ollama pull deepseek-r1:14b-qwen # Best distilled option
# Distilled on Llama architecture
ollama pull deepseek-r1:8b-llama # Llama 3 architecture
ollama pull deepseek-r1:70b-llama # Strongest distilled
Key point: Distilled models are MIT licensed — fully permissive for commercial use. The full R1 model uses the DeepSeek License. If you are building commercial applications, distilled R1 models eliminate licensing uncertainty.
Common R1 Mistakes
Mistake 1: Using R1 for everything R1’s thinking phase adds latency that is wasted on simple tasks. “What is photosynthesis?” does not need chain-of-thought. Use standard models for conversational and simple tasks; reach for R1 when the problem genuinely requires reasoning.
Mistake 2: Expecting instant responses R1 14B with Extended context generates the thinking trace before answering — on a complex problem, this can take 30–90 seconds before the final answer appears. This is not a malfunction. The thinking is happening.
Mistake 3: Not reading the thinking trace The thinking trace is not just scaffolding — it shows you exactly where the model is reasoning, where it catches errors, and where it is uncertain. Reading it is the most efficient way to identify if the final answer should be trusted.
Mistake 4: Small models for hard problems R1 7B has reasoning capability but limited knowledge base. For competitive mathematics or advanced technical problems, you need R1 32B or 70B. The thinking architecture alone does not compensate for limited model capacity.
Conclusion
DeepSeek-R1 is the first local model that genuinely earns the description “reasoning model” rather than just a fast response generator. On mathematics, logic puzzles, complex debugging, and systematic analysis — R1 with its visible chain-of-thought produces qualitatively better results than standard local models.
The 14B variant is the practical sweet spot: fits comfortably in 10GB VRAM, generates responses at usable speed, and delivers reasoning capability that was cloud-only twelve months ago.
Your next step: ollama pull deepseek-r1:14b. Then give it a math problem or logic puzzle you would normally need to work through carefully yourself. Watch the thinking trace. See what the reasoning looks like from the inside.
📚 Continue the Series:
- ← Previous Llama 4 Scout: Meta’s Best Local Model
- Next → Qwen3 and Kimi K2.6: Best Coding Models Locally
- For general use Llama 4 Scout Guide
- For building Building AI Apps With Ollama and Python
Last updated: May 2026. DeepSeek releases model updates periodically. Check ollama.com/library/deepseek-r1 for the latest versions.
⚠️ R1’s thinking phase generates many tokens before the final answer — this is expected behavior. Total response time is longer than standard models on complex tasks. Budget accordingly when using in applications.