Skip to content
← Back to Blog

DeepSeek-R1 Locally: The Best Reasoning Model You Can Run Yourself

DeepSeek-R1 brings chain-of-thought reasoning — the capability that powers OpenAI's thinking models — to your local hardware with full privacy and...

Featured cover graphic for: DeepSeek-R1 Locally: The Best Reasoning Model You Can Run Yourself

OpenAI’s reasoning models — GPT-5.5 Thinking, the o-series before it — represent one of the most significant capability advances in AI: models that think through problems step-by-step before answering, catching their own errors, exploring multiple approaches, and producing qualitatively better results on hard problems.

Until recently, this capability required a paid cloud subscription. DeepSeek-R1 changed that.

Released by Chinese AI lab DeepSeek in January 2026, R1 is a reasoning model — it generates a visible chain-of-thought thinking trace before producing its final answer, exactly like OpenAI’s thinking models. R1 runs locally via Ollama, meaning your math problems, complex analyses, and logical reasoning tasks can benefit from extended chain-of-thought with full privacy and no per-token cost.

R1’s performance is genuine. On MATH benchmarks, DeepSeek-R1 32B scores comparably to GPT-4 with reasoning. On AIME and competition mathematics, R1 70B approaches GPT-4o performance. On software engineering benchmarks (SWE-bench), R1 outperforms several reasoning model alternatives.

This guide covers every R1 variant, how to use the thinking trace effectively, the tasks where R1 genuinely excels, and where it is overkill.

🔗 This is Post #6 in the Ollama Unlocked series. For the general model guide including R1’s position in the model landscape, see The Local LLM Model Guide (Post #2). For coding models, see Qwen3 and Kimi K2.6 (Post #7).


How DeepSeek-R1 Works

The Chain-of-Thought Architecture

Standard language models generate tokens sequentially — each token directly builds toward the response. R1 uses a different process:

  1. Thinking phase: The model generates a <think>...</think> block containing its internal reasoning — exploring approaches, checking logic, self-correcting
  2. Answer phase: After thinking completes, the model generates the final response based on its reasoning

The thinking trace is visible to you — you can watch R1 work through a problem, see where it corrects itself, and understand the reasoning path to the conclusion.

This matters because:

  • The model can backtrack when it realizes an approach is wrong
  • Multi-step problems can be broken into verifiable sub-steps
  • Mathematical errors are more likely to be caught during thinking
  • You can identify exactly where reasoning fails if the answer is wrong

R1 vs. Standard Models: When It Matters

The chain-of-thought thinking adds latency — R1 is slower than standard models for the same apparent output length because of the hidden thinking phase. This cost is worth it specifically when:

  • The task requires multi-step logical reasoning
  • Mathematical accuracy matters
  • The problem has multiple approaches and you need the best one
  • Errors in the chain of reasoning cascade to a wrong answer

It is NOT worth it for:

  • Simple factual questions
  • Writing and creative tasks
  • Conversational exchanges
  • Tasks where the reasoning is not complex

The R1 Size Variants

DeepSeek-R1 is available in multiple sizes. Choose based on your hardware:

# 7B — Entry level reasoning (6GB VRAM)
ollama pull deepseek-r1:7b

# 8B — Slightly better (6GB VRAM)
ollama pull deepseek-r1:8b

# 14B — Solid reasoning (10GB VRAM)
ollama pull deepseek-r1:14b

# 32B — Strong reasoning (20GB VRAM)
ollama pull deepseek-r1:32b

# 70B — Best local reasoning (40GB VRAM)
ollama pull deepseek-r1:70b

# 671B — Cloud-scale via Ollama cloud offloading
ollama pull deepseek-r1:671b-cloud

Performance vs. Hardware Trade-offs

Variant VRAM Math Performance Speed Recommendation
7B 6 GB Basic reasoning improvement 25–40 t/s Entry point only
14B 10 GB Good — noticeable improvement 15–25 t/s Best value
32B 20 GB Strong — competitive with paid models 8–15 t/s If hardware allows
70B 40 GB Excellent — approaches GPT-4 level 4–8 t/s Workstation only

The 14B recommendation: DeepSeek-R1 14B provides a genuine, noticeable improvement over standard 7B models on reasoning tasks while fitting comfortably in 10GB VRAM. It is the practical sweet spot for most users.


Installation and Basic Usage

# Pull your preferred size
ollama pull deepseek-r1:14b

# Run interactively
ollama run deepseek-r1:14b

# Important: give it a reasoning problem, not a simple question
# Bad: "What is the capital of France?"
# Good: "A train travels 150km in 2 hours. It then travels 
#         another 100km. If it takes 30 minutes less for the 
#         second leg than the first, what is the average speed 
#         of the entire journey?"

What you see when running R1:

<think>
Let me work through this step by step.

First leg: 150km in 2 hours → 75 km/h
Time for first leg: 2 hours

For the second leg: 100km, takes 30 minutes less than first leg
Time for second leg: 2 - 0.5 = 1.5 hours
Speed for second leg: 100/1.5 = 66.67 km/h

Total distance: 150 + 100 = 250 km
Total time: 2 + 1.5 = 3.5 hours

Average speed = 250 / 3.5 = 71.43 km/h

Wait, let me re-read the problem. "30 minutes less for the second 
leg than the first" — the first leg took 2 hours, so second leg 
takes 1.5 hours. That's what I calculated. ✓
</think>

The average speed for the entire journey is **71.43 km/h**.

Working:
- First leg: 150km ÷ 2h = 75 km/h, time = 2 hours
- Second leg: 100km in 1.5 hours (30 min less than first leg)
- Total: 250km in 3.5 hours = **71.43 km/h**

The thinking trace shows the model’s work — including the self-check at the end.


Where DeepSeek-R1 Dramatically Outperforms Standard Models

Mathematics

This is R1’s clearest domain. Multi-step math problems where one error propagates through subsequent calculations — R1’s self-checking significantly reduces error rates.

Test it yourself — compare R1:14B and llama4:scout on:

A company has 3 factories. Factory A produces 200 units/day, 
Factory B produces 350 units/day, Factory C produces 150 units/day.

Production increases: A by 15%, B decreases by 10%, C increases by 25%.

If the company needs to produce 850 units/day total, by what 
percentage would they need to further increase combined production?

R1 will typically show its calculation steps and catch arithmetic errors. A standard model will often calculate confidently but make errors.

Logic and Constraint Problems

Problems where multiple constraints must all be satisfied simultaneously:

Five people — Alice, Bob, Carlos, Diana, and Eve — 
need to be seated at a round table.

Constraints:
- Alice and Bob cannot sit adjacent
- Carlos must sit directly opposite Diana (across the table)
- Eve must sit next to either Alice or Carlos
- Bob cannot sit next to Eve

Find a valid seating arrangement and verify it satisfies 
all constraints.

R1 systematically explores possibilities and verifies constraints — standard models often miss one constraint or give inconsistent answers.

Code Analysis and Debugging

For debugging complex code where the bug involves subtle logic:

This function is supposed to find the longest palindromic 
subsequence in a string, but it gives wrong answers for some inputs.

[paste code]

Find the bug. Think through the algorithm step by step before 
identifying what is wrong.

R1’s systematic approach to tracing execution paths catches errors that pattern-matching approaches miss.

Multi-Step Reasoning Chains

Questions where the answer requires building on intermediate conclusions:

A startup has these metrics:
- Monthly burn rate: $85,000
- Current ARR: $240,000  
- MoM ARR growth: 15%
- Current cash: $510,000

At current growth rate, when will monthly revenue exceed monthly 
burn rate, and how much cash will they have left at that point?
Assume burn rate stays constant.

Prompting R1 Effectively

Let It Think — Don’t Interrupt

The thinking phase produces better final answers. Do not add instructions that cut it short:

# Bad — cuts off reasoning
"Answer briefly: [complex math problem]"

# Good — lets R1 think fully
"[complex math problem]"
# Or explicitly invite thinking:
"Think through this carefully: [problem]"

Ask for Verification

R1 benefits from explicit verification requests:

After solving, verify your answer by checking it satisfies 
all the conditions in the problem.

This triggers an additional self-check loop in the thinking trace.

Format for Reasoning Tasks

Problem: [state the problem clearly with all given information]

Work through this step by step. Show your reasoning.
After reaching an answer, verify it against the original conditions.

When to Hide the Thinking Trace

In production applications using R1 via API, you may want to suppress the <think> tags from the final output:

import ollama
import re

response = ollama.chat(
    model='deepseek-r1:14b',
    messages=[{'role': 'user', 'content': your_problem}]
)

# Remove thinking trace, keep only final answer
answer = re.sub(r'<think>.*?</think>', '', 
                response['message']['content'], 
                flags=re.DOTALL).strip()
print(answer)

DeepSeek-R1 vs. GPT-5.5 Thinking: Honest Comparison

This is the comparison most people want:

GPT-5.5 Thinking (Extended) advantages:

  • Higher absolute performance ceiling — especially on competition math
  • Better natural language understanding in reasoning tasks
  • More reliable on very long reasoning chains
  • Integrated with web search for current information

DeepSeek-R1 advantages:

  • Free to run locally after initial setup
  • Complete privacy — reasoning process stays on your machine
  • No per-token cost — run as many reasoning tasks as needed
  • 671B parameter version available (via cloud offloading) for research-grade tasks
  • MIT licensed distilled versions (1.5B–70B) for commercial use

Honest performance gap: R1 32B and R1 70B are competitive with GPT-4-class reasoning on most practical tasks. They fall behind GPT-5.5 Thinking (Extended) on frontier mathematical olympiad problems and highly nuanced analysis. For everyday professional reasoning — business analysis, debugging, systematic problem-solving — the gap is smaller than the cost difference suggests.


R1 Distilled Variants

DeepSeek released distilled versions of R1 — smaller models trained to mimic R1’s reasoning using knowledge distillation. These use standard model architectures (Qwen, Llama) with R1’s reasoning capability transferred:

# Distilled on Qwen2.5 architecture
ollama pull deepseek-r1:1.5b-qwen  # Tiny, surprisingly capable
ollama pull deepseek-r1:7b-qwen    # Good for constrained hardware
ollama pull deepseek-r1:14b-qwen   # Best distilled option

# Distilled on Llama architecture  
ollama pull deepseek-r1:8b-llama   # Llama 3 architecture
ollama pull deepseek-r1:70b-llama  # Strongest distilled

Key point: Distilled models are MIT licensed — fully permissive for commercial use. The full R1 model uses the DeepSeek License. If you are building commercial applications, distilled R1 models eliminate licensing uncertainty.


Common R1 Mistakes

Mistake 1: Using R1 for everything R1’s thinking phase adds latency that is wasted on simple tasks. “What is photosynthesis?” does not need chain-of-thought. Use standard models for conversational and simple tasks; reach for R1 when the problem genuinely requires reasoning.

Mistake 2: Expecting instant responses R1 14B with Extended context generates the thinking trace before answering — on a complex problem, this can take 30–90 seconds before the final answer appears. This is not a malfunction. The thinking is happening.

Mistake 3: Not reading the thinking trace The thinking trace is not just scaffolding — it shows you exactly where the model is reasoning, where it catches errors, and where it is uncertain. Reading it is the most efficient way to identify if the final answer should be trusted.

Mistake 4: Small models for hard problems R1 7B has reasoning capability but limited knowledge base. For competitive mathematics or advanced technical problems, you need R1 32B or 70B. The thinking architecture alone does not compensate for limited model capacity.


Conclusion

DeepSeek-R1 is the first local model that genuinely earns the description “reasoning model” rather than just a fast response generator. On mathematics, logic puzzles, complex debugging, and systematic analysis — R1 with its visible chain-of-thought produces qualitatively better results than standard local models.

The 14B variant is the practical sweet spot: fits comfortably in 10GB VRAM, generates responses at usable speed, and delivers reasoning capability that was cloud-only twelve months ago.

Your next step: ollama pull deepseek-r1:14b. Then give it a math problem or logic puzzle you would normally need to work through carefully yourself. Watch the thinking trace. See what the reasoning looks like from the inside.


📚 Continue the Series:


Last updated: May 2026. DeepSeek releases model updates periodically. Check ollama.com/library/deepseek-r1 for the latest versions.

⚠️ R1’s thinking phase generates many tokens before the final answer — this is expected behavior. Total response time is longer than standard models on complex tasks. Budget accordingly when using in applications.

Frequently Asked Questions (FAQ)

Is DeepSeek-R1 safe to run locally given it is from a Chinese lab?
For local Ollama inference, the model weights run entirely on your hardware. No data leaves your machine during inference. The weights are published openly and have been audited by the research community. The "Chinese lab" concern relates to cloud services sending data to Chinese servers — which is irrelevant when running the weights locally.
What is the difference between R1 and R1-Zero?
R1-Zero was the earlier release trained purely via reinforcement learning without supervised fine-tuning — it was impressive but sometimes produced messy or inconsistent reasoning traces. R1 added supervised fine-tuning on top, producing cleaner and more reliable reasoning. R1 is the recommended version.
When should I use R1 over Llama 4 Scout?
Use R1 when the task is primarily a reasoning, mathematics, or logic problem. Use Scout for writing, general conversation, and document analysis. Many users keep both: Scout as the daily driver, R1 for the problems that require careful reasoning.
Can I use R1 in production applications?
Yes, via the Ollama API. Use the distilled MIT-licensed versions (7b-qwen, 14b-qwen, etc.) for commercial applications to ensure clear licensing. Full R1 has its own license — review at deepseek.com for commercial use terms.

Disclaimer: The information contained on this blog is for academic and educational purposes only. Unauthorized use and/or duplication of this material without express and written permission from this site's author and/or owner is strictly prohibited. The materials (images, logos, content) contained in this web site are protected by applicable copyright and trademark law.