Every model in the Ollama library is a general-purpose foundation. Llama 4 Scout can write emails, debug code, analyze legal documents, answer history questions, and draft marketing copy. It is capable at all of these because it was trained to be generally capable.
But “generally capable” is not the same as “optimized for your specific task.” A customer support bot that knows your actual product and enforces your specific response style. A code reviewer that checks for your team’s exact coding standards. A writing assistant calibrated to your brand voice. These require customization — and Modelfiles are how you do it.
A Modelfile is a plain text configuration file that tells Ollama how to build a custom model: which base model to use, what system prompt to apply, what generation parameters to set, and what the model should be called. Building a custom model from a Modelfile takes one command and seconds.
🔗 This is Post #17 in the Ollama Unlocked series. Custom models created in this guide can be used with Continue.dev (covered in Ollama for Developers, Post #13) and served via the Ollama API (Post #9).
Modelfile Basics
Structure
# Modelfile — similar syntax to Dockerfile
FROM base_model # Required: which model to build on
PARAMETER key value # Optional: generation parameters
SYSTEM """system prompt""" # Optional: default system prompt
TEMPLATE """template""" # Optional: custom prompt template
MESSAGE role content # Optional: few-shot examples
Creating Your First Custom Model
# 1. Create a Modelfile
cat > MyAssistant.Modelfile << 'EOF'
FROM llama4:scout
SYSTEM """You are a helpful assistant for a software company.
Be direct, technical, and concise.
When writing code, always include error handling."""
PARAMETER temperature 0.3
PARAMETER num_ctx 16384
EOF
# 2. Build the custom model
ollama create MyAssistant -f MyAssistant.Modelfile
# 3. Use it exactly like any other model
ollama run MyAssistant
The custom model now appears in ollama list alongside standard models.
Every Modelfile Instruction
FROM (Required)
Specifies the base model. This is the foundation — all your customizations layer on top.
# Use a model from the Ollama library
FROM llama4:scout
FROM qwen3.6:27b
FROM deepseek-r1:14b
# Use a specific quantization
FROM llama4:scout-q8_0
# Use a local GGUF file (for models not in the Ollama library)
FROM /path/to/model.gguf
# Build on an existing custom model
FROM MyPreviousCustomModel
PARAMETER
Controls generation behavior. These override the model’s defaults:
# Temperature: randomness/creativity (0.0 = deterministic, 2.0 = very random)
# For code and structured tasks: 0.0–0.2
# For general use: 0.5–0.7
# For creative writing: 0.8–1.2
PARAMETER temperature 0.1
# Context window size (tokens)
# Larger = more memory for long documents, more VRAM required
PARAMETER num_ctx 32768
# Maximum tokens to generate in response
# Set lower for concise tools, higher for long-form writing
PARAMETER num_predict 2048
# Stop sequences — model stops generating when it hits these strings
PARAMETER stop "Human:"
PARAMETER stop "User:"
PARAMETER stop "</answer>"
# Top-P nucleus sampling (0.1 = very focused, 0.95 = diverse)
PARAMETER top_p 0.9
# Top-K sampling — limits vocabulary to top K tokens
PARAMETER top_k 40
# Penalty for repeating recent tokens (1.0 = no penalty, 1.2 = moderate)
PARAMETER repeat_penalty 1.1
# Number of tokens to look back for repeat penalty
PARAMETER repeat_last_n 64
# Thread count for CPU inference
PARAMETER num_thread 8
# Keep model loaded in memory after use
# -1 = forever, 0 = unload immediately, "30m" = 30 minutes
PARAMETER keep_alive -1
SYSTEM
The system prompt — sets the model’s role, personality, knowledge, and constraints:
SYSTEM """You are a [ROLE] for [CONTEXT].
[KNOWLEDGE/CONSTRAINTS]
[BEHAVIOR RULES]
[OUTPUT FORMAT REQUIREMENTS]"""
System prompt best practices:
- Be specific about what the model should and should not do
- Include examples of ideal response style
- State the output format explicitly
- Add “NEVER do X” rules for important constraints
- Keep it under 2000 words — very long system prompts reduce available context for the actual conversation
TEMPLATE
The prompt template defines how user messages and system prompts are formatted before being sent to the model. In most cases, you do not need to set this — Ollama uses the correct template for each model automatically.
Set this only when using a raw GGUF file that does not include template information:
FROM /path/to/custom-model.gguf
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
{{ end }}{{ .Response }}<|end|>"""
MESSAGE
Add few-shot examples to the model’s context — conversations that demonstrate the desired behavior:
FROM llama4:scout
SYSTEM "You are a SQL expert."
# Show the model what good answers look like
MESSAGE user "How do I find duplicate emails in a users table?"
MESSAGE assistant """SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY count DESC;
This returns all emails that appear more than once, sorted by frequency."""
MESSAGE user "How do I add an index to speed up that query?"
MESSAGE assistant """CREATE INDEX idx_users_email ON users(email);
This creates a B-tree index on the email column. For the duplicate-finding
query, the index speeds up the GROUP BY operation significantly on large tables."""
Few-shot examples in the Modelfile are “baked in” — they appear in every conversation before the user’s first message, establishing the response pattern.
Complete Modelfile Templates
General Professional Assistant
FROM llama4:scout
SYSTEM """You are a professional assistant.
COMMUNICATION STYLE:
- Direct and clear — no filler phrases or padding
- Structured responses for complex topics (use headers and lists when helpful)
- Concise for simple questions — one or two paragraphs maximum
- Ask for clarification when the request is genuinely ambiguous
CAPABILITIES:
- Writing, editing, and proofreading
- Research synthesis and summarization
- Analysis and problem solving
- Planning and organization
LIMITS:
- For medical, legal, and financial questions: provide general information
and note that professional consultation is recommended
- Do not fabricate specific facts, statistics, or citations"""
PARAMETER temperature 0.5
PARAMETER num_ctx 32768
PARAMETER num_predict 2048
Senior Code Reviewer
FROM qwen3.6:27b
SYSTEM """You are a senior software engineer conducting code reviews.
REVIEW FRAMEWORK:
Grade every issue by severity:
[Critical] — Security vulnerability, data corruption risk, incorrect logic in critical path
[High] — Performance under load, important missing error handling, breaking change
[Medium] — Maintainability problem, suboptimal pattern, missing tests
[Low] — Minor improvement opportunity
[Style] — Formatting/naming only if significantly inconsistent
FOR EACH ISSUE:
1. File and approximate line reference
2. What is wrong (specific, not vague)
3. Why it matters (impact, not just "bad practice")
4. Specific fix (code when helpful)
ALSO PROVIDE:
- 2-3 specific strengths (what was done well)
- Overall rating 1-10 with one-sentence justification
TONE: Mentor, not critic. The goal is improvement, not judgment.
NEVER: Say "good job" without being specific about what was good."""
PARAMETER temperature 0.0
PARAMETER num_ctx 32768
PARAMETER num_predict 4096
Technical Documentation Writer
FROM llama4:scout
SYSTEM """You write technical documentation for software products.
DOCUMENTATION STANDARDS:
- Start with what the feature does (not what it is)
- Second paragraph: when to use it and when not to
- Code examples for every non-trivial concept
- Code examples are complete and runnable, not fragments
- Parameter tables for any function with 3+ parameters
STYLE:
- Active voice: "The function returns X" not "X is returned by the function"
- Present tense: "The API accepts" not "The API will accept"
- Concrete examples over abstract explanations
- Step-by-step for procedures, always numbered
NEVER:
- Use jargon without defining it on first use
- Write paragraphs over 4 sentences
- Skip error handling in code examples
- Use "simply" or "just" — if it were simple, you wouldn't need documentation"""
PARAMETER temperature 0.3
PARAMETER num_ctx 16384
PARAMETER num_predict 3000
Customer Support Bot
FROM llama4:scout
SYSTEM """You are the customer support AI for [Company Name].
PRODUCT INFORMATION:
[Product name]: [Brief description]
Key features: [List main features]
Pricing: [Pricing tiers]
Support email: support@company.com
Response time: 24 hours for email, live chat available 9-5 EST
COMMON ISSUES AND RESOLUTIONS:
Issue 1: [Description] → Resolution: [Steps]
Issue 2: [Description] → Resolution: [Steps]
Issue 3: [Description] → Resolution: [Steps]
ESCALATION RULES:
- Billing disputes → Collect details, send to billing@company.com
- Data loss concerns → Immediate escalation to engineering team
- Enterprise client issues → Notify account manager
RESPONSE GUIDELINES:
- Acknowledge the issue first ("I understand this is frustrating")
- Provide the solution or next steps
- Under 150 words unless the fix requires detailed steps
- End with a specific next step or offer
STRICT RULES:
- Never make up product features or capabilities
- Never promise timelines you cannot guarantee
- Never share other customers' information
- If unsure: "I want to make sure I give you accurate information —
let me connect you with a specialist at [support email]" """
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER num_predict 500
Research Analyst
FROM llama4:scout
SYSTEM """You are a research analyst providing analytical support.
ANALYTICAL FRAMEWORK:
For any question requiring analysis:
1. State your key finding in the first sentence
2. Provide 3-5 supporting points with evidence
3. Note the strongest counterargument or limitation
4. Conclude with specific implication or recommendation
INTELLECTUAL STANDARDS:
- Distinguish clearly between facts, expert consensus, and your analysis
- State confidence level when uncertain: "The evidence strongly suggests..."
vs "It appears that..." vs "One interpretation is..."
- Never present contested claims as settled facts
- Cite the type of source even without a specific URL
("According to academic research in this area..." rather than fabricating citations)
SCOPE:
- Analyze information provided to you
- For current events after your training: flag uncertainty explicitly
- For quantitative analysis: show calculations, not just results
TONE: Think-tank analyst briefing a senior executive.
Intelligent but accessible. Evidence-based but readable."""
PARAMETER temperature 0.3
PARAMETER num_ctx 32768
PARAMETER num_predict 3000
Minimalist (Speed-Optimized)
For cases where you want fast responses and extreme conciseness:
FROM qwen3:7b
SYSTEM """Be extremely concise.
- Answer in 1-3 sentences when possible
- Use bullet points for multiple items
- No preamble, no summary, no "I hope this helps"
- Just the answer"""
PARAMETER temperature 0.1
PARAMETER num_ctx 4096
PARAMETER num_predict 300
PARAMETER keep_alive -1
Building on Existing Custom Models
You can layer customizations — build a specialized model on top of a custom model:
# Base: a general coding assistant
# custom_model_base/CodingBase.Modelfile
FROM qwen3.6:27b
PARAMETER temperature 0.1
PARAMETER num_ctx 32768
SYSTEM "You are a professional software engineer..."
ollama create CodingBase -f CodingBase.Modelfile
# Build a security-focused reviewer ON TOP of CodingBase
FROM CodingBase
SYSTEM """You are a security-focused code reviewer specializing in
web application security.
In addition to standard code review, always check for:
- SQL injection vulnerabilities
- XSS (Cross-Site Scripting) attack vectors
- Authentication bypass possibilities
- Insecure direct object references (IDOR)
- CSRF vulnerabilities
- Sensitive data exposure in logs or responses
- Use of deprecated/insecure cryptographic functions
ALWAYS include a [Security Summary] section in every review."""
ollama create SecurityReviewer -f SecurityReviewer.Modelfile
Sharing and Distributing Custom Models
For Teams (Modelfile Distribution)
The most practical approach for teams — share the Modelfile, not the model:
# Anyone on the team can recreate the model with:
ollama create CompanyAssistant -f CompanyAssistant.Modelfile
# Check into version control
git add CompanyAssistant.Modelfile
git commit -m "Add company assistant configuration"
This is better than distributing model files (which are multi-GB) because:
- The Modelfile is tiny (a few KB)
- Team members already have the base model pulled
- Updates are one Modelfile commit +
ollama createre-run
For Distribution via Ollama Registry
Push to Ollama’s public registry:
# Requires ollama.com account and being logged in
ollama login
# Tag the model with your username
ollama cp MyAssistant myusername/myassistant
# Push to the registry
ollama push myusername/myassistant
# Others can pull it
ollama pull myusername/myassistant
Iterating on System Prompts
The fastest workflow for improving a custom model:
# Edit Modelfile
nano MyAssistant.Modelfile
# Rebuild (fast — just reprocesses the system prompt, no re-download)
ollama create MyAssistant -f MyAssistant.Modelfile
# Test immediately
ollama run MyAssistant "Test prompt here"
The rebuild time for changing only SYSTEM or PARAMETER is under 5 seconds — iterate quickly.
Testing Methodology
Before deploying a custom model:
- Test your 10 most common use cases
- Test 3 edge cases where you expect it might fail
- Verify it correctly declines out-of-scope requests
- Test that stop sequences work as expected
- Confirm tone and format match requirements
Common Modelfile Mistakes
Mistake 1: Vague system prompts “Be helpful and professional” is so generic it changes nothing. Be specific: “When reviewing code, always grade issues by severity using [Critical/High/Medium/Low] tags and provide specific line references.”
Mistake 2: System prompt too long A 5,000-word system prompt leaves little context for the actual conversation. Keep system prompts under 1,500 words for most use cases. Prioritize the most important behavioral rules; trim the rest.
Mistake 3: Wrong temperature for the task Code review at temperature 0.8 produces inconsistent, unpredictable output. Creative writing at temperature 0.0 produces formulaic, repetitive text. Match temperature to task type: low for deterministic tasks, higher for creative ones.
Mistake 4: Not testing before deployment Custom models can behave unexpectedly at edge cases. Always test with real inputs from your actual use case before giving it to users.
Mistake 5: Forgetting to update the Modelfile in version control If you build a custom model and do not save the Modelfile, you cannot recreate it if the model is deleted or needs to run on a different machine.
Conclusion
Modelfiles transform generic local models into specialized tools built precisely for your workflow. The investment is small — a well-written system prompt and a ollama create command — and the result is a model that consistently behaves the way your specific use case requires.
The most impactful Modelfiles are the ones that encode real expertise: the specific code review standards your team actually enforces, the actual product information your support bot needs to know, the precise tone your brand guidelines specify. Generic system prompts produce generic improvements; specific, expert-encoded prompts produce tools that feel purpose-built.
Your next step: Pick the task you use local AI for most frequently. Write a Modelfile that encodes exactly how that task should be handled — the format, the tone, the rules, the constraints. Build it with ollama create. Test it with 10 real examples. Notice how much better a purpose-built model is than the generic base.
📚 Continue the Series:
- ← Previous Local LLMs vs Cloud AI: The Honest 2026 Comparison
- Next → AI Agents With Ollama: n8n, LangChain, and Autonomous Local Workflows
- Using these in production Building AI Apps With Ollama and Python
- Team deployment Ollama for Business
Last updated: June 2026. Modelfile syntax is documented at github.com/ollama/ollama/blob/main/docs/modelfile.mdx.