The Modelfile: Customize, Brand, and Build With Any Local Model

Every model in the Ollama library is a general-purpose foundation. Llama 4 Scout can write emails, debug code, analyze legal documents, answer history questions, and draft marketing copy. It is capable at all of these because it was trained to be generally capable.

But “generally capable” is not the same as “optimized for your specific task.” A customer support bot that knows your actual product and enforces your specific response style. A code reviewer that checks for your team’s exact coding standards. A writing assistant calibrated to your brand voice. These require customization — and Modelfiles are how you do it.

A Modelfile is a plain text configuration file that tells Ollama how to build a custom model: which base model to use, what system prompt to apply, what generation parameters to set, and what the model should be called. Building a custom model from a Modelfile takes one command and seconds.

🔗 This is Post #17 in the Ollama Unlocked series. Custom models created in this guide can be used with Continue.dev (covered in Ollama for Developers, Post #13) and served via the Ollama API (Post #9).

Modelfile Basics

Structure

# Modelfile — similar syntax to Dockerfile
FROM base_model           # Required: which model to build on
PARAMETER key value       # Optional: generation parameters
SYSTEM """system prompt"""  # Optional: default system prompt
TEMPLATE """template"""   # Optional: custom prompt template
MESSAGE role content      # Optional: few-shot examples

Creating Your First Custom Model

# 1. Create a Modelfile
cat > MyAssistant.Modelfile << 'EOF'
FROM llama4:scout

SYSTEM """You are a helpful assistant for a software company.
Be direct, technical, and concise.
When writing code, always include error handling."""

PARAMETER temperature 0.3
PARAMETER num_ctx 16384
EOF

# 2. Build the custom model
ollama create MyAssistant -f MyAssistant.Modelfile

# 3. Use it exactly like any other model
ollama run MyAssistant

The custom model now appears in ollama list alongside standard models.

Every Modelfile Instruction

FROM (Required)

Specifies the base model. This is the foundation — all your customizations layer on top.

# Use a model from the Ollama library
FROM llama4:scout
FROM qwen3.6:27b
FROM deepseek-r1:14b

# Use a specific quantization
FROM llama4:scout-q8_0

# Use a local GGUF file (for models not in the Ollama library)
FROM /path/to/model.gguf

# Build on an existing custom model
FROM MyPreviousCustomModel

PARAMETER

Controls generation behavior. These override the model’s defaults:

# Temperature: randomness/creativity (0.0 = deterministic, 2.0 = very random)
# For code and structured tasks: 0.0–0.2
# For general use: 0.5–0.7
# For creative writing: 0.8–1.2
PARAMETER temperature 0.1

# Context window size (tokens)
# Larger = more memory for long documents, more VRAM required
PARAMETER num_ctx 32768

# Maximum tokens to generate in response
# Set lower for concise tools, higher for long-form writing
PARAMETER num_predict 2048

# Stop sequences — model stops generating when it hits these strings
PARAMETER stop "Human:"
PARAMETER stop "User:"
PARAMETER stop "</answer>"

# Top-P nucleus sampling (0.1 = very focused, 0.95 = diverse)
PARAMETER top_p 0.9

# Top-K sampling — limits vocabulary to top K tokens
PARAMETER top_k 40

# Penalty for repeating recent tokens (1.0 = no penalty, 1.2 = moderate)
PARAMETER repeat_penalty 1.1

# Number of tokens to look back for repeat penalty
PARAMETER repeat_last_n 64

# Thread count for CPU inference
PARAMETER num_thread 8

# Keep model loaded in memory after use
# -1 = forever, 0 = unload immediately, "30m" = 30 minutes
PARAMETER keep_alive -1

SYSTEM

The system prompt — sets the model’s role, personality, knowledge, and constraints:

SYSTEM """You are a [ROLE] for [CONTEXT].

[KNOWLEDGE/CONSTRAINTS]

[BEHAVIOR RULES]

[OUTPUT FORMAT REQUIREMENTS]"""

System prompt best practices:

Be specific about what the model should and should not do
Include examples of ideal response style
State the output format explicitly
Add “NEVER do X” rules for important constraints
Keep it under 2000 words — very long system prompts reduce available context for the actual conversation

TEMPLATE

The prompt template defines how user messages and system prompts are formatted before being sent to the model. In most cases, you do not need to set this — Ollama uses the correct template for each model automatically.

Set this only when using a raw GGUF file that does not include template information:

FROM /path/to/custom-model.gguf

TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
{{ end }}{{ .Response }}<|end|>"""

MESSAGE

Add few-shot examples to the model’s context — conversations that demonstrate the desired behavior:

FROM llama4:scout

SYSTEM "You are a SQL expert."

# Show the model what good answers look like
MESSAGE user "How do I find duplicate emails in a users table?"
MESSAGE assistant """SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY count DESC;

This returns all emails that appear more than once, sorted by frequency."""

MESSAGE user "How do I add an index to speed up that query?"
MESSAGE assistant """CREATE INDEX idx_users_email ON users(email);

This creates a B-tree index on the email column. For the duplicate-finding
query, the index speeds up the GROUP BY operation significantly on large tables."""

Few-shot examples in the Modelfile are “baked in” — they appear in every conversation before the user’s first message, establishing the response pattern.

Complete Modelfile Templates

General Professional Assistant

FROM llama4:scout

SYSTEM """You are a professional assistant.

COMMUNICATION STYLE:
- Direct and clear — no filler phrases or padding
- Structured responses for complex topics (use headers and lists when helpful)
- Concise for simple questions — one or two paragraphs maximum
- Ask for clarification when the request is genuinely ambiguous

CAPABILITIES:
- Writing, editing, and proofreading
- Research synthesis and summarization
- Analysis and problem solving
- Planning and organization

LIMITS:
- For medical, legal, and financial questions: provide general information 
  and note that professional consultation is recommended
- Do not fabricate specific facts, statistics, or citations"""

PARAMETER temperature 0.5
PARAMETER num_ctx 32768
PARAMETER num_predict 2048

Senior Code Reviewer

FROM qwen3.6:27b

SYSTEM """You are a senior software engineer conducting code reviews.

REVIEW FRAMEWORK:
Grade every issue by severity:
[Critical] — Security vulnerability, data corruption risk, incorrect logic in critical path
[High] — Performance under load, important missing error handling, breaking change
[Medium] — Maintainability problem, suboptimal pattern, missing tests
[Low] — Minor improvement opportunity
[Style] — Formatting/naming only if significantly inconsistent

FOR EACH ISSUE:
1. File and approximate line reference
2. What is wrong (specific, not vague)
3. Why it matters (impact, not just "bad practice")
4. Specific fix (code when helpful)

ALSO PROVIDE:
- 2-3 specific strengths (what was done well)
- Overall rating 1-10 with one-sentence justification

TONE: Mentor, not critic. The goal is improvement, not judgment.
NEVER: Say "good job" without being specific about what was good."""

PARAMETER temperature 0.0
PARAMETER num_ctx 32768
PARAMETER num_predict 4096

Technical Documentation Writer

FROM llama4:scout

SYSTEM """You write technical documentation for software products.

DOCUMENTATION STANDARDS:
- Start with what the feature does (not what it is)
- Second paragraph: when to use it and when not to
- Code examples for every non-trivial concept
- Code examples are complete and runnable, not fragments
- Parameter tables for any function with 3+ parameters

STYLE:
- Active voice: "The function returns X" not "X is returned by the function"
- Present tense: "The API accepts" not "The API will accept"
- Concrete examples over abstract explanations
- Step-by-step for procedures, always numbered

NEVER:
- Use jargon without defining it on first use
- Write paragraphs over 4 sentences
- Skip error handling in code examples
- Use "simply" or "just" — if it were simple, you wouldn't need documentation"""

PARAMETER temperature 0.3
PARAMETER num_ctx 16384
PARAMETER num_predict 3000

Customer Support Bot

FROM llama4:scout

SYSTEM """You are the customer support AI for [Company Name].

PRODUCT INFORMATION:
[Product name]: [Brief description]
Key features: [List main features]
Pricing: [Pricing tiers]
Support email: support@company.com
Response time: 24 hours for email, live chat available 9-5 EST

COMMON ISSUES AND RESOLUTIONS:
Issue 1: [Description] → Resolution: [Steps]
Issue 2: [Description] → Resolution: [Steps]
Issue 3: [Description] → Resolution: [Steps]

ESCALATION RULES:
- Billing disputes → Collect details, send to billing@company.com
- Data loss concerns → Immediate escalation to engineering team
- Enterprise client issues → Notify account manager

RESPONSE GUIDELINES:
- Acknowledge the issue first ("I understand this is frustrating")
- Provide the solution or next steps
- Under 150 words unless the fix requires detailed steps
- End with a specific next step or offer

STRICT RULES:
- Never make up product features or capabilities
- Never promise timelines you cannot guarantee
- Never share other customers' information
- If unsure: "I want to make sure I give you accurate information — 
  let me connect you with a specialist at [support email]" """

PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER num_predict 500

Research Analyst

FROM llama4:scout

SYSTEM """You are a research analyst providing analytical support.

ANALYTICAL FRAMEWORK:
For any question requiring analysis:
1. State your key finding in the first sentence
2. Provide 3-5 supporting points with evidence
3. Note the strongest counterargument or limitation
4. Conclude with specific implication or recommendation

INTELLECTUAL STANDARDS:
- Distinguish clearly between facts, expert consensus, and your analysis
- State confidence level when uncertain: "The evidence strongly suggests..." 
  vs "It appears that..." vs "One interpretation is..."
- Never present contested claims as settled facts
- Cite the type of source even without a specific URL 
  ("According to academic research in this area..." rather than fabricating citations)

SCOPE:
- Analyze information provided to you
- For current events after your training: flag uncertainty explicitly
- For quantitative analysis: show calculations, not just results

TONE: Think-tank analyst briefing a senior executive.
Intelligent but accessible. Evidence-based but readable."""

PARAMETER temperature 0.3
PARAMETER num_ctx 32768
PARAMETER num_predict 3000

Minimalist (Speed-Optimized)

For cases where you want fast responses and extreme conciseness:

FROM qwen3:7b

SYSTEM """Be extremely concise. 
- Answer in 1-3 sentences when possible
- Use bullet points for multiple items
- No preamble, no summary, no "I hope this helps"
- Just the answer"""

PARAMETER temperature 0.1
PARAMETER num_ctx 4096
PARAMETER num_predict 300
PARAMETER keep_alive -1

Building on Existing Custom Models

You can layer customizations — build a specialized model on top of a custom model:

# Base: a general coding assistant
# custom_model_base/CodingBase.Modelfile
FROM qwen3.6:27b
PARAMETER temperature 0.1
PARAMETER num_ctx 32768
SYSTEM "You are a professional software engineer..."

ollama create CodingBase -f CodingBase.Modelfile

# Build a security-focused reviewer ON TOP of CodingBase
FROM CodingBase
SYSTEM """You are a security-focused code reviewer specializing in 
web application security.

In addition to standard code review, always check for:
- SQL injection vulnerabilities
- XSS (Cross-Site Scripting) attack vectors
- Authentication bypass possibilities
- Insecure direct object references (IDOR)
- CSRF vulnerabilities
- Sensitive data exposure in logs or responses
- Use of deprecated/insecure cryptographic functions

ALWAYS include a [Security Summary] section in every review."""

ollama create SecurityReviewer -f SecurityReviewer.Modelfile

For Teams (Modelfile Distribution)

The most practical approach for teams — share the Modelfile, not the model:

# Anyone on the team can recreate the model with:
ollama create CompanyAssistant -f CompanyAssistant.Modelfile

# Check into version control
git add CompanyAssistant.Modelfile
git commit -m "Add company assistant configuration"

This is better than distributing model files (which are multi-GB) because:

The Modelfile is tiny (a few KB)
Team members already have the base model pulled
Updates are one Modelfile commit + ollama create re-run

For Distribution via Ollama Registry

Push to Ollama’s public registry:

# Requires ollama.com account and being logged in
ollama login

# Tag the model with your username
ollama cp MyAssistant myusername/myassistant

# Push to the registry
ollama push myusername/myassistant

# Others can pull it
ollama pull myusername/myassistant

Iterating on System Prompts

The fastest workflow for improving a custom model:

# Edit Modelfile
nano MyAssistant.Modelfile

# Rebuild (fast — just reprocesses the system prompt, no re-download)
ollama create MyAssistant -f MyAssistant.Modelfile

# Test immediately
ollama run MyAssistant "Test prompt here"

The rebuild time for changing only SYSTEM or PARAMETER is under 5 seconds — iterate quickly.

Testing Methodology

Before deploying a custom model:

Test your 10 most common use cases
Test 3 edge cases where you expect it might fail
Verify it correctly declines out-of-scope requests
Test that stop sequences work as expected
Confirm tone and format match requirements

Common Modelfile Mistakes

Mistake 1: Vague system prompts “Be helpful and professional” is so generic it changes nothing. Be specific: “When reviewing code, always grade issues by severity using [Critical/High/Medium/Low] tags and provide specific line references.”

Mistake 2: System prompt too long A 5,000-word system prompt leaves little context for the actual conversation. Keep system prompts under 1,500 words for most use cases. Prioritize the most important behavioral rules; trim the rest.

Mistake 3: Wrong temperature for the task Code review at temperature 0.8 produces inconsistent, unpredictable output. Creative writing at temperature 0.0 produces formulaic, repetitive text. Match temperature to task type: low for deterministic tasks, higher for creative ones.

Mistake 4: Not testing before deployment Custom models can behave unexpectedly at edge cases. Always test with real inputs from your actual use case before giving it to users.

Mistake 5: Forgetting to update the Modelfile in version control If you build a custom model and do not save the Modelfile, you cannot recreate it if the model is deleted or needs to run on a different machine.

Conclusion

Modelfiles transform generic local models into specialized tools built precisely for your workflow. The investment is small — a well-written system prompt and a ollama create command — and the result is a model that consistently behaves the way your specific use case requires.

The most impactful Modelfiles are the ones that encode real expertise: the specific code review standards your team actually enforces, the actual product information your support bot needs to know, the precise tone your brand guidelines specify. Generic system prompts produce generic improvements; specific, expert-encoded prompts produce tools that feel purpose-built.

Your next step: Pick the task you use local AI for most frequently. Write a Modelfile that encodes exactly how that task should be handled — the format, the tone, the rules, the constraints. Build it with ollama create. Test it with 10 real examples. Notice how much better a purpose-built model is than the generic base.

📚 Continue the Series:

← Previous Local LLMs vs Cloud AI: The Honest 2026 Comparison

Next → AI Agents With Ollama: n8n, LangChain, and Autonomous Local Workflows

Using these in production Building AI Apps With Ollama and Python

Team deployment Ollama for Business

Last updated: June 2026. Modelfile syntax is documented at github.com/ollama/ollama/blob/main/docs/modelfile.mdx.

Frequently Asked Questions (FAQ)

Do I lose access to the base model's capabilities when I create a custom model?

No — a custom model has all the capabilities of its base model. The Modelfile adds constraints and defaults; it does not remove capabilities. A user can still ask your code review model to write poetry.

Can I use a Modelfile to make a model refuse certain topics?

You can instruct it to decline out-of-scope requests via the system prompt. This is not a hard technical block — a determined user can still elicit responses by rephrasing. For true content blocking, you need application-level filtering on top of the model.

How does a Modelfile system prompt interact with system prompts set in the API?

When you call the Ollama API with a system message in the messages array, it typically overrides or appends to the Modelfile system prompt depending on the model's template. Test the specific model's behavior — use the Modelfile system prompt for baked-in behavior and reserve API system prompts for per-conversation overrides.

Can I use the same Modelfile for different base models?

Yes — change the FROM line and rebuild. This is useful for testing- build the same custom model on `qwen3:7b` for fast testing and `qwen3.6:27b` for deployment quality.