Most developers who use AI coding assistants send their code to cloud servers. Every snippet you paste into ChatGPT, every function that gets sent to GitHub Copilot, every debugging session with Claude — all of it leaves your machine and is processed on remote infrastructure. For most code, this is fine. For proprietary code, client projects, and security-sensitive codebases, it is a real concern.
A complete local AI development environment changes this. With Ollama running Qwen 3.6 27B or Kimi K2.6, you get coding assistance that is genuinely competitive with cloud tools — without any code leaving your machine. Continue.dev provides in-editor autocomplete and chat. Aider handles autonomous multi-file coding tasks. Shell tools give you AI from the terminal. The entire setup is private, free to use at any volume, and optimized for the specific languages and frameworks you work with.
This guide builds the complete local AI dev environment from scratch.
🔗 This is Post #13 in the Ollama Unlocked series. For the coding model comparison (Qwen3 vs Kimi K2.6), see Post #7. For the Ollama API that powers these integrations, see Post #9.
The Local AI Dev Stack
The complete environment uses these components:
| Tool | Purpose | Models |
|---|---|---|
| Continue.dev | IDE integration (VS Code / JetBrains) | Qwen3.6:27b (chat), Qwen3:7b (autocomplete) |
| Aider | Autonomous multi-file coding | Kimi-k2.6 or Qwen3.6:27b |
| Shell AI | Terminal-based AI for commands | Llama4:scout |
| Custom Modelfiles | Specialized models for your stack | Any base model |
| Open WebUI | Chat interface for longer sessions | All models |
Continue.dev: AI Inside Your Editor
Continue is an open-source VS Code and JetBrains extension that connects to Ollama for in-editor AI assistance. It replaces GitHub Copilot with a fully local equivalent.
Installation
VS Code:
- Open Extensions (
Ctrl+Shift+X) - Search “Continue”
- Install the “Continue - Codestral, Claude, and more” extension
- Click the Continue icon in the left sidebar
JetBrains (IntelliJ, PyCharm, etc.):
- Settings → Plugins → Marketplace
- Search “Continue”
- Install and restart
Configuration
Edit ~/.continue/config.json:
{
"models": [
{
"title": "Qwen Coder (Primary)",
"provider": "ollama",
"model": "qwen3.6:27b",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": {
"temperature": 0.1,
"maxTokens": 4096
}
},
{
"title": "Llama 4 Scout (General)",
"provider": "ollama",
"model": "llama4:scout",
"apiBase": "http://localhost:11434",
"contextLength": 32768
},
{
"title": "DeepSeek R1 (Reasoning)",
"provider": "ollama",
"model": "deepseek-r1:14b",
"apiBase": "http://localhost:11434",
"contextLength": 16384
}
],
"tabAutocompleteModel": {
"title": "Qwen 3 7B (Fast Autocomplete)",
"provider": "ollama",
"model": "qwen3:7b",
"apiBase": "http://localhost:11434"
},
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text",
"apiBase": "http://localhost:11434"
},
"contextProviders": [
{"name": "code"},
{"name": "docs"},
{"name": "diff"},
{"name": "terminal"},
{"name": "problems"},
{"name": "folder"},
{"name": "codebase"}
],
"slashCommands": [
{
"name": "edit",
"description": "Edit highlighted code"
},
{
"name": "comment",
"description": "Add comments to code"
},
{
"name": "share",
"description": "Export conversation"
},
{
"name": "cmd",
"description": "Generate terminal command"
}
]
}
Key Continue.dev Workflows
Inline code editing (Ctrl+I / Cmd+I):
Select code → press Ctrl+I → type instruction → Continue rewrites the selection
# Examples of inline edit instructions:
"Add error handling for network timeouts"
"Refactor to use async/await instead of callbacks"
"Add type annotations"
"Optimize this for performance"
Chat with code context (Ctrl+L / Cmd+L):
Opens the chat panel. Use @ to reference files, symbols, or the codebase:
@auth.py What security issues do you see in this file?
@codebase How does our authentication flow work end-to-end?
@terminal The last command failed. What happened?
/edit make this function handle the case where the database is unavailable
Autocomplete: Tab autocomplete activates automatically as you type. The 7B model generates suggestions fast enough to feel responsive while the 27B model provides deeper chat assistance.
Codebase indexing: Continue indexes your codebase using nomic-embed-text embeddings for semantic search:
- Automatically updates as you edit files
- Enables
@codebasequeries across the entire project - All indexing stays local in
~/.continue/index
Aider: Autonomous Multi-File Coding
Aider is a terminal-based AI coding assistant that understands your git repository structure, makes coordinated changes across multiple files, and commits with appropriate messages.
Installation
pip install aider-chat
Connecting to Ollama
# Use OpenAI-compatible endpoint
aider \
--model ollama/qwen3.6:27b \
--openai-api-base http://localhost:11434/v1 \
--openai-api-key ollama
# Or set as environment variables
export OPENAI_API_BASE=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
aider --model ollama/qwen3.6:27b
Create an Aider config file for your project
# .aider.conf.yml (in your project root)
model: ollama/qwen3.6:27b
openai-api-base: http://localhost:11434/v1
openai-api-key: ollama
auto-commits: true
dirty-commits: true
show-diffs: true
Aider Workflow Examples
# Start Aider in your project directory
cd my-project
aider
# Aider opens with the file tree context
# Type natural language instructions:
> Add a rate limiting decorator to all endpoints in api/routes.py
# Aider reads the file, plans changes, implements across files, commits
> The tests in tests/test_auth.py are failing because of the new JWT format.
> Fix the auth module to handle both old and new token formats.
# Aider reads both files, understands the relationship, fixes the issue
> Add comprehensive docstrings to all public methods in src/
# Aider iterates through files systematically
# Add specific files to Aider's context
/add src/database.py src/models.py
# Show which files Aider is currently tracking
/ls
# Run tests and let Aider fix failures
/test pytest tests/
# View the diff before committing
/diff
Shell AI: Terminal Intelligence
Shell-GPT With Ollama
pip install shell-gpt
# Configure to use Ollama
export OPENAI_API_HOST=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
export DEFAULT_MODEL=llama4:scout
# Use it
sgpt "find all Python files modified in the last 7 days"
# Output: find . -name "*.py" -mtime -7
sgpt --shell "compress all log files older than 30 days"
# Generates and optionally runs: find /var/log -name "*.log" -mtime +30 -exec gzip {} \;
# Explain a complex command
sgpt --explain "awk '{sum += $1} END {print sum}' data.txt"
# Explains what the awk command does
Simple Shell AI Function (No Extra Packages)
Add to your .bashrc or .zshrc:
# AI helper function using Ollama directly
ai() {
local prompt="$*"
curl -s http://localhost:11434/api/generate \
-d "{\"model\":\"llama4:scout\",\"prompt\":\"$prompt\",\"stream\":false}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Bash command suggester
suggest() {
local task="$*"
curl -s http://localhost:11434/api/generate \
-d "{
\"model\":\"llama4:scout\",
\"prompt\":\"Give me a bash command to: $task. Return only the command, no explanation.\",
\"stream\":false
}" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Usage:
# ai "What does the -r flag do in rm?"
# suggest "find all files larger than 100MB"
Custom Modelfiles for Your Stack
Generic coding models give generic answers. A Modelfile configured for your specific stack gives much better results.
Python / Django Developer
cat > PythonDev.Modelfile << 'EOF'
FROM qwen3.6:27b
SYSTEM """You are an expert Python developer specializing in Django and FastAPI.
CODING STANDARDS:
- Python 3.11+ with full type hints
- Pydantic v2 for data validation
- Async/await for I/O-bound operations
- Follow PEP 8 strictly
- Prefer composition over inheritance
DJANGO PATTERNS:
- Use class-based views for complex logic, function-based for simple views
- QuerySet optimization: select_related, prefetch_related, only/defer
- Always use Django's ORM — never raw SQL unless performance-critical
- Signals only for truly decoupled concerns
FASTAPI PATTERNS:
- Dependency injection for shared resources
- Pydantic models for all request/response schemas
- Background tasks via BackgroundTasks or Celery for long operations
CODE REVIEW STANDARDS:
Grade issues: [Critical] [High] [Medium] [Low]
For Critical/High: always provide the fixed code
Mention Django/FastAPI specific anti-patterns explicitly"""
PARAMETER temperature 0.1
PARAMETER num_ctx 32768
EOF
ollama create PythonDev -f PythonDev.Modelfile
TypeScript / React Developer
cat > TypeScriptDev.Modelfile << 'EOF'
FROM qwen3.6:27b
SYSTEM """You are an expert TypeScript and React developer.
CODING STANDARDS:
- TypeScript strict mode — no implicit any
- Functional components with hooks only
- Zod for runtime type validation
- React Query for server state, Zustand for client state
- Tailwind CSS for styling
PATTERNS TO ENFORCE:
- Custom hooks for reusable logic
- Proper error boundaries
- Memoization only when measured (useMemo, useCallback)
- Component composition over prop drilling
CODE STYLE:
- Named exports only (no default exports except pages)
- Descriptive variable names (no abbreviations)
- Co-locate types with their usage
When reviewing: always check for accessibility (a11y) issues.
Flag: missing aria labels, poor color contrast, keyboard navigation gaps."""
PARAMETER temperature 0.1
PARAMETER num_ctx 32768
EOF
ollama create TypeScriptDev -f TypeScriptDev.Modelfile
Infrastructure / DevOps
cat > DevOpsDev.Modelfile << 'EOF'
FROM llama4:scout
SYSTEM """You are an expert DevOps and infrastructure engineer.
SPECIALIZATIONS:
- Kubernetes (Helm, operators, RBAC)
- Terraform and infrastructure as code
- Docker multi-stage builds
- CI/CD (GitHub Actions, GitLab CI)
- Observability (Prometheus, Grafana, OpenTelemetry)
SECURITY FIRST:
- Always flag security implications
- Principle of least privilege in all IAM/RBAC configs
- Never expose secrets in manifests — use Secrets management
- Container security: non-root users, read-only filesystems
WHEN WRITING CONFIGS:
- Add comments for non-obvious decisions
- Include health checks in all deployments
- Resource limits are always required, never optional
- Label everything with standard labels (app, version, component)"""
PARAMETER temperature 0.05
PARAMETER num_ctx 16384
EOF
ollama create DevOpsDev -f DevOpsDev.Modelfile
Use your specialized models:
ollama run PythonDev
ollama run TypeScriptDev
aider --model ollama/PythonDev ...
Git Integration Workflows
AI-Generated Commit Messages
# .git/hooks/prepare-commit-msg
#!/bin/bash
# Only for non-merge, non-squash commits with empty message
if [ -n "$2" ]; then exit 0; fi
if [ -s "$1" ]; then exit 0; fi
DIFF=$(git diff --cached --stat 2>/dev/null)
if [ -z "$DIFF" ]; then exit 0; fi
COMMIT_MSG=$(curl -s http://localhost:11434/api/generate \
-d "{
\"model\": \"qwen3:7b\",
\"prompt\": \"Write a concise git commit message (under 72 chars, imperative mood) for these changes:\\n$DIFF\\n\\nReturn only the commit message, no explanation.\",
\"stream\": false
}" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'].strip())")
echo "$COMMIT_MSG" > "$1"
chmod +x .git/hooks/prepare-commit-msg
# Now git commit opens with an AI-generated message you can edit
PR Description Generator
#!/bin/bash
# generate-pr-description.sh
BASE_BRANCH=${1:-main}
DIFF=$(git diff "$BASE_BRANCH"...HEAD --stat)
COMMITS=$(git log "$BASE_BRANCH"...HEAD --oneline)
curl -s http://localhost:11434/api/generate \
-d "{
\"model\": \"llama4:scout\",
\"prompt\": \"Generate a GitHub pull request description for these changes.\\n\\nCommits:\\n$COMMITS\\n\\nFiles changed:\\n$DIFF\\n\\nInclude: Summary, Changes, Testing approach. Use markdown.\",
\"stream\": false
}" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
Performance Optimization for Developer Workflows
Model Routing Strategy
Use faster, smaller models for instant tasks; slower, larger models for complex ones:
# Fast model (7B) — autocomplete, quick questions, command generation
# Target: <500ms response for autocomplete
# Medium model (Scout/14B) — code review, explanation, documentation
# Target: <5s for typical functions
# Large model (27B+) — complex debugging, architecture, security review
# Target: <30s acceptable for deep analysis
Context Length Optimization
# For autocomplete (fast, small context)
PARAMETER num_ctx 4096
# For file-level coding (standard context)
PARAMETER num_ctx 16384
# For multi-file or long document tasks
PARAMETER num_ctx 32768
Preloading Your Primary Model
Add to your shell profile to warm the model on terminal startup:
# ~/.zshrc or ~/.bashrc
# Silently warm the coding model in the background
(sleep 2 && curl -s http://localhost:11434/api/generate \
-d '{"model":"qwen3.6:27b","prompt":"ready","stream":false,"keep_alive":-1}' > /dev/null &)
Security Considerations for Local Coding AI
Running AI locally is inherently more private than cloud tools, but some practices still matter:
What local AI DOES protect:
- Your code never leaves your machine during inference
- No training on your proprietary code
- No API key exposure risk
- Works fully offline
What to still be careful about:
- Model weights are downloaded from ollama.com (pull only over trusted networks)
- Open WebUI and other interfaces store conversation history locally
- If you expose Ollama on your network, protect it with authentication
- Aider writes to your git history — review auto-commits before pushing
Conclusion
A local AI development environment in 2026 is not a compromise from cloud tools — it is a genuine alternative for most professional coding tasks. Continue.dev with Qwen 3.6 27B gives you in-editor AI assistance without any code leaving your machine. Aider with Kimi K2.6 handles autonomous multi-file tasks. Custom Modelfiles optimize the experience for your specific stack.
The productivity gain is real, the privacy benefit is complete, and the cost is zero beyond the hardware you already own.
Your next step: Install Continue.dev in VS Code. Copy the config file from this guide. Select qwen3.6:27b as the chat model and qwen3:7b as the autocomplete model. Use it for your next coding task. The experience will tell you within 30 minutes whether local coding AI has reached your threshold.
📚 Continue the Series:
- ← Previous Ollama on Docker and Production Deployment
- Next → Ollama for Business: Private AI for Teams
- For agentic coding AI Agents With Ollama: n8n, LangChain
- For coding models Qwen3 and Kimi K2.6: Best Local Coding Models
Last updated: May 2026. Continue.dev, Aider, and related tools update frequently. Verify current configuration formats at docs.continue.dev and aider.chat.