Most guides to local AI coding stop at “install Ollama, install Continue.dev, done.” That setup gives you a local autocomplete. It does not give you a professional-grade AI development environment.
A professional local AI coding environment is different in degree and in kind. It does not just suggest the next line — it reviews the diff before every commit, generates test suites from your specs, debugs with a reasoning model that shows its work, maintains a semantic index of your entire codebase so every AI interaction is codebase-aware, runs autonomous multi-file refactoring sessions overnight, and adapts to your specific language stack through purpose-built Modelfiles.
This guide builds that environment step by step. It assumes you are already a professional developer — comfortable with the terminal, familiar with Python and shell scripting, and done with toy examples. Every step is production-oriented, everything stays on your hardware, and by the end you will have an AI-augmented development workflow that replaces the most repetitive parts of your professional work without sending a single line of code to an external server.
🔗 Between the ChatGPT Unlocked series and the Ollama Unlocked series. For Ollama installation, see Ollama Masterclass 2026. For the Ollama API reference, see The Ollama API.
Prerequisites and Model Selection
Models to Pull Before Starting
This guide uses four models with distinct roles. Pull all of them:
# Primary coding model — review, generation, architecture
ollama pull qwen3.6:27b
# Fast autocomplete — sub-200ms suggestions in your editor
ollama pull qwen3:7b
# Reasoning and debugging — shows thinking trace for hard problems
ollama pull deepseek-r1:14b
# General assistant — documentation, explanation, planning
ollama pull llama4:scout
# Embeddings — codebase semantic search
ollama pull nomic-embed-text
Hardware minimum for this guide: 16GB VRAM (RTX 3090, RTX 4090, or M3 Pro 36GB). The guide works at 12GB VRAM by substituting qwen3:14b for qwen3.6:27b.
Verify Everything Is Running
# Ollama API health check
curl -s http://localhost:11434/api/tags | python3 -c \
"import json,sys; models=[m['name'] for m in json.load(sys.stdin)['models']]; \
[print(f' ✓ {m}') for m in models]"
# Expected output:
# ✓ qwen3.6:27b
# ✓ qwen3:7b
# ✓ deepseek-r1:14b
# ✓ llama4:scout
# ✓ nomic-embed-text
Part 1: Continue.dev — Advanced Configuration
Installation
VS Code:
Extensions (Ctrl+Shift+X) → search "Continue" → Install
JetBrains (IntelliJ, PyCharm, GoLand, etc.):
Settings → Plugins → Marketplace → "Continue" → Install → Restart IDE
Advanced Config: ~/.continue/config.json
This configuration goes well beyond the defaults — it sets up per-task model routing, codebase indexing, custom slash commands, and tight context controls:
{
"models": [
{
"title": "Qwen Coder 27B",
"provider": "ollama",
"model": "qwen3.6:27b",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": {
"temperature": 0.05,
"maxTokens": 8192,
"top_p": 0.9
},
"systemMessage": "You are an expert software engineer. Write production-ready code with error handling, type annotations, and concise comments on non-obvious logic only. Never write placeholder or TODO code unless explicitly asked."
},
{
"title": "DeepSeek R1 Debugger",
"provider": "ollama",
"model": "deepseek-r1:14b",
"apiBase": "http://localhost:11434",
"contextLength": 16384,
"completionOptions": {
"temperature": 0.0,
"maxTokens": 4096
},
"systemMessage": "You are a systematic debugger. Think through problems step by step before proposing a fix. Identify all plausible root causes before recommending one. Show your reasoning."
},
{
"title": "Llama 4 Scout (Docs)",
"provider": "ollama",
"model": "llama4:scout",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": {
"temperature": 0.3,
"maxTokens": 4096
}
}
],
"tabAutocompleteModel": {
"title": "Qwen 3 7B (Autocomplete)",
"provider": "ollama",
"model": "qwen3:7b",
"apiBase": "http://localhost:11434",
"completionOptions": {
"temperature": 0.0,
"maxTokens": 256,
"stop": ["\n\n", "def ", "class ", "# ---"]
}
},
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text",
"apiBase": "http://localhost:11434"
},
"reranker": {
"name": "llm",
"params": {
"modelTitle": "Qwen Coder 27B"
}
},
"contextProviders": [
{ "name": "code" },
{ "name": "docs" },
{ "name": "diff" },
{ "name": "terminal" },
{ "name": "problems" },
{ "name": "folder" },
{ "name": "codebase" },
{ "name": "open" },
{ "name": "os" }
],
"slashCommands": [
{
"name": "review",
"description": "Review selected code for issues",
"prompt": "Review this code. Grade issues as [Critical/High/Medium/Low]. For each issue: location, problem, why it matters, specific fix. Also list 2-3 strengths. Be direct."
},
{
"name": "test",
"description": "Generate comprehensive tests",
"prompt": "Write a comprehensive test suite for this code. Cover: happy path, edge cases (empty, null, boundary values), error cases, any concurrency issues. Use the project's existing test framework. Tests must be independent — no shared mutable state."
},
{
"name": "doc",
"description": "Write documentation",
"prompt": "Write complete documentation for this code. Include: what it does (not what it is), parameters with types and constraints, return values, exceptions raised, one complete usage example. Active voice, present tense."
},
{
"name": "perf",
"description": "Analyze performance",
"prompt": "Analyze this code for performance. Identify: algorithmic complexity (Big O), memory allocation patterns, I/O bottlenecks, and any N+1 query problems. Prioritize issues by impact. Suggest specific optimizations with expected improvements."
},
{
"name": "secure",
"description": "Security review",
"prompt": "Review this code for security vulnerabilities. Check for: injection vulnerabilities (SQL, command, LDAP), authentication/authorization flaws, sensitive data exposure, insecure deserialization, SSRF, path traversal, cryptographic issues. Rate each finding: Critical/High/Medium/Low."
},
{
"name": "refactor",
"description": "Suggest refactoring",
"prompt": "Identify refactoring opportunities in this code. Focus on: extracting reusable functions, eliminating code duplication, improving naming clarity, reducing function complexity. For each suggestion, show the before and after."
},
{
"name": "explain",
"description": "Explain code in depth",
"prompt": "Explain this code in depth. Cover: what problem it solves, the algorithm/approach used, non-obvious implementation decisions, potential failure modes, and what I should know to modify it safely."
},
{
"name": "commit",
"description": "Generate commit message",
"prompt": "Write a git commit message for this diff. Format: imperative mood, under 72 characters, present tense. Add a blank line then 2-3 sentences explaining WHY these changes were made (not just what changed). Follow Conventional Commits format if the project uses it."
}
],
"tabAutocompleteOptions": {
"disable": false,
"useCopyBuffer": true,
"useFileSuffix": true,
"maxPromptTokens": 2048,
"debounceDelay": 250,
"maxSuffixPercentage": 0.25,
"prefixPercentage": 0.85,
"multilineCompletions": "auto"
},
"ui": {
"codeBlockToolbarPosition": "top",
"fontSize": 14,
"displayRawMarkdown": false
}
}
Codebase Indexing: Making Every Query Context-Aware
Continue.dev indexes your entire codebase using the nomic-embed-text embeddings model. This happens automatically when you open a folder. To trigger or refresh manually:
Ctrl+Shift+P → "Continue: Index codebase"
After indexing, @codebase queries search across all your code semantically:
@codebase Where is user authentication handled?
@codebase Which functions write to the database?
@codebase Show me how we handle rate limiting across the API
@codebase What tests exist for the payment module?
The index is stored in ~/.continue/index — it persists between sessions and updates incrementally as you edit files.
Part 2: Per-Stack Modelfiles
Generic models give generic results. These Modelfiles encode the specific conventions, patterns, and review criteria for each major stack.
Python / FastAPI / SQLAlchemy Stack
cat > ~/.config/ollama/modelfiles/PythonPro.Modelfile << 'EOF'
FROM qwen3.6:27b
SYSTEM """You are a senior Python engineer specializing in FastAPI, SQLAlchemy 2.0, and Pydantic v2.
CODING STANDARDS:
- Python 3.11+ with strict type hints everywhere
- Async/await for all I/O-bound operations — never sync in async context
- Pydantic v2 models for ALL request/response schemas, no raw dicts
- SQLAlchemy 2.0 style: select() statements, not Query API
- Repository pattern for database access — no raw SQL in route handlers
- Dependency injection for all shared resources (DB sessions, services)
SQLALCHEMY PATTERNS:
- Always use select() with explicit columns when performance matters
- eager load with selectinload/joinedload — never lazy load in async context
- Use db.scalar() for single values, db.scalars() for lists
- Explicit transactions: async with db.begin(): for write operations
FASTAPI PATTERNS:
- HTTPException with specific status codes, never generic 500
- Response models on all endpoints — no Any return types
- Background tasks for operations not needed in response
- Lifespan context manager for startup/shutdown, not @app.on_event
ERROR HANDLING:
- Custom exception classes that map to HTTP responses
- Log errors with structlog, include request_id in every log
- Never expose internal errors to clients
REVIEW CHECKLIST (always apply):
□ N+1 queries (missing eager loading)
□ Missing input validation (beyond Pydantic)
□ SQL injection risk (raw queries with user input)
□ Missing authentication/authorization checks
□ Unhandled edge cases (empty results, None values)
□ Missing index on filtered/joined columns"""
PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF
ollama create PythonPro -f ~/.config/ollama/modelfiles/PythonPro.Modelfile
TypeScript / Node / React Stack
cat > ~/.config/ollama/modelfiles/TypeScriptPro.Modelfile << 'EOF'
FROM qwen3.6:27b
SYSTEM """You are a senior TypeScript engineer working across Node.js backend and React frontend.
TYPESCRIPT STANDARDS:
- Strict mode always: "strict": true in tsconfig
- No implicit any — every variable has an explicit or inferred type
- Discriminated unions over optional chains for state modeling
- Zod for runtime validation of all external data (API responses, env vars, user input)
- Exhaustiveness checking with never type for switch statements
REACT STANDARDS:
- Functional components only, no class components
- Custom hooks for ALL reusable stateful logic
- React Query for server state (never useState + useEffect for data fetching)
- Zustand for client state that crosses component boundaries
- useMemo/useCallback only when profiler shows a problem — not preemptively
- Error boundaries at route and major component boundaries
- Accessibility: aria-label on interactive elements, keyboard navigation
NODE.JS BACKEND:
- Express or Fastify with typed request/response
- Middleware for cross-cutting concerns (auth, logging, rate limiting)
- Environment variables validated with Zod at startup — fail fast
- Structured logging with pino, never console.log in production
- Graceful shutdown handling for SIGTERM/SIGINT
NAMING CONVENTIONS:
- PascalCase: Components, Types, Interfaces, Enums
- camelCase: variables, functions, methods
- SCREAMING_SNAKE: constants
- Named exports everywhere (no default exports except Next.js pages)
- Boolean variables: is/has/can/should prefix
REVIEW PRIORITIES:
□ Type safety gaps (any, unknown without narrowing)
□ Missing error boundaries
□ React hook dependency array issues
□ N+1 API calls (missing parallel fetching with Promise.all)
□ XSS risks (dangerouslySetInnerHTML)
□ Missing loading/error states"""
PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF
ollama create TypeScriptPro -f ~/.config/ollama/modelfiles/TypeScriptPro.Modelfile
Go Stack
cat > ~/.config/ollama/modelfiles/GoPro.Modelfile << 'EOF'
FROM qwen3.6:27b
SYSTEM """You are a senior Go engineer.
GO STANDARDS:
- Follow effective Go and the Google Go style guide
- Errors are values — handle every error, never ignore with _
- Error wrapping: fmt.Errorf("context: %w", err) for error chains
- Context propagation: context.Context is always the first parameter
- Interfaces defined at the point of use (consumer side), not implementation side
- Goroutine leaks: always cancel context or close channel to stop goroutines
- sync.WaitGroup or errgroup for goroutine coordination
CONCURRENCY PATTERNS:
- Channels for communication, mutexes for shared state — not both
- Select with default for non-blocking channel operations
- Always defer cancel() immediately after context.WithCancel/Timeout
- Use sync.Once for expensive one-time initialization
- atomic package for simple counters, not mutex
PERFORMANCE:
- Avoid allocations in hot paths: reuse buffers with sync.Pool
- strings.Builder over += concatenation in loops
- Benchmark before optimizing: go test -bench=.
- Profile before claiming something is slow
TESTING:
- Table-driven tests for all non-trivial functions
- testify/assert for assertions
- httptest.NewRecorder for HTTP handler tests
- Interfaces for external dependencies — mock with testify/mock
- go test -race on every CI run
ERROR HANDLING (Go-specific):
- Sentinel errors: var ErrNotFound = errors.New("not found")
- Error types for errors with data: type ValidationError struct{}
- errors.Is() and errors.As() for checking wrapped errors"""
PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF
ollama create GoPro -f ~/.config/ollama/modelfiles/GoPro.Modelfile
Add Stack Models to Continue.dev
// Add to models array in ~/.continue/config.json
{
"title": "Python Pro",
"provider": "ollama",
"model": "PythonPro",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
},
{
"title": "TypeScript Pro",
"provider": "ollama",
"model": "TypeScriptPro",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
},
{
"title": "Go Pro",
"provider": "ollama",
"model": "GoPro",
"apiBase": "http://localhost:11434",
"contextLength": 32768,
"completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
}
Part 3: Aider — Autonomous Multi-File Coding
Aider is a terminal coding assistant that reads your git history, plans coordinated changes across multiple files, and commits with appropriate messages. It is the closest local equivalent to cloud-based autonomous coding agents.
Installation and Project Configuration
pip install aider-chat
# Per-project config (commit this to your repo)
cat > .aider.conf.yml << 'EOF'
model: ollama/qwen3.6:27b
openai-api-base: http://localhost:11434/v1
openai-api-key: ollama
# Aider behavior
auto-commits: true
dirty-commits: true
show-diffs: true
stream: true
pretty: true
# Editor (opens diffs)
editor: code --wait
# Files Aider reads for context (always in memory)
read:
- README.md
- ARCHITECTURE.md
- docs/conventions.md
EOF
The Aider Workflow for Professional Tasks
# Start Aider — it reads your git history and file tree
cd my-project
aider
# --- TASK 1: Targeted refactoring ---
> The connection pool in src/database/pool.py has a memory leak
> when connections fail to close on exception. Fix it.
# Aider: reads pool.py, identifies the issue, implements the fix,
# runs your tests, commits with a descriptive message
# --- TASK 2: Feature implementation ---
> Add pagination to the GET /users endpoint in api/routes/users.py
> Use cursor-based pagination matching the pattern in api/routes/products.py
# Aider: reads both files, implements consistent pagination, updates tests
# --- TASK 3: Multi-file refactoring ---
> Rename UserService to AccountService throughout the codebase
> Update all imports, tests, and documentation references
# Aider: finds all references, renames systematically, commits
# --- TASK 4: Test generation ---
> Generate comprehensive tests for src/services/payment_processor.py
> Focus on: all payment provider error states, concurrent payment requests,
> idempotency key handling, and partial failure scenarios
# Aider: reads the service, writes tests, verifies they pass
# --- Useful Aider commands ---
/add src/models/user.py src/services/auth.py # Add files to context
/drop src/old_module.py # Remove from context
/ls # Show files in context
/diff # Show current diff
/undo # Undo last commit
/run pytest tests/unit/ # Run tests from within Aider
/ask How does the authentication flow work? # Ask without making changes
Using DeepSeek-R1 for Hard Debugging With Aider
Swap to the reasoning model for bugs that require systematic analysis:
# Start Aider with the reasoning model
aider \
--model ollama/deepseek-r1:14b \
--openai-api-base http://localhost:11434/v1 \
--openai-api-key ollama
# DeepSeek-R1 shows its thinking trace before making changes
> The race condition in src/cache/distributed_lock.py appears
> intermittently under high concurrent load. Analyze systematically
> and fix the root cause, not the symptom.
# R1 will show <think>...</think> reasoning through the concurrency
# model before proposing and implementing the fix
Part 4: Git Hooks for AI-Powered Quality Gates
These git hooks run automatically — turning AI review into a mandatory step in your commit workflow.
Pre-Commit AI Review Hook
cat > .git/hooks/pre-commit << 'HOOK'
#!/bin/bash
# AI-powered pre-commit review
# Fails the commit if Critical issues are found
set -e
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACMR | \
grep -E '\.(py|js|ts|go|rs|java|rb|cpp|c)$' || true)
if [ -z "$STAGED_FILES" ]; then
exit 0
fi
echo "🤖 Running AI code review..."
DIFF=$(git diff --cached -- $STAGED_FILES)
if [ ${#DIFF} -gt 15000 ]; then
echo " ℹ️ Diff too large for pre-commit review — skipping"
exit 0
fi
REVIEW=$(curl -s http://localhost:11434/api/generate \
-d "$(python3 -c "
import json, sys
diff = open('/dev/stdin').read() if False else '''$DIFF'''
payload = {
'model': 'qwen3.6:27b',
'prompt': '''Review this git diff for critical issues only.
REPORT ONLY:
[Critical] - Security vulnerabilities, data corruption risk, auth bypass
[High] - Crashes under load, missing error handling in critical paths
DO NOT REPORT: style issues, minor improvements, medium/low priority items.
If no Critical or High issues found, respond ONLY with: APPROVED
Diff:
''' + diff,
'stream': False,
'options': {'temperature': 0, 'num_predict': 800}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])")
echo "$REVIEW"
if echo "$REVIEW" | grep -q "\[Critical\]"; then
echo ""
echo "❌ Commit blocked: Critical issues found."
echo " Fix the issues above or use 'git commit --no-verify' to bypass."
exit 1
fi
if echo "$REVIEW" | grep -q "APPROVED"; then
echo " ✅ AI review passed"
else
echo " ⚠️ High priority issues found — review before proceeding"
echo " Use 'git commit --no-verify' to bypass if issues are acceptable"
read -p " Continue with commit? [y/N] " yn
case $yn in
[Yy]*) exit 0;;
*) exit 1;;
esac
fi
HOOK
chmod +x .git/hooks/pre-commit
AI Commit Message Generator
cat > .git/hooks/prepare-commit-msg << 'HOOK'
#!/bin/bash
# Auto-generate commit message using local AI
# Only activates for blank commit messages (not amends, merges, etc.)
COMMIT_MSG_FILE="$1"
COMMIT_SOURCE="$2"
# Only auto-generate for fresh commits with empty message
if [ -n "$COMMIT_SOURCE" ]; then
exit 0
fi
if [ -s "$COMMIT_MSG_FILE" ]; then
CONTENT=$(cat "$COMMIT_MSG_FILE" | grep -v '^#' | tr -d '[:space:]')
if [ -n "$CONTENT" ]; then
exit 0
fi
fi
DIFF=$(git diff --cached --stat 2>/dev/null)
FULL_DIFF=$(git diff --cached 2>/dev/null | head -c 8000)
if [ -z "$DIFF" ]; then
exit 0
fi
echo "🤖 Generating commit message..."
MSG=$(curl -s http://localhost:11434/api/generate \
-d "$(python3 -c "
import json
payload = {
'model': 'llama4:scout',
'prompt': '''Generate a git commit message for these changes.
FORMAT:
Line 1: <type>(<scope>): <imperative verb> <what changed> (max 72 chars)
Line 2: blank
Lines 3+: WHY these changes were made (not what — the diff shows that)
Types: feat, fix, refactor, test, docs, chore, perf, style, ci
Changed files:
$DIFF
Full diff (excerpt):
$FULL_DIFF
Return ONLY the commit message. No explanation, no quotes.''',
'stream': False,
'options': {'temperature': 0.3, 'num_predict': 200}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'].strip())")
echo "$MSG" > "$COMMIT_MSG_FILE"
echo "" >> "$COMMIT_MSG_FILE"
echo "# AI-generated message — edit as needed" >> "$COMMIT_MSG_FILE"
echo "# Delete this line and above to use original empty message" >> "$COMMIT_MSG_FILE"
HOOK
chmod +x .git/hooks/prepare-commit-msg
Post-Merge Architecture Check
cat > .git/hooks/post-merge << 'HOOK'
#!/bin/bash
# Analyze what changed in a merge and flag architecture concerns
MERGE_DIFF=$(git diff HEAD~1 HEAD --stat 2>/dev/null | head -30)
if [ -z "$MERGE_DIFF" ]; then
exit 0
fi
echo "🏗️ Checking merge for architecture concerns..."
ANALYSIS=$(curl -s http://localhost:11434/api/generate \
-d "$(python3 -c "
import json
payload = {
'model': 'llama4:scout',
'prompt': '''Briefly analyze this merge. Flag ONLY if you see:
- Circular dependency risks
- Large files that should be split (>500 lines added to one file)
- Migration files without corresponding model changes
- Test coverage gaps (significant new code with no test files changed)
If nothing concerning: respond only with OK
Changed files:
$MERGE_DIFF''',
'stream': False,
'options': {'temperature': 0, 'num_predict': 300}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'].strip())")
if [ "$ANALYSIS" != "OK" ]; then
echo "📋 Post-merge notes:"
echo "$ANALYSIS"
fi
HOOK
chmod +x .git/hooks/post-merge
Part 5: Shell Integration
Full Shell AI Toolkit
Add to ~/.zshrc or ~/.bashrc:
# ─── LOCAL AI SHELL TOOLKIT ───────────────────────────────────────
OLLAMA_API="http://localhost:11434/api/generate"
# Generic AI query
ai() {
local prompt="$*"
curl -s "$OLLAMA_API" \
-d "{\"model\":\"llama4:scout\",\"prompt\":$(echo "$prompt" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0.3,\"num_predict\":800}}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Shell command suggester
cmd() {
local task="$*"
curl -s "$OLLAMA_API" \
-d "{\"model\":\"qwen3:7b\",\"prompt\":$(echo "Give me a bash/zsh command to: $task\nReturn ONLY the command, no explanation. OS: $(uname -s)" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":150}}" \
| python3 -c "import json,sys; r=json.load(sys.stdin)['response'].strip(); print(r)"
}
# Explain a command
explain() {
ai "Explain this command in detail, including each flag: $*"
}
# Explain last error
oops() {
local last_cmd=$(fc -ln -1 2>/dev/null || history 1 | awk '{$1=""; print $0}')
ai "This command failed: $last_cmd\nExplain what went wrong and how to fix it."
}
# Review a file
review_file() {
local file="$1"
if [ ! -f "$file" ]; then
echo "File not found: $file"
return 1
fi
local content=$(cat "$file")
curl -s "$OLLAMA_API" \
-d "{\"model\":\"qwen3.6:27b\",\"prompt\":$(echo "Code review this file. Grade: [Critical/High/Medium/Low]. Be specific.\n\n$(cat $file)" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":1500}}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Summarize current git diff
diff_summary() {
local diff=$(git diff 2>/dev/null)
if [ -z "$diff" ]; then
diff=$(git diff --cached 2>/dev/null)
fi
if [ -z "$diff" ]; then
echo "No diff found"
return 0
fi
echo "$diff" | head -c 8000 | \
(echo "Summarize what changed in this diff in plain English. List the main changes as bullet points:"; cat) | \
ai
}
# Generate test for a function (paste function, get tests)
gen_tests() {
echo "Paste your function, then press Ctrl+D:"
local code=$(cat)
curl -s "$OLLAMA_API" \
-d "{\"model\":\"qwen3.6:27b\",\"prompt\":$(echo "Write comprehensive tests for this function. Cover: happy path, edge cases, error cases. Use the appropriate test framework for the language.\n\n$code" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0.1,\"num_predict\":3000}}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Debug with reasoning model
debug() {
echo "Describe the bug (Ctrl+D when done):"
local description=$(cat)
curl -s "$OLLAMA_API" \
-d "{\"model\":\"deepseek-r1:14b\",\"prompt\":$(echo "Debug this problem systematically. Show your reasoning.\n\n$description" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":3000}}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}
# Aliases
alias aii='ai'
alias ?='cmd'
Part 6: Codebase-Aware RAG for Large Projects
For codebases too large to fit in any context window, build a semantic search index:
#!/usr/bin/env python3
# codebase_rag.py — Semantic codebase search using local embeddings
import ollama
import chromadb
from pathlib import Path
import hashlib
import json
SUPPORTED_EXTENSIONS = {
'.py', '.js', '.ts', '.tsx', '.jsx', '.go', '.rs',
'.java', '.rb', '.cpp', '.c', '.h', '.swift', '.kt',
'.md', '.yml', '.yaml', '.toml', '.sql'
}
def chunk_code_file(content: str, filepath: str,
chunk_size: int = 60) -> list[dict]:
"""Split code into overlapping line-based chunks."""
lines = content.splitlines()
chunks = []
for i in range(0, len(lines), chunk_size - 10):
chunk_lines = lines[i:i + chunk_size]
chunk_text = '\n'.join(chunk_lines)
if chunk_text.strip():
chunks.append({
'text': chunk_text,
'filepath': filepath,
'start_line': i + 1,
'end_line': min(i + chunk_size, len(lines))
})
return chunks
def index_codebase(project_root: str,
collection_name: str = "codebase"):
"""Index an entire codebase for semantic search."""
client = chromadb.PersistentClient(path=f"{project_root}/.codebase_index")
collection = client.get_or_create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"}
)
root = Path(project_root)
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv',
'venv', 'dist', 'build', '.next', 'target'}
files_indexed = 0
chunks_indexed = 0
for filepath in root.rglob('*'):
# Skip ignored directories
if any(part in ignore_dirs for part in filepath.parts):
continue
if filepath.suffix not in SUPPORTED_EXTENSIONS:
continue
if not filepath.is_file():
continue
try:
content = filepath.read_text(encoding='utf-8', errors='ignore')
except Exception:
continue
if len(content) < 50: # Skip very small files
continue
relative_path = str(filepath.relative_to(root))
chunks = chunk_code_file(content, relative_path)
for chunk in chunks:
chunk_id = hashlib.md5(
f"{relative_path}:{chunk['start_line']}:{chunk['text'][:50]}".encode()
).hexdigest()
try:
embedding = ollama.embeddings(
model='nomic-embed-text',
prompt=chunk['text']
)['embedding']
collection.upsert(
ids=[chunk_id],
documents=[chunk['text']],
embeddings=[embedding],
metadatas=[{
'filepath': relative_path,
'start_line': chunk['start_line'],
'end_line': chunk['end_line']
}]
)
chunks_indexed += 1
except Exception as e:
print(f" Error embedding {relative_path}: {e}")
files_indexed += 1
if files_indexed % 20 == 0:
print(f" Indexed {files_indexed} files, {chunks_indexed} chunks...")
print(f"\n✓ Indexed {files_indexed} files, {chunks_indexed} chunks")
return collection
def search_codebase(query: str, project_root: str,
top_k: int = 7) -> list[dict]:
"""Search the codebase index semantically."""
client = chromadb.PersistentClient(path=f"{project_root}/.codebase_index")
collection = client.get_collection("codebase")
embedding = ollama.embeddings(
model='nomic-embed-text',
prompt=query
)['embedding']
results = collection.query(
query_embeddings=[embedding],
n_results=top_k
)
return [
{
'code': doc,
'filepath': meta['filepath'],
'lines': f"{meta['start_line']}-{meta['end_line']}",
'relevance': 1 - dist
}
for doc, meta, dist in zip(
results['documents'][0],
results['metadatas'][0],
results['distances'][0]
)
]
def ask_codebase(question: str, project_root: str) -> str:
"""Answer a question about the codebase using RAG."""
results = search_codebase(question, project_root)
context_parts = []
for r in results:
context_parts.append(
f"// {r['filepath']} (lines {r['lines']})\n{r['code']}"
)
context = "\n\n---\n\n".join(context_parts)
response = ollama.generate(
model='llama4:scout',
prompt=f"""Answer this question about the codebase using the provided code excerpts.
Cite specific files and line numbers in your answer.
If the code excerpts don't contain enough information, say so.
Question: {question}
Relevant code:
{context}
Answer:""",
options={'temperature': 0.1, 'num_ctx': 32768, 'num_predict': 2000}
)
return response['response']
if __name__ == '__main__':
import sys
if len(sys.argv) < 2:
print("Usage:")
print(" python codebase_rag.py index [project_root]")
print(" python codebase_rag.py ask 'your question' [project_root]")
sys.exit(1)
command = sys.argv[1]
project_root = sys.argv[3] if len(sys.argv) > 3 else '.'
if command == 'index':
index_codebase(project_root)
elif command == 'ask':
question = sys.argv[2]
answer = ask_codebase(question, project_root)
print(answer)
# Index your codebase (run once, then update incrementally)
python codebase_rag.py index /path/to/project
# Ask questions about the codebase
python codebase_rag.py ask "How does the authentication middleware work?"
python codebase_rag.py ask "Where are database connections opened and closed?"
python codebase_rag.py ask "What happens when a payment fails?"
Part 7: Measuring the Impact
A professional setup deserves measurement. Track these to prove the value:
#!/usr/bin/env python3
# ai_metrics.py — Track AI coding tool effectiveness
import json
import time
from pathlib import Path
from datetime import datetime, timedelta
from collections import defaultdict
METRICS_FILE = Path.home() / '.ai_coding_metrics.json'
def log_ai_interaction(task_type: str, time_saved_minutes: int,
quality_improvement: str,
model_used: str):
"""Log an AI-assisted coding interaction."""
metrics = []
if METRICS_FILE.exists():
metrics = json.loads(METRICS_FILE.read_text())
metrics.append({
'timestamp': datetime.now().isoformat(),
'task_type': task_type,
'time_saved_minutes': time_saved_minutes,
'quality_improvement': quality_improvement,
'model_used': model_used
})
METRICS_FILE.write_text(json.dumps(metrics, indent=2))
print(f"✓ Logged: {task_type} — {time_saved_minutes} min saved")
def generate_weekly_report():
"""Generate a weekly productivity report."""
if not METRICS_FILE.exists():
print("No metrics recorded yet.")
return
metrics = json.loads(METRICS_FILE.read_text())
one_week_ago = datetime.now() - timedelta(days=7)
weekly = [m for m in metrics
if datetime.fromisoformat(m['timestamp']) > one_week_ago]
if not weekly:
print("No metrics in the last 7 days.")
return
total_time = sum(m['time_saved_minutes'] for m in weekly)
by_task = defaultdict(lambda: {'count': 0, 'time': 0})
by_model = defaultdict(int)
for m in weekly:
by_task[m['task_type']]['count'] += 1
by_task[m['task_type']]['time'] += m['time_saved_minutes']
by_model[m['model_used']] += 1
print("\n=== WEEKLY AI CODING METRICS ===")
print(f"Period: {one_week_ago.date()} to {datetime.now().date()}")
print(f"\nTotal time saved: {total_time} minutes ({total_time/60:.1f} hours)")
print(f"Total interactions: {len(weekly)}")
print(f"\nBy task type:")
for task, data in sorted(by_task.items(),
key=lambda x: x[1]['time'], reverse=True):
print(f" {task}: {data['count']} interactions, "
f"{data['time']} min saved")
print(f"\nBy model:")
for model, count in sorted(by_model.items(),
key=lambda x: x[1], reverse=True):
print(f" {model}: {count} interactions")
if __name__ == '__main__':
import sys
if sys.argv[1] == 'log':
log_ai_interaction(
task_type=sys.argv[2],
time_saved_minutes=int(sys.argv[3]),
quality_improvement=sys.argv[4] if len(sys.argv) > 4 else 'unknown',
model_used=sys.argv[5] if len(sys.argv) > 5 else 'unknown'
)
elif sys.argv[1] == 'report':
generate_weekly_report()
# Log interactions throughout your day
python ai_metrics.py log "code_review" 20 "caught 2 security issues" "qwen3.6:27b"
python ai_metrics.py log "test_generation" 30 "full coverage added" "qwen3.6:27b"
python ai_metrics.py log "debugging" 45 "root cause found faster" "deepseek-r1:14b"
python ai_metrics.py log "commit_message" 5 "conventional format" "llama4:scout"
# Weekly summary
python ai_metrics.py report
Quick Reference: Which Model for Which Task
| Task | Model | Command |
|---|---|---|
| Autocomplete | qwen3:7b |
Auto via Continue.dev |
| Code review | PythonPro / TypeScriptPro / GoPro |
/review in Continue |
| Debugging (hard) | deepseek-r1:14b |
debug shell function |
| Multi-file refactor | qwen3.6:27b via Aider |
aider in project root |
| Test generation | PythonPro or stack model |
/test in Continue |
| Documentation | llama4:scout |
/doc in Continue |
| Security review | qwen3.6:27b |
/secure in Continue |
| Shell commands | qwen3:7b |
cmd "task description" |
| Codebase Q&A | llama4:scout + RAG |
python codebase_rag.py ask |
| Commit message | llama4:scout |
Auto via git hook |
Conclusion
This setup takes about two hours to fully configure. What you get in return:
- Every code change reviewed by a model that knows your stack’s specific standards
- Autocomplete that understands your codebase semantically via embeddings
- A reasoning model available for hard debugging without any cloud call
- Autonomous multi-file coding assistance via Aider for complex refactoring
- Git hooks that enforce quality automatically without manual intervention
- Shell tools that make the terminal AI-native
- Zero code leaving your machine, zero API costs, zero usage limits
The models are good enough that the quality difference from cloud AI tools is marginal for most professional coding tasks. The privacy, cost, and control advantages are absolute.
Your next step: Install Continue.dev, copy the config.json from Part 1, and install your stack-specific Modelfile. Use the /review slash command on the next file you write. The gap between that experience and what you get from generic base models is the immediate value of this setup.
📚 Related Posts in This Blog:
- Ollama Masterclass 2026 — Install Ollama and run your first model
- Qwen3 and Kimi K2.6: Best Local Coding Models — Deep dive on the coding models
- DeepSeek-R1 Locally: Best Reasoning Model — Debugging with chain-of-thought
- The Modelfile: Customize Any Local Model — Full Modelfile reference
- Building AI Apps With Ollama and Python — Production applications
Last updated: May 2026. Continue.dev, Aider, and Ollama all release updates frequently. Verify current configuration syntax at docs.continue.dev and aider.chat/docs.
⚠️ The pre-commit hook blocks commits containing Critical issues. Test it on a safe branch before enabling in a production repository. Use git commit --no-verify to bypass if the hook blocks a legitimate commit.