Local AI Coding: The Advanced Setup Guide for Professional Developers in 2026

Most guides to local AI coding stop at “install Ollama, install Continue.dev, done.” That setup gives you a local autocomplete. It does not give you a professional-grade AI development environment.

A professional local AI coding environment is different in degree and in kind. It does not just suggest the next line — it reviews the diff before every commit, generates test suites from your specs, debugs with a reasoning model that shows its work, maintains a semantic index of your entire codebase so every AI interaction is codebase-aware, runs autonomous multi-file refactoring sessions overnight, and adapts to your specific language stack through purpose-built Modelfiles.

This guide builds that environment step by step. It assumes you are already a professional developer — comfortable with the terminal, familiar with Python and shell scripting, and done with toy examples. Every step is production-oriented, everything stays on your hardware, and by the end you will have an AI-augmented development workflow that replaces the most repetitive parts of your professional work without sending a single line of code to an external server.

🔗 Between the ChatGPT Unlocked series and the Ollama Unlocked series. For Ollama installation, see Ollama Masterclass 2026. For the Ollama API reference, see The Ollama API.

Prerequisites and Model Selection

Models to Pull Before Starting

This guide uses four models with distinct roles. Pull all of them:

# Primary coding model — review, generation, architecture
ollama pull qwen3.6:27b

# Fast autocomplete — sub-200ms suggestions in your editor
ollama pull qwen3:7b

# Reasoning and debugging — shows thinking trace for hard problems
ollama pull deepseek-r1:14b

# General assistant — documentation, explanation, planning
ollama pull llama4:scout

# Embeddings — codebase semantic search
ollama pull nomic-embed-text

Hardware minimum for this guide: 16GB VRAM (RTX 3090, RTX 4090, or M3 Pro 36GB). The guide works at 12GB VRAM by substituting qwen3:14b for qwen3.6:27b.

Verify Everything Is Running

# Ollama API health check
curl -s http://localhost:11434/api/tags | python3 -c \
  "import json,sys; models=[m['name'] for m in json.load(sys.stdin)['models']]; \
  [print(f'  ✓ {m}') for m in models]"

# Expected output:
#   ✓ qwen3.6:27b
#   ✓ qwen3:7b
#   ✓ deepseek-r1:14b
#   ✓ llama4:scout
#   ✓ nomic-embed-text

Part 1: Continue.dev — Advanced Configuration

Installation

VS Code:

Extensions (Ctrl+Shift+X) → search "Continue" → Install

JetBrains (IntelliJ, PyCharm, GoLand, etc.):

Settings → Plugins → Marketplace → "Continue" → Install → Restart IDE

Advanced Config: `~/.continue/config.json`

This configuration goes well beyond the defaults — it sets up per-task model routing, codebase indexing, custom slash commands, and tight context controls:

{
  "models": [
    {
      "title": "Qwen Coder 27B",
      "provider": "ollama",
      "model": "qwen3.6:27b",
      "apiBase": "http://localhost:11434",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.05,
        "maxTokens": 8192,
        "top_p": 0.9
      },
      "systemMessage": "You are an expert software engineer. Write production-ready code with error handling, type annotations, and concise comments on non-obvious logic only. Never write placeholder or TODO code unless explicitly asked."
    },
    {
      "title": "DeepSeek R1 Debugger",
      "provider": "ollama",
      "model": "deepseek-r1:14b",
      "apiBase": "http://localhost:11434",
      "contextLength": 16384,
      "completionOptions": {
        "temperature": 0.0,
        "maxTokens": 4096
      },
      "systemMessage": "You are a systematic debugger. Think through problems step by step before proposing a fix. Identify all plausible root causes before recommending one. Show your reasoning."
    },
    {
      "title": "Llama 4 Scout (Docs)",
      "provider": "ollama",
      "model": "llama4:scout",
      "apiBase": "http://localhost:11434",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.3,
        "maxTokens": 4096
      }
    }
  ],

  "tabAutocompleteModel": {
    "title": "Qwen 3 7B (Autocomplete)",
    "provider": "ollama",
    "model": "qwen3:7b",
    "apiBase": "http://localhost:11434",
    "completionOptions": {
      "temperature": 0.0,
      "maxTokens": 256,
      "stop": ["\n\n", "def ", "class ", "# ---"]
    }
  },

  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://localhost:11434"
  },

  "reranker": {
    "name": "llm",
    "params": {
      "modelTitle": "Qwen Coder 27B"
    }
  },

  "contextProviders": [
    { "name": "code" },
    { "name": "docs" },
    { "name": "diff" },
    { "name": "terminal" },
    { "name": "problems" },
    { "name": "folder" },
    { "name": "codebase" },
    { "name": "open" },
    { "name": "os" }
  ],

  "slashCommands": [
    {
      "name": "review",
      "description": "Review selected code for issues",
      "prompt": "Review this code. Grade issues as [Critical/High/Medium/Low]. For each issue: location, problem, why it matters, specific fix. Also list 2-3 strengths. Be direct."
    },
    {
      "name": "test",
      "description": "Generate comprehensive tests",
      "prompt": "Write a comprehensive test suite for this code. Cover: happy path, edge cases (empty, null, boundary values), error cases, any concurrency issues. Use the project's existing test framework. Tests must be independent — no shared mutable state."
    },
    {
      "name": "doc",
      "description": "Write documentation",
      "prompt": "Write complete documentation for this code. Include: what it does (not what it is), parameters with types and constraints, return values, exceptions raised, one complete usage example. Active voice, present tense."
    },
    {
      "name": "perf",
      "description": "Analyze performance",
      "prompt": "Analyze this code for performance. Identify: algorithmic complexity (Big O), memory allocation patterns, I/O bottlenecks, and any N+1 query problems. Prioritize issues by impact. Suggest specific optimizations with expected improvements."
    },
    {
      "name": "secure",
      "description": "Security review",
      "prompt": "Review this code for security vulnerabilities. Check for: injection vulnerabilities (SQL, command, LDAP), authentication/authorization flaws, sensitive data exposure, insecure deserialization, SSRF, path traversal, cryptographic issues. Rate each finding: Critical/High/Medium/Low."
    },
    {
      "name": "refactor",
      "description": "Suggest refactoring",
      "prompt": "Identify refactoring opportunities in this code. Focus on: extracting reusable functions, eliminating code duplication, improving naming clarity, reducing function complexity. For each suggestion, show the before and after."
    },
    {
      "name": "explain",
      "description": "Explain code in depth",
      "prompt": "Explain this code in depth. Cover: what problem it solves, the algorithm/approach used, non-obvious implementation decisions, potential failure modes, and what I should know to modify it safely."
    },
    {
      "name": "commit",
      "description": "Generate commit message",
      "prompt": "Write a git commit message for this diff. Format: imperative mood, under 72 characters, present tense. Add a blank line then 2-3 sentences explaining WHY these changes were made (not just what changed). Follow Conventional Commits format if the project uses it."
    }
  ],

  "tabAutocompleteOptions": {
    "disable": false,
    "useCopyBuffer": true,
    "useFileSuffix": true,
    "maxPromptTokens": 2048,
    "debounceDelay": 250,
    "maxSuffixPercentage": 0.25,
    "prefixPercentage": 0.85,
    "multilineCompletions": "auto"
  },

  "ui": {
    "codeBlockToolbarPosition": "top",
    "fontSize": 14,
    "displayRawMarkdown": false
  }
}

Codebase Indexing: Making Every Query Context-Aware

Continue.dev indexes your entire codebase using the nomic-embed-text embeddings model. This happens automatically when you open a folder. To trigger or refresh manually:

Ctrl+Shift+P → "Continue: Index codebase"

After indexing, @codebase queries search across all your code semantically:

@codebase Where is user authentication handled?
@codebase Which functions write to the database?
@codebase Show me how we handle rate limiting across the API
@codebase What tests exist for the payment module?

The index is stored in ~/.continue/index — it persists between sessions and updates incrementally as you edit files.

Part 2: Per-Stack Modelfiles

Generic models give generic results. These Modelfiles encode the specific conventions, patterns, and review criteria for each major stack.

Python / FastAPI / SQLAlchemy Stack

cat > ~/.config/ollama/modelfiles/PythonPro.Modelfile << 'EOF'
FROM qwen3.6:27b

SYSTEM """You are a senior Python engineer specializing in FastAPI, SQLAlchemy 2.0, and Pydantic v2.

CODING STANDARDS:
- Python 3.11+ with strict type hints everywhere
- Async/await for all I/O-bound operations — never sync in async context
- Pydantic v2 models for ALL request/response schemas, no raw dicts
- SQLAlchemy 2.0 style: select() statements, not Query API
- Repository pattern for database access — no raw SQL in route handlers
- Dependency injection for all shared resources (DB sessions, services)

SQLALCHEMY PATTERNS:
- Always use select() with explicit columns when performance matters
- eager load with selectinload/joinedload — never lazy load in async context
- Use db.scalar() for single values, db.scalars() for lists
- Explicit transactions: async with db.begin(): for write operations

FASTAPI PATTERNS:
- HTTPException with specific status codes, never generic 500
- Response models on all endpoints — no Any return types
- Background tasks for operations not needed in response
- Lifespan context manager for startup/shutdown, not @app.on_event

ERROR HANDLING:
- Custom exception classes that map to HTTP responses
- Log errors with structlog, include request_id in every log
- Never expose internal errors to clients

REVIEW CHECKLIST (always apply):
□ N+1 queries (missing eager loading)
□ Missing input validation (beyond Pydantic)
□ SQL injection risk (raw queries with user input)
□ Missing authentication/authorization checks
□ Unhandled edge cases (empty results, None values)
□ Missing index on filtered/joined columns"""

PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF

ollama create PythonPro -f ~/.config/ollama/modelfiles/PythonPro.Modelfile

TypeScript / Node / React Stack

cat > ~/.config/ollama/modelfiles/TypeScriptPro.Modelfile << 'EOF'
FROM qwen3.6:27b

SYSTEM """You are a senior TypeScript engineer working across Node.js backend and React frontend.

TYPESCRIPT STANDARDS:
- Strict mode always: "strict": true in tsconfig
- No implicit any — every variable has an explicit or inferred type
- Discriminated unions over optional chains for state modeling
- Zod for runtime validation of all external data (API responses, env vars, user input)
- Exhaustiveness checking with never type for switch statements

REACT STANDARDS:
- Functional components only, no class components
- Custom hooks for ALL reusable stateful logic
- React Query for server state (never useState + useEffect for data fetching)
- Zustand for client state that crosses component boundaries
- useMemo/useCallback only when profiler shows a problem — not preemptively
- Error boundaries at route and major component boundaries
- Accessibility: aria-label on interactive elements, keyboard navigation

NODE.JS BACKEND:
- Express or Fastify with typed request/response
- Middleware for cross-cutting concerns (auth, logging, rate limiting)
- Environment variables validated with Zod at startup — fail fast
- Structured logging with pino, never console.log in production
- Graceful shutdown handling for SIGTERM/SIGINT

NAMING CONVENTIONS:
- PascalCase: Components, Types, Interfaces, Enums
- camelCase: variables, functions, methods
- SCREAMING_SNAKE: constants
- Named exports everywhere (no default exports except Next.js pages)
- Boolean variables: is/has/can/should prefix

REVIEW PRIORITIES:
□ Type safety gaps (any, unknown without narrowing)
□ Missing error boundaries
□ React hook dependency array issues
□ N+1 API calls (missing parallel fetching with Promise.all)
□ XSS risks (dangerouslySetInnerHTML)
□ Missing loading/error states"""

PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF

ollama create TypeScriptPro -f ~/.config/ollama/modelfiles/TypeScriptPro.Modelfile

Go Stack

cat > ~/.config/ollama/modelfiles/GoPro.Modelfile << 'EOF'
FROM qwen3.6:27b

SYSTEM """You are a senior Go engineer.

GO STANDARDS:
- Follow effective Go and the Google Go style guide
- Errors are values — handle every error, never ignore with _
- Error wrapping: fmt.Errorf("context: %w", err) for error chains
- Context propagation: context.Context is always the first parameter
- Interfaces defined at the point of use (consumer side), not implementation side
- Goroutine leaks: always cancel context or close channel to stop goroutines
- sync.WaitGroup or errgroup for goroutine coordination

CONCURRENCY PATTERNS:
- Channels for communication, mutexes for shared state — not both
- Select with default for non-blocking channel operations
- Always defer cancel() immediately after context.WithCancel/Timeout
- Use sync.Once for expensive one-time initialization
- atomic package for simple counters, not mutex

PERFORMANCE:
- Avoid allocations in hot paths: reuse buffers with sync.Pool
- strings.Builder over += concatenation in loops
- Benchmark before optimizing: go test -bench=.
- Profile before claiming something is slow

TESTING:
- Table-driven tests for all non-trivial functions
- testify/assert for assertions
- httptest.NewRecorder for HTTP handler tests
- Interfaces for external dependencies — mock with testify/mock
- go test -race on every CI run

ERROR HANDLING (Go-specific):
- Sentinel errors: var ErrNotFound = errors.New("not found")
- Error types for errors with data: type ValidationError struct{}
- errors.Is() and errors.As() for checking wrapped errors"""

PARAMETER temperature 0.05
PARAMETER num_ctx 32768
PARAMETER num_predict 6000
EOF

ollama create GoPro -f ~/.config/ollama/modelfiles/GoPro.Modelfile

Add Stack Models to Continue.dev

// Add to models array in ~/.continue/config.json
{
  "title": "Python Pro",
  "provider": "ollama",
  "model": "PythonPro",
  "apiBase": "http://localhost:11434",
  "contextLength": 32768,
  "completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
},
{
  "title": "TypeScript Pro",
  "provider": "ollama",
  "model": "TypeScriptPro",
  "apiBase": "http://localhost:11434",
  "contextLength": 32768,
  "completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
},
{
  "title": "Go Pro",
  "provider": "ollama",
  "model": "GoPro",
  "apiBase": "http://localhost:11434",
  "contextLength": 32768,
  "completionOptions": { "temperature": 0.05, "maxTokens": 6000 }
}

Part 3: Aider — Autonomous Multi-File Coding

Aider is a terminal coding assistant that reads your git history, plans coordinated changes across multiple files, and commits with appropriate messages. It is the closest local equivalent to cloud-based autonomous coding agents.

Installation and Project Configuration

pip install aider-chat

# Per-project config (commit this to your repo)
cat > .aider.conf.yml << 'EOF'
model: ollama/qwen3.6:27b
openai-api-base: http://localhost:11434/v1
openai-api-key: ollama

# Aider behavior
auto-commits: true
dirty-commits: true
show-diffs: true
stream: true
pretty: true

# Editor (opens diffs)
editor: code --wait

# Files Aider reads for context (always in memory)
read:
  - README.md
  - ARCHITECTURE.md
  - docs/conventions.md
EOF

The Aider Workflow for Professional Tasks

# Start Aider — it reads your git history and file tree
cd my-project
aider

# --- TASK 1: Targeted refactoring ---
> The connection pool in src/database/pool.py has a memory leak
> when connections fail to close on exception. Fix it.
# Aider: reads pool.py, identifies the issue, implements the fix,
# runs your tests, commits with a descriptive message

# --- TASK 2: Feature implementation ---
> Add pagination to the GET /users endpoint in api/routes/users.py
> Use cursor-based pagination matching the pattern in api/routes/products.py
# Aider: reads both files, implements consistent pagination, updates tests

# --- TASK 3: Multi-file refactoring ---
> Rename UserService to AccountService throughout the codebase
> Update all imports, tests, and documentation references
# Aider: finds all references, renames systematically, commits

# --- TASK 4: Test generation ---
> Generate comprehensive tests for src/services/payment_processor.py
> Focus on: all payment provider error states, concurrent payment requests,
> idempotency key handling, and partial failure scenarios
# Aider: reads the service, writes tests, verifies they pass

# --- Useful Aider commands ---
/add src/models/user.py src/services/auth.py  # Add files to context
/drop src/old_module.py                        # Remove from context
/ls                                            # Show files in context
/diff                                          # Show current diff
/undo                                          # Undo last commit
/run pytest tests/unit/                        # Run tests from within Aider
/ask How does the authentication flow work?    # Ask without making changes

Using DeepSeek-R1 for Hard Debugging With Aider

Swap to the reasoning model for bugs that require systematic analysis:

# Start Aider with the reasoning model
aider \
  --model ollama/deepseek-r1:14b \
  --openai-api-base http://localhost:11434/v1 \
  --openai-api-key ollama

# DeepSeek-R1 shows its thinking trace before making changes
> The race condition in src/cache/distributed_lock.py appears
> intermittently under high concurrent load. Analyze systematically
> and fix the root cause, not the symptom.
# R1 will show <think>...</think> reasoning through the concurrency
# model before proposing and implementing the fix

Part 4: Git Hooks for AI-Powered Quality Gates

These git hooks run automatically — turning AI review into a mandatory step in your commit workflow.

Pre-Commit AI Review Hook

cat > .git/hooks/pre-commit << 'HOOK'
#!/bin/bash
# AI-powered pre-commit review
# Fails the commit if Critical issues are found

set -e

STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACMR | \
  grep -E '\.(py|js|ts|go|rs|java|rb|cpp|c)$' || true)

if [ -z "$STAGED_FILES" ]; then
  exit 0
fi

echo "🤖 Running AI code review..."

DIFF=$(git diff --cached -- $STAGED_FILES)

if [ ${#DIFF} -gt 15000 ]; then
  echo "  ℹ️  Diff too large for pre-commit review — skipping"
  exit 0
fi

REVIEW=$(curl -s http://localhost:11434/api/generate \
  -d "$(python3 -c "
import json, sys
diff = open('/dev/stdin').read() if False else '''$DIFF'''
payload = {
    'model': 'qwen3.6:27b',
    'prompt': '''Review this git diff for critical issues only.

REPORT ONLY:
[Critical] - Security vulnerabilities, data corruption risk, auth bypass
[High] - Crashes under load, missing error handling in critical paths

DO NOT REPORT: style issues, minor improvements, medium/low priority items.
If no Critical or High issues found, respond ONLY with: APPROVED

Diff:
''' + diff,
    'stream': False,
    'options': {'temperature': 0, 'num_predict': 800}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])")

echo "$REVIEW"

if echo "$REVIEW" | grep -q "\[Critical\]"; then
  echo ""
  echo "❌ Commit blocked: Critical issues found."
  echo "   Fix the issues above or use 'git commit --no-verify' to bypass."
  exit 1
fi

if echo "$REVIEW" | grep -q "APPROVED"; then
  echo "  ✅ AI review passed"
else
  echo "  ⚠️  High priority issues found — review before proceeding"
  echo "  Use 'git commit --no-verify' to bypass if issues are acceptable"
  read -p "  Continue with commit? [y/N] " yn
  case $yn in
    [Yy]*) exit 0;;
    *) exit 1;;
  esac
fi
HOOK
chmod +x .git/hooks/pre-commit

AI Commit Message Generator

cat > .git/hooks/prepare-commit-msg << 'HOOK'
#!/bin/bash
# Auto-generate commit message using local AI
# Only activates for blank commit messages (not amends, merges, etc.)

COMMIT_MSG_FILE="$1"
COMMIT_SOURCE="$2"

# Only auto-generate for fresh commits with empty message
if [ -n "$COMMIT_SOURCE" ]; then
  exit 0
fi

if [ -s "$COMMIT_MSG_FILE" ]; then
  CONTENT=$(cat "$COMMIT_MSG_FILE" | grep -v '^#' | tr -d '[:space:]')
  if [ -n "$CONTENT" ]; then
    exit 0
  fi
fi

DIFF=$(git diff --cached --stat 2>/dev/null)
FULL_DIFF=$(git diff --cached 2>/dev/null | head -c 8000)

if [ -z "$DIFF" ]; then
  exit 0
fi

echo "🤖 Generating commit message..."

MSG=$(curl -s http://localhost:11434/api/generate \
  -d "$(python3 -c "
import json
payload = {
    'model': 'llama4:scout',
    'prompt': '''Generate a git commit message for these changes.

FORMAT:
Line 1: <type>(<scope>): <imperative verb> <what changed> (max 72 chars)
Line 2: blank
Lines 3+: WHY these changes were made (not what — the diff shows that)

Types: feat, fix, refactor, test, docs, chore, perf, style, ci

Changed files:
$DIFF

Full diff (excerpt):
$FULL_DIFF

Return ONLY the commit message. No explanation, no quotes.''',
    'stream': False,
    'options': {'temperature': 0.3, 'num_predict': 200}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'].strip())")

echo "$MSG" > "$COMMIT_MSG_FILE"
echo "" >> "$COMMIT_MSG_FILE"
echo "# AI-generated message — edit as needed" >> "$COMMIT_MSG_FILE"
echo "# Delete this line and above to use original empty message" >> "$COMMIT_MSG_FILE"
HOOK
chmod +x .git/hooks/prepare-commit-msg

Post-Merge Architecture Check

cat > .git/hooks/post-merge << 'HOOK'
#!/bin/bash
# Analyze what changed in a merge and flag architecture concerns

MERGE_DIFF=$(git diff HEAD~1 HEAD --stat 2>/dev/null | head -30)

if [ -z "$MERGE_DIFF" ]; then
  exit 0
fi

echo "🏗️  Checking merge for architecture concerns..."

ANALYSIS=$(curl -s http://localhost:11434/api/generate \
  -d "$(python3 -c "
import json
payload = {
    'model': 'llama4:scout',
    'prompt': '''Briefly analyze this merge. Flag ONLY if you see:
- Circular dependency risks
- Large files that should be split (>500 lines added to one file)
- Migration files without corresponding model changes
- Test coverage gaps (significant new code with no test files changed)

If nothing concerning: respond only with OK

Changed files:
$MERGE_DIFF''',
    'stream': False,
    'options': {'temperature': 0, 'num_predict': 300}
}
print(json.dumps(payload))
")" | python3 -c "import json,sys; print(json.load(sys.stdin)['response'].strip())")

if [ "$ANALYSIS" != "OK" ]; then
  echo "📋 Post-merge notes:"
  echo "$ANALYSIS"
fi
HOOK
chmod +x .git/hooks/post-merge

Part 5: Shell Integration

Full Shell AI Toolkit

Add to ~/.zshrc or ~/.bashrc:

# ─── LOCAL AI SHELL TOOLKIT ───────────────────────────────────────

OLLAMA_API="http://localhost:11434/api/generate"

# Generic AI query
ai() {
  local prompt="$*"
  curl -s "$OLLAMA_API" \
    -d "{\"model\":\"llama4:scout\",\"prompt\":$(echo "$prompt" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0.3,\"num_predict\":800}}" \
    | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}

# Shell command suggester
cmd() {
  local task="$*"
  curl -s "$OLLAMA_API" \
    -d "{\"model\":\"qwen3:7b\",\"prompt\":$(echo "Give me a bash/zsh command to: $task\nReturn ONLY the command, no explanation. OS: $(uname -s)" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":150}}" \
    | python3 -c "import json,sys; r=json.load(sys.stdin)['response'].strip(); print(r)"
}

# Explain a command
explain() {
  ai "Explain this command in detail, including each flag: $*"
}

# Explain last error
oops() {
  local last_cmd=$(fc -ln -1 2>/dev/null || history 1 | awk '{$1=""; print $0}')
  ai "This command failed: $last_cmd\nExplain what went wrong and how to fix it."
}

# Review a file
review_file() {
  local file="$1"
  if [ ! -f "$file" ]; then
    echo "File not found: $file"
    return 1
  fi
  local content=$(cat "$file")
  curl -s "$OLLAMA_API" \
    -d "{\"model\":\"qwen3.6:27b\",\"prompt\":$(echo "Code review this file. Grade: [Critical/High/Medium/Low]. Be specific.\n\n$(cat $file)" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":1500}}" \
    | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}

# Summarize current git diff
diff_summary() {
  local diff=$(git diff 2>/dev/null)
  if [ -z "$diff" ]; then
    diff=$(git diff --cached 2>/dev/null)
  fi
  if [ -z "$diff" ]; then
    echo "No diff found"
    return 0
  fi
  echo "$diff" | head -c 8000 | \
    (echo "Summarize what changed in this diff in plain English. List the main changes as bullet points:"; cat) | \
    ai
}

# Generate test for a function (paste function, get tests)
gen_tests() {
  echo "Paste your function, then press Ctrl+D:"
  local code=$(cat)
  curl -s "$OLLAMA_API" \
    -d "{\"model\":\"qwen3.6:27b\",\"prompt\":$(echo "Write comprehensive tests for this function. Cover: happy path, edge cases, error cases. Use the appropriate test framework for the language.\n\n$code" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0.1,\"num_predict\":3000}}" \
    | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}

# Debug with reasoning model
debug() {
  echo "Describe the bug (Ctrl+D when done):"
  local description=$(cat)
  curl -s "$OLLAMA_API" \
    -d "{\"model\":\"deepseek-r1:14b\",\"prompt\":$(echo "Debug this problem systematically. Show your reasoning.\n\n$description" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),\"stream\":false,\"options\":{\"temperature\":0,\"num_predict\":3000}}" \
    | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"
}

# Aliases
alias aii='ai'
alias ?='cmd'

Part 6: Codebase-Aware RAG for Large Projects

For codebases too large to fit in any context window, build a semantic search index:

#!/usr/bin/env python3
# codebase_rag.py — Semantic codebase search using local embeddings

import ollama
import chromadb
from pathlib import Path
import hashlib
import json

SUPPORTED_EXTENSIONS = {
    '.py', '.js', '.ts', '.tsx', '.jsx', '.go', '.rs',
    '.java', '.rb', '.cpp', '.c', '.h', '.swift', '.kt',
    '.md', '.yml', '.yaml', '.toml', '.sql'
}

def chunk_code_file(content: str, filepath: str, 
                    chunk_size: int = 60) -> list[dict]:
    """Split code into overlapping line-based chunks."""
    lines = content.splitlines()
    chunks = []

    for i in range(0, len(lines), chunk_size - 10):
        chunk_lines = lines[i:i + chunk_size]
        chunk_text = '\n'.join(chunk_lines)
        if chunk_text.strip():
            chunks.append({
                'text': chunk_text,
                'filepath': filepath,
                'start_line': i + 1,
                'end_line': min(i + chunk_size, len(lines))
            })
    return chunks

def index_codebase(project_root: str, 
                   collection_name: str = "codebase"):
    """Index an entire codebase for semantic search."""

    client = chromadb.PersistentClient(path=f"{project_root}/.codebase_index")
    collection = client.get_or_create_collection(
        name=collection_name,
        metadata={"hnsw:space": "cosine"}
    )

    root = Path(project_root)
    ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv',
                   'venv', 'dist', 'build', '.next', 'target'}

    files_indexed = 0
    chunks_indexed = 0

    for filepath in root.rglob('*'):
        # Skip ignored directories
        if any(part in ignore_dirs for part in filepath.parts):
            continue
        if filepath.suffix not in SUPPORTED_EXTENSIONS:
            continue
        if not filepath.is_file():
            continue

        try:
            content = filepath.read_text(encoding='utf-8', errors='ignore')
        except Exception:
            continue

        if len(content) < 50:  # Skip very small files
            continue

        relative_path = str(filepath.relative_to(root))
        chunks = chunk_code_file(content, relative_path)

        for chunk in chunks:
            chunk_id = hashlib.md5(
                f"{relative_path}:{chunk['start_line']}:{chunk['text'][:50]}".encode()
            ).hexdigest()

            try:
                embedding = ollama.embeddings(
                    model='nomic-embed-text',
                    prompt=chunk['text']
                )['embedding']

                collection.upsert(
                    ids=[chunk_id],
                    documents=[chunk['text']],
                    embeddings=[embedding],
                    metadatas=[{
                        'filepath': relative_path,
                        'start_line': chunk['start_line'],
                        'end_line': chunk['end_line']
                    }]
                )
                chunks_indexed += 1
            except Exception as e:
                print(f"  Error embedding {relative_path}: {e}")

        files_indexed += 1
        if files_indexed % 20 == 0:
            print(f"  Indexed {files_indexed} files, {chunks_indexed} chunks...")

    print(f"\n✓ Indexed {files_indexed} files, {chunks_indexed} chunks")
    return collection

def search_codebase(query: str, project_root: str, 
                    top_k: int = 7) -> list[dict]:
    """Search the codebase index semantically."""
    client = chromadb.PersistentClient(path=f"{project_root}/.codebase_index")
    collection = client.get_collection("codebase")

    embedding = ollama.embeddings(
        model='nomic-embed-text',
        prompt=query
    )['embedding']

    results = collection.query(
        query_embeddings=[embedding],
        n_results=top_k
    )

    return [
        {
            'code': doc,
            'filepath': meta['filepath'],
            'lines': f"{meta['start_line']}-{meta['end_line']}",
            'relevance': 1 - dist
        }
        for doc, meta, dist in zip(
            results['documents'][0],
            results['metadatas'][0],
            results['distances'][0]
        )
    ]

def ask_codebase(question: str, project_root: str) -> str:
    """Answer a question about the codebase using RAG."""
    results = search_codebase(question, project_root)

    context_parts = []
    for r in results:
        context_parts.append(
            f"// {r['filepath']} (lines {r['lines']})\n{r['code']}"
        )
    context = "\n\n---\n\n".join(context_parts)

    response = ollama.generate(
        model='llama4:scout',
        prompt=f"""Answer this question about the codebase using the provided code excerpts.
Cite specific files and line numbers in your answer.
If the code excerpts don't contain enough information, say so.

Question: {question}

Relevant code:
{context}

Answer:""",
        options={'temperature': 0.1, 'num_ctx': 32768, 'num_predict': 2000}
    )

    return response['response']

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 2:
        print("Usage:")
        print("  python codebase_rag.py index [project_root]")
        print("  python codebase_rag.py ask 'your question' [project_root]")
        sys.exit(1)

    command = sys.argv[1]
    project_root = sys.argv[3] if len(sys.argv) > 3 else '.'

    if command == 'index':
        index_codebase(project_root)
    elif command == 'ask':
        question = sys.argv[2]
        answer = ask_codebase(question, project_root)
        print(answer)

# Index your codebase (run once, then update incrementally)
python codebase_rag.py index /path/to/project

# Ask questions about the codebase
python codebase_rag.py ask "How does the authentication middleware work?"
python codebase_rag.py ask "Where are database connections opened and closed?"
python codebase_rag.py ask "What happens when a payment fails?"

Part 7: Measuring the Impact

A professional setup deserves measurement. Track these to prove the value:

#!/usr/bin/env python3
# ai_metrics.py — Track AI coding tool effectiveness

import json
import time
from pathlib import Path
from datetime import datetime, timedelta
from collections import defaultdict

METRICS_FILE = Path.home() / '.ai_coding_metrics.json'

def log_ai_interaction(task_type: str, time_saved_minutes: int,
                       quality_improvement: str,
                       model_used: str):
    """Log an AI-assisted coding interaction."""
    metrics = []
    if METRICS_FILE.exists():
        metrics = json.loads(METRICS_FILE.read_text())

    metrics.append({
        'timestamp': datetime.now().isoformat(),
        'task_type': task_type,
        'time_saved_minutes': time_saved_minutes,
        'quality_improvement': quality_improvement,
        'model_used': model_used
    })

    METRICS_FILE.write_text(json.dumps(metrics, indent=2))
    print(f"✓ Logged: {task_type} — {time_saved_minutes} min saved")

def generate_weekly_report():
    """Generate a weekly productivity report."""
    if not METRICS_FILE.exists():
        print("No metrics recorded yet.")
        return

    metrics = json.loads(METRICS_FILE.read_text())
    one_week_ago = datetime.now() - timedelta(days=7)

    weekly = [m for m in metrics
              if datetime.fromisoformat(m['timestamp']) > one_week_ago]

    if not weekly:
        print("No metrics in the last 7 days.")
        return

    total_time = sum(m['time_saved_minutes'] for m in weekly)
    by_task = defaultdict(lambda: {'count': 0, 'time': 0})
    by_model = defaultdict(int)

    for m in weekly:
        by_task[m['task_type']]['count'] += 1
        by_task[m['task_type']]['time'] += m['time_saved_minutes']
        by_model[m['model_used']] += 1

    print("\n=== WEEKLY AI CODING METRICS ===")
    print(f"Period: {one_week_ago.date()} to {datetime.now().date()}")
    print(f"\nTotal time saved: {total_time} minutes ({total_time/60:.1f} hours)")
    print(f"Total interactions: {len(weekly)}")
    print(f"\nBy task type:")
    for task, data in sorted(by_task.items(),
                             key=lambda x: x[1]['time'], reverse=True):
        print(f"  {task}: {data['count']} interactions, "
              f"{data['time']} min saved")
    print(f"\nBy model:")
    for model, count in sorted(by_model.items(),
                               key=lambda x: x[1], reverse=True):
        print(f"  {model}: {count} interactions")

if __name__ == '__main__':
    import sys
    if sys.argv[1] == 'log':
        log_ai_interaction(
            task_type=sys.argv[2],
            time_saved_minutes=int(sys.argv[3]),
            quality_improvement=sys.argv[4] if len(sys.argv) > 4 else 'unknown',
            model_used=sys.argv[5] if len(sys.argv) > 5 else 'unknown'
        )
    elif sys.argv[1] == 'report':
        generate_weekly_report()

# Log interactions throughout your day
python ai_metrics.py log "code_review" 20 "caught 2 security issues" "qwen3.6:27b"
python ai_metrics.py log "test_generation" 30 "full coverage added" "qwen3.6:27b"
python ai_metrics.py log "debugging" 45 "root cause found faster" "deepseek-r1:14b"
python ai_metrics.py log "commit_message" 5 "conventional format" "llama4:scout"

# Weekly summary
python ai_metrics.py report

Quick Reference: Which Model for Which Task

Task	Model	Command
Autocomplete	`qwen3:7b`	Auto via Continue.dev
Code review	`PythonPro` / `TypeScriptPro` / `GoPro`	`/review` in Continue
Debugging (hard)	`deepseek-r1:14b`	`debug` shell function
Multi-file refactor	`qwen3.6:27b` via Aider	`aider` in project root
Test generation	`PythonPro` or stack model	`/test` in Continue
Documentation	`llama4:scout`	`/doc` in Continue
Security review	`qwen3.6:27b`	`/secure` in Continue
Shell commands	`qwen3:7b`	`cmd "task description"`
Codebase Q&A	`llama4:scout` + RAG	`python codebase_rag.py ask`
Commit message	`llama4:scout`	Auto via git hook

Conclusion

This setup takes about two hours to fully configure. What you get in return:

Every code change reviewed by a model that knows your stack’s specific standards
Autocomplete that understands your codebase semantically via embeddings
A reasoning model available for hard debugging without any cloud call
Autonomous multi-file coding assistance via Aider for complex refactoring
Git hooks that enforce quality automatically without manual intervention
Shell tools that make the terminal AI-native
Zero code leaving your machine, zero API costs, zero usage limits

The models are good enough that the quality difference from cloud AI tools is marginal for most professional coding tasks. The privacy, cost, and control advantages are absolute.

Your next step: Install Continue.dev, copy the config.json from Part 1, and install your stack-specific Modelfile. Use the /review slash command on the next file you write. The gap between that experience and what you get from generic base models is the immediate value of this setup.

📚 Related Posts in This Blog:

Ollama Masterclass 2026 — Install Ollama and run your first model

Qwen3 and Kimi K2.6: Best Local Coding Models — Deep dive on the coding models

DeepSeek-R1 Locally: Best Reasoning Model — Debugging with chain-of-thought

The Modelfile: Customize Any Local Model — Full Modelfile reference

Building AI Apps With Ollama and Python — Production applications

Last updated: May 2026. Continue.dev, Aider, and Ollama all release updates frequently. Verify current configuration syntax at docs.continue.dev and aider.chat/docs.

⚠️ The pre-commit hook blocks commits containing Critical issues. Test it on a safe branch before enabling in a production repository. Use git commit --no-verify to bypass if the hook blocks a legitimate commit.

Frequently Asked Questions (FAQ)

How do I handle projects where different services use different languages?

Create a Modelfile for each language stack and switch models in Continue.dev using the model picker at the top of the chat panel. Keep your codebase RAG index at the monorepo root so cross-service questions still work.

The pre-commit hook is slow — how do I speed it up?

Route pre-commit to `qwen3:7b` for speed at the cost of some review depth. Alternatively, only review changed files above a size threshold and skip the hook for single-line changes: `if [ $(echo "$DIFF" | wc -l) -lt 10 ]; then exit 0; fi`.

Can I use this setup with GitHub Copilot enabled at the same time?

Yes — Continue.dev and GitHub Copilot are separate extensions that do not conflict. A common pattern: Copilot for fast inline autocomplete, Continue for deeper chat, code review, and codebase queries.

How do I keep the codebase RAG index updated?

Run `python codebase_rag.py index` after significant changes, or add it to a post-merge git hook. The `upsert` operation only re-indexes files that changed, so it is fast on subsequent runs.