AI-Assisted Development

Choosing the Right AI Model

Workshop — Part 3b: The brain behind the tool

AI-Assisted Development

The tool is what you interact with.

The model is the brain behind it.

Same tool, different model → different results.

AI-Assisted Development

Four Model Families

Anthropic (Claude)
Opus 4.6 · Sonnet 4.6 · Haiku 4.5
Deep reasoning, long-context, complex refactors

OpenAI (GPT)
GPT-5.4 · mini · nano
Versatile, fast, strong polyglot coding

Google (Gemini)
Gemini 3 Pro · Flash · Deep Think
Massive context windows, speed, multimodal

Open-source
DeepSeek V3.2 · Llama 4 · Qwen3-Coder
Free, private, self-hostable, rapidly improving

AI-Assisted Development

Three Tiers — Not Every Task Needs the Biggest Model

Tier	Models	Use when	Trade-off
Frontier “Architects”	Opus 4.6, GPT-5.4, Gemini Pro	Architecture, hard bugs, security reviews	Slow, expensive
Mid-tier “Workhorses”	Sonnet 4.6, GPT-5.4-mini, Gemini Flash	Daily coding, features, tests, code review	Best balance
Small “Sprinters”	Haiku 4.5, GPT-5.4-nano, local models	Completions, simple Q&A, high-volume	Fast & cheap, limited reasoning

AI-Assisted Development

Anthropic — Claude

Model	Speed	Cost
Opus 4.6	Slower	$5 /$ 25
Sonnet 4.6	Fast	$3 /$ 15
Haiku 4.5	Fastest	$1 /$ 5

(per million tokens: input / output)

Opus → deep reasoning, architecture, hard bugs
Sonnet → your daily default — features, refactors, reviews
Haiku → autocomplete, simple edits, high-volume

Strength: best at understanding complex intent and maintaining coherence across large contexts

AI-Assisted Development

OpenAI — GPT

Model	Speed	Cost
GPT-5.4	Medium	$2.50 /$ 15
GPT-5.4-mini	Fast	~ $0.40 /$ 1.60
GPT-5.4-nano	Fastest	~ $0.10 /$ 0.40

GPT-5.4 → strong general-purpose default
mini → everyday coding, cost-efficient
nano → completions, classification

Strength: polyglot projects, broadest tool integration

AI-Assisted Development

Google — Gemini

Model	Speed	Cost
Gemini 3 Pro	Medium	$2–4 /$ 8–16
Gemini 3 Flash	Fast	$0.50 /$ 2
Deep Think	Slow	Premium

Pro → 2M token context — feed in entire repos, massive logs
Flash → surprisingly beats Pro on coding (78% vs 76.2% SWE-bench). Best value in 2026.
Deep Think → hard reasoning, mathematical proofs

Strength: massive context windows, multimodal, fast iteration

AI-Assisted Development

Open-Source & Local

Model	Highlight
DeepSeek V3.2	Near-frontier, free, 685B params
Llama 4 (Meta)	Enterprise, fine-tunable
Qwen3-Coder	Outperforms models 10x its size
Mistral	European compliance

Pick open-source when:

Data must stay on your infrastructure
You need to fine-tune on your codebase
You want zero per-token costs
You need offline/air-gapped operation

Many teams: local models for autocomplete, cloud models for complex reasoning.

AI-Assisted Development

Decision Flowchart

What are you doing?
│
├─ Quick completion while typing
│  → Small: Haiku, nano, local model
│
├─ Implementing a feature / writing tests
│  → Mid-tier: Sonnet, GPT-5.4-mini, Gemini Flash
│
├─ Complex refactor / architecture / hard bug
│  → Frontier: Opus, GPT-5.4, Gemini Pro
│
├─ Analyzing huge codebase or logs
│  → Gemini Pro (2M) or Opus (1M context)
│
├─ Maximum correctness needed
│  → Opus 4.6 or GPT-5.4
│
└─ Privacy / offline / self-hosted
   → DeepSeek, Llama 4, Qwen3

AI-Assisted Development

Benchmarks (March 2026) — SWE-bench Verified

Model	Score
Claude Opus 4.6	80.8%
Claude Sonnet 4.6	79.6%
Gemini 3 Flash	78.0%
Gemini 3 Pro	76.2%
GPT-5.4	~75%
DeepSeek V3.2-Speciale	~74%

Benchmarks ≠ real-world. Test models on your code before deciding.

AI-Assisted Development

Practical Recommendations

For most developers:

Default: Sonnet 4.6 or GPT-5.4-mini
Complex tasks: Opus 4.6 or GPT-5.4
Completions: Haiku 4.5 or nano
Try Gemini Flash — fast, cheap, capable

For compliance teams:

DeepSeek / Llama 4 on-premise via Ollama
Ollama works with almost any IDE — no special plugin needed

For budget-conscious:

Gemini Flash ($0.50/MTok) — best value
Batch API — 50% discount on all providers
Local models for completions (free)

AI-Assisted Development

Use the smallest model that gets the job done.

Upgrade only when you need to.

AI-Assisted Development

Summary

Four families: Anthropic, OpenAI, Google, open-source
Three tiers: frontier (accuracy), mid (daily default), small (speed & cost)
Sonnet / GPT-5.4-mini / Gemini Flash cover 80% of tasks
Reach for Opus 4.6 / GPT-5.4 when mistakes are expensive
Open-source is viable for privacy, compliance, and cost at scale
Configure your tools to use different models for different tasks