AI-Assisted Development

Choosing the Right AI Model

Workshop — Part 3b: The brain behind the tool

3b / 12 — Models
AI-Assisted Development

The tool is what you interact with.

The model is the brain behind it.

Same tool, different model → different results.

3b / 12 — Models
AI-Assisted Development

Four Model Families

Anthropic (Claude)
Opus 4.6 · Sonnet 4.6 · Haiku 4.5
Deep reasoning, long-context, complex refactors

OpenAI (GPT)
GPT-5.4 · mini · nano
Versatile, fast, strong polyglot coding

Google (Gemini)
Gemini 3 Pro · Flash · Deep Think
Massive context windows, speed, multimodal

Open-source
DeepSeek V3.2 · Llama 4 · Qwen3-Coder
Free, private, self-hostable, rapidly improving

3b / 12 — Models
AI-Assisted Development

Three Tiers — Not Every Task Needs the Biggest Model

Tier Models Use when Trade-off
Frontier “Architects” Opus 4.6, GPT-5.4, Gemini Pro Architecture, hard bugs, security reviews Slow, expensive
Mid-tier “Workhorses” Sonnet 4.6, GPT-5.4-mini, Gemini Flash Daily coding, features, tests, code review Best balance
Small “Sprinters” Haiku 4.5, GPT-5.4-nano, local models Completions, simple Q&A, high-volume Fast & cheap, limited reasoning
3b / 12 — Models
AI-Assisted Development

Anthropic — Claude

Model Speed Cost
Opus 4.6 Slower 5/5 / 25
Sonnet 4.6 Fast 3/3 / 15
Haiku 4.5 Fastest 1/1 / 5

(per million tokens: input / output)

Opus → deep reasoning, architecture, hard bugs
Sonnet → your daily default — features, refactors, reviews
Haiku → autocomplete, simple edits, high-volume

Strength: best at understanding complex intent and maintaining coherence across large contexts

3b / 12 — Models
AI-Assisted Development

OpenAI — GPT

Model Speed Cost
GPT-5.4 Medium 2.50/2.50 / 15
GPT-5.4-mini Fast ~0.40/0.40 / 1.60
GPT-5.4-nano Fastest ~0.10/0.10 / 0.40

GPT-5.4 → strong general-purpose default
mini → everyday coding, cost-efficient
nano → completions, classification

Strength: polyglot projects, broadest tool integration

3b / 12 — Models
AI-Assisted Development

Google — Gemini

Model Speed Cost
Gemini 3 Pro Medium 24/2–4 / 8–16
Gemini 3 Flash Fast 0.50/0.50 / 2
Deep Think Slow Premium

Pro → 2M token context — feed in entire repos, massive logs
Flash → surprisingly beats Pro on coding (78% vs 76.2% SWE-bench). Best value in 2026.
Deep Think → hard reasoning, mathematical proofs

Strength: massive context windows, multimodal, fast iteration

3b / 12 — Models
AI-Assisted Development

Open-Source & Local

Model Highlight
DeepSeek V3.2 Near-frontier, free, 685B params
Llama 4 (Meta) Enterprise, fine-tunable
Qwen3-Coder Outperforms models 10x its size
Mistral European compliance

Pick open-source when:

  • Data must stay on your infrastructure
  • You need to fine-tune on your codebase
  • You want zero per-token costs
  • You need offline/air-gapped operation

Many teams: local models for autocomplete, cloud models for complex reasoning.

3b / 12 — Models
AI-Assisted Development

Decision Flowchart

What are you doing?
│
├─ Quick completion while typing
│  → Small: Haiku, nano, local model
│
├─ Implementing a feature / writing tests
│  → Mid-tier: Sonnet, GPT-5.4-mini, Gemini Flash
│
├─ Complex refactor / architecture / hard bug
│  → Frontier: Opus, GPT-5.4, Gemini Pro
│
├─ Analyzing huge codebase or logs
│  → Gemini Pro (2M) or Opus (1M context)
│
├─ Maximum correctness needed
│  → Opus 4.6 or GPT-5.4
│
└─ Privacy / offline / self-hosted
   → DeepSeek, Llama 4, Qwen3
3b / 12 — Models
AI-Assisted Development

Benchmarks (March 2026) — SWE-bench Verified

Model Score
Claude Opus 4.6 80.8%
Claude Sonnet 4.6 79.6%
Gemini 3 Flash 78.0%
Gemini 3 Pro 76.2%
GPT-5.4 ~75%
DeepSeek V3.2-Speciale ~74%

Benchmarks ≠ real-world. Test models on your code before deciding.

3b / 12 — Models
AI-Assisted Development

Practical Recommendations

For most developers:

  • Default: Sonnet 4.6 or GPT-5.4-mini
  • Complex tasks: Opus 4.6 or GPT-5.4
  • Completions: Haiku 4.5 or nano
  • Try Gemini Flash — fast, cheap, capable

For compliance teams:

  • DeepSeek / Llama 4 on-premise via Ollama
  • Ollama works with almost any IDE — no special plugin needed

For budget-conscious:

  • Gemini Flash ($0.50/MTok) — best value
  • Batch API — 50% discount on all providers
  • Local models for completions (free)
3b / 12 — Models
AI-Assisted Development

Use the smallest model that gets the job done.

Upgrade only when you need to.

3b / 12 — Models
AI-Assisted Development

Summary

  • Four families: Anthropic, OpenAI, Google, open-source
  • Three tiers: frontier (accuracy), mid (daily default), small (speed & cost)
  • Sonnet / GPT-5.4-mini / Gemini Flash cover 80% of tasks
  • Reach for Opus 4.6 / GPT-5.4 when mistakes are expensive
  • Open-source is viable for privacy, compliance, and cost at scale
  • Configure your tools to use different models for different tasks