Local Models · Production-Ready

Frontier model designs.
Local model codes.

zap runs a frontier model for design and a local small model for the actual coding — and it proposes which model fits a task, then waits for you to confirm. No silent downgrades, no cloud round-trip for the bulk of the edits.

100%

structured plan success rate

294s

wall-clock per task

5 / 6

models score 80%+ on eval

Validated on qwen3-coder-30b, devstral-small-2, gemma-4-e4b, glm-4.7-flash-reap, and more — through the real zap TUI, objective verification, 191 passing tests. As of v0.15.18.

Why run code generation on your own machine?

Frontier models are the right tool for deciding what to change. They're an expensive, slow, privacy-leaking tool for mechanically applying a change you've already specified. That mechanical majority of agentic coding can run on a 14B model on your laptop.

💸

Near-Zero Cost

The frontier model is only paid for the thinking — add a field, rename a symbol, wire a route. The bulk of the edits run locally with no per-token billing.

🔒

Full Privacy

The bulk of your code never leaves your laptop. Only the task plan (a few hundred tokens) goes to the cloud. The actual code changes happen locally.

⚡

Low Latency

No network hop for the execution half. Local inference on scoped tasks is fast — a 4-step structured plan completes in under 5 minutes end-to-end.

✈️

Works Offline

Once the plan is written, execution works on a plane or air-gapped network. The local model handles the entire execution phase without internet access.

Propose, don't switch. Confirm, don't assume.

zap never silently changes models. It proposes and waits — the frontier model writes a step-by-step plan with exact code snippets, then recommends which local model should execute it. Nothing runs until you press a key.

1. Declare your models

# ~/.agent.toml
[providers.anthropic]
model = "claude-sonnet-4-6"
role  = "frontier"

[providers.lm_studio]
model = "qwen3-coder-30b"
role  = "local-coder"
base_url = "http://localhost:1234/v1/chat/completions"

2. The model proposes

◈ Task 3 — Add created_at field (+4 callers)
  Recommended: qwen3-coder-30b (local)
  reason: mechanical, well-scoped

  [Enter] run on qwen
  [o] keep frontier model
  [m] pick another model
  [s] skip this task

Structured plan execution — 100% success rate

The optimal SLM workflow: a frontier model pre-writes a step-by-step plan with exact code snippets, and the SLM executes it mechanically — one step, one verification, one result. No improvisation, no dead ends.

	Open-ended goal	Structured plan
Success rate	50% per attempt	100% first attempt
Wall-clock	305–905s	294s
Model turns	4–8	2
Failure mode	Wrong conditional → 13 min debugging	None — plan is unambiguous

Test 6: Add a GET /todos REST endpoint to a Node.js Express app. The plan had 4 numbered steps with exact code in fenced blocks and a verification command per step. The model followed every step, made two correct edits in one turn, and verification passed.

# Example plan (frontier model writes this, SLM executes it)
Step 1 — Read the current code (code_map app.js)
Step 2 — Add todo data array (exact code provided)
Step 3 — Add GET /todos route (exact code provided)
Step 4 — Run `node test.js` and confirm "ok"

Purpose-built for small model success

Four features that make the difference between an SLM that spins for 20 minutes and one that ships correct code in 2 turns.

🗺️

Pre-Indexed Code Graph

zap --index-only builds an AST index (tree-sitter + SQLite) so the SLM navigates with find_definition and find_references instead of reading dozens of files. For a 300-file project, this is the difference between a 2-turn execution and a 12-turn goose chase.

🔧

Core Tool Profile

AGENT_TOOL_PROFILE=core sends only 6 tool schemas (file ops, shell, search). Small models struggle with large tool schemas — this profile makes tool-calling tractable without sacrificing capability.

⏳

Streaming Watchdog

Local models prefill big prompts for minutes. zap tolerates that: 1-hour streaming cap, idle detection at 10 minutes, and "waiting for first token" progress notices every 30 seconds. Backoff on stream drops prevents stacked prefills from grinding your machine.

🛡️

Verify-Aware Watchdog

If the SLM gets stuck in a verification loop (3 consecutive failed shell runs), zap injects a rethink nudge. At 6 failures, tools are withdrawn and the model must produce a structured handoff summary. No other local-first agent has this — OpenHands' StuckDetector only catches identical repeated actions.

Recommended local models

Backed by a full multi-turn agentic eval — 3 realistic tasks, objective verification, through the real zap TUI.

Model	Size	Score	Verdict
`qwen3-coder-30b`	30B (MoE, ~3.3B active)	3/3	Default executor pick — RL-tuned for the agent loop
`devstral-small-2`	24B	3/3	Co-built with OpenHands scaffold; great alt
`gemma-4-e4b`	4B	3/3	Punches far above its size on scoped tasks
`glm-4.7-flash-reap`	23B (MoE)	2/3	Solid on fixes/renames, weaker on multi-step
`qwen2.5-coder-14b`	14B	1/3	⚠️ Completion model — avoid for executor role

Key lesson: model class beats model size. Pick an agentic-tuned model (Devstral, Qwen3-Coder) for the executor role — not a generic "coder" completion model. Scoped single-file tasks passed on every model tested.

Validated — through the real TUI

All tests run through the real zap TUI with qwen3-coder-30b via LM Studio on a 32 GB Apple M-series machine. Full evidence, scripts, and raw logs in the research repo.

Test	Scenario	Result	Time
Test 3	Frontier-decomposed task (tool validation)	✅ PASS	214s
Test 4 r1	Goal-level task (8 behaviors)	❌ 7/8	905s
Test 4 r2	Same goal, retry	✅ 8/8	305s
Test 5	Impossible task — escalation drill	⚠️ partial	10–45s
Test 6	Structured plan execution	✅ PASS	294s

Bottom line: Give an SLM a scoped, pre-written plan with exact verification steps → it executes reliably and fast. Give it an open-ended goal with ambiguous constraints → success is model and run dependent, but zap's watchdog bounds the failure cost to minutes. The recommended production workflow is the structured plan pattern (Test 6).

Get started in 3 steps

Install zap

curl -fsSL https://raw.githubusercontent.com/zap-coding-agent/zap-coding-agent/main/install.sh | bash

Single binary, no runtime. Cold start in milliseconds.

Configure your local model

# ~/.agent.toml
[providers.lm_studio]
kind     = "openai"
model    = "qwen3-coder-30b"
role     = "local-coder"
base_url = "http://localhost:1234/v1/chat/completions"

Works with LM Studio, Ollama, or any OpenAI-compatible endpoint. Set role = "local-coder" to enable propose-and-confirm routing.

Pre-index and run

cd your-project
zap --index-only          # builds .zap/code.db
zap                        # start with index available

The index gives the SLM find_definition, find_references, code_map, and who_calls — it navigates directly to symbols instead of reading dozens of files.

Full docs: SLM Support Guide ↗ · Research: SLM Coding Eval ↗

Frontier model designs.Local model codes.