zap runs a frontier model for design and a local small model for the actual coding — and it proposes which model fits a task, then waits for you to confirm. No silent downgrades, no cloud round-trip for the bulk of the edits.
Validated on qwen3-coder-30b, devstral-small-2, gemma-4-e4b, glm-4.7-flash-reap, and more — through the real zap TUI, objective verification, 191 passing tests. As of v0.15.18.
Frontier models are the right tool for deciding what to change. They're an expensive, slow, privacy-leaking tool for mechanically applying a change you've already specified. That mechanical majority of agentic coding can run on a 14B model on your laptop.
The frontier model is only paid for the thinking — add a field, rename a symbol, wire a route. The bulk of the edits run locally with no per-token billing.
The bulk of your code never leaves your laptop. Only the task plan (a few hundred tokens) goes to the cloud. The actual code changes happen locally.
No network hop for the execution half. Local inference on scoped tasks is fast — a 4-step structured plan completes in under 5 minutes end-to-end.
Once the plan is written, execution works on a plane or air-gapped network. The local model handles the entire execution phase without internet access.
zap never silently changes models. It proposes and waits — the frontier model writes a step-by-step plan with exact code snippets, then recommends which local model should execute it. Nothing runs until you press a key.
The optimal SLM workflow: a frontier model pre-writes a step-by-step plan with exact code snippets, and the SLM executes it mechanically — one step, one verification, one result. No improvisation, no dead ends.
| Open-ended goal | Structured plan | |
|---|---|---|
| Success rate | 50% per attempt | 100% first attempt |
| Wall-clock | 305–905s | 294s |
| Model turns | 4–8 | 2 |
| Failure mode | Wrong conditional → 13 min debugging | None — plan is unambiguous |
Test 6: Add a GET /todos REST endpoint to a Node.js Express app. The plan had 4 numbered steps with exact code in fenced blocks and a verification command per step. The model followed every step, made two correct edits in one turn, and verification passed.
Four features that make the difference between an SLM that spins for 20 minutes and one that ships correct code in 2 turns.
zap --index-only builds an AST index (tree-sitter + SQLite) so the SLM navigates with find_definition and find_references instead of reading dozens of files. For a 300-file project, this is the difference between a 2-turn execution and a 12-turn goose chase.
AGENT_TOOL_PROFILE=core sends only 6 tool schemas (file ops, shell, search). Small models struggle with large tool schemas — this profile makes tool-calling tractable without sacrificing capability.
Local models prefill big prompts for minutes. zap tolerates that: 1-hour streaming cap, idle detection at 10 minutes, and "waiting for first token" progress notices every 30 seconds. Backoff on stream drops prevents stacked prefills from grinding your machine.
If the SLM gets stuck in a verification loop (3 consecutive failed shell runs), zap injects a rethink nudge. At 6 failures, tools are withdrawn and the model must produce a structured handoff summary. No other local-first agent has this — OpenHands' StuckDetector only catches identical repeated actions.
Backed by a full multi-turn agentic eval — 3 realistic tasks, objective verification, through the real zap TUI.
| Model | Size | Score | Verdict |
|---|---|---|---|
qwen3-coder-30b |
30B (MoE, ~3.3B active) | 3/3 | Default executor pick — RL-tuned for the agent loop |
devstral-small-2 |
24B | 3/3 | Co-built with OpenHands scaffold; great alt |
gemma-4-e4b |
4B | 3/3 | Punches far above its size on scoped tasks |
glm-4.7-flash-reap |
23B (MoE) | 2/3 | Solid on fixes/renames, weaker on multi-step |
qwen2.5-coder-14b |
14B | 1/3 | ⚠️ Completion model — avoid for executor role |
Key lesson: model class beats model size. Pick an agentic-tuned model (Devstral, Qwen3-Coder) for the executor role — not a generic "coder" completion model. Scoped single-file tasks passed on every model tested.
All tests run through the real zap TUI with qwen3-coder-30b via LM Studio on a 32 GB Apple M-series machine. Full evidence, scripts, and raw logs in the research repo.
| Test | Scenario | Result | Time |
|---|---|---|---|
| Test 3 | Frontier-decomposed task (tool validation) | ✅ PASS | 214s |
| Test 4 r1 | Goal-level task (8 behaviors) | ❌ 7/8 | 905s |
| Test 4 r2 | Same goal, retry | ✅ 8/8 | 305s |
| Test 5 | Impossible task — escalation drill | ⚠️ partial | 10–45s |
| Test 6 | Structured plan execution | ✅ PASS | 294s |
Bottom line: Give an SLM a scoped, pre-written plan with exact verification steps → it executes reliably and fast. Give it an open-ended goal with ambiguous constraints → success is model and run dependent, but zap's watchdog bounds the failure cost to minutes. The recommended production workflow is the structured plan pattern (Test 6).
Single binary, no runtime. Cold start in milliseconds.
Works with LM Studio, Ollama, or any OpenAI-compatible endpoint. Set role = "local-coder" to enable propose-and-confirm routing.
The index gives the SLM find_definition, find_references, code_map, and who_calls — it navigates directly to symbols instead of reading dozens of files.
Full docs: SLM Support Guide ↗ · Research: SLM Coding Eval ↗
Open source, MIT licensed. Zero telemetry. No cloud for the bulk of your edits.