Zap — Independent Engineering Review (Claude Opus 4.8)

The verdict

zap is a genuinely well-engineered coding agent — clean architecture, the right abstractions, a green build, and real test discipline. It is now in the same conversation as the major agents on harness quality, and it ships three capabilities the majors do not.

Why this matters for adoption: the review was deliberately barred from trusting any of the project's own marketing documents — it scored the source code only. The two structural gaps from the first review (no automated tests on the agent loop; no way to measure quality) were closed and re-verified. This is an evidence-based assessment, suitable for technical due diligence.

Capabilities unique to zap

These are the things the review found in zap's source that Claude Code, Gemini CLI, and OpenCode do not provide. This is where zap is ahead — not on a feature checklist, but on architecture.

🕸️

Persistent code graph

A tree-sitter + SQLite index of the whole repo — symbols, function call-sites, imports, and a type hierarchy, with PageRank-ranked files. The model answers "who calls X" or "what breaks if I rename Y" in one query, not ten greps.

No other major agent maintains a queryable code graph. Claude Code and Gemini CLI re-grep every time.

🎯

Skill-first context injection

Per-turn, zap injects only the skills your message triggers, ranked and capped to a token budget. A Rust task gets Rust conventions; a greeting costs ~31 tokens instead of a 2,000-token wall sent on every turn.

Others send the same static system prompt regardless of task. This is zap's signature design.

🔌

Token-smart MCP loading

MCP servers stay pending at startup. Instead of dumping every server's tool schemas into every request, zap injects one lightweight connector and loads a server's tools only when the model decides it needs them.

A genuinely different, more economical MCP integration than the eager-load model others use.

🔒

Secret pre-flight + sandbox

Content is scanned for ~25 credential patterns before it ever leaves for a cloud LLM, and shell execution has real isolation modes (off / workdir / container) with a documented threat model.

An enterprise-grade data-egress + execution posture built into the agent, not bolted on.

🏢

Any provider · local · air-gapped

Anthropic, any OpenAI-compatible endpoint, Gemini, DeepSeek, local models (LM Studio / Ollama), corporate gateways, and gcloud ADC — switchable mid-session. Runs fully offline against a local model.

Unlike Claude Code (Anthropic-centric) or Gemini CLI (Gemini-only) — zap fits a regulated, gateway-bound enterprise.

⚡

Single Rust binary

One 26 MB static binary. No Node.js, no Python venv, no Docker to install or audit. Cold-starts in milliseconds and dramatically shrinks the supply-chain surface compared to npm-based agents.

Smaller attack surface and zero runtime dependencies — a real security and IT-approval advantage.

How zap compares

Harness capabilities side by side. zap's edge is concentrated in code understanding, context efficiency, and enterprise/security fit.

Capability	Claude Code	Gemini CLI	OpenCode	zap
Persistent queryable code graph	❌	❌	❌	✅ tree-sitter + SQLite
Per-task skill injection	❌ static prompt	❌ static prompt	❌ static prompt	✅ token-budgeted
Token-smart / lazy MCP loading	eager	eager	eager	✅ on-demand
Any provider (incl. local / air-gapped)	Anthropic-centric	Gemini only	✅	✅
Secret pre-flight scan before cloud send	❌	❌	❌	✅
Shell sandbox modes	permission-based	partial	partial	✅ off/workdir/container
Single binary, no runtime	needs Node	needs Node	✅ (Go)	✅ (Rust)
Full prompt caching	✅	N/A	provider-dependent	✅

Comparison reflects the review's findings as of June 2026. Claude Code remains the most battle-tested harness overall; zap's advantage is architectural — capabilities the others do not implement.

Scorecard

How the review rated each dimension — and how it improved after the gap-closing work was completed and re-verified.

Dimension	First pass	Current
Architecture & abstractions	9	9
Code understanding (graph)	9	9
Context efficiency / caching	6	9
Testing & verifiability	4	8
Reliability (crash-safety)	5	7
Safety / sandboxing	5	8
Correctness (edit tools)	7	8
Overall	7.0	8.5

Verified, not claimed. At review time: cargo build --release completed clean with zero warnings, and cargo test reported 168 passed, 0 failed — including new deterministic tests on the core agent loop that did not exist in the first pass.

The remaining 15% is not about coding ability. Architecture and code-understanding already score 9/10, and zap's actual coding output is bounded by whichever model you point it at — the harness is built to make the most of that, not waste it. The gap to a perfect score is purely operational maturity: recording an eval baseline and accumulating real-world mileage. Nothing in it reflects a weakness in how zap reads code, edits files, navigates the repo, or reasons about a task.

Security review

Beyond the engineering review, zap is undergoing an independent security assessment.

🛡️

Independent security review — by Mythos In progress

A dedicated security assessment of zap's data-egress controls, shell sandboxing, supply-chain surface, and credential handling is being conducted by Mythos. The report will be linked here when complete.

Read zap's security model & threat model →

The full reports

Everything above is drawn from these source-verified documents.

Engineering Review (v2) ↗ Engineering Review (v1) ↗ Security Model ↗

Independently reviewed.
8.5 / 10.

The verdict

Capabilities unique to zap

Persistent code graph

Skill-first context injection

Token-smart MCP loading

Secret pre-flight + sandbox

Any provider · local · air-gapped

Single Rust binary

How zap compares

Scorecard

Security review

Independent security review — by Mythos In progress

The full reports

Try the agent that earned the marks

Independently reviewed.8.5 / 10.

The verdict

Capabilities unique to zap

Persistent code graph

Skill-first context injection

Token-smart MCP loading

Secret pre-flight + sandbox

Any provider · local · air-gapped

Single Rust binary

How zap compares

Scorecard

Security review

Independent security review — by Mythos In progress

The full reports

Try the agent that earned the marks

Independently reviewed.
8.5 / 10.