Independent Engineering Review

Independently reviewed.
8.5 / 10.

A full source-code review of zap by Claude Opus 4.8 — not a marketing claim. Every score is backed by code that was read and a build + test suite that was run.

8.5/10
overall engineering score
168
tests passing · 0 failing
0
compiler warnings

Reviewer: Claude Opus 4.8 · Method: read of src/** (~29k LOC) + live cargo build --release and cargo test · June 2026

The verdict

zap is a genuinely well-engineered coding agent — clean architecture, the right abstractions, a green build, and real test discipline. It is now in the same conversation as the major agents on harness quality, and it ships three capabilities the majors do not.

Why this matters for adoption: the review was deliberately barred from trusting any of the project's own marketing documents — it scored the source code only. The two structural gaps from the first review (no automated tests on the agent loop; no way to measure quality) were closed and re-verified. This is an evidence-based assessment, suitable for technical due diligence.

Capabilities unique to zap

These are the things the review found in zap's source that Claude Code, Gemini CLI, and OpenCode do not provide. This is where zap is ahead — not on a feature checklist, but on architecture.

🕸️

Persistent code graph

A tree-sitter + SQLite index of the whole repo — symbols, function call-sites, imports, and a type hierarchy, with PageRank-ranked files. The model answers "who calls X" or "what breaks if I rename Y" in one query, not ten greps.

No other major agent maintains a queryable code graph. Claude Code and Gemini CLI re-grep every time.

🎯

Skill-first context injection

Per-turn, zap injects only the skills your message triggers, ranked and capped to a token budget. A Rust task gets Rust conventions; a greeting costs ~31 tokens instead of a 2,000-token wall sent on every turn.

Others send the same static system prompt regardless of task. This is zap's signature design.

🔌

Token-smart MCP loading

MCP servers stay pending at startup. Instead of dumping every server's tool schemas into every request, zap injects one lightweight connector and loads a server's tools only when the model decides it needs them.

A genuinely different, more economical MCP integration than the eager-load model others use.

🔒

Secret pre-flight + sandbox

Content is scanned for ~25 credential patterns before it ever leaves for a cloud LLM, and shell execution has real isolation modes (off / workdir / container) with a documented threat model.

An enterprise-grade data-egress + execution posture built into the agent, not bolted on.

🏢

Any provider · local · air-gapped

Anthropic, any OpenAI-compatible endpoint, Gemini, DeepSeek, local models (LM Studio / Ollama), corporate gateways, and gcloud ADC — switchable mid-session. Runs fully offline against a local model.

Unlike Claude Code (Anthropic-centric) or Gemini CLI (Gemini-only) — zap fits a regulated, gateway-bound enterprise.

Single Rust binary

One 26 MB static binary. No Node.js, no Python venv, no Docker to install or audit. Cold-starts in milliseconds and dramatically shrinks the supply-chain surface compared to npm-based agents.

Smaller attack surface and zero runtime dependencies — a real security and IT-approval advantage.

How zap compares

Harness capabilities side by side. zap's edge is concentrated in code understanding, context efficiency, and enterprise/security fit.

Capability Claude Code Gemini CLI OpenCode zap
Persistent queryable code graph ✅ tree-sitter + SQLite
Per-task skill injection ❌ static prompt ❌ static prompt ❌ static prompt ✅ token-budgeted
Token-smart / lazy MCP loading eager eager eager ✅ on-demand
Any provider (incl. local / air-gapped) Anthropic-centric Gemini only
Secret pre-flight scan before cloud send
Shell sandbox modes permission-based partial partial ✅ off/workdir/container
Single binary, no runtime needs Node needs Node ✅ (Go) ✅ (Rust)
Full prompt caching N/A provider-dependent

Comparison reflects the review's findings as of June 2026. Claude Code remains the most battle-tested harness overall; zap's advantage is architectural — capabilities the others do not implement.

Scorecard

How the review rated each dimension — and how it improved after the gap-closing work was completed and re-verified.

DimensionFirst passCurrent
Architecture & abstractions99
Code understanding (graph)99
Context efficiency / caching69
Testing & verifiability48
Reliability (crash-safety)57
Safety / sandboxing58
Correctness (edit tools)78
Overall7.08.5
Verified, not claimed. At review time: cargo build --release completed clean with zero warnings, and cargo test reported 168 passed, 0 failed — including new deterministic tests on the core agent loop that did not exist in the first pass.
The remaining 15% is not about coding ability. Architecture and code-understanding already score 9/10, and zap's actual coding output is bounded by whichever model you point it at — the harness is built to make the most of that, not waste it. The gap to a perfect score is purely operational maturity: recording an eval baseline and accumulating real-world mileage. Nothing in it reflects a weakness in how zap reads code, edits files, navigates the repo, or reasons about a task.

Security review

Beyond the engineering review, zap is undergoing an independent security assessment.

🛡️

Independent security review — by Mythos In progress

A dedicated security assessment of zap's data-egress controls, shell sandboxing, supply-chain surface, and credential handling is being conducted by Mythos. The report will be linked here when complete.

Read zap's security model & threat model →

The full reports

Everything above is drawn from these source-verified documents.

Try the agent that earned the marks

Open source, MIT licensed. Zero telemetry. Single binary.