Built on Claude Code
The open-source autonomous delivery agent for Claude Code.
Two ways to run it
One system. A plugin and a cloud agent.
The same autonomous coding system, two ways to put it to work — drive it yourself from Claude Code, or deploy it to your cloud and let it ship on its own.
The plugin
In Claude Code
Drive it at your prompt
Install it into Claude Code and run the full loop — spec → plan → build → review → deploy — on any repository. It writes the feature and its tests, opens the PR, and ships, autonomously, with you in the loop where it counts.
$ /shipwright:dev-task
→ build · test · review · ship
✓ PR #128 opened - End to end — planning through deployment
- Repo-agnostic — Node, Go, Rust, Python, Ruby
- Driven by /shipwright:* commands
The cloud agent
In your cloud
Talk to it in Slack. It ships on a schedule.
Deploy the same system to your cloud as an agent you reach in Slack — DM it, @mention it in a thread, drop a voice note. It runs on a pool of scheduled jobs, picking up ready work and shipping PRs on its own — coding while you're away, at the same review and test bar.
@shipwright take the next ready task
✓ Shipped PR #128 — reviewed, tests green, deployed.
- Slack-native — DMs, mentions, voice notes, reactions
- Cron-driven — two jobs on by default, ten in the pool
- Deploys on Docker and Kubernetes
Ships on a schedule
Ten cron jobs, ready out of the box.
Every cloud agent is seeded with ten scheduled jobs. Two run from day one; the rest are a single toggle away. Each guards itself with a pre-check, so it only spends a turn when there's real work to do.
dev-task Picks the next ready task, builds it with tests, opens a PR.
every 30 min
review-patch Reviews open PRs and patches the ones failing CI or review.
every 30 min
review Review-only pass over open PRs.
every 30 min
patch Fixes failing CI and unresolved review findings.
every 30 min
deploy Merges approved PRs and deploys them.
every 30 min
test-readiness Audits test coverage and publishes the report.
daily · 6am
docs-freshness Refreshes docs that drifted from the code.
daily · 7am
learn-dream Mines merged PRs for durable learnings.
daily · 3am
dependabot-triage Reviews and triages Dependabot PRs.
daily · 8am
entropy-patrol Scans for code entropy and fixes what's PR-worthy.
weekly · Mon
Measured, not vibes
A dashboard that grades the work.
Every task emits telemetry — first-time-quality, estimation accuracy, review verdicts, cycle time, and token cost — to a dashboard you host yourself. PostHog, Postgres, or a local SQLite store; no data leaves your control.
87%
First-time quality
merged without rework
1.4d
Avg cycle time
task → shipped
92%
Estimation accuracy
planned vs. actual
48
Tasks shipped
last 30 days
Weekly throughput
▲ trending up
Illustrative figures. Run task api to see the
live dashboard on your own data.
A run, end to end
Watch it ship — from your terminal or from Slack.
The plugin runs the loop from the command line. The cloud agent does the same from a Slack thread — you ask, it builds, reviews, and ships.
From the command line
$ /plugin install shipwright@app-vitals/shipwright
✓ shipwright installed
$ /shipwright:dev-task
→ picking next ready task … SWW-2.2 (frontend)
branch feat/sww-2-2-body-sections
writing tests → 7 specs (e2e)
implementing … 5 sections
✓ playwright 15 passed
✓ PR #102 opened
$ /shipwright:review
deep single-pass review … 0 blocking findings
verdict: APPROVE
$ /shipwright:deploy
merge --squash … ✓ merged
deploy … ✓ live metrics → forwarded From Slack
you 9:14
@shipwright take the next ready task
Shipwright 9:14
On it — picking up SWW-2.4. Branching, writing tests…
Shipwright 9:16
Tests green · review approved · 0 blocking findings
Shipwright 9:17
✓ Shipped PR #128 — merged & deployed. Metrics forwarded.
👍 1Inside /shipwright:dev-task
From a ready task to a green PR.
One command runs the whole sequence — pick a task, write the tests, build, simplify, verify, and open a reviewed, green pull request.
Enforced order: tests before code · simplify after green · spec verified before the PR · CI must pass.
- 01
Detect the toolchain
Scan the repo's build config and extract the test, lint, typecheck, and validate commands to use later.
- 02
Pick the next task
Resume an in-progress task, or pull the next ready item from the queue and validate its fields.
- 03
Mark in-progress
Clean up any orphaned branches or PRs from a prior run, then flip the task's status label.
- 04
Build the brief
Assemble a spec prompt from the task's title, description, acceptance criteria, and layer.
- 05
Set up a worktree
Create or reuse an isolated git worktree on the task's branch — never on main.
- 06
Tests first, then code
Write failing tests, make them pass, then refactor. No production code before a failing test exists.
- 07
Simplify
Review the diff for duplication, dead code, naming, and needless complexity — and fix it.
- 08
Verify the spec
An independent subagent checks every acceptance criterion against the diff and auto-fixes gaps.
- 09
Grade requirements
Score each criterion met / partial / not-met, and block the task if anything is unmet after fixes.
- 10
Pre-ship checks
Run validate, lint, test, and typecheck, and report the coverage delta against the threshold.
- 11
Refresh the docs
A docs agent updates any docs the change made stale, in a separate commit on the branch.
- 12
Push & open the PR
Push the branch and open a pull request — or add commits to the existing one.
- 13
Watch CI, fix failures
Poll CI, collect failure logs, and retry fixes up to six times before blocking the task.
- 14
Record metrics & hand off
Move the task to pr_open, append the metrics line, and print a handoff summary.
How it works
The same four stages, however you run it.
Plugin or cloud agent, the work runs through the same four stages. The deployable agent picks up each ready task and drives it end to end — with you in the loop only where it counts.
Plan
Read the spec, explore the codebase, and emit a sequenced task queue with estimates.
Build
Pick the next ready task, branch, write the feature and its tests at the correct layer.
Review
Deep single-pass review with inline findings and a recorded verdict before anything merges.
Ship
Merge the green PR, deploy, and forward the metrics that close the feedback loop.
Why Shipwright Harness
Built on Claude Code. Owned by you.
It runs on Claude Code — the platform it's built for — and stays entirely in your hands: open-source, self-hosted, and measured.
Free & open-source (MIT)
No tiers, no seats, no data leaving your control. Clone it, run it in your own cloud, fork it if you want to.
Tests land with the code
Every task ships its tests at the correct layer — unit, integration, smoke, or e2e — in the same PR. No 'tests later'.
Measured, not vibes
First-time-quality, estimation accuracy, and review verdicts are tracked per task so the pipeline gets honestly better.
Runs on your stack
Node, Rust, Go, Python, Ruby, Make — the plugin drives any repository it's pointed at, not a blessed template.
Built on Claude Code, free and open-source under the MIT license — you own it and run it in your own cloud.
Work with us
Work with the people who built it.
Shipwright Harness is yours to run — free, open-source, and self-hosted, always. If you'd rather have a hand standing it up on your own codebase, the people who build it can help you get there faster. No commitment — just a conversation about your pipeline.
Get started
Star it, install it, ship with it.
Read the source, file an issue, or star the repo to follow along as it grows.
Star on GitHubDrop it into Claude Code and point it at any repository.
Want a walkthrough on your own codebase? Book a discovery call.