Built on Claude Code

The open-source autonomous delivery agent for Claude Code.

Two ways to run it Inside a task Ships on a schedule Metrics Demo

Two ways to run it

One system. A plugin and a cloud agent.

The same autonomous coding system, two ways to put it to work — drive it yourself from Claude Code, or deploy it to your cloud and let it ship on its own.

The plugin

In Claude Code

Drive it at your prompt

Install it into Claude Code and run the full loop — spec → plan → build → review → deploy — on any repository. It writes the feature and its tests, opens the PR, and ships, autonomously, with you in the loop where it counts.

$ /shipwright:dev-task
→ build · test · review · ship
✓ PR #128 opened

End to end — planning through deployment
Repo-agnostic — Node, Go, Rust, Python, Ruby
Driven by /shipwright:* commands

The cloud agent

In your cloud

Talk to it in Slack. It ships on a schedule.

Deploy the same system to your cloud as an agent you reach in Slack — DM it, @mention it in a thread, drop a voice note. It runs on a pool of scheduled jobs, picking up ready work and shipping PRs on its own — coding while you're away, at the same review and test bar.

you

@shipwright take the next ready task

✓ Shipped PR #128 — reviewed, tests green, deployed.

Slack-native — DMs, mentions, voice notes, reactions
Cron-driven — two jobs on by default, ten in the pool
Deploys on Docker and Kubernetes

Either way, the same four-stage engine ships it →

Ships on a schedule

Ten cron jobs, ready out of the box.

Every cloud agent is seeded with ten scheduled jobs. Two run from day one; the rest are a single toggle away. Each guards itself with a pre-check, so it only spends a turn when there's real work to do.

dev-task

Picks the next ready task, builds it with tests, opens a PR.

every 30 min

review-patch

Reviews open PRs and patches the ones failing CI or review.

every 30 min

review

Review-only pass over open PRs.

every 30 min

Opt-in

patch

Fixes failing CI and unresolved review findings.

every 30 min

Opt-in

deploy

Merges approved PRs and deploys them.

every 30 min

Opt-in

test-readiness

Audits test coverage and publishes the report.

daily · 6am

Opt-in

docs-freshness

Refreshes docs that drifted from the code.

daily · 7am

Opt-in

learn-dream

Mines merged PRs for durable learnings.

daily · 3am

Opt-in

dependabot-triage

Reviews and triages Dependabot PRs.

daily · 8am

Opt-in

entropy-patrol

Scans for code entropy and fixes what's PR-worthy.

weekly · Mon

Opt-in

Measured, not vibes

A dashboard that grades the work.

Every task emits telemetry — first-time-quality, estimation accuracy, review verdicts, cycle time, and token cost — to a dashboard you host yourself. PostHog, Postgres, or a local SQLite store; no data leaves your control.

shipwright metrics · last 30 days

87%

First-time quality

merged without rework

1.4d

Avg cycle time

task → shipped

92%

Estimation accuracy

planned vs. actual

Tasks shipped

last 30 days

Weekly throughput

▲ trending up

Illustrative figures. Run task api to see the live dashboard on your own data.

A run, end to end

Watch it ship — from your terminal or from Slack.

The plugin runs the loop from the command line. The cloud agent does the same from a Slack thread — you ask, it builds, reviews, and ships.

From the command line

Shipwright Harness in a Claude Code terminal: install, then plan, build, review, and ship a task end to end.

$ /plugin install shipwright@app-vitals/shipwright
✓ shipwright installed

$ /shipwright:dev-task
→ picking next ready task … SWW-2.2 (frontend)
  branch feat/sww-2-2-body-sections
  writing tests → 7 specs (e2e)
  implementing … 5 sections
  ✓ playwright   15 passed
  ✓ PR #102 opened

$ /shipwright:review
  deep single-pass review … 0 blocking findings
  verdict: APPROVE

$ /shipwright:deploy
  merge --squash … ✓ merged
  deploy … ✓ live   metrics → forwarded

From Slack

shipwright APP #engineering

you

you 9:14

@shipwright take the next ready task

Shipwright 9:14

On it — picking up SWW-2.4. Branching, writing tests…

Shipwright 9:16

Tests green · review approved · 0 blocking findings

Shipwright 9:17

✓ Shipped PR #128 — merged & deployed. Metrics forwarded.

👍 1

Inside /shipwright:dev-task

From a ready task to a green PR.

One command runs the whole sequence — pick a task, write the tests, build, simplify, verify, and open a reviewed, green pull request.

Enforced order: tests before code · simplify after green · spec verified before the PR · CI must pass.

01

Detect the toolchain

Scan the repo's build config and extract the test, lint, typecheck, and validate commands to use later.
02

Pick the next task

Resume an in-progress task, or pull the next ready item from the queue and validate its fields.
03

Mark in-progress

Clean up any orphaned branches or PRs from a prior run, then flip the task's status label.
04

Build the brief

Assemble a spec prompt from the task's title, description, acceptance criteria, and layer.
05

Set up a worktree

Create or reuse an isolated git worktree on the task's branch — never on main.
06

Tests first, then code

Write failing tests, make them pass, then refactor. No production code before a failing test exists.
07

Simplify

Review the diff for duplication, dead code, naming, and needless complexity — and fix it.
08

Verify the spec

An independent subagent checks every acceptance criterion against the diff and auto-fixes gaps.
09

Grade requirements

Score each criterion met / partial / not-met, and block the task if anything is unmet after fixes.
10

Pre-ship checks

Run validate, lint, test, and typecheck, and report the coverage delta against the threshold.
11

Refresh the docs

A docs agent updates any docs the change made stale, in a separate commit on the branch.
12

Push & open the PR

Push the branch and open a pull request — or add commits to the existing one.
13

Watch CI, fix failures

Poll CI, collect failure logs, and retry fixes up to six times before blocking the task.
14

Record metrics & hand off

Move the task to pr_open, append the metrics line, and print a handoff summary.

How it works

The same four stages, however you run it.

Plugin or cloud agent, the work runs through the same four stages. The deployable agent picks up each ready task and drives it end to end — with you in the loop only where it counts.

01 Stage

Plan

Read the spec, explore the codebase, and emit a sequenced task queue with estimates.

02 Stage

Build

Pick the next ready task, branch, write the feature and its tests at the correct layer.

03 Stage

Review

Deep single-pass review with inline findings and a recorded verdict before anything merges.

04 Stage

Ship

Merge the green PR, deploy, and forward the metrics that close the feedback loop.

Why Shipwright Harness

Built on Claude Code. Owned by you.

It runs on Claude Code — the platform it's built for — and stays entirely in your hands: open-source, self-hosted, and measured.

Own it

Free & open-source (MIT)

No tiers, no seats, no data leaving your control. Clone it, run it in your own cloud, fork it if you want to.

Test-readiness

Tests land with the code

Every task ships its tests at the correct layer — unit, integration, smoke, or e2e — in the same PR. No 'tests later'.

Metric-first

Measured, not vibes

First-time-quality, estimation accuracy, and review verdicts are tracked per task so the pipeline gets honestly better.

Repo-agnostic

Runs on your stack

Node, Rust, Go, Python, Ruby, Make — the plugin drives any repository it's pointed at, not a blessed template.

Built on Claude Code, free and open-source under the MIT license — you own it and run it in your own cloud.

Get started

Star it, install it, ship with it.

On GitHub

Read the source, file an issue, or star the repo to follow along as it grows.

Star on GitHub

Install

Drop it into Claude Code and point it at any repository.

/plugin install shipwright@app-vitals/shipwright

Want a walkthrough on your own codebase? Book a discovery call.

Work with us

Work with the people who built it.

Shipwright Harness is yours to run — free, open-source, and self-hosted, always. If you'd rather have a hand standing it up on your own codebase, the people who build it can help you get there faster. No commitment — just a conversation about your pipeline.

Book a discovery call

The open-source autonomous delivery agent for Claude Code.

One system. A plugin and a cloud agent.

Drive it at your prompt

Talk to it in Slack. It ships on a schedule.

Ten cron jobs, ready out of the box.

A dashboard that grades the work.

Watch it ship — from your terminal or from Slack.

From a ready task to a green PR.

Detect the toolchain

Pick the next task

Mark in-progress

Build the brief

Set up a worktree

Tests first, then code

Simplify

Verify the spec

Grade requirements

Pre-ship checks

Refresh the docs

Push & open the PR

Watch CI, fix failures

Record metrics & hand off

The same four stages, however you run it.

Plan

Build

Review

Ship

Built on Claude Code. Owned by you.

Free & open-source (MIT)

Tests land with the code

Measured, not vibes

Runs on your stack

Star it, install it, ship with it.

Work with the people who built it.