Shipwright Harness GitHub ↗

Built on Claude Code

The open-source autonomous delivery agent for Claude Code.

Two ways to run it

One system. A plugin and a cloud agent.

The same autonomous coding system, two ways to put it to work — drive it yourself from Claude Code, or deploy it to your cloud and let it ship on its own.

The plugin

In Claude Code

01

Drive it at your prompt

Install it into Claude Code and run the full loop — spec → plan → build → review → deploy — on any repository. It writes the feature and its tests, opens the PR, and ships, autonomously, with you in the loop where it counts.

$ /shipwright:dev-task
→ build · test · review · ship
✓ PR #128 opened
  • End to end — planning through deployment
  • Repo-agnostic — Node, Go, Rust, Python, Ruby
  • Driven by /shipwright:* commands

The cloud agent

In your cloud

02

Talk to it in Slack. It ships on a schedule.

Deploy the same system to your cloud as an agent you reach in Slack — DM it, @mention it in a thread, drop a voice note. It runs on a pool of scheduled jobs, picking up ready work and shipping PRs on its own — coding while you're away, at the same review and test bar.

you

@shipwright take the next ready task

Shipped PR #128 — reviewed, tests green, deployed.

  • Slack-native — DMs, mentions, voice notes, reactions
  • Cron-driven — two jobs on by default, ten in the pool
  • Deploys on Docker and Kubernetes

Either way, the same four-stage engine ships it →

Ships on a schedule

Ten cron jobs, ready out of the box.

Every cloud agent is seeded with ten scheduled jobs. Two run from day one; the rest are a single toggle away. Each guards itself with a pre-check, so it only spends a turn when there's real work to do.

dev-task

Picks the next ready task, builds it with tests, opens a PR.

every 30 min

On
review-patch

Reviews open PRs and patches the ones failing CI or review.

every 30 min

On
review

Review-only pass over open PRs.

every 30 min

Opt-in
patch

Fixes failing CI and unresolved review findings.

every 30 min

Opt-in
deploy

Merges approved PRs and deploys them.

every 30 min

Opt-in
test-readiness

Audits test coverage and publishes the report.

daily · 6am

Opt-in
docs-freshness

Refreshes docs that drifted from the code.

daily · 7am

Opt-in
learn-dream

Mines merged PRs for durable learnings.

daily · 3am

Opt-in
dependabot-triage

Reviews and triages Dependabot PRs.

daily · 8am

Opt-in
entropy-patrol

Scans for code entropy and fixes what's PR-worthy.

weekly · Mon

Opt-in

Measured, not vibes

A dashboard that grades the work.

Every task emits telemetry — first-time-quality, estimation accuracy, review verdicts, cycle time, and token cost — to a dashboard you host yourself. PostHog, Postgres, or a local SQLite store; no data leaves your control.

Illustrative figures. Run task api to see the live dashboard on your own data.

A run, end to end

Watch it ship — from your terminal or from Slack.

The plugin runs the loop from the command line. The cloud agent does the same from a Slack thread — you ask, it builds, reviews, and ships.

From the command line

Shipwright Harness in a Claude Code terminal: install, then plan, build, review, and ship a task end to end.
$ /plugin install shipwright@app-vitals/shipwright
✓ shipwright installed

$ /shipwright:dev-task
→ picking next ready task … SWW-2.2 (frontend)
  branch feat/sww-2-2-body-sections
  writing tests → 7 specs (e2e)
  implementing … 5 sections
  ✓ playwright   15 passed
  ✓ PR #102 opened

$ /shipwright:review
  deep single-pass review … 0 blocking findings
  verdict: APPROVE

$ /shipwright:deploy
  merge --squash … ✓ merged
  deploy … ✓ live   metrics → forwarded

From Slack

Inside /shipwright:dev-task

From a ready task to a green PR.

One command runs the whole sequence — pick a task, write the tests, build, simplify, verify, and open a reviewed, green pull request.

Enforced order: tests before code · simplify after green · spec verified before the PR · CI must pass.

  1. 01

    Detect the toolchain

    Scan the repo's build config and extract the test, lint, typecheck, and validate commands to use later.

  2. 02

    Pick the next task

    Resume an in-progress task, or pull the next ready item from the queue and validate its fields.

  3. 03

    Mark in-progress

    Clean up any orphaned branches or PRs from a prior run, then flip the task's status label.

  4. 04

    Build the brief

    Assemble a spec prompt from the task's title, description, acceptance criteria, and layer.

  5. 05

    Set up a worktree

    Create or reuse an isolated git worktree on the task's branch — never on main.

  6. 06

    Tests first, then code

    Write failing tests, make them pass, then refactor. No production code before a failing test exists.

  7. 07

    Simplify

    Review the diff for duplication, dead code, naming, and needless complexity — and fix it.

  8. 08

    Verify the spec

    An independent subagent checks every acceptance criterion against the diff and auto-fixes gaps.

  9. 09

    Grade requirements

    Score each criterion met / partial / not-met, and block the task if anything is unmet after fixes.

  10. 10

    Pre-ship checks

    Run validate, lint, test, and typecheck, and report the coverage delta against the threshold.

  11. 11

    Refresh the docs

    A docs agent updates any docs the change made stale, in a separate commit on the branch.

  12. 12

    Push & open the PR

    Push the branch and open a pull request — or add commits to the existing one.

  13. 13

    Watch CI, fix failures

    Poll CI, collect failure logs, and retry fixes up to six times before blocking the task.

  14. 14

    Record metrics & hand off

    Move the task to pr_open, append the metrics line, and print a handoff summary.

How it works

The same four stages, however you run it.

Plugin or cloud agent, the work runs through the same four stages. The deployable agent picks up each ready task and drives it end to end — with you in the loop only where it counts.

01 Stage

Plan

Read the spec, explore the codebase, and emit a sequenced task queue with estimates.

02 Stage

Build

Pick the next ready task, branch, write the feature and its tests at the correct layer.

03 Stage

Review

Deep single-pass review with inline findings and a recorded verdict before anything merges.

04 Stage

Ship

Merge the green PR, deploy, and forward the metrics that close the feedback loop.

Why Shipwright Harness

Built on Claude Code. Owned by you.

It runs on Claude Code — the platform it's built for — and stays entirely in your hands: open-source, self-hosted, and measured.

Own it

Free & open-source (MIT)

No tiers, no seats, no data leaving your control. Clone it, run it in your own cloud, fork it if you want to.

Test-readiness

Tests land with the code

Every task ships its tests at the correct layer — unit, integration, smoke, or e2e — in the same PR. No 'tests later'.

Metric-first

Measured, not vibes

First-time-quality, estimation accuracy, and review verdicts are tracked per task so the pipeline gets honestly better.

Repo-agnostic

Runs on your stack

Node, Rust, Go, Python, Ruby, Make — the plugin drives any repository it's pointed at, not a blessed template.

Built on Claude Code, free and open-source under the MIT license — you own it and run it in your own cloud.

Get started

Star it, install it, ship with it.

On GitHub

Read the source, file an issue, or star the repo to follow along as it grows.

Star on GitHub
Install

Drop it into Claude Code and point it at any repository.

/plugin install shipwright@app-vitals/shipwright

Want a walkthrough on your own codebase? Book a discovery call.

Work with us

Work with the people who built it.

Shipwright Harness is yours to run — free, open-source, and self-hosted, always. If you'd rather have a hand standing it up on your own codebase, the people who build it can help you get there faster. No commitment — just a conversation about your pipeline.