Capsules
Read when:
- turning a failed GitHub Actions run into a repeatable Crabbox debug case;
- deciding whether data belongs in a capsule, checkpoint, image, or cache;
- reviewing the
repo-build-replaycapsule contract.
Capsules are local-first failure replay manifests. A capsule records what failed, how to run the replay, what outcome counts as a reproduction, and the bounded evidence needed to inspect the original failure.
Capsules do not preserve a machine. Environment state belongs to the existing Crabbox primitives:
crabbox image: trusted base runner image for future leases.crabbox checkpoint: explicit prepared machine or workspace state that can becrabbox cache: package/build cache state on a lease.crabbox actions hydrate: repository-owned CI setup on a live lease.crabbox capsule: failure recipe, source evidence, replay oracle, replay
forked later.
history.
The first delivery is intentionally narrow: GitHub Actions failures become small repo-build-replay bundles that Crabbox reruns through crabbox run. There is no coordinator registry, remote storage, automatic workflow parser, emulator class, or training loop in this version.
#Basic Flow
Capture a failed run:
crabbox capsule from-actions https://github.com/example-org/my-app/actions/runs/123 \
--replay 'go test ./...'
Replay it on a normal lease:
crabbox capsule replay capsules/example-org-my-app-actions-123/capsule.yaml --keep
crabbox ssh --id <printed-lease-or-slug>
Mark the local manifest as a regression replay after the failure is useful:
crabbox capsule promote capsules/example-org-my-app-actions-123/capsule.yaml --regression
from-actions records:
- repository, run URL, run id, attempt, workflow name/path, commit SHA, branch,
- failed job and failed step when GitHub exposes them;
- the explicit replay command supplied with
--replay; - bounded failed-step logs when
gh run view --log-failedcan fetch them; - GitHub artifact download references.
event, status, and conclusion;
The explicit --replay command is the v1 contract. Crabbox does not try to reconstruct arbitrary workflow YAML or infer shell snippets from logs.
#With Actions Hydration
Use hydration when the failing CI job depends on repository-owned setup such as service containers, dependency installation, or toolchain bootstrap:
crabbox warmup
crabbox actions hydrate --id blue-lobster
crabbox capsule replay capsules/example-org-my-app-actions-123/capsule.yaml \
--id blue-lobster \
--keep
The hydrate workflow owns CI setup. The capsule owns the replay command and oracle. crabbox run syncs local edits into the hydrated workspace before running the replay.
#With Checkpoints
Use a checkpoint when setup is expensive and should be reused across many replays:
crabbox warmup --provider aws --class beast
crabbox actions hydrate --id blue-lobster
crabbox checkpoint create --id blue-lobster --name ci-go-ready
crabbox checkpoint fork chk_123 --class beast
crabbox capsule replay capsules/example-org-my-app-actions-123/capsule.yaml \
--id purple-whale
The checkpoint preserves the prepared environment. The capsule remains portable: it can be replayed against the original lease, a forked checkpoint, or a fresh lease if the command has enough setup built in.
#Manifest Contract
The manifest keeps the durable parts small and versioned:
capsule_version: 1
class: repo-build-replay
class_version: 0.1.0
source:
kind: github_actions
replay:
command: go test ./...
command_mode: shell
required_quality: semantically_identical
oracle:
type: deterministic_rerun
success_condition: The replay command exits non-zero with the same failure signature.
safety:
action_profile: build_debug_v1
network: repo_default
secrets: denied
extensions:
repo-build-replay:
schema_version: 1
source: github_actions
replay_mode: explicit_command
The core contract is deliberately small. Future replay classes can add their own data under extensions without changing the base manifest.
#Replay Semantics
capsule replay delegates to the existing crabbox run path with --shell. If the replay command exits non-zero and the manifest has no oracle.failure_signature, Crabbox records outcome: fail_reproduced. When a signature is present, the bounded replay output must contain that signature to count as fail_reproduced. A nonzero replay with a different signature records outcome: fail_new and returns nonzero because it is a new failure, not an honest reproduction. If the command exits zero, Crabbox records outcome: pass and returns nonzero because the original failure was not reproduced.
Use --keep when the goal is human or agent debugging. Crabbox keeps the lease alive and the user can attach with crabbox ssh --id <id-or-slug> using the id or slug printed by the underlying run.
Each replay appends a local record with outcome, replay_quality, exit_code, duration_ms, and whether the lease was kept. That is the simple measurement surface for the first gate: did the same failure reproduce?
#Secret And Evidence Boundary
Capsules store local YAML, bounded logs, and GitHub artifact references. They do not store secrets intentionally, but CI logs and artifacts can still contain sensitive data if the source workflow wrote it. Treat capsule directories as debug artifacts:
- keep
--max-log-bytesbounded; - use
--no-logsfor sensitive runs; - do not commit capsule directories unless the logs were reviewed;
- delete local capsules when they stop being useful.
#Non-Goals
- No RL training or reward loop.
- No emulator or hardware-in-loop implementation.
- No coordinator registry or worker storage.
- No automatic workflow command extraction.
- No machine snapshotting. Use checkpoints or images for environment state.
- No secret capture. Capsules store bounded logs and references, not raw runtime
state.
The strategic point is to make real CI failures replayable first. That creates a useful debug product immediately and leaves a clean path toward richer replay catalogues once the failure catalogue is trustworthy.