Troubleshooting
Use this guide when a lease, run, or deployment misbehaves. Each section lists the symptoms you will see, the commands that confirm the cause, and the fixes that resolve it. Read it when:
- a lease fails to create or is rejected before provisioning;
- SSH never becomes ready on a new box;
- tailnet reachability behaves unexpectedly;
- sync transfers the wrong files, stalls, or aborts;
- Actions hydration times out;
- the docs site fails to publish.
#First steps
Most problems show up in one of four places: local config, broker readiness, the lease list, or usage/cost state. Start here before digging deeper.
crabbox doctor
crabbox config show
crabbox list --json
crabbox usage --scope all --json
doctor reports local and broker/provider readiness; pass --provider <name> to scope its checks to one provider. See doctor and cost & usage for the data behind these commands.
#Broker auth fails
The CLI talks to the coordinator over HTTP for every lease lifecycle call. Auth failures stop a lease before it reaches a provider.
Symptoms
401or403responses;missing broker token;- a GitHub
Invalid redirect_urierror during browser login; - a Cloudflare Access challenge page returned instead of JSON.
Checks
crabbox config show
printenv CRABBOX_COORDINATOR
printenv CRABBOX_COORDINATOR_TOKEN
printenv CRABBOX_PUBLIC_URL
Fixes
- Configure the broker with
crabbox config set-broker --url <coordinator-url>(or - Point the CLI at the coordinator URL, or at the Access-protected route only when
- Ensure
CRABBOX_COORDINATOR_TOKENmatches the coordinator'sCRABBOX_SHARED_TOKEN. - For self-hosted GitHub browser login, create a GitHub OAuth app and set its
- Ensure the coordinator's
CRABBOX_PUBLIC_URLuses the same public origin as that
log in with crabbox login).
that is intended.
callback URL to https://<your-coordinator-host>/v1/auth/github/callback.
GitHub OAuth callback.
See broker auth & routing for the full token precedence and Access integration.
#SSH host key or control socket fails
Symptoms
- SSH warns that the host identification changed after a provider reused an IP;
- a reused warm lease connects to the wrong
ControlMastersocket; - config paths under
~/Library/Application Supportappear split at the space.
Checks
crabbox inspect --id swift-crab --json
crabbox ssh --id swift-crab
Fixes
- Upgrade to a build that quotes SSH config values containing spaces.
- Keep per-lease keys under the Crabbox config
testboxes/<lease>directory - Do not manually override
UserKnownHostsFileorControlPathunless you are
(see SSH keys).
debugging SSH itself.
#Lease rejected by cost control
The broker enforces active-lease counts and monthly reserved-USD budgets before provisioning. Over-limit requests are rejected with HTTP 429.
Symptoms
cost_limit_exceeded;- the lease request fails before any provider machine is created.
Checks
crabbox usage --scope user --user "$(git config user.email)"
crabbox usage --scope org --org example-org
Fixes
- Raise the relevant monthly or active-lease limit on the broker.
- Shorten
--idle-timeout(or--ttl) to lower the reserved estimate. - Choose a smaller
--class. - Stop kept leases you no longer need with
crabbox stop <id-or-slug>.
#Provider not configured, or capacity/quota fails
Symptoms
provider_not_configured(HTTP 424 from the broker);crabbox doctor --provider azurereportsmissing=AZURE_TENANT_ID,...;- the class falls back from a dedicated machine to a smaller one;
- an AWS Spot request cannot be fulfilled;
- AWS reports
VcpuLimitExceededfor large On-Demand instances; - server creation fails before SSH is reachable.
Checks
crabbox doctor --provider aws
crabbox list --json
crabbox usage --scope all
CRABBOX_CAPACITY_REGIONS=eu-west-1,eu-west-2,eu-central-1,us-east-1,us-west-2 \
crabbox warmup --provider aws --class standard --market on-demand --timing-json
Fixes
- Set the named coordinator provider secrets before retrying brokered leases.
- Choose a smaller
--class. - Override the AWS capacity market for a one-off launch with
--market on-demand - Set
CRABBOX_CAPACITY_REGIONSso brokered and direct AWS launches can try - Set
CRABBOX_CAPACITY_AVAILABILITY_ZONESonly when you intentionally want - Set
CRABBOX_CAPACITY_STRATEGY=most-availableto prefer regions with capacity. - Set
CRABBOX_CAPACITY_LARGE_CLASSESwhen your installation wants warnings for - If
doctorprintswarning capacity, use its recommended class/type or - Raise the AWS
Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances - Raise the Hetzner dedicated-core quota when dedicated classes are required.
or --market spot.
several regions.
specific zones within those regions.
classes beyond beast.
request the printed AWS quota code.
quota for the C/M/R/T/Z families, or the matching Spot quota when using Spot.
Brokered AWS launches record provisioning attempts and use AWS Service Quotas when available, reporting the quota code, applied vCPU limit, requested type, and required vCPUs before trying the next candidate. Brokered responses include capacityHints so callers can surface the selected region/market and the next operator action instead of parsing raw provider errors. crabbox doctor --provider aws surfaces the same quota pressure before the first warmup.
If AWS reports InvalidInstanceID.NotFound during coordinator-backed lease creation, the backing instance record was stale by the time Crabbox tried to use it. Crabbox discards that lease record best-effort and retries once with a fresh lease. See capacity fallback for the full algorithm.
#Provider machine looks orphaned
Symptoms
crabbox listshowsorphan=no-active-leaseororphan=missing-lease-label;- the provider console has a
crabbox-cbx-...machine, butcrabbox inspect
returns not found.
Checks
crabbox list --provider hetzner
crabbox list --provider aws
crabbox admin leases --state active
Fixes
- Do not delete
keep=truemachines automatically. - Stop or delete a machine only after confirming no active coordinator lease
- Use
crabbox stop <id-or-slug>for active leases; reserve provider/admin
references it.
cleanup (crabbox cleanup, crabbox admin delete) for confirmed orphans.
The broker also sweeps untracked AWS instances and idle Mac dedicated hosts on a schedule. See lifecycle & cleanup.
#SSH never becomes ready
A box exists but bootstrap has not finished writing the readiness marker, or the SSH port is unreachable. The default SSH user is crabbox, the primary port is 2222, and fallback port 22 is tried unless disabled.
Symptoms
- the lease exists, but
crabbox runwaits until the SSH timeout; - the primary port (
2222) and all fallback ports are unreachable; - the
crabbox-readymarker is missing.
Checks
crabbox inspect --id cbx_... --json
ssh -p 2222 crabbox@HOST crabbox-ready
ssh -p 2222 crabbox@HOST test -f /var/lib/crabbox/bootstrapped
ssh -p 22 crabbox@HOST crabbox-ready
Fixes
- Wait for cloud-init to finish on freshly created machines.
- Verify the security group or firewall allows the primary SSH port and the
- Set
CRABBOX_SSH_FALLBACK_PORTS=nonewhen fallback port22should not be - Inspect the provider console output for cloud-init failures.
- Retry the lease if bootstrap failed before creating the ready marker.
configured fallback ports.
opened or tried.
See runner bootstrap for the readiness contract.
#Tailscale path fails
Symptoms
--tailscalelease creation fails withtailscale_unavailable,--network tailscalereports the lease has no tailnet address;--network tailscalereports the tailnet host is unreachable over SSH;--network autofalls back topublic;tailscale exit node ... joined but remote internet egress failed.
tailscale_disabled, or invalid_tailscale_tags;
Checks
crabbox config show
crabbox inspect --id swift-crab --json
crabbox ssh --id swift-crab --network tailscale
tailscale status
tailscale ping <tailscale-fqdn-or-100.x-address>
Fixes
- For brokered leases, configure the coordinator secrets
- Keep
CRABBOX_TAILSCALE_ENABLEDunset or1; set it to0only to disable - Ensure requested tags are in the coordinator's
CRABBOX_TAILSCALE_TAGSallowlist. - If
invalid_tailscale_tagsincludes Tailscale's raw `requested tags ... are - Ensure the local client is joined to the same tailnet and that ACLs allow SSH
- For exit nodes, ensure the node is approved and that tailnet grants or ACLs
- If the exit node is a personal Mac, verify Tailscale still advertises it as an
- Use
--network publicto prove the provider SSH path independently. - Use
--network autowhen fallback to public is acceptable. - Use
--network tailscalewhen a missing or unreachable tailnet path should
CRABBOX_TAILSCALE_CLIENT_ID and CRABBOX_TAILSCALE_CLIENT_SECRET.
brokered Tailscale intentionally.
invalid or not permitted response, make the requested set exactly match the OAuth client's tags or update tagOwners` so an OAuth client tag owns every requested tag. Multi-tag OAuth clients need explicit self-ownership for subset requests; a dedicated deployment-owner tag is usually simpler and narrower.
to the tagged node.
allow the lease tag (for example tag:crabbox) to reach autogroup:internet.
exit node and that the Mac can forward internet traffic for clients.
fail the command.
Crabbox uses OpenSSH with per-lease SSH keys over the selected host. Tailscale SSH, Serve, Funnel, and direct VNC binding are not part of managed lease support. See tailscale and network.
#Sync looks wrong
run syncs your dirty checkout to the box via a git manifest plus rsync, with fingerprint short-circuiting and guardrails. Wrong-base or unexpected-deletion problems usually trace to git state or excludes.
Symptoms
- changed-test detection picks the wrong base;
- deleted files unexpectedly reappear on the remote;
- sync aborts on a mass tracked-file deletion;
- sync warns or fails before rsync because the candidate tree is too large.
Checks
git status --short
git ls-files --cached --others --exclude-standard | wc -l
crabbox run --id cbx_... -- git status --short
crabbox run --id cbx_... --sync-only --debug
Fixes
- Commit, stash, or intentionally keep local deletions before syncing.
- Add generated directories to
.gitignoreor tosync.excludein - Keep
.git, build caches, and package caches out of the repo source tree. - Use
--force-sync-largeonly after verifying the candidate file count and - Check the repo-local
.crabbox.yamlsync excludes. - Rerun without relying on the sync fingerprint after large tree changes.
- Verify base-ref hydration in repo config.
.crabbox.yaml.
byte total are expected.
#Sync stalls or times out
Symptoms
- rsync prints little output for a long time;
rsync timed out after ...;- a local cache directory made the first sync unexpectedly huge.
Checks
crabbox config show
crabbox sync-plan
crabbox run --id cbx_... --sync-only --debug
Fixes
- Inspect the printed sync candidate estimate (
crabbox sync-plan) before - Lower
sync.timeoutfor quick failure in agent loops, or raise it for - Tune
sync.warnFiles,sync.warnBytes,sync.failFiles, andsync.failBytes - Stop and warm a fresh lease if the remote workspace looks corrupted.
retrying.
intentionally large source transfers.
in repo config.
See sync for the full manifest and hydration flow.
#Actions hydration times out
crabbox actions hydrate populates a lease's workspace by driving the repo's configured GitHub Actions workflow. It waits for the workflow to write a ready marker before later run calls reuse that workspace.
Symptoms
crabbox actions hydratedispatches a run but never sees the ready marker;- a later
crabbox run --iddoes not enter the expected Actions workspace.
Checks
crabbox actions hydrate --id swift-crab
crabbox inspect --id swift-crab --json
Fixes
- Open the workflow run URL and find the failed setup step.
- Ensure the generated workflow writes the ready marker.
- Confirm the workflow has permission to register or use the runner.
- Keep secrets inside the workflow and write only non-secret handoff data.
See Actions hydration.
#Docs site fails to publish
Symptoms
- the Pages workflow fails during Pages setup;
- the local docs build succeeds.
Checks
scripts/check-docs.sh
gh run list --workflow pages.yml
Fixes
- Enable GitHub Pages for the repository or organization.
- Rerun the Pages workflow after Pages is allowed.
- Keep Markdown links relative so the static builder can rewrite them.
- Fix broken internal Markdown links before assuming Pages itself is down.