Sync
Read this when you are:
- changing rsync behavior or the remote sync flow;
- debugging missing, stale, or unexpectedly deleted files on a runner;
- tuning Git seeding, fingerprints, excludes, or large-sync guardrails.
Before running a command, crabbox run syncs your current checkout to the leased runner. Sync only applies to SSH-lease providers; delegated-run providers own their own file transfer and reject the local sync options. Native Windows targets use the same file list but ship it as a tar archive over OpenSSH instead of rsync.
#What gets synced
Sync transfers the Git-managed working set, not the whole directory tree. The file list comes from git ls-files --cached --others --exclude-standard -z, which is:
- tracked files in the index;
- nonignored untracked files (new files Git would not ignore).
That list is then filtered by the active excludes:
- Crabbox's built-in cache and generated-output excludes;
- repo-local
sync.exclude(config) patterns; - root
.crabboxignorepatterns.
Git-ignored output, dependency folders, .git, and common local caches stay out of the transfer. This keeps a first sync close to what CI would see while still letting you test uncommitted local edits.
The built-in excludes are intentionally conservative. They cover common churn such as node_modules, .git, dist, coverage, playwright-report, test-results, .next, .vite, .turbo, target, .venv, __pycache__, .gradle, and the local .crabbox/logs, .crabbox/captures, and .crabbox/runs directories. Crabbox does not globally drop tracked source files just because a path segment happens to be named build or out. Put project-specific generated directories in .crabboxignore or sync.exclude.
#Excludes
Patterns match against POSIX-style relative paths. A pattern with no / matches any path segment by name or by glob (for example, node_modules or *.log); patterns with a / match a path prefix or a glob over the full relative path.
Use .crabboxignore when you only need repo-local sync exclusions. The file is read from the repository root. Blank lines and lines starting with # are ignored; the remaining lines are appended to sync.exclude and use the same matcher as config excludes. Crabbox supports only the exact .crabboxignore name; there is no short alias.
Repo-local config should hold project-specific excludes and env allowlists. Secrets must never be passed as command-line arguments or via broad env globs.
#Sync flow
For an SSH-lease run, sync runs these steps:
- Resolve the local repository root.
- Build the sync manifest (the NUL-delimited file list) and a parallel list of
- Print a candidate estimate and, when the checkout is dirty, a dirty-delta
- When fingerprinting is enabled, compute a local fingerprint and compare it to
- On
--full-resync/--fresh-sync, reset the remote workdir first. - Seed the remote Git tree from
originat the localHEADwhen that commit - Write the manifest (and the deletion list) to the remote workdir.
- When delete-sync is enabled, prune previously synced remote files that are no
- rsync the working set with
--files-from=- --from0(the manifest drives the - Finalize: git-hydrate the worktree against the configured base ref, run the
tracked paths that were deleted locally.
estimate; then enforce the large-sync guardrails (see below).
the remote one. If they match, print No changes detected, skipping sync and skip the rest.
is reachable from a remote ref, so rsync only ships the diff.
longer in the manifest.
transfer).
mass-deletion sanity check, and record the new fingerprint.
The remote prune in step 8 only removes paths Crabbox previously synced. It does not touch workflow-created state, package caches, .git, or any other runner file outside the managed list. The mass-deletion guard in step 10 aborts a sync that would delete an unexpectedly large fraction of tracked files; set CRABBOX_ALLOW_MASS_DELETIONS=1 to override it (this is also implied during Actions hydration).
On the remote box, sync metadata (including the fingerprint) is stored under .git/crabbox when .git is a directory, and under .crabbox otherwise. The .crabbox/ directory in your repository remains available for repository-owned files and config; Crabbox does not delete files there.
#Fingerprints and Git seeding
When sync.fingerprint is enabled (the default), Crabbox derives a fingerprint from HEAD, the delete/checksum settings, the manifest, the deletion list, the excludes, and the content of every changed file. If the remote workdir already carries that fingerprint, the sync is skipped entirely. --full-resync ignores the remote fingerprint and forces a clean transfer.
Git seeding (sync.gitSeed, default on) clones or fetches the base tree on the runner before rsync, so only your diff travels over the wire. It activates only when the local HEAD commit is reachable from a remote ref.
#Large-sync guardrails
crabbox run prints a one-line size estimate before transferring. When the checkout is clean, the candidate counts the full file set. When the checkout is dirty, the guardrails count the dirty delta (changed plus new files) instead, but the line still shows the full candidate size so first-sync cost stays visible:
sync candidate: 299 files, 14.2 MiB dirty_delta=7 files, 92.4 KiB
The guardrail scope (candidate or dirty delta) is compared against the warn and fail thresholds. Crossing a warn threshold prints a warning plus the top source directories by file count, so accidental dependency repair or generated churn is easy to spot. Crossing a fail threshold aborts the run.
crabbox run --force-sync-large bypasses the fail thresholds for one run. --debug adds rsync progress and stat output; quiet syncs still print a heartbeat when rsync goes silent for a while.
#Alternatives to syncing the whole checkout
For noisy worktrees, crabbox run --fresh-pr example-org/my-app#123 is often faster and clearer than syncing the local checkout. The runner starts from the PR head; add --apply-local-patch to layer your local git diff on top. The --fresh-pr path replaces rsync and cannot be combined with --no-sync, --sync-only, or --full-resync.
Use crabbox sync-plan to inspect the manifest before leasing a box. It prints the candidate file count, total bytes, the count of deleted tracked paths, and the largest files and directories, using the same excludes as run. Use --limit to change how many top files and directories are listed (default 20).
$ crabbox sync-plan
sync candidate: 299 files, 14.2 MiB
top files:
3.1 MiB docs/assets/demo.gif
...
top dirs:
6.4 MiB docs/assets
...
#Configuration
Sync defaults (override per repo in config or via env):
sync:
delete: true
checksum: false
gitSeed: true
fingerprint: true
baseRef: "" # defaults to the repo's origin HEAD / current branch
timeout: 15m
warnFiles: 50000
warnBytes: 5368709120 # 5 GiB
failFiles: 150000
failBytes: 21474836480 # 20 GiB
allowLarge: false
exclude: []
Environment overrides:
CRABBOX_SYNC_CHECKSUM
CRABBOX_SYNC_DELETE
CRABBOX_SYNC_GIT_SEED
CRABBOX_SYNC_FINGERPRINT
CRABBOX_SYNC_BASE_REF
CRABBOX_SYNC_TIMEOUT
CRABBOX_SYNC_WARN_FILES
CRABBOX_SYNC_WARN_BYTES
CRABBOX_SYNC_FAIL_FILES
CRABBOX_SYNC_FAIL_BYTES
CRABBOX_SYNC_ALLOW_LARGE
CRABBOX_ALLOW_MASS_DELETIONS
CRABBOX_ENV_ALLOW