runtime: checkpoint/restore (CheckpointRuntime) + Podman backend#98
Conversation
Docker's checkpoint/restore is broken on current engines (the netns bind-mount on restore — open upstream containerd#12141 / moby#37344). Podman does the full round trip (`checkpoint --export` / `restore --import`: process + memory + writable rootfs in a portable, node-independent archive). So expose an optional CheckpointRuntime sub-interface and add a Podman backend that implements it. Contract (backend-agnostic): - runtime.CheckpointRuntime + CheckpointSpec/RestoreSpec/CheckpointRef (export/import model), Capabilities.Checkpoint, and typed errors (ErrCheckpointUnsupported, CheckpointFailedError, RestoreFailedError). - Engine.Checkpoint / Engine.Restore wrappers (checkpoint.go) that type-assert the sub-interface and gate on the capability bit. Restore returns a fully reattached *Workspace (re-inspect, rebuild config) via a shared reattachWorkspace helper factored out of Attach. - Engine.CheckpointProject / RestoreProject (checkpoint_project.go): a thin orchestrator over the per-container primitive for multi-service compose projects — enumerate by com.docker.compose.project label, checkpoint each service to its own archive + a manifest, restore each and reattach the devcontainer service as the Primary workspace. Decoupled from the compose-go / `docker compose` machinery (label-based only). Backend (runtime/podman, Linux): - Embeds *docker.Runtime over Podman's docker-compatible socket for the standard surface; drives checkpoint/restore + buildah build through the libpod REST API on the same socket via a thin stdlib net/http client. No CLI shell-out, no pkg/bindings (that spike took deps 76->384 + cgo/gpgme), zero new modules, cross-compiles clean. - Capabilities().Checkpoint gates on libpod reachability plus an optional Options.CheckpointProbe: the REST transport has no `criu check` equivalent, so the deployer (who runs the service) asserts CRIU. Tests: - Unit: httptest-based libpod request/response shapes, fake runtimes for the engine wrappers + project orchestrator (incl. partial-failure → no manifest, RestoreFailedError propagation), criu-probe gating, and the reattach image-metadata merge. - Integration (PODMAN_SOCKET-gated; skip without a live Podman+CRIU host): single-container reattach, multi-service project round trip, and a two-phase cross-node test (checkpoint on one store, restore on a fresh store that never pulled the image — proving the archive is self-contained; it also pins the workspace-bind-source-must-exist-on-the-destination requirement). - CI: a test-integration-podman job installs podman+criu+crun+iptables, starts the socket, and runs the gated tests for real (continue-on-error until the hosted-runner CRIU capability is confirmed). Validated end-to-end on real Podman+CRIU (all integration tests pass). Design records: design/checkpoint-restore.md (primitive + project orchestrator §3.3 + empirical runtime matrix) and design/podman-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughIntroduces a ChangesPodman Checkpoint/Restore Feature
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Engine
participant podman_Runtime as runtime/podman.Runtime
participant libpodClient
participant Filesystem
Note over Caller,Filesystem: Checkpoint flow
Caller->>Engine: Checkpoint(ctx, ws, CheckpointOptions)
Engine->>Engine: validate + type-assert CheckpointRuntime
Engine->>podman_Runtime: Checkpoint(ctx, containerID, CheckpointSpec)
podman_Runtime->>libpodClient: POST /containers/{id}/checkpoint?export=true&tcpestablished=...
libpodClient-->>podman_Runtime: HTTP 200 (archive stream)
podman_Runtime->>Filesystem: write archive to ArchivePath
podman_Runtime-->>Engine: CheckpointRef{ArchivePath, Size}
Engine-->>Caller: CheckpointRef
Note over Caller,Filesystem: Restore flow
Caller->>Engine: Restore(ctx, RestoreOptions)
Engine->>Engine: validate + type-assert CheckpointRuntime
Engine->>podman_Runtime: Restore(ctx, RestoreSpec)
podman_Runtime->>Filesystem: open archive from ArchivePath
podman_Runtime->>libpodClient: POST /containers/import/restore (archive body)
libpodClient-->>podman_Runtime: HTTP 200 {Id: newContainerID}
podman_Runtime-->>Engine: *Container{State: Running}
Engine->>Engine: inspectStable → extract WorkspaceID from label
Engine->>Engine: reattachWorkspace(ctx, details, id, LocalEnv)
Engine-->>Caller: *Workspace
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (5)
runtime/podman/integration_test.go (2)
68-69: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winReplace fixed sleeps with deadline-based polling to reduce flakes.
The hard sleeps make this test timing-sensitive on slower runners; polling for observable state (counter file available/increasing) is more stable.
Also applies to: 87-88, 94-95
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@runtime/podman/integration_test.go` around lines 68 - 69, Replace the fixed time.Sleep calls at multiple locations in the test (including the time.Sleep(6 * time.Second) on line 68-69, and the additional sleep calls at lines 87-88 and 94-95) with deadline-based polling loops that check for observable state changes, such as verifying that the counter file exists or that its value has increased. This approach should use a timeout context and repeatedly check the actual condition with small intervals rather than waiting for a fixed duration, making the test more reliable across different execution environments.
36-37: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winUse bounded contexts for live Podman integration calls.
Both tests run external socket operations with
context.Background(), so a stuck Podman API can hang until the global test timeout instead of failing at the operation boundary.Suggested patch
- ctx := context.Background() + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() rt, err := New(ctx, Options{Socket: socket})- ctx := context.Background() + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() rt, err := New(ctx, Options{Socket: socket})Also applies to: 130-131
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@runtime/podman/integration_test.go` around lines 36 - 37, Replace the use of context.Background() with a bounded context that has a timeout to prevent indefinite hangs during Podman API operations. In the integration tests where context.Background() is passed to the New function (both at the initial creation and in the other test location), use context.WithTimeout or context.WithDeadline to create a context with an appropriate timeout duration. This ensures that if the Podman socket operations get stuck, they will fail at the operation boundary rather than hanging until the global test timeout.test/integration/podman_project_checkpoint_test.go (1)
184-193: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
readCounthides the failure mode by collapsing all errors to zero.Returning
0for exec and parse failures makes diagnostics ambiguous when the test fails; surfacing the error will speed triage.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/integration/podman_project_checkpoint_test.go` around lines 184 - 193, The readCount function currently silences all errors by returning 0 when ExecContainer fails or when parsing the stdout fails. Instead of returning 0 on errors, use the testing.T helper (t.Fatalf) to fail the test with the actual error details. Specifically, when ExecContainer returns an error or non-zero exit code, report that error using t.Fatalf with the error message. Similarly, when strconv.Atoi fails to parse the count, report the parse error using t.Fatalf. This way, test failures will show the actual underlying problem rather than just silently returning 0, making debugging much faster.design/checkpoint-restore.md (2)
116-116: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueMinor language polish: "exactly" is redundant.
Line 116: "That is exactly why it works end-to-end..." could become "That is why it works end-to-end..." The emphasis is already strong from the preceding sentence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design/checkpoint-restore.md` at line 116, The word "exactly" in the sentence "That is exactly why it works end-to-end" is redundant and weakens the emphasis already conveyed by the preceding sentence. Remove the word "exactly" from this sentence so it reads "That is why it works end-to-end..." to improve the language flow and eliminate the unnecessary emphasis.
300-300: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueMinor language polish: "very" is a weak intensifier.
Line 300: "pulls in the very large
containers/podmanmodule..." Consider "pulls in the massive / bulkycontainers/podmanmodule (cgo, storage build tags, ~300 added modules)" to make the cost concrete rather than vague.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design/checkpoint-restore.md` at line 300, The phrase "pulls in the very large" uses "very" as a weak intensifier that lacks concrete information about why this is a problem. Replace "very" with a more descriptive term like "massive" or "bulky" and add specific details about the actual cost of including the containers/podman module, such as mentioning cgo requirements, storage build tags, and the approximate number of added modules (around 300) to make the impact concrete and easier to understand.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 145-146: The GitHub Actions references for actions/checkout and
actions/setup-go on lines 145-146 are using version tags (`@v6`) without being
pinned to specific commit hashes, which is a security risk. Replace the `@v6`
references with exact commit hashes (e.g., the full SHA like
`@a05bd36ab36d3e7f7a1506cbb51ccad719569de1`) for both actions/checkout@v6 and
actions/setup-go@v6 to pin them to their current release commits and ensure
supply-chain integrity.
In `@checkpoint_project.go`:
- Around line 140-153: The code currently allows multiple containers to resolve
to the same service name through serviceName(c), which causes archive file
overwrites since archive := svc + ".tar" will be identical for duplicates.
Before the checkpoint loop iterates through containers, add validation to detect
and reject duplicate service names. Track seen service names in a map or set as
you iterate through the containers, and return an error immediately if a
duplicate service name is encountered, preventing silent data loss during the
checkpoint operation.
- Around line 236-249: The readProjectManifest function does not validate the
archive paths in the services array before using them, creating a path traversal
vulnerability. After unmarshaling the JSON into the ref variable, add validation
logic to ensure each services[].archive path does not contain path traversal
sequences like ../ and is not an absolute path. Check these constraints for all
archive entries and return an error if any archive path violates these security
requirements before assigning ref.ArchiveDir and returning the ref.
- Around line 228-234: The writeProjectManifest function uses os.WriteFile
directly which is not atomic and can leave a truncated file if interrupted,
breaking the completion-marker guarantee. Modify the function to write the JSON
manifest data to a temporary file first (using a temp file pattern), then
atomically rename that temporary file to the final destination using os.Rename,
ensuring that the project manifest file only appears complete or not at all.
In `@checkpoint.go`:
- Around line 134-135: The Engine.Restore method currently allows restoration
without validating that the required devcontainer ID label exists in
details.Labels, which can result in an empty WorkspaceID being returned. Before
calling reattachWorkspace, add a check to verify that details.Labels contains
the LabelDevcontainerID key. If the label is missing, return an error instead of
proceeding with the reattach operation. This ensures the method fails explicitly
rather than returning a workspace with an empty ID that violates the fully
reattached contract.
In `@design/podman-backend.md`:
- Around line 132-141: The design document contains conflicting descriptions of
how the container build and runtime operations are implemented. The sections
discussing podman build behavior (around the BuildImage implementation) claim a
CLI-based approach that shells out to podman, while other sections in the same
document describe a REST-based implementation. Review the sections describing
the build and container runtime behavior, identify which approach (CLI-based or
REST-based) actually matches the current code implementation, and update all
conflicting sections to consistently describe the same implementation path.
Ensure sections around line 132-141 and 192-203 align with each other and with
the actual implementation.
In `@runtime/podman/build.go`:
- Around line 105-111: The tarDir goroutine spawned to write to the pipe writer
(pw) can become blocked if the r.lp.do() request call fails before consuming the
full request body. When an error occurs on line 108, the function returns early
without consuming data from the pipe reader (pr), causing the goroutine to hang
indefinitely. Close the pipe reader (pr) on error by adding a defer statement or
explicit close in the error handling path to signal EOF to the tarDir goroutine
and allow it to finish gracefully and avoid goroutine leaks.
In `@runtime/podman/checkpoint.go`:
- Around line 84-89: The code in the Decode block does not validate that the
restored container ID is populated before returning success. After the
json.NewDecoder successfully decodes the response into the rep variable (which
is of type restoreReport), add a validation check to ensure rep.ID is not empty.
If rep.ID is empty, return an appropriate error (similar to the existing
RestoreFailedError) to prevent returning a Container with an empty ID that
causes issues downstream.
- Around line 35-38: The os.Create call used to create the checkpoint archive
file at spec.ArchivePath may result in files with overly permissive access
permissions depending on the system umask. Replace os.Create with os.OpenFile,
specifying explicit flags (O_CREATE, O_WRONLY, O_TRUNC) and restrictive file
permissions (0600) to ensure the checkpoint archive file is only readable and
writable by the owner, protecting the sensitive process memory contents.
- Around line 39-46: When either copyErr or closeErr is encountered in the
checkpoint archive writing process, a partial or truncated file is left at
ArchivePath that could be mistaken for valid data. In both error handling
branches (the if copyErr != nil block and the if closeErr != nil block), add a
file deletion operation to remove the archive file at ArchivePath before
returning the CheckpointFailedError, ensuring no partial checkpoint artifacts
remain that could cause issues during subsequent operations.
In `@test/integration/podman_project_checkpoint_test.go`:
- Around line 58-64: The CreateNetwork call is not idempotent across interrupted
reruns because if a network with the same netName already exists from a previous
failed run, the creation will fail. To fix this, before calling CreateNetwork
with the netName, first attempt to remove any existing network with that name
using rt.RemoveNetwork (or similar method), ignoring errors if the network
doesn't exist, so that each test run starts with a clean state and the network
creation succeeds regardless of prior run failures.
---
Nitpick comments:
In `@design/checkpoint-restore.md`:
- Line 116: The word "exactly" in the sentence "That is exactly why it works
end-to-end" is redundant and weakens the emphasis already conveyed by the
preceding sentence. Remove the word "exactly" from this sentence so it reads
"That is why it works end-to-end..." to improve the language flow and eliminate
the unnecessary emphasis.
- Line 300: The phrase "pulls in the very large" uses "very" as a weak
intensifier that lacks concrete information about why this is a problem. Replace
"very" with a more descriptive term like "massive" or "bulky" and add specific
details about the actual cost of including the containers/podman module, such as
mentioning cgo requirements, storage build tags, and the approximate number of
added modules (around 300) to make the impact concrete and easier to understand.
In `@runtime/podman/integration_test.go`:
- Around line 68-69: Replace the fixed time.Sleep calls at multiple locations in
the test (including the time.Sleep(6 * time.Second) on line 68-69, and the
additional sleep calls at lines 87-88 and 94-95) with deadline-based polling
loops that check for observable state changes, such as verifying that the
counter file exists or that its value has increased. This approach should use a
timeout context and repeatedly check the actual condition with small intervals
rather than waiting for a fixed duration, making the test more reliable across
different execution environments.
- Around line 36-37: Replace the use of context.Background() with a bounded
context that has a timeout to prevent indefinite hangs during Podman API
operations. In the integration tests where context.Background() is passed to the
New function (both at the initial creation and in the other test location), use
context.WithTimeout or context.WithDeadline to create a context with an
appropriate timeout duration. This ensures that if the Podman socket operations
get stuck, they will fail at the operation boundary rather than hanging until
the global test timeout.
In `@test/integration/podman_project_checkpoint_test.go`:
- Around line 184-193: The readCount function currently silences all errors by
returning 0 when ExecContainer fails or when parsing the stdout fails. Instead
of returning 0 on errors, use the testing.T helper (t.Fatalf) to fail the test
with the actual error details. Specifically, when ExecContainer returns an error
or non-zero exit code, report that error using t.Fatalf with the error message.
Similarly, when strconv.Atoi fails to parse the count, report the parse error
using t.Fatalf. This way, test failures will show the actual underlying problem
rather than just silently returning 0, making debugging much faster.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 83a254fe-5d7e-4146-8e93-79120c50801b
📒 Files selected for processing (23)
.github/workflows/ci.yml.gitignoreREADME.mdattach.gocheckpoint.gocheckpoint_project.gocheckpoint_project_test.gocheckpoint_test.godesign/README.mddesign/checkpoint-restore.mddesign/podman-backend.mdruntime/compose_primitives.goruntime/errors.goruntime/podman/build.goruntime/podman/checkpoint.goruntime/podman/integration_test.goruntime/podman/libpod.goruntime/podman/podman.goruntime/podman/podman_test.goruntime/runtime.gotest/integration/podman_checkpoint_restore_test.gotest/integration/podman_crossnode_test.gotest/integration/podman_project_checkpoint_test.go
CodeRabbit review on #98. Substantive fixes: - runtime/podman/checkpoint.go: write the archive 0600 (it carries process memory) instead of os.Create's umask default; delete a partial archive if the copy/close fails; reject an empty container id from libpod restore. - runtime/podman/build.go: CloseWithError the pipe reader when the /build request errors, so the tarDir goroutine doesn't leak parked on the pipe. - checkpoint.go: Engine.Restore now errors if the restored container has no dev.containers.id label (was silently returning an empty Workspace.ID). - checkpoint_project.go: reject duplicate service names (would overwrite archives); write project.json atomically via temp+rename so a present manifest is always complete; validate manifest archive entries are plain basenames (block path traversal on restore). - ci.yml: criu was dropped from Ubuntu 24.04 (the install failed with "no installation candidate"); pin ubuntu-22.04 (jammy ships criu in universe) and have the setup step skip the job green when criu can't be installed or `criu check` fails, so it's a ready harness rather than a red X. - design/podman-backend.md: §5.1/§5.2 still presented the CLI plan as final; marked superseded by the libpod-REST implementation. - multi-service integration test: clear a leftover network before CreateNetwork so reruns after an interrupted run are idempotent. Skipped: the unpinned-actions flag — matches the repo-wide @vn convention (every existing job uses it); out of scope for this PR. Re-validated on real Podman+CRIU (OrbStack): all integration tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… runners The test-integration-podman job failed for real reasons, not flakiness: Ubuntu 24.04 dropped criu; 22.04 ships podman 3.4.4 whose OCI runtime reports "does not support checkpoint/restore" (and 3.x predates the libpod v5 API this backend targets). criu check passing wasn't a sufficient gate. Now the job always compiles the gated tests (build step), and only runs them when the host is genuinely capable (criu + criu check + podman>=5), skipping green with a warning otherwise. Dropped continue-on-error so a failure on a capable (self-hosted) runner is honest. Real C/R is validated locally on podman 5.x + criu (OrbStack). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hosted runner's apt podman can't do C/R (24.04 has no criu; 22.04's podman 3.4.4 can't checkpoint and predates the libpod v5 API), so the previous job could only skip. Instead bring a modern stack in a container: build the gated tests on the runner (static, CGO_ENABLED=0) for compile coverage, then run them inside quay.io/podman/stable (podman 5.x + crun + criu) launched --privileged --cgroupns=host so CRIU can drive the runner's kernel. .github/scripts/podman-cr.sh smoke-tests an actual checkpoint and skips green with the real reason if the runner can't (e.g. cgroup-freezer perms), so a capable runner runs for real and an incapable one stays green. Validated in the equivalent topology (Linux VM → Docker → modern-podman container → CRIU) on a criu-capable kernel: all four integration tests pass (checkpoint/restore, buildah build, engine reattach, multi-service project). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
143-166: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd an explicit timeout to the Podman integration job.
This job runs privileged containerized tests and can hang on host/kernel edge cases; set
timeout-minutesto bound CI resource consumption and improve failure recovery.Suggested diff
test-integration-podman: runs-on: ubuntu-latest + timeout-minutes: 30 needs: [lint, test-linux] steps:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 143 - 166, The test-integration-podman job lacks an explicit timeout which can cause the CI pipeline to hang when the privileged containerized tests encounter kernel edge cases. Add a timeout-minutes property to the test-integration-podman job at the same indentation level as the runs-on, needs, and steps properties to set a reasonable timeout limit that bounds resource consumption and improves failure recovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 143-166: The test-integration-podman job lacks an explicit timeout
which can cause the CI pipeline to hang when the privileged containerized tests
encounter kernel edge cases. Add a timeout-minutes property to the
test-integration-podman job at the same indentation level as the runs-on, needs,
and steps properties to set a reasonable timeout limit that bounds resource
consumption and improves failure recovery.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 51e096f4-f990-4c68-8ce3-6fa94f03404c
📒 Files selected for processing (2)
.github/scripts/podman-cr.sh.github/workflows/ci.yml
✅ Files skipped from review due to trivial changes (1)
- .github/scripts/podman-cr.sh
Document everything merged since v0.3.0: Podman backend + checkpoint/restore (#98), compose feature security metadata + entrypoint chaining (#103), Podman compose health probing (#102), deps bump (#101), and the prebuild dev-environment / CI tooling (#88–#93). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Why
Docker's checkpoint/restore is broken on current engines —
start --checkpointfails on the netns bind-mount (open upstream containerd#12141 / moby#37344). Podman does the full round trip (checkpoint --export/restore --import: process + memory + writable rootfs in a portable, node-independent archive). So this adds an optionalCheckpointRuntimesub-interface and a Podman backend that implements it — for migrating spot-evicted devcontainers without losing in-memory work.See
design/checkpoint-restore.md(primitive + empirical runtime matrix + why-not-docker) anddesign/podman-backend.md(backend impl).What's in it
Contract (backend-agnostic)
runtime.CheckpointRuntime+CheckpointSpec/RestoreSpec/CheckpointRef,Capabilities.Checkpoint, and typed errors (ErrCheckpointUnsupported,CheckpointFailedError,RestoreFailedError).Engine.Checkpoint/Engine.Restore— gate on the capability bit;Restorereturns a fully reattached*Workspace(re-inspect + rebuild config via areattachWorkspacehelper shared withAttach).Engine.CheckpointProject/RestoreProject— a thin orchestrator over the per-container primitive for multi-service projects: enumerate bycom.docker.compose.projectlabel, one archive per service + a manifest, reattach the devcontainer service asPrimary. Decoupled from the compose machinery (label-based only).Backend (
runtime/podman, Linux)*docker.Runtimeover Podman's docker-compatible socket for the standard surface; drives checkpoint/restore + buildah build through the libpod REST API on the same socket via a thin stdlibnet/httpclient. No CLI shell-out, nopkg/bindings(rejected: deps 76→384 + cgo/gpgme). Zero new modules; cross-compiles clean.Capabilities().Checkpointgates on libpod reachability plus an optionalOptions.CheckpointProbe— the REST transport has nocriu checkequivalent, so the deployer (who runs the service) asserts CRIU.Testing
RestoreFailedErrorpropagation), capability-probe gating, reattach metadata merge.PODMAN_SOCKET-gated; skip without a live Podman+CRIU host): single-container reattach, multi-service project round trip, and a two-phase cross-node test (restore on a fresh store that never pulled the image — proves the archive is self-contained, and pins the requirement that the workspace bind source must exist on the destination).test-integration-podmanjob installs podman+criu+crun+iptables and runs the gated tests for real (continue-on-erroruntil the hosted-runner CRIU capability is confirmed).🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes