Skip to content

Podman: fix compose-devcontainer health + multi-service checkpoint/restore#102

Merged
bilby91 merged 3 commits into
mainfrom
podman/dap-health-and-checkpoint-restore
Jun 23, 2026
Merged

Podman: fix compose-devcontainer health + multi-service checkpoint/restore#102
bilby91 merged 3 commits into
mainfrom
podman/dap-health-and-checkpoint-restore

Conversation

@bilby91

@bilby91 bilby91 commented Jun 23, 2026

Copy link
Copy Markdown
Member

Validates the Podman backend against a real multi-service compose devcontainer (8 services) and fixes the issues that blocked it. Two independent fixes, one per commit — happy to split into two PRs if preferred.

1. Orchestrator-driven health probing on Podman

Problem: Podman runs a container's HEALTHCHECK as root and fires the first probe immediately at start (it ignores start_period for the initial run and exposes no per-restore flag to defer it — confirmed on 5.7.0). For privilege-dropping images this breaks startup: rabbitmq's rabbitmq-diagnostics probe, run as root, creates a root-owned /var/lib/rabbitmq/.erlang.cookie that the gosu-dropped uid-999 server then can't read (eacces), so rabbitmq exits 1. Docker is unaffected because it defers the first probe past the interval. Reproduces on bare podman run --health-cmd …, so it's a Podman behavior, not our translation.

Fix: The compose orchestrator probes health itself on backends that opt in via a new selfHealthProber interface (Podman returns true; Docker/Apple unchanged). When set: omit RunSpec.HealthCheck (no native eager-root probe), and evaluate service_healthy gates by exec'ing the compose test command in waitFor — which defers the first probe until after the service initializes, matching Docker.

See design/compose-native-health.md (incl. the deliberate two-path divergence and the out-of-scope rootless-gosu issue).

2. Multi-service project checkpoint/restore

Validating CheckpointProject/RestoreProject against a real 8-service compose project surfaced restore failures the single-container alpine spike never hit. Design decisions folded into design/checkpoint-restore.md (§3.3 / §7):

  • Recreate the project network before restoring — a cross-node restore lands on a fresh store with no <project>_default network and libpod fails with network not found. Idempotent; same-node unaffected.
  • Clear a stale same-named containerCheckpointProject(StopAfter) stops but doesn't remove, so same-node restore-in-place collides (ID already in use). Removes the deterministic <project>-<service>-1 name only when not running (state is in the archive; volumes preserved); refuses to clobber a running container (replace a dead husk, never duplicate).
  • IgnoreVolumes option (ProjectRestoreOptions/RestoreOptionsruntime.RestoreSpec → Podman ignoreVolumes=true) so same-node restore-in-place reuses existing volumes instead of colliding (volume already exists). Cross-node leaves it false (content comes from the archive).

Verified end-to-end including true cross-node migration with state preserved (a file in the app container and a Postgres row written pre-checkpoint were present post-restore on a separate, never-ran-this Podman store).

Two node prerequisites documented (node-provisioning, not library code):

  • CRIU skip-file-rwx-check in /etc/criu/default.conf (Postgres data-dir is 0700 at checkpoint vs the 0755 restored mountpoint — trips CRIU's file-mode check; Podman has no per-restore flag and the engine talks to a remote socket).
  • Base images present on the destination node (restore pulls docker://… if absent).

Timings (8-service project, ~580 MB)

  • Checkpoint: ~9–17s (local disk vs shared mount).
  • Restore: ~6–7s cross-node (warm images); ~5s same-node --ignore-volumes; up to ~20–52s when re-extracting volume content off slow storage.

Testing

  • Unit tests added (health probe translation + self-probe gate; project restore network step) and the affected suites pass.
  • Validated manually end-to-end on rootful Podman 5.7 / CRIU 4.2 / crun 1.21. Pre-existing runtime/applecontainer test failures are environmental (Apple container JSON decode) and unrelated.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added IgnoreVolumes option to skip restoring volume contents and reuse existing volumes.
    • Enhanced project restore: recreates the default compose network when needed and handles stale container name collisions safely.
    • Added optional orchestrator-driven Compose health probing using container exec for service_healthy gating.
  • Bug Fixes
    • Prevents restore failures by detecting and resolving conflicting container states during project restore.
  • Documentation
    • Updated checkpoint/restore and health probing design notes, including validation updates.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8ef8a85-c6bb-4d83-b0ad-168b2f72bb68

📥 Commits

Reviewing files that changed from the base of the PR and between 0c62dbe and 49eca6d.

📒 Files selected for processing (3)
  • checkpoint_project.go
  • compose/orchestrator.go
  • compose/orchestrator_health_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • compose/orchestrator_health_test.go
  • compose/orchestrator.go
  • checkpoint_project.go

📝 Walkthrough

Walkthrough

Two features are added: (1) an IgnoreVolumes option threaded from RestoreSpec through Engine.Restore/RestoreProject to Podman's restore query parameter, plus pre-restore network recreation and container-collision handling in RestoreProject; (2) an orchestrator-driven self-probed healthcheck path for Podman that disables native HEALTHCHECK and probes service_healthy gates via ExecContainer, with unit tests and design documents for both features.

Changes

Checkpoint / Restore fixes

Layer / File(s) Summary
IgnoreVolumes contract and Podman wiring
runtime/runtime.go, checkpoint.go, checkpoint_project.go, runtime/podman/checkpoint.go
IgnoreVolumes bool added to RestoreSpec, RestoreOptions, and ProjectRestoreOptions; Engine.Restore and Engine.RestoreProject forward the flag; Podman backend appends ignoreVolumes=true to the restore query when set.
RestoreProject network recreation and collision handling
checkpoint_project.go, checkpoint_project_test.go
Engine.RestoreProject recreates <project>_default network before restoring, detects existing containers by deterministic name, returns an error for running conflicts, removes stale stopped containers, and passes IgnoreVolumes into each per-service RestoreSpec. Fake runtime adds CreateNetwork.
Checkpoint/restore design document updates
design/checkpoint-restore.md
Updates §3.3 with network recreation and collision semantics, adds node prerequisites, records a real 8-service cross-node validation result, and revises open items.

Compose orchestrator self-probed health (Podman)

Layer / File(s) Summary
Podman PreferSelfProbedHealth capability
runtime/podman/podman.go
Adds (*Runtime).PreferSelfProbedHealth() bool returning true, documenting that the orchestrator should exec probes instead of relying on Podman's native HEALTHCHECK.
Orchestrator self-probe detection and RunSpec wiring
compose/orchestrator.go
Adds selfProbe bool to Orchestrator, defines the optional PreferSelfProbedHealth interface, initializes the flag in NewOrchestrator, and sets HealthCheck = {Disable: true} in the per-service RunSpec when selfProbe is active.
waitFor self-probe path and helpers
compose/orchestrator.go
gateLevel passes *runtime.HealthCheckSpec into waitFor; waitFor adds a selfProbe branch for service_healthy that calls probeHealthy instead of reading native health status. probeHealthy honors start_period grace and optional timeout; healthProbeCmd converts NONE/CMD/CMD-SHELL test specs into exec slices.
Orchestrator self-probe unit tests
compose/orchestrator_test.go, compose/orchestrator_health_test.go
mockRuntime gains an OnExec hook; new tests cover healthProbeCmd translation, selfProbe detection, waitFor success after retries, waitFor timeout yielding *HealthTimeoutError, and nil-healthcheck satisfied-when-running.
Compose native health design document
design/compose-native-health.md
New design doc covering the RabbitMQ root-cookie Podman failure, the self-probe opt-in decision, interface and probing semantics, implementation scope, native-vs-self-probe divergence, and out-of-scope rootless issues.

Sequence Diagram(s)

sequenceDiagram
  actor Caller
  participant Engine
  participant Orchestrator
  participant PodmanRuntime

  rect rgba(135, 206, 235, 0.5)
    Note over Caller,PodmanRuntime: Restore flow with IgnoreVolumes
    Caller->>Engine: RestoreProject(opts.IgnoreVolumes=true)
    Engine->>PodmanRuntime: CreateNetwork(<project>_default)
    PodmanRuntime-->>Engine: networkID
    Engine->>PodmanRuntime: InspectContainer(<project>-<svc>-1)
    alt stale stopped container exists
      Engine->>PodmanRuntime: RemoveContainer(force=true)
    end
    Engine->>PodmanRuntime: Restore(RestoreSpec{IgnoreVolumes: true})
    PodmanRuntime-->>Engine: restored Container
  end

  rect rgba(144, 238, 144, 0.5)
    Note over Caller,PodmanRuntime: Self-probe health gate flow
    Caller->>Orchestrator: up (service_healthy dependency)
    Orchestrator->>PodmanRuntime: RunContainer(RunSpec{HealthCheck: {Disable: true}})
    Orchestrator->>Orchestrator: gateLevel → waitFor(healthCheckSpec)
    loop poll until exit 0 or timeout
      Orchestrator->>PodmanRuntime: ExecContainer(healthProbeCmd)
      PodmanRuntime-->>Orchestrator: ExecResult
    end
    Orchestrator-->>Caller: gate satisfied (or HealthTimeoutError)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • crunchloop/devcontainer#98: Adds the core checkpoint/restore engine and Podman backend; this PR extends that restore path by introducing IgnoreVolumes option and refining RestoreProject behavior for same-node and cross-node scenarios.

Poem

🐇 Hoppity-hop, the volumes say "stay!"
Cross-node we jump, networks freshly made each day.
The rabbit probes health with an exec, not a prayer,
No root-owned cookies left lurking in there.
IgnoreVolumes=true when we're hopping in place—
Our checkpoints restored with impeccable grace! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the two main changes: Podman compose health probing and multi-service checkpoint/restore functionality, matching the PR's core contributions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch podman/dap-health-and-checkpoint-restore

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
design/checkpoint-restore-fixes.md (1)

66-68: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add language identifier to fenced code block.

The code block is missing a language specifier. Mark it as shell/bash output to pass linting and clarify intent.

📝 Proposed fix
-```
+```sh
 criu/files-reg.c: File var/lib/postgresql/data has bad mode 040755 (expect 040700)
-```
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design/checkpoint-restore-fixes.md` around lines 66 - 68, The fenced code
block containing the criu error message starting with "criu/files-reg.c: File
var/lib/postgresql/data has bad mode 040755" is missing a language identifier in
the opening fence. Add the language specifier "sh" to the opening triple
backticks (change ``` to ```sh) to properly mark this as shell output, which
will satisfy linting requirements and make the intent clear.

Source: Linters/SAST tools

design/compose-native-health.md (2)

20-22: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add language specifier to code block.

The fenced code block should declare a language for proper syntax highlighting and linting compliance.

📝 Proposed fix
-```
+```shell
 Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces
-```
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design/compose-native-health.md` around lines 20 - 22, The fenced code block
containing the error message "Error when reading
/var/lib/rabbitmq/.erlang.cookie: eacces" is missing a language specifier. Add
the language identifier "shell" immediately after the opening triple backticks
(```shell) to enable proper syntax highlighting and meet linting compliance
requirements.

92-104: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Use more formal phrasing in probe semantics section.

The prose on lines 104 ("Exit 0 → healthy → gate satisfied. Non-zero → keep polling until deadline") uses informal arrow notation. Consider a more structured phrasing for consistency with the rest of the technical specification.

Suggested revision:

Exit code 0 indicates the service is healthy and the gate is satisfied. Non-zero exit codes trigger continued polling until the deadline is reached.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design/compose-native-health.md` around lines 92 - 104, Replace the informal
arrow notation in the probe semantics section (the final bullet point starting
with "Exit 0") with more formal phrasing. Instead of using arrows to connect
concepts, rephrase this to use complete sentences with explicit structure, such
as stating that exit code 0 indicates the service is healthy and satisfies the
gate, while non-zero exit codes result in continued polling until the deadline
is reached. This will maintain consistency with the technical specification
style used throughout the document.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@checkpoint_project.go`:
- Around line 240-247: The InspectContainer call in the checkpoint restore logic
currently ignores errors other than nil. When InspectContainer fails with a real
error (daemon/API failure, permission issue), the code should explicitly handle
it instead of silently proceeding. After calling e.runtime.InspectContainer, add
a check for when ierr is not nil to determine if the error is a "not found"
error (which allows restore to proceed) or a genuine failure that should be
returned immediately with the actual error context. Only proceed with the
container removal logic in the RemoveContainer call when either InspectContainer
succeeds or the error is specifically a "not found" type error.

In `@compose/orchestrator.go`:
- Around line 506-511: In the orchestrator.go file within the self-probe mode
conditional block (if o.selfProbe), setting spec.HealthCheck to nil does not
actually disable the native healthcheck but instead inherits the image's
baked-in HEALTHCHECK. To fix this, replace the nil assignment with an explicit
disabled healthcheck configuration that prevents native healthchecks from
running, rather than relying on nil inheritance semantics.

---

Nitpick comments:
In `@design/checkpoint-restore-fixes.md`:
- Around line 66-68: The fenced code block containing the criu error message
starting with "criu/files-reg.c: File var/lib/postgresql/data has bad mode
040755" is missing a language identifier in the opening fence. Add the language
specifier "sh" to the opening triple backticks (change ``` to ```sh) to properly
mark this as shell output, which will satisfy linting requirements and make the
intent clear.

In `@design/compose-native-health.md`:
- Around line 20-22: The fenced code block containing the error message "Error
when reading /var/lib/rabbitmq/.erlang.cookie: eacces" is missing a language
specifier. Add the language identifier "shell" immediately after the opening
triple backticks (```shell) to enable proper syntax highlighting and meet
linting compliance requirements.
- Around line 92-104: Replace the informal arrow notation in the probe semantics
section (the final bullet point starting with "Exit 0") with more formal
phrasing. Instead of using arrows to connect concepts, rephrase this to use
complete sentences with explicit structure, such as stating that exit code 0
indicates the service is healthy and satisfies the gate, while non-zero exit
codes result in continued polling until the deadline is reached. This will
maintain consistency with the technical specification style used throughout the
document.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 813a4d00-2246-44b2-8a51-37ddca545343

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0c14c and 83aad2f.

📒 Files selected for processing (11)
  • checkpoint.go
  • checkpoint_project.go
  • checkpoint_project_test.go
  • compose/orchestrator.go
  • compose/orchestrator_health_test.go
  • compose/orchestrator_test.go
  • design/checkpoint-restore-fixes.md
  • design/compose-native-health.md
  • runtime/podman/checkpoint.go
  • runtime/podman/podman.go
  • runtime/runtime.go

Comment thread checkpoint_project.go Outdated
Comment thread compose/orchestrator.go
bilby91 and others added 2 commits June 23, 2026 14:02
Podman runs a container's HEALTHCHECK as root and fires the first probe
immediately at start (it ignores start_period for the initial run, and
exposes no per-restore flag to defer it — confirmed on 5.7.0). For
privilege-dropping images this breaks the service: rabbitmq's
`rabbitmq-diagnostics` probe, run as root, creates a root-owned
/var/lib/rabbitmq/.erlang.cookie that the gosu-dropped uid-999 server
then can't read (eacces), so rabbitmq exits 1. Docker is unaffected
because it defers the first probe past the interval.

Have the compose orchestrator probe health itself on backends that opt
in via the new selfHealthProber interface (Podman returns true; Docker
and Apple unchanged). When set: omit RunSpec.HealthCheck so the backend
runs no native healthcheck, and evaluate service_healthy gates by
exec'ing the compose test command in waitFor — which defers the first
probe until after the service has initialized, matching Docker.

See design/compose-native-health.md (incl. the two-path divergence and
the out-of-scope rootless-gosu issue).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Validating CheckpointProject/RestoreProject against a real multi-service
compose devcontainer (8 services) surfaced restore failures the
single-container alpine spike never hit. Fixes:

- RestoreProject recreates the <project>_default network before
  restoring containers. A cross-node restore lands on a fresh store with
  no network, and libpod restore fails with "network not found". Mirrors
  how compose.Orchestrator.Up names + labels it; idempotent, so same-node
  is unaffected. (Custom/extra networks aren't recorded yet — follow-up.)

- RestoreProject clears a stale same-named container before restore.
  CheckpointProject(StopAfter) stops but does not remove containers, so a
  same-node restore-in-place collides ("ID already in use"). Inspect the
  deterministic <project>-<service>-1 name and remove it only when NOT
  running (state is in the archive; RemoveVolumes stays false so the data
  volume survives); refuse to clobber a running container.

- Add IgnoreVolumes to ProjectRestoreOptions/RestoreOptions ->
  runtime.RestoreSpec -> Podman restore ignoreVolumes=true, so a
  same-node restore-in-place reuses existing volumes instead of colliding
  ("volume already exists"). Cross-node leaves it false (content comes
  from the archive).

Folded the design decisions + two node prerequisites (CRIU
skip-file-rwx-check; base images on the destination) into
design/checkpoint-restore.md §3.3/§7. Verified end-to-end incl. true
cross-node migration with state preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bilby91 bilby91 force-pushed the podman/dap-health-and-checkpoint-restore branch from 83aad2f to 0c62dbe Compare June 23, 2026 17:02
@bilby91 bilby91 changed the title Podman: fix DAP devcontainer health + multi-service checkpoint/restore Podman: fix compose-devcontainer health + multi-service checkpoint/restore Jun 23, 2026
Two fixes from PR review:

- compose: in self-probe mode set RunSpec.HealthCheck to {Disable:true}
  rather than nil. nil means "inherit from image" (toHealthcheck), so an
  image-baked HEALTHCHECK would still run natively on Podman and could
  reintroduce the eager-root-probe failure. Disable emits the NONE
  sentinel that actually suppresses it. Adds a regression test.

- checkpoint: RestoreProject now distinguishes a not-found container
  (normal fresh/cross-node case → proceed) from any other InspectContainer
  error (daemon/API/permission), which is surfaced instead of masked
  behind a downstream restore error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bilby91 bilby91 merged commit dcde75c into main Jun 23, 2026
19 checks passed
bilby91 added a commit that referenced this pull request Jun 24, 2026
Document everything merged since v0.3.0: Podman backend +
checkpoint/restore (#98), compose feature security metadata + entrypoint
chaining (#103), Podman compose health probing (#102), deps bump (#101),
and the prebuild dev-environment / CI tooling (#88#93).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant