Skip to content

worker: stop receiving new messages on SIGTERM#1165

Open
arpitjain099 wants to merge 1 commit into
ossf:mainfrom
arpitjain099:feat/worker-sigterm-stop-receiving
Open

worker: stop receiving new messages on SIGTERM#1165
arpitjain099 wants to merge 1 commit into
ossf:mainfrom
arpitjain099:feat/worker-sigterm-stop-receiving

Conversation

@arpitjain099

Copy link
Copy Markdown

Summary

When the analysis worker runs under Kubernetes it receives a SIGTERM and then a SIGKILL roughly 30 seconds later. Today the worker installs no signal handler, so it can pull a fresh pubsub message right before the SIGKILL and leave that analysis half-finished.

This is part 1 of #376: stop grabbing new work once a SIGTERM has been received.

What this changes

  • main now derives its context from signal.NotifyContext(..., syscall.SIGTERM, syscall.SIGINT), so the context is cancelled when the worker is asked to shut down (defer stop() restores default signal handling on exit). SIGINT is included so a local Ctrl-C behaves the same way as a Kubernetes SIGTERM.
  • The receive loop checks for a requested shutdown before calling sub.Receive, and returns cleanly instead of pulling another message. If the shutdown arrives while the loop is already blocked in Receive, the resulting cancelled-context error is now treated as a clean stop rather than a failure (a genuine receive error is still returned as before).
  • The "should we stop?" decision is factored into a small shutdownRequested(ctx) helper so it can be unit tested without standing up pubsub.

By the time the loop reaches the top, any message that was already being handled has finished, so this gives an in-flight analysis the chance to complete within the grace period before the loop exits.

Out of scope (deliberate follow-up)

Per @calebbrown's breakdown on the issue, this PR only covers item 1 (do not start new work after SIGTERM). Threading context through internal/analysis and wiring up a grace-period timer that cancels an in-flight analysis (items 2 and 3) is intentionally left for a separate change so this slice stays small and reviewable.

Testing

Added cmd/worker/main_test.go covering shutdownRequested for a live context, a cancelled context, and a real SIGTERM delivered to the process via signal.NotifyContext.

internal/packetcapture does not build natively on macOS (libpcap is Linux-only), so I ran the authoritative checks for Linux:

  • GOOS=linux go build ./cmd/worker/ and GOOS=linux go vet ./cmd/worker/: both clean.
  • go test ./cmd/worker/ run inside a golang:1.26 Linux container: all three tests pass.
  • gofmt clean.

Part of #376.

Kubernetes sends the analysis worker a SIGTERM and then a SIGKILL about
30 seconds later. Previously the worker had no signal handling, so it
could pull a fresh pubsub message right before being killed and leave
that analysis unfinished.

Install a SIGTERM/SIGINT handler via signal.NotifyContext in main and
have the receive loop check for a requested shutdown before pulling the
next message, returning cleanly instead of treating the cancelled
context as an error. The decision is factored into a small testable
shutdownRequested helper.

This is part 1 of ossf#376 (stop grabbing new work). Cancelling an
in-flight analysis through context after a grace period is a deliberate
follow-up and is not included here.

Signed-off-by: Arpit Jain <arpitjain099@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant