Skip to content

Emit Events from the three config controllers#5514

Merged
ChrisJBurns merged 16 commits into
mainfrom
cburns/config-controllers-eventrecorder
Jun 16, 2026
Merged

Emit Events from the three config controllers#5514
ChrisJBurns merged 16 commits into
mainfrom
cburns/config-controllers-eventrecorder

Conversation

@ChrisJBurns

Copy link
Copy Markdown
Collaborator

Summary

The MCPOIDCConfig, MCPExternalAuthConfig, and MCPAuthzConfig controllers logged validation failures and deletion-blocked transitions but emitted no Kubernetes Event objects — so an operator running kubectl describe mcpauthzconfig <name> (or the siblings) couldn't see why a config was rejected or which workloads were blocking its deletion. Per the operator Status Condition Parity rule, all three siblings gain an EventRecorder in a single change so the parity is established as one logical step. Flagged as F15 in the multi-agent review of #4777.

Closes #5496

Note: stacked on #5509 (cburns/migrate-sibling-controllers-mutatepatchstatus) — these controllers only exist in their migrated form on that branch. Rebase to main once #5509 merges.

Medium level
  • New shared helper (config_controller_events.go): emitConfigEvent (nil-guarded Eventf wrapper), emitConfigRecoveryEvent (Normal ConfigValid, no-op unless recovering), conditionStatusIs (transition detection), and the shared ConfigInvalid / ConfigValid / DeletionBlocked event reasons.
  • Each reconciler gains Recorder events.EventRecorder, wired in app.go via mgr.GetEventRecorder("<resource>-controller"), plus an events.k8s.io/events RBAC marker (matching the five sibling controllers that already carry it; regenerated manifests show no diff since the permission was already granted).
  • Emission is transition-gated: the prior condition is read before MutateAndPatchStatus (which mutates conditions in place), so a Warning fires only on entering the invalid/blocked state and the recovery Normal fires only on the FalseTrue transition — steady-state reconciles produce no event spam.
  • ExternalAuth coverage: the inline-validation path, the OBO setInvalid path, and both success paths (handleConfigHashChange, updateReferencingWorkloads) are all covered.
  • Security: event messages carry only structural validation errors (err.Error() from Validate()) and logical workload Kind/Name references — never raw policy, secrets, bearer tokens, or fully-qualified internal URLs. A test asserts the ExternalAuth deletion event omits the token URL and secret ref.
Low level
File Change
cmd/thv-operator/controllers/config_controller_events.go New: shared event reasons/actions, emitConfigEvent, emitConfigRecoveryEvent, conditionStatusIs.
cmd/thv-operator/controllers/mcpoidcconfig_controller.go Recorder field + RBAC marker; emit ConfigInvalid, ConfigValid, DeletionBlocked.
cmd/thv-operator/controllers/mcpexternalauthconfig_controller.go Recorder field + RBAC marker; emit on inline + OBO failure, both recovery paths, and deletion-block.
cmd/thv-operator/controllers/mcpauthzconfig_controller.go Recorder field + RBAC marker; emit ConfigInvalid, ConfigValid, DeletionBlocked.
cmd/thv-operator/app/app.go Wire Recorder for all three controllers.
cmd/thv-operator/controllers/config_controller_events_test.go New: per-controller event tests (invalid → recovery, deletion-blocked), steady-state recovery path, nil-recorder safety, and message-sanitization assertions using events.NewFakeRecorder.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation
  • Other

Test plan

  • task test (operator controllers + app unit suites) passes
  • task build passes
  • gofmt + go vet ./cmd/thv-operator/... clean (local task lint-fix blocked by a known golangci-lint go1.24/1.26 toolchain mismatch; CI lint is unaffected)
  • New unit tests assert: Warning ConfigInvalid on validation failure, Normal ConfigValid on recovery (incl. the steady-state path), Warning DeletionBlocked when gated by a referencing workload, no event spam while staying invalid, nil-recorder safety, and that event messages don't leak the token URL / secret ref

Special notes for reviewers

  • The events.k8s.io RBAC markers were added for parity with the sibling controllers, but task operator-manifests produces no diff — the aggregated role already granted events: create;patch from the existing five controllers.
  • The integration suite (cmd/thv-operator/test-integration/...) was not run locally; it requires envtest control-plane binaries (/usr/local/kubebuilder/bin/etcd) that aren't installed in this environment.

Generated with Claude Code

ChrisJBurns and others added 6 commits June 12, 2026 18:09
Status writes now flow through controllerutil.MutateAndPatchStatus and
finalizer writes through controllerutil.MutateAndPatchSpec, matching the
MCPAuthzConfig controller migration in #4777. The previous r.Status().Update
calls sent full PUT bodies that would clobber conditions written by any
disjoint owner of Status.Conditions on this CRD; the previous r.Update calls
had no optimistic-lock guard around the finalizer array.

Condition and field mutations move inside the helper closures so the
pre-mutate snapshot reflects the live state rather than already containing the
change — a MutateAndPatchStatus prerequisite. A small helper,
setOIDCConfigValidTrueCondition, factors out the Valid=True transition.

Add a PreservesForeignConditions regression test mirroring the MCPAuthzConfig
guard, asserting a foreign-owned condition survives a reconcile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Status writes now flow through controllerutil.MutateAndPatchStatus and
finalizer writes through controllerutil.MutateAndPatchSpec, matching the
MCPAuthzConfig and MCPOIDCConfig controllers. The previous r.Status().Update
calls sent full PUT bodies that would clobber conditions owned by any disjoint
writer of Status.Conditions on this CRD; the previous r.Update calls had no
optimistic-lock guard around the finalizer array.

The IdentitySynthesized advisory was previously computed once in memory before
validation. Because MutateAndPatchStatus snapshots the object on entry and any
pre-mutate change is dropped from the merge patch, that upfront mutation is
removed; the advisory is now recomputed inside each status-write closure via a
new setValidTrueAndSynthesized helper (and directly on the validation-failure
and setInvalid paths). applyIdentitySynthesizedCondition is idempotent on the
same spec, so this preserves the advisory transition on every path. The
setInvalid doc comment is updated to drop the stale upfront-mutation reference.

Add a PreservesForeignConditions regression test mirroring the sibling guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The advisory helper's bool return was consumed only by the upfront in-memory
call removed in the MutateAndPatchStatus migration; every remaining caller
ignores it inside a status-write closure. Drop the return value to satisfy the
unparam linter and refresh the doc comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review feedback on the status-write migration:

- Gate the OIDC steady-state reference refresh on a captured findErr == nil
  rather than the referencingWorkloads != nil sentinel, decoupling the guard
  from findReferencingWorkloads' nil-on-error / non-nil-empty-on-success
  contract.
- Assert the migrated DeletionBlocked condition (and reference bookkeeping) is
  persisted in the OIDC blocking-deletion test, which previously only checked
  the requeue and finalizer retention.
- Use meta.FindStatusCondition in the externalauth PreservesForeignConditions
  test for parity with the OIDC and MCPAuthzConfig sibling guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MCPOIDCConfig, MCPExternalAuthConfig, and MCPAuthzConfig controllers
logged validation failures and deletion-blocked transitions but emitted no
Kubernetes Events, so `kubectl describe` could not show why a policy-shaped
config was rejected or which workloads blocked its deletion. Per the operator
Status Condition Parity rule, all three siblings gain an EventRecorder in one
change.

Each reconciler now carries `Recorder events.EventRecorder`, wired in app.go,
and emits: a Warning ConfigInvalid on entering the invalid state, a Warning
DeletionBlocked on entering the blocked state, and a Normal ConfigValid on
recovery. Emission is transition-gated (the prior condition is read before
MutateAndPatchStatus mutates it in place) so steady-state reconciles produce
no event spam. Messages carry only structural validation errors and logical
workload names — never raw policy, secrets, tokens, or internal URLs.

Closes #5496

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.01887% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.86%. Comparing base (5310d73) to head (5e9e04a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/app/app.go 0.00% 9 Missing ⚠️
...or/controllers/mcpexternalauthconfig_controller.go 87.50% 1 Missing and 3 partials ⚠️
...v-operator/controllers/mcpoidcconfig_controller.go 90.90% 1 Missing and 2 partials ⚠️
...-operator/controllers/mcpauthzconfig_controller.go 90.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5514      +/-   ##
==========================================
+ Coverage   69.84%   69.86%   +0.02%     
==========================================
  Files         647      648       +1     
  Lines       65796    65873      +77     
==========================================
+ Hits        45957    46024      +67     
- Misses      16494    16504      +10     
  Partials     3345     3345              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 12, 2026
jhrozek
jhrozek previously approved these changes Jun 16, 2026

@jhrozek jhrozek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The transition-gating is the tricky part here and it's done right — the wasInvalid/wasBlocked snapshot is read before MutateAndPatchStatus mutates conditions in place, so Warnings fire only on entering the bad state and the recovery Normal only on False->True. Verified the two ExternalAuth recovery sites are mutually exclusive, sanitization holds (events carry only Kind/Name, no token URL or secret ref), and the events.k8s.io RBAC markers produce no manifest diff since the aggregated role already grants them.

One minor thing, non-blocking: on the validation-failure path the event is emitted regardless of whether the status patch succeeded, so if the status write keeps failing the ConfigInvalid Warning will re-fire each reconcile. Fine to leave as-is given it's the only operator-visible signal in that case.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 16, 2026
Base automatically changed from cburns/migrate-sibling-controllers-mutatepatchstatus to main June 16, 2026 14:52
@ChrisJBurns ChrisJBurns dismissed jhrozek’s stale review June 16, 2026 14:52

The base branch was changed.

ChrisJBurns and others added 3 commits June 16, 2026 19:05
The PreservesForeignConditions tests were named for a guarantee they could not
actually fail on: under the fake client there is no concurrent writer between
Get and Patch, so a regression to a full r.Status().Update would still persist
the pre-seeded foreign condition. Rename them to
ReconcileKeepsExistingForeignCondition, which is what they verify (the
controller folds its own Valid=True into an existing condition set rather than
dropping foreign entries, catching the mutate-outside-the-closure bug).

Add TestMCPOIDCConfigReconciler_ConcurrentForeignConditionSurvivesMergePatch,
which uses WithInterceptorFuncs to inject a foreign condition into the store
between the reconciler's Get and its patch, then drives a reference-only
reconcile. The foreign condition survives only because MutateAndPatchStatus
sends a partial JSON merge-patch (omitting the unchanged conditions array); a
full PUT regression fails the test. This guards the shared
ctrlutil.MutateAndPatchStatus mechanism used by all three config controllers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lers-eventrecorder

# Conflicts:
#	cmd/thv-operator/controllers/mcpexternalauthconfig_controller.go
#	cmd/thv-operator/controllers/mcpexternalauthconfig_controller_test.go
#	cmd/thv-operator/controllers/mcpoidcconfig_controller.go
#	cmd/thv-operator/controllers/mcpoidcconfig_controller_test.go
Emitting the ConfigInvalid Warning event pushed MCPOIDCConfigReconciler.Reconcile
to cyclomatic complexity 16, over the gocyclo threshold of 15. Move the
validation-failure status write and event emission into a handleValidationFailure
helper. Pure refactor — no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 16, 2026
ChrisJBurns and others added 2 commits June 16, 2026 20:19
Addresses review feedback: the ConfigInvalid / DeletionBlocked Warning events
were emitted regardless of whether the MutateAndPatchStatus write succeeded.
Because the transition gate reads the prior condition before the patch, a
status write that keeps failing would never persist the new condition, so the
gate stays open and the Warning re-fires on every reconcile.

Capture the patch error and emit the Warning only when it is nil, so the event
fires once on a real, persisted transition. Applied to the validation-failure
and deletion-blocked paths across all three config controllers for parity; the
OBO setInvalid path already returned on patch error before emitting, and the
recovery Normal events already run only after a successful patch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
codespell flagged "PUTing" as a misspelling of "putting". Reword to "sending a
full PUT of" — same meaning, no false positive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ChrisJBurns

Copy link
Copy Markdown
Collaborator Author

Thanks for the review @jhrozek. Addressed the non-blocking note about the validation-failure event re-firing in 38b69ff3:

on the validation-failure path the event is emitted regardless of whether the status patch succeeded, so if the status write keeps failing the ConfigInvalid Warning will re-fire each reconcile.

The transition gate (!wasInvalid / !wasBlocked) now also requires the MutateAndPatchStatus write to have succeeded (updateErr == nil) before emitting. If the status write keeps failing, the condition never persists, so previously the gate stayed open and the Warning re-fired every reconcile; it now fires once on a real, persisted transition. Applied to both the validation-failure and deletion-blocked paths across all three config controllers for parity. The OBO setInvalid path already returned on patch error before emitting, and the recovery Normal events already run only after a successful patch.

Also rebased onto latest main (resolved the controller conflicts against the squash-merged #5509), fixed the gocyclo regression the events introduced on MCPOIDCConfigReconciler.Reconcile, and reworded a comment that tripped codespell (PUTing).

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 16, 2026
@ChrisJBurns ChrisJBurns merged commit 141f9a4 into main Jun 16, 2026
45 checks passed
@ChrisJBurns ChrisJBurns deleted the cburns/config-controllers-eventrecorder branch June 16, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add EventRecorder to MCPOIDCConfig, MCPExternalAuthConfig, and MCPAuthzConfig controllers

2 participants