Skip to content

Add generic webhook notifier for alerts and reports#9643

Open
dfliess wants to merge 10 commits into
rilldata:mainfrom
dfliess:feature/webhook-notifier
Open

Add generic webhook notifier for alerts and reports#9643
dfliess wants to merge 10 commits into
rilldata:mainfrom
dfliess:feature/webhook-notifier

Conversation

@dfliess

@dfliess dfliess commented Jul 2, 2026

Copy link
Copy Markdown

Alerts and scheduled reports can currently notify via email and Slack only. Integrating
with anything else — PagerDuty, Opsgenie, n8n/Zapier/Make flows, or an in-house system —
requires abusing Slack incoming webhooks or polling the API.

This PR adds a generic webhook notifier driver, completing the connector-agnostic
Notifiers {connector, properties} design introduced in #4371. Because that groundwork
exists, the change is contained: no runtime proto changes and no new Go dependencies
(delivery uses hashicorp/go-retryablehttp, already a direct dependency; signing is stdlib
crypto/hmac). The only proto change is two additive webhook_urls fields on the admin
API's AlertOptions/ReportOptions for the UI dialogs.

YAML surface (alerts and reports)

notify:
  webhook:
    urls:
      - https://example.com/rill-hook
    connector: my_hook   # optional — named connector instance; defaults to "webhook"

Parity with the slack block: one entry appended to Notifiers, multiplicity inside
properties. The optional connector: key selects a named connector instance (the
reconciler already resolves notifiers by connector name), enabling per-receiver
secrets/headers without new machinery.

Connector config (via connector.<name>.* variables or a connector YAML, never in project
files): signing_secret (secret) and headers (map, e.g. an Authorization header for
receivers behind an API gateway — same pattern as the https driver).

Payload

Versioned envelope, two event types mirroring the Notifier interface:

{
  "id": "unique-per-delivery",
  "type": "alert.status",            // or "report.scheduled"
  "version": 1,
  "timestamp": "2026-07-02T15:04:05Z",
  "data": {
    "display_name": "",
    "execution_time": "",
    "status": "PASS | FAIL | ERROR",
    "is_recover": false,
    "fail_row": { "…": "" },
    "execution_error": "",
    "open_link": "",
    "edit_link": ""
  }
}

report.scheduled carries display_name, report_time, download_format, summary,
open_link and download_link. fail_row is included for parity with the email/Slack
notifiers; ToEmail/ToName are excluded (email-specific, per the #4371 review), and no
unsubscribe link is sent (webhook receivers are anonymous).

Signing: Standard Webhooks

Signatures follow the Standard Webhooks spec exactly
(webhook-id / webhook-timestamp / webhook-signature headers, HMAC-SHA256 over
{id}.{timestamp}.{body}, whsec_ secret format), so receivers can verify with the
existing libraries in 11+ languages. Implemented with ~15 lines of stdlib code and
unit-tested against the spec's official test vectors. Without a signing_secret the
request is sent unsigned (capability-URL receivers like Zapier/n8n — same spirit as Slack
incoming webhooks working without a bot token). The connector's Ping validates that the
secret is well-formed so misconfigurations surface early.

Delivery semantics

  • go-retryablehttp: 3 attempts, exponential backoff (1s → 4s), 10s per-attempt timeout;
    retries on 5xx/429/network errors, hard-fails on other 4xx. Success = any 2xx.
  • All URLs are always attempted; errors are aggregated with errors.Join and surface as
    the execution's ERROR result. Since executions are the only delivery-visibility
    surface, each per-URL error is self-sufficient (URL + status + attempts).
  • No persistent queue / deferred redelivery — that is a delivery-platform concern, out of
    scope for a notifier driver. The stable id lets receivers deduplicate.

UI

Mirrors the Slack pattern: a "Webhook notifications" section (toggle + URL multi-input) in
the alert and report create/edit dialogs, gated on ListNotifierConnectors reporting a
configured webhook connector (with a "not configured" docs hint otherwise), and the
destinations listed on the alert/report detail pages. The connector: key stays YAML-only,
like named Slack instances today. New strings use paraglide messages in en.json; Spanish
translations ship as an inert es.json so this composes cleanly with #9635 in either
merge order.

Drive-by fix

popCurrentExecution in the alert reconciler dereferenced adminMeta unguarded in the
non-email notifier path — with an admin service that doesn't implement alert metadata
(e.g. the test runtime's noop admin, or local development), any non-email notifier
(including Slack today) panics. The email path already guards this; the fix applies the
same guard, sending the notification without open/edit links, matching the documented
intent.

Tested

Unit tests for the driver (signing vectors, retry policy, per-URL error aggregation,
dedupe, unsigned mode, Ping) and the parser (alerts, reports, URL validation, default and
named connectors). Exercised end-to-end on our self-hosted deployment: alert and report
created through the dialogs, alert.status and report.scheduled (ad hoc and cron)
delivered and signature-verified against a local receiver.

Open questions

  1. SSRF hardening — the runtime POSTs to user-supplied URLs. The exposure already
    exists via notify.slack.webhooks, so this is not a regression, but: would you want an
    optional allowed_url_prefixes allowlist (precedent: path_prefixes in the https
    driver) or a private-IP guard in v1?
  2. In-process retries — worst case adds ~45s inside the alert reconcile. Acceptable,
    or would you prefer single-attempt (Slack parity)?
  3. Docs — added as docs/docs/developers/build/connectors/services/webhook.md; should
    the payload JSON schema be published more formally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant