Skip to content

Update BanyanDB self-observability (SWIP-15) for the demo env#271

Merged
wu-sheng merged 1 commit into
mainfrom
update-banyandb-so11y
Jun 11, 2026
Merged

Update BanyanDB self-observability (SWIP-15) for the demo env#271
wu-sheng merged 1 commit into
mainfrom
update-banyandb-so11y

Conversation

@wu-sheng

Copy link
Copy Markdown
Member

What

Adopts the SWIP-15 cluster/node/group self-observability model for BanyanDB in the showcase (both Docker and Kubernetes modes), and refreshes the rendered demo.yaml.

This depends on SWIP-15 having merged into OAP (apache/skywalking#13903).

Changes

Versions (Makefile.in)

  • OAP → c9f816a4de — SWIP-15 redesigned BanyanDB so11y rules.
  • BanyanDB → c2d925e4e (#1169) — adds the queue batch/message metric families the new instance/endpoint rules read.

Docker single-node (standalone, no FODC proxy)

  • otel-collector-config-banyandb.yaml: scrape banyandb:2121 and inject the identity labels the new MAL rules key on — cluster / container_name / node_role / node_type / pod_name — replacing the obsolete host_name: root[root]. Mirrors the SWIP-15 e2e's no-FODC static-label pattern.
  • docker-compose.single-node.yaml: switch the OAP rule-enable from banyandbbanyandb/*. The rules now live in otel-rules/banyandb/ (3 files); bare banyandb matches no file and fails OAP startup.

Kubernetes (BanyanDB cluster + FODC proxy — the documented production model)

  • feature-banyandb-monitor/opentelemetry-config.yaml: scrape the FODC proxy aggregated /metrics (component=fodc-proxy, port http) and inject only cluster; the proxy stamps the rest. Replaces the old per-pod host_name/service_instance_id model. Adds scrape_interval: 10s so the new .rate('PT15S') write/query/error metrics populate (default 60s is too coarse).
  • values.yaml: make banyandb.cluster.fodc.enabled explicit and expose the proxy http service as ClusterIP (chart default is LoadBalancer).

demo.yaml — regenerated. Now renders banyandb-helm 0.6.0 with the FODC proxy and the new images; the previous snapshot was the pre-FODC 0.5.0-rc0 / etcd chart (a year stale), so the diff is large.

Verification

Deployed the full stack to a local kind cluster (BanyanDB cluster hot/warm/cold + liaison + FODC proxy/agents + OAP + otel-collector) and confirmed end-to-end:

  • FODC proxy /metrics emits container_name/node_role/node_type/pod_name across all nodes.
  • otel collector scrapes the proxy and injects cluster: showcase-banyandb.
  • OAP loads the SWIP-15 banyandb rules (service/instance/endpoint).
  • listServices(layer:"BANYANDB")showcase-banyandb; instances resolve as pod@container with role/type; endpoints resolve as storage groups (sw_metricsMinute, sw_metadata, …).
  • Gauge metrics (total_cpu_cores, total_memory_used, reporting_instances) and rate metrics (cluster_write_rate, cluster_query_rate) all populate; cluster_error_rate 0 (healthy).

🤖 Generated with Claude Code

Adopt the SWIP-15 cluster/node/group self-observability model for BanyanDB
in both deploy modes, and refresh the rendered demo.yaml.

Versions (Makefile.in):
- OAP -> c9f816a4de (SWIP-15 #13903: redesigned BanyanDB so11y rules).
- BanyanDB -> c2d925e4e (#1169: adds the queue batch/message metric
  families the new instance/endpoint rules read).

Docker single-node (standalone, no FODC proxy):
- otel-collector-config-banyandb.yaml: scrape banyandb:2121 and inject the
  identity labels the new MAL rules key on (cluster/container_name/node_role/
  node_type/pod_name) instead of the obsolete host_name.
- docker-compose.single-node.yaml: switch the OAP rule-enable from `banyandb`
  to `banyandb/*` (the rules now live in the otel-rules/banyandb/ subdir;
  bare `banyandb` matches no file and fails OAP startup).

Kubernetes (BanyanDB cluster + FODC proxy):
- feature-banyandb-monitor/opentelemetry-config.yaml: scrape the FODC proxy's
  aggregated /metrics (component=fodc-proxy, port http) and inject only the
  `cluster` label; the proxy stamps container_name/node_role/node_type/
  pod_name. Add scrape_interval: 10s so the new .rate('PT15S') write/query/
  error metrics populate (the default 60s is too coarse).
- values.yaml: make banyandb.cluster.fodc.enabled explicit and expose the
  proxy http service as ClusterIP (chart default is LoadBalancer).

Regenerate demo.yaml (now renders banyandb-helm 0.6.0 with the FODC proxy and
the new images; the previous snapshot was the pre-FODC 0.5.0-rc0 / etcd chart).

Verified end-to-end on a local kind cluster (BanyanDB cluster + FODC proxy +
OAP + otel): the BANYANDB-layer service/instance/endpoint entities resolve and
both gauge (cpu/mem) and rate (write/query) metrics populate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wu-sheng wu-sheng merged commit 8acea24 into main Jun 11, 2026
@wu-sheng wu-sheng deleted the update-banyandb-so11y branch June 11, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants