logical,txnmode: add throughput and latency metrics by DarrylWong · Pull Request #171191 · cockroachdb/cockroach

DarrylWong · 2026-05-29T19:12:03Z

The first two commits are part of #169672. We only require the first commit which could go in this PR as well i.e. the other PR is not blocking.

This change wires up the following metrics for txn mode:

logical_replication.events_ingested
logical_replication.events_ingested_by_label
logical_replication.logical_bytes
logical_replication.commit_latency
logical_replication.batch_hist_nanos

DLQ metrics are skipped for now as the DLQ path does not exist yet.

Release note: none
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-61283
Informs: #169872

trunk-io · 2026-05-29T19:12:08Z

Merging to master in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

cockroach-teamcity · 2026-05-29T19:12:21Z

This change is

DarrylWong · 2026-05-29T19:37:19Z

We want to start adding metrics to ldr txn mode, including the existing metrics. Txn mode is implemented as individual sub systems which will cause a dependency cycle if we attempt to import the original logical package. This change extracts the existing metrics struct to its own package. Release note: None

jeffswenson · 2026-06-01T12:29:57Z

 		if waitingTxn.remainingDeps == 0 {
 			if waitingTxn.EventHorizon.LessEq(a.getGlobalFrontierLocked()) {
 				readyBuffer.AddLast(waitingTxn.Transaction)
+				a.metrics.TxnApplierBlockedTxns.Dec(1)


What do you think of using labeled metrics and model the different states of a transaction in the applier as different labels. I think the states are broadly:

ready := can be applied, is waiting for an applier to pick it up

applying := was dispatched to an applier

txn-wait := is waiting on a transaction

horizion-wait := is waiting on the horizion

Neat, I didn't know about metric states.

How would you feel if we had just one metric for ready+applying txns? Ready+applying is easy to calculate, but splitting them up would require some non trivial refactoring; the readyBuffer is a locally scoped var on the aggregator that we'd have to extract to the applier and then wrap around a mutex. My current thinking is that this isn't worth the complexity because I don't think the split up metrics are that useful of a signal.

It would let you observe the case where we have ready txns, but for some reason we never hand it off to the applier. Given the handoff is just a select statement over a channel this seems unlikely? I'm imagining the more realistic stuck boundary is that we can't apply a txn for some reason and that blocks everything else. i.e. if ready+applying is growing/stalled, then it seems very likely it's applying thats the one blocked.

Let me know what you think though, happy to take a stab at lifting it out.

jeffswenson · 2026-06-01T12:31:12Z

 			if waitingTxn.EventHorizon.LessEq(a.getGlobalFrontierLocked()) {
 				readyBuffer.AddLast(waitingTxn.Transaction)
+				a.metrics.TxnApplierBlockedTxns.Dec(1)
+				a.metrics.TxnApplierReadyTxns.Inc(1)


consider: instead of trying to have all of these explicit increments and decrements, we could define a set of child metrics and have a goroutine periodically lock the applier stats and calculate the metrics at that time. I think that might be simpler than trying to keep everything perfectly in sync and it keeps the metric logic out of the critical code path.

jeffswenson · 2026-06-01T13:02:32Z

 		LabeledScanningRanges: metric.NewExportedGaugeVec(metaLabeledScanningRanges, []string{"label"}),
 		LabeledCatchupRanges:  metric.NewExportedGaugeVec(metaLabeledCatchupRanges, []string{"label"}),
+
+		TxnApplierBlockedTxns: metric.NewGauge(metaTxnApplierBlockedTxns),


The new metrics should also accept the scope label.

jeffswenson · 2026-06-01T13:03:46Z

 				}
 			}
+			a.metrics.AppliedRowUpdates.Inc(int64(txn.applyResult.AppliedRows))
+			if a.metricsLabel != "" {


Can we have the txnwriter track the commit latency and throughput metrics? The applier is probably the most complex part of txnldr, so I would like to keep as much complexity out of it as possible.

Done, but I'll call out that we add one extra loop by putting it on this layer, since we now need to process the results of our apply whereas before we just returned them directly. I think this should be negligible overhead though.

This change adds metrics to see in flight transactions in the applier pipeline. The number is broken down into txn blocked, horizon blocked and ready txns. Txn blocked means the transaction is not yet able to be committed it is waiting on a peer transaction to resolve first. Horizon blocked means that the transaction's dependencies are resolved but we are waiting on the event horizon to pass. Ready txns is the number of txns that are ready to be commited but haven't yet, i.e. on the ready buffer. Release note: None

This change wires up the following metrics for txn mode: 1. logical_replication.events_ingested 2. logical_replication.events_ingested_by_label 3. logical_replication.logical_bytes 4. logical_replication.commit_latency 5. logical_replication.batch_hist_nanos DLQ metrics are skipped for now as the DLQ path does not exist yet.

DarrylWong force-pushed the ldr-txn-wire-up-metrics branch 2 times, most recently from c8eac63 to 8e45b11 Compare May 29, 2026 19:43

DarrylWong changed the title ~~Ldr txn wire up metrics~~ logical,txnmode: add throughput and latency metrics May 29, 2026

DarrylWong force-pushed the ldr-txn-wire-up-metrics branch from 8e45b11 to ed4cca5 Compare May 29, 2026 21:03

DarrylWong requested a review from jeffswenson May 29, 2026 21:04

DarrylWong marked this pull request as ready for review May 29, 2026 21:04

DarrylWong requested review from a team as code owners May 29, 2026 21:04

DarrylWong requested review from DrewKimball and removed request for a team May 29, 2026 21:04

jeffswenson reviewed Jun 1, 2026

View reviewed changes

DarrylWong mentioned this pull request Jun 1, 2026

logical, txnapply: add in flight txn metrics #169672

Closed

DarrylWong force-pushed the ldr-txn-wire-up-metrics branch 2 times, most recently from 770597a to 8c8c113 Compare June 1, 2026 21:45

DarrylWong added 3 commits June 1, 2026 17:57

fixup: move latency+throughput metrics to transaction writer

7ff99c2

DarrylWong force-pushed the ldr-txn-wire-up-metrics branch from 8c8c113 to 7ff99c2 Compare June 1, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logical,txnmode: add throughput and latency metrics#171191

logical,txnmode: add throughput and latency metrics#171191
DarrylWong wants to merge 4 commits into
cockroachdb:masterfrom
DarrylWong:ldr-txn-wire-up-metrics

DarrylWong commented May 29, 2026 •

edited

Loading

Uh oh!

trunk-io Bot commented May 29, 2026

Uh oh!

cockroach-teamcity commented May 29, 2026

Uh oh!

DarrylWong commented May 29, 2026

Uh oh!

jeffswenson Jun 1, 2026

Uh oh!

DarrylWong Jun 1, 2026

Uh oh!

jeffswenson Jun 1, 2026

Uh oh!

jeffswenson Jun 1, 2026

Uh oh!

DarrylWong Jun 1, 2026

Uh oh!

jeffswenson Jun 1, 2026

Uh oh!

DarrylWong Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DarrylWong commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trunk-io Bot commented May 29, 2026

Uh oh!

cockroach-teamcity commented May 29, 2026

Uh oh!

DarrylWong commented May 29, 2026

Uh oh!

jeffswenson Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

DarrylWong Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeffswenson Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeffswenson Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

DarrylWong Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeffswenson Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

DarrylWong Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarrylWong commented May 29, 2026 •

edited

Loading