fix: eliminate race condition in crash_after_dequeue_before_append_completion test by pinodeca · Pull Request #20 · microsoft/duroxide

pinodeca · 2026-05-18T14:29:56Z

This PR eliminates flaky external duplicate event tests by splitting the logic into two distinct, deterministic test cases: one for same-scoop behavior and one for different-scoop-after-terminal behavior.

Problem

Previously, duplicate external events could be fetched in either a single dispatcher scoop or across separate scoops. The old live-runtime enqueue test was non-deterministic because it could observe either 1 or 2 ExternalEvents depending on timing.

Approach

external_duplicate_workitems_same_scoop_materializes_both: This test forces the same scoop by waiting for the subscription to be active, shutting down the runtime, enqueueing both duplicate events while it is down, and then restarting. It expects 2 ExternalEvents to be materialized.
crash_after_dequeue_before_append_completion_different_scoop_after_terminal: This case simulates a race where the first event is fetched/locked but not yet acknowledged. It waits for the lock to expire, restarts the runtime with only the recovered row, waits for completion, enqueues a late duplicate after the terminal state is reached, waits for the queue to drain, and expects only 1 ExternalEvent.

Tests

Verified with:

cargo nt --test reliability_tests

…mpletion test The test had a timing race: the runtime could dequeue and process the first ExternalRaised (completing the orchestration) before the second was enqueued. When the second arrived in a subsequent batch, the orchestration was already terminal, so the dispatcher acked it without materializing — resulting in 1 ExternalEvent instead of the expected 2. Fix: shut down the runtime after the subscription is confirmed, enqueue both duplicates while the dispatcher is stopped, then restart. This guarantees both land in the same batch, faithfully simulating the real crash window (where lock expiry makes both copies visible simultaneously).

pinodeca

Here is GPT-5.5 assessment of the 1st commit/attempt to fix the race condition (by Opus 4.6). Subsequently, GPT-5.5 was used for the second attempt (2nd commit in this PR).

The current fix makes the test deterministic, but it no longer exercises the crash window named by the test. By shutting the runtime down before either duplicate ExternalRaised message is enqueued, the test only verifies that two never-dequeued visible messages are fetched in the same batch. That overlaps with the existing duplicate external work-item coverage and would not catch a regression in recovery of a previously dequeued-but-unacked row after lock expiry.

The test should instead stop the runtime after the subscription is recorded, enqueue one ExternalRaised message, directly fetch it through the provider with a short orchestration lock timeout without acking it, wait for the lock to expire, enqueue the duplicate/new copy, then restart the runtime. That models crash-after-dequeue-before-append: one expired locked row plus one fresh visible row must be fetched together and both materialized before replay delivers only one to the pending subscription.

pinodeca commented May 18, 2026

View reviewed changes

pinodeca added 3 commits May 18, 2026 15:19

test: simulate crash-before-append via lock expiry

0850783

test: clarify crash recovery scenario comments

26c6c67

test: split external event duplicate race cases

a554c31

pinodeca force-pushed the pinodeca/fix-race branch from 2bfd447 to a554c31 Compare May 18, 2026 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: eliminate race condition in crash_after_dequeue_before_append_completion test#20

fix: eliminate race condition in crash_after_dequeue_before_append_completion test#20
pinodeca wants to merge 4 commits into
mainfrom
pinodeca/fix-race

pinodeca commented May 18, 2026 •

edited

Loading

Uh oh!

pinodeca left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pinodeca commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach

Tests

Uh oh!

pinodeca left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pinodeca commented May 18, 2026 •

edited

Loading

pinodeca left a comment •

edited

Loading