fix(sdk,core): head-start handover correctness and continuation boot latency#3907
Conversation
…s is registered The turn-0 handover splice only ran on the default accumulation path, so agents registering hydrateMessages lost the warm route's step-1 response: pure-text turns fired onTurnComplete with no assistant message, tool-call turns re-ran step 1 from scratch under a fresh messageId, and the head-start user message never reached the hydrate hook. The first-turn history now reaches hydrateMessages as incoming messages, and the splice runs after both accumulation branches, deduped by the handover messageId.
synthesizeHandoverUIMessage only mapped text and tool-call parts, so an extended-thinking model's step-1 reasoning streamed to the browser but never reached the durable session history: onTurnComplete, chat.history, and reloads all lost it. Reasoning parts now map through with provider metadata so Anthropic thinking signatures survive the UIMessage round trip on hydrate replays.
…on cursor scan The .in resume cursor was found by draining an SSE subscription that only closes after its 5 second inactivity window, and the scan ran twice per continuation boot (once for the replay cursor, once for the subscribe cursor), stalling every continuation around 10 seconds before the first turn. The scan is now a non-blocking records read of the latest turn-complete header, runs at most once per boot, the snapshot and replay reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
WalkthroughThis PR optimizes chat agent boot performance by replacing SSE long-poll cursor discovery with non-blocking record reads and concurrent snapshot/replay operations. It persists the 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install timed out. The project may have too many dependencies for the sandbox. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…nsiently A scan that threw was treated the same as one that found no cursor, so the resume-cursor block skipped its retry and the live subscription could replay from the start. Only a successful lookup (including a genuine no-cursor-yet answer) is shared now; a throw leaves the retry available.
🦋 Changeset detectedLatest commit: 9042df6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/plugins
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
## Summary 7 improvements, 1 bug fix. ## Improvements - `trigger init` now sets up your AI coding assistant as part of project setup: pick the MCP server, the agent skills, or both, then scaffold with the CLI or hand off to your assistant. Adds a new `getting-started` agent skill that teaches assistants how to bootstrap Trigger.dev (install the SDK, write `trigger.config.ts`, create a first task, run `trigger dev`), so the AI-driven setup path works end to end. It ships in the CLI alongside the existing skills, version-matched to your SDK. ([#3872](#3872)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - `trigger skills` installs Trigger.dev agent skills into your coding agent so it knows how to write tasks, schedules, realtime, and chat.agent code. The skills ship with the CLI and are copied into each tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and Codex / AGENTS.md), and `trigger dev` offers to install them on first run. ([#3868](#3868)) - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Record client-side dequeue API latency in the supervisor consumer pool as a Prometheus histogram (`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`: success/empty/error). ([#3887](#3887)) - Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment` schemas for the new `GET /api/v1/projects/{projectRef}/environments` endpoint, which lists the parent environments (dev, staging, preview, prod) a personal access token can access for a project. Dev is scoped to the token owner and branch (preview child) environments are excluded. ([#3880](#3880)) ## Bug fixes - Fix two `chat.createSession()` bugs: stopping a generation no longer wedges the run (the turn loop raced a `totalUsage` promise that never settles after a stop-abort), and continuation runs now wait for the next message instead of invoking the model with an empty prompt. ([#3920](#3920)) <details> <summary>Raw changeset output</summary>⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`.⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## trigger.dev@4.5.0-rc.6 ### Patch Changes - `trigger init` now sets up your AI coding assistant as part of project setup: pick the MCP server, the agent skills, or both, then scaffold with the CLI or hand off to your assistant. Adds a new `getting-started` agent skill that teaches assistants how to bootstrap Trigger.dev (install the SDK, write `trigger.config.ts`, create a first task, run `trigger dev`), so the AI-driven setup path works end to end. It ships in the CLI alongside the existing skills, version-matched to your SDK. ([#3872](#3872)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - `trigger skills` installs Trigger.dev agent skills into your coding agent so it knows how to write tasks, schedules, realtime, and chat.agent code. The skills ship with the CLI and are copied into each tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and Codex / AGENTS.md), and `trigger dev` offers to install them on first run. ([#3868](#3868)) ```bash trigger skills --target claude-code ``` Replaces the previous `install-rules` command, which stays as an alias. - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` - `@trigger.dev/build@4.5.0-rc.6` - `@trigger.dev/schema-to-json@4.5.0-rc.6` ## @trigger.dev/core@4.5.0-rc.6 ### Patch Changes - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Record client-side dequeue API latency in the supervisor consumer pool as a Prometheus histogram (`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`: success/empty/error). ([#3887](#3887)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment` schemas for the new `GET /api/v1/projects/{projectRef}/environments` endpoint, which lists the parent environments (dev, staging, preview, prod) a personal access token can access for a project. Dev is scoped to the token owner and branch (preview child) environments are excluded. ([#3880](#3880)) ## @trigger.dev/python@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.6` - `@trigger.dev/core@4.5.0-rc.6` - `@trigger.dev/build@4.5.0-rc.6` ## @trigger.dev/react-hooks@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/redis-worker@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/rsc@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/schema-to-json@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/sdk@4.5.0-rc.6 ### Patch Changes - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Fix `chat.headStart` when `hydrateMessages` is registered. The warm route's step-1 partial now reaches the agent's accumulator on the hydrate path, so `onTurnComplete` carries the full first turn (the head-start user message included), tool-call handovers resume from step 2 instead of re-running step 1, and the assistant `messageId` stays stable across the handover. ([#3907](#3907)) - Preserve reasoning parts across the `chat.headStart` handover. Extended-thinking models' step-1 reasoning now lands in the durable session history (and `onTurnComplete`) under the same assistant `messageId`, with provider metadata intact so Anthropic thinking signatures survive replays. ([#3907](#3907)) - Fix two `chat.createSession()` bugs: stopping a generation no longer wedges the run (the turn loop raced a `totalUsage` promise that never settles after a stop-abort), and continuation runs now wait for the next message instead of invoking the model with an empty prompt. ([#3920](#3920)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/plugins@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
… page (triggerdotdev#3908) ## Summary Two documentation improvements for the AI chat docs. **Head-start persistence contract.** The fast starts page now documents what your hooks can rely on across a head-start handover: one stable assistant `messageId` for the whole turn, `onTurnComplete` as the canonical persistence point, reasoning parts flowing into durable history, and how Head Start composes with `hydrateMessages` (the first-turn history arrives as `incomingMessages`, and the runtime splices the warm partial onto the hydrated chain, deduplicated by id). The hydrate examples on the lifecycle hooks and database persistence pages now upsert their conversation row, since head-start first turns run without a preload to create it. **Sessions page.** The page opened with "a durable, task-bound, bi-directional I/O channel pair", which reads as jargon and omitted run orchestration entirely. It now leads with the plain mental model (a pair of durable streams: input carries user messages, output carries everything the agent produces) plus the Session's role orchestrating runs, a diagram, a minimal runnable example, and a section on the one-session-many-runs lifecycle. Documents behavior shipping in [triggerdotdev#3907](triggerdotdev#3907).
Summary
Three related fixes for
chat.headStartand continuation boots, found while investigating customer reports.1.
chat.headStartnow works withhydrateMessages. The turn-0 handover splice only ran on the default accumulation path, so agents registeringhydrateMessagessilently lost the warm route's step-1 response: pure-text turns firedonTurnCompletewith no assistant message (and an empty durable write), tool-call turns re-ran step 1 from scratch under a freshmessageId, and the head-start user message never reached the hydrate hook at all. The first-turn history now reacheshydrateMessagesasincomingMessages, and the splice runs after both accumulation branches, deduplicated by the handovermessageId.2. Reasoning parts survive the handover. The synthesized partial only mapped text and tool-call parts, so an extended-thinking model's step-1 reasoning streamed to the browser but never reached durable history. Reasoning parts now map through with provider metadata, so Anthropic thinking signatures survive a UIMessage round trip on hydrate replays.
3. Continuation boots no longer stall for ~10 seconds. The
.inresume cursor was found by draining an SSE subscription that only closes after its 5 second inactivity window, and the scan ran twice per boot. It is now a non-blocking records read of the latest turn-complete header, runs at most once per boot, the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. Measured locally on a cancel-then-continue repro: pre-turn continuation latency dropped from ~11s to ~0.5s.Every fix was verified red-green: new unit tests reproduced each failure before the fix, and end-to-end smoke tests against a live local stack covered both handover legs, reasoning persistence with extended thinking (including a follow-up turn that round-trips the persisted signed reasoning back to the provider), and the boot timing comparison.
Rollout
SDK-only; no server change required. A new SDK against a server that does not serialize record headers degrades to the existing no-cursor fallback. Old SDKs ignore the new snapshot field, and new SDKs fall back to the records scan on snapshots written before it existed.