Skip to content

fix(sdk,core): head-start handover correctness and continuation boot latency#3907

Merged
ericallam merged 4 commits into
mainfrom
fix/chat-headstart-hydrate
Jun 11, 2026
Merged

fix(sdk,core): head-start handover correctness and continuation boot latency#3907
ericallam merged 4 commits into
mainfrom
fix/chat-headstart-hydrate

Conversation

@ericallam

Copy link
Copy Markdown
Member

Summary

Three related fixes for chat.headStart and continuation boots, found while investigating customer reports.

1. chat.headStart now works with hydrateMessages. The turn-0 handover splice only ran on the default accumulation path, so agents registering hydrateMessages silently lost the warm route's step-1 response: pure-text turns fired onTurnComplete with no assistant message (and an empty durable write), tool-call turns re-ran step 1 from scratch under a fresh messageId, and the head-start user message never reached the hydrate hook at all. The first-turn history now reaches hydrateMessages as incomingMessages, and the splice runs after both accumulation branches, deduplicated by the handover messageId.

2. Reasoning parts survive the handover. The synthesized partial only mapped text and tool-call parts, so an extended-thinking model's step-1 reasoning streamed to the browser but never reached durable history. Reasoning parts now map through with provider metadata, so Anthropic thinking signatures survive a UIMessage round trip on hydrate replays.

3. Continuation boots no longer stall for ~10 seconds. The .in resume cursor was found by draining an SSE subscription that only closes after its 5 second inactivity window, and the scan ran twice per boot. It is now a non-blocking records read of the latest turn-complete header, runs at most once per boot, the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. Measured locally on a cancel-then-continue repro: pre-turn continuation latency dropped from ~11s to ~0.5s.

Every fix was verified red-green: new unit tests reproduced each failure before the fix, and end-to-end smoke tests against a live local stack covered both handover legs, reasoning persistence with extended thinking (including a follow-up turn that round-trips the persisted signed reasoning back to the provider), and the boot timing comparison.

Rollout

SDK-only; no server change required. A new SDK against a server that does not serialize record headers degrades to the existing no-cursor fallback. Old SDKs ignore the new snapshot field, and new SDKs fall back to the records scan on snapshots written before it existed.

…s is registered

The turn-0 handover splice only ran on the default accumulation path, so
agents registering hydrateMessages lost the warm route's step-1 response:
pure-text turns fired onTurnComplete with no assistant message, tool-call
turns re-ran step 1 from scratch under a fresh messageId, and the
head-start user message never reached the hydrate hook. The first-turn
history now reaches hydrateMessages as incoming messages, and the splice
runs after both accumulation branches, deduped by the handover messageId.
synthesizeHandoverUIMessage only mapped text and tool-call parts, so an
extended-thinking model's step-1 reasoning streamed to the browser but
never reached the durable session history: onTurnComplete, chat.history,
and reloads all lost it. Reasoning parts now map through with provider
metadata so Anthropic thinking signatures survive the UIMessage round
trip on hydrate replays.
…on cursor scan

The .in resume cursor was found by draining an SSE subscription that
only closes after its 5 second inactivity window, and the scan ran twice
per continuation boot (once for the replay cursor, once for the
subscribe cursor), stalling every continuation around 10 seconds before
the first turn. The scan is now a non-blocking records read of the
latest turn-complete header, runs at most once per boot, the snapshot
and replay reads run concurrently, and chat snapshots carry the cursor
so subsequent boots skip the scan entirely.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2a24183-3362-40ae-91f4-df772adeebfe

📥 Commits

Reviewing files that changed from the base of the PR and between 4a88e92 and 9042df6.

📒 Files selected for processing (1)
  • packages/trigger-sdk/src/v3/ai.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/trigger-sdk/src/v3/ai.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: packages / 🧪 Unit Tests: Packages (2, 3)
  • GitHub Check: packages / 🧪 Unit Tests: Packages (3, 3)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: packages / 🧪 Unit Tests: Packages (1, 3)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: Build and publish previews
  • GitHub Check: Analyze (javascript-typescript)

Walkthrough

This PR optimizes chat agent boot performance by replacing SSE long-poll cursor discovery with non-blocking record reads and concurrent snapshot/replay operations. It persists the .in resume cursor in chat snapshots to eliminate repeated scans on subsequent boots. Additionally, the PR improves head-start handover by preserving reasoning content across extended-thinking model calls and correctly routing messages through hydrateMessages hooks, while ensuring tool-call handovers resume from step 2 rather than re-running step 1.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: three related fixes for head-start handover correctness and continuation boot latency.
Description check ✅ Passed The description comprehensively covers the three fixes with clear examples, testing approach, and rollout considerations, though it deviates from the template structure by not using the checklist or explicit changelog section.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/chat-headstart-hydrate

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install timed out. The project may have too many dependencies for the sandbox.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

…nsiently

A scan that threw was treated the same as one that found no cursor, so
the resume-cursor block skipped its retry and the live subscription
could replay from the start. Only a successful lookup (including a
genuine no-cursor-yet answer) is shared now; a throw leaves the retry
available.
@changeset-bot

changeset-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 9042df6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages
Name Type
@trigger.dev/sdk Patch
@trigger.dev/core Patch
@trigger.dev/python Patch
@internal/sdk-compat-tests Patch
@trigger.dev/build Patch
trigger.dev Patch
@trigger.dev/plugins Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@trigger.dev/rbac Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 11, 2026

Copy link
Copy Markdown

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@9042df6

trigger.dev

npm i https://pkg.pr.new/trigger.dev@9042df6

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@9042df6

@trigger.dev/plugins

npm i https://pkg.pr.new/@trigger.dev/plugins@9042df6

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@9042df6

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@9042df6

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@9042df6

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@9042df6

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@9042df6

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@9042df6

commit: 9042df6

@ericallam ericallam merged commit 2b6d249 into main Jun 11, 2026
98 of 100 checks passed
@ericallam ericallam deleted the fix/chat-headstart-hydrate branch June 11, 2026 17:48
ericallam pushed a commit that referenced this pull request Jun 12, 2026
## Summary
7 improvements, 1 bug fix.

## Improvements
- `trigger init` now sets up your AI coding assistant as part of project
setup: pick the MCP server, the agent skills, or both, then scaffold
with the CLI or hand off to your assistant. Adds a new `getting-started`
agent skill that teaches assistants how to bootstrap Trigger.dev
(install the SDK, write `trigger.config.ts`, create a first task, run
`trigger dev`), so the AI-driven setup path works end to end. It ships
in the CLI alongside the existing skills, version-matched to your SDK.
([#3872](#3872))
- `dev` and `deploy` now fail with a clear error when two tasks are
defined with the same id, including across different task types (e.g. a
scheduled task and a regular task sharing an id). Previously the second
definition silently overwrote the first, so one of the tasks would
vanish with no warning. Task ids are detected as duplicates during
indexing (naming each offending id and the files it was found in), and
the same rule is enforced server-side when the background worker is
registered.
([#3865](#3865))
- `trigger skills` installs Trigger.dev agent skills into your coding
agent so it knows how to write tasks, schedules, realtime, and
chat.agent code. The skills ship with the CLI and are copied into each
tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and
Codex / AGENTS.md), and `trigger dev` offers to install them on first
run. ([#3868](#3868))
- Reliability fixes for `chat.agent`. A user message sent while the
agent is streaming is no longer delivered twice (which could run a
duplicate turn), input appends now carry an idempotency key so a retried
send can't duplicate a message, stopping a generation clears the
streaming state so a page reload doesn't replay the stopped turn, and
runs can now carry the full set of dashboard tags instead of being
silently truncated. `onTurnComplete` now fires on errored turns (with
the thrown error attached) and the failed turn's user message is
persisted so it isn't lost on the next run. Custom agents and manual
`chat.writeTurnComplete` callers now trim the output stream, sending a
custom action no longer leaves a second stream reader running, and a
long-lived `watch` subscription no longer grows its dedupe set without
bound. ([#3891](#3891))
- Continuation chat boots no longer stall for around 10 seconds before
the first turn. The `session.in` resume cursor is now found with a
non-blocking records read instead of draining an SSE long-poll (which
always waited out its full 5 second inactivity window, twice per boot),
the boot reads run concurrently, and chat snapshots carry the cursor so
subsequent boots skip the scan entirely.
([#3907](#3907))
- Record client-side dequeue API latency in the supervisor consumer pool
as a Prometheus histogram
(`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`:
success/empty/error).
([#3887](#3887))
- Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment`
schemas for the new `GET /api/v1/projects/{projectRef}/environments`
endpoint, which lists the parent environments (dev, staging, preview,
prod) a personal access token can access for a project. Dev is scoped to
the token owner and branch (preview child) environments are excluded.
([#3880](#3880))

## Bug fixes
- Fix two `chat.createSession()` bugs: stopping a generation no longer
wedges the run (the turn loop raced a `totalUsage` promise that never
settles after a stop-abort), and continuation runs now wait for the next
message instead of invoking the model with an empty prompt.
([#3920](#3920))

<details>
<summary>Raw changeset output</summary>

⚠️⚠️⚠️⚠️⚠️⚠️

`main` is currently in **pre mode** so this branch has prereleases
rather than normal releases. If you want to exit prereleases, run
`changeset pre exit` on `main`.

⚠️⚠️⚠️⚠️⚠️⚠️

# Releases
## @trigger.dev/build@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## trigger.dev@4.5.0-rc.6

### Patch Changes

- `trigger init` now sets up your AI coding assistant as part of project
setup: pick the MCP server, the agent skills, or both, then scaffold
with the CLI or hand off to your assistant. Adds a new `getting-started`
agent skill that teaches assistants how to bootstrap Trigger.dev
(install the SDK, write `trigger.config.ts`, create a first task, run
`trigger dev`), so the AI-driven setup path works end to end. It ships
in the CLI alongside the existing skills, version-matched to your SDK.
([#3872](#3872))

- `dev` and `deploy` now fail with a clear error when two tasks are
defined with the same id, including across different task types (e.g. a
scheduled task and a regular task sharing an id). Previously the second
definition silently overwrote the first, so one of the tasks would
vanish with no warning. Task ids are detected as duplicates during
indexing (naming each offending id and the files it was found in), and
the same rule is enforced server-side when the background worker is
registered.
([#3865](#3865))

- `trigger skills` installs Trigger.dev agent skills into your coding
agent so it knows how to write tasks, schedules, realtime, and
chat.agent code. The skills ship with the CLI and are copied into each
tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and
Codex / AGENTS.md), and `trigger dev` offers to install them on first
run. ([#3868](#3868))

    ```bash
    trigger skills --target claude-code
    ```

Replaces the previous `install-rules` command, which stays as an alias.

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`
    -   `@trigger.dev/build@4.5.0-rc.6`
    -   `@trigger.dev/schema-to-json@4.5.0-rc.6`

## @trigger.dev/core@4.5.0-rc.6

### Patch Changes

- Reliability fixes for `chat.agent`. A user message sent while the
agent is streaming is no longer delivered twice (which could run a
duplicate turn), input appends now carry an idempotency key so a retried
send can't duplicate a message, stopping a generation clears the
streaming state so a page reload doesn't replay the stopped turn, and
runs can now carry the full set of dashboard tags instead of being
silently truncated. `onTurnComplete` now fires on errored turns (with
the thrown error attached) and the failed turn's user message is
persisted so it isn't lost on the next run. Custom agents and manual
`chat.writeTurnComplete` callers now trim the output stream, sending a
custom action no longer leaves a second stream reader running, and a
long-lived `watch` subscription no longer grows its dedupe set without
bound. ([#3891](#3891))
- Continuation chat boots no longer stall for around 10 seconds before
the first turn. The `session.in` resume cursor is now found with a
non-blocking records read instead of draining an SSE long-poll (which
always waited out its full 5 second inactivity window, twice per boot),
the boot reads run concurrently, and chat snapshots carry the cursor so
subsequent boots skip the scan entirely.
([#3907](#3907))
- Record client-side dequeue API latency in the supervisor consumer pool
as a Prometheus histogram
(`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`:
success/empty/error).
([#3887](#3887))
- `dev` and `deploy` now fail with a clear error when two tasks are
defined with the same id, including across different task types (e.g. a
scheduled task and a regular task sharing an id). Previously the second
definition silently overwrote the first, so one of the tasks would
vanish with no warning. Task ids are detected as duplicates during
indexing (naming each offending id and the files it was found in), and
the same rule is enforced server-side when the background worker is
registered.
([#3865](#3865))
- Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment`
schemas for the new `GET /api/v1/projects/{projectRef}/environments`
endpoint, which lists the parent environments (dev, staging, preview,
prod) a personal access token can access for a project. Dev is scoped to
the token owner and branch (preview child) environments are excluded.
([#3880](#3880))

## @trigger.dev/python@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/sdk@4.5.0-rc.6`
    -   `@trigger.dev/core@4.5.0-rc.6`
    -   `@trigger.dev/build@4.5.0-rc.6`

## @trigger.dev/react-hooks@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## @trigger.dev/redis-worker@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## @trigger.dev/rsc@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## @trigger.dev/schema-to-json@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## @trigger.dev/sdk@4.5.0-rc.6

### Patch Changes

- Reliability fixes for `chat.agent`. A user message sent while the
agent is streaming is no longer delivered twice (which could run a
duplicate turn), input appends now carry an idempotency key so a retried
send can't duplicate a message, stopping a generation clears the
streaming state so a page reload doesn't replay the stopped turn, and
runs can now carry the full set of dashboard tags instead of being
silently truncated. `onTurnComplete` now fires on errored turns (with
the thrown error attached) and the failed turn's user message is
persisted so it isn't lost on the next run. Custom agents and manual
`chat.writeTurnComplete` callers now trim the output stream, sending a
custom action no longer leaves a second stream reader running, and a
long-lived `watch` subscription no longer grows its dedupe set without
bound. ([#3891](#3891))
- Continuation chat boots no longer stall for around 10 seconds before
the first turn. The `session.in` resume cursor is now found with a
non-blocking records read instead of draining an SSE long-poll (which
always waited out its full 5 second inactivity window, twice per boot),
the boot reads run concurrently, and chat snapshots carry the cursor so
subsequent boots skip the scan entirely.
([#3907](#3907))
- Fix `chat.headStart` when `hydrateMessages` is registered. The warm
route's step-1 partial now reaches the agent's accumulator on the
hydrate path, so `onTurnComplete` carries the full first turn (the
head-start user message included), tool-call handovers resume from step
2 instead of re-running step 1, and the assistant `messageId` stays
stable across the handover.
([#3907](#3907))
- Preserve reasoning parts across the `chat.headStart` handover.
Extended-thinking models' step-1 reasoning now lands in the durable
session history (and `onTurnComplete`) under the same assistant
`messageId`, with provider metadata intact so Anthropic thinking
signatures survive replays.
([#3907](#3907))
- Fix two `chat.createSession()` bugs: stopping a generation no longer
wedges the run (the turn loop raced a `totalUsage` promise that never
settles after a stop-abort), and continuation runs now wait for the next
message instead of invoking the model with an empty prompt.
([#3920](#3920))
-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

## @trigger.dev/plugins@4.5.0-rc.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.5.0-rc.6`

</details>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
pull Bot pushed a commit to Dustin4444/trigger.dev that referenced this pull request Jun 12, 2026
… page (triggerdotdev#3908)

## Summary

Two documentation improvements for the AI chat docs.

**Head-start persistence contract.** The fast starts page now documents
what your hooks can rely on across a head-start handover: one stable
assistant `messageId` for the whole turn, `onTurnComplete` as the
canonical persistence point, reasoning parts flowing into durable
history, and how Head Start composes with `hydrateMessages` (the
first-turn history arrives as `incomingMessages`, and the runtime
splices the warm partial onto the hydrated chain, deduplicated by id).
The hydrate examples on the lifecycle hooks and database persistence
pages now upsert their conversation row, since head-start first turns
run without a preload to create it.

**Sessions page.** The page opened with "a durable, task-bound,
bi-directional I/O channel pair", which reads as jargon and omitted run
orchestration entirely. It now leads with the plain mental model (a pair
of durable streams: input carries user messages, output carries
everything the agent produces) plus the Session's role orchestrating
runs, a diagram, a minimal runnable example, and a section on the
one-session-many-runs lifecycle.

Documents behavior shipping in
[triggerdotdev#3907](triggerdotdev#3907).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants