Lightspeed is a powerful agent harness built around a deterministic core and data structures designed to run inside durable workflow engines. Temporal is fully supported today; others are coming soon: Restate, Inngest, Hatchet, AWS Step Functions, etc. The core is written in Rust. The production data backend is Postgres and optional S3.
Frontier agent harnesses like Claude Code, Codex, OpenCode, OpenClaw are designed to run inside a guest OS and need an entire OS process for themselves. These agents are difficult to scale and secure.
There's an emerging pattern to "separate the harness from compute" for security, and partially for scale. Further, it's also a pattern to run agents inside workflow engines, for durability and easier scale. This is especially interesting for agents running in enterprise settings.
Further, most agent SDKs are not designed for workflow engines: they do not separate the deterministic core from effects such as LLM or tool calls, and they pass too much data between the core workflow logic and the effectful "tasks" or "activities"–e.g. passing the entire chat history back and forth–creating various issues for the workflow runtimes.
The goal of Lightspeed is to build as powerful of an agent as Claude Code, Codex, or OpenClaw, but running outside operating systems, thus separating the harness from compute. Plus, making this tenable for workflow engines. This unlocks very scalable agent architectures–think thousands of agents that run for months.
We also acknowledge that the current iteration of frontier models are optimized to the hilt (via RL) to accomplish most tasks under the assumption that they fully control a POSIX-compatible OS. So, just giving the agent access to some MCPs or provider native tools, will not yield the same results as when the agent has full access to an OS. Therefore, a central goal is to bridge that gap with various features where the agent can still use or borrow sandboxes, permanent VMs, or other computers.
What constitutes an "agent harness" is a rapidly expanding set of table-stakes features. Here is a list of where we are at:
- Broad frontier model support for OpenAI and Anthropic: native compaction, reasoning traces, advanced tool configurations and provider native tools, MCP, files, images, provider OAuth login, multiple API keys, etc.
- Other model support via the "Completion API" standard (in progress)
- Long-lasting and durable agent runs (weeks to months)
- Sandboxes, including delegating work to standard coding agents inside sandboxes (in progress)
- Virtual file system that allows the agent to use standard file tools (read, glob, patch. etc), without needing a full operating system attached
- Skills hosted on a virtual file system or inside sandboxes
- Flexible prompt and instruction configuration features
- Hosted MCP, including various authentication methods such as API keys, OAuth flows
- Sub-agents (aka. "fleets"), letting agents start or manage other agents (planned)
- Timers, schedules, wake-ups (planned)
- Multi-tenant support (in progress)
- CLI to connect to running agent sessions
- Bridge to various messaging platforms (WhatsApp, Telegram, others coming soon)
At the heart of every agent is a carefully engineered state machine that manages what goes into the context window of the LLM. We start with that core and then layer various systems on top until we have a complete, working agent.
The core engine is implemented as an event-sourced deterministic finite state machine.
Note
The event log we are talking of here is separate from the Temporal event history (or other workflow). We are talking specifically of the events that constitute an agent's session state. These events are stored in Lightspeed's own Postgres event store.
When a command arrives, it is converted to an event, which is then recorded in the event log. The event is then applied to the core state. Then a "next step decider" figures out what to do next. If effects need to be issued, the decider outputs a list of effect intents, which then get later executed against the LLM providers or tool call surfaces. The results of these effects get sent back to the event log to be recorded and then sent to the FSM, resulting in an event loop.
flowchart TD
Command["User / API command"] --> Log
subgraph CoreBox["Deterministic Core"]
Log[("Event log")]
Core["Core FSM<br/>replay events -> state<br/>choose next step"]
Intent["Effect intent<br/>LLM, tool, compaction"]
Idle["Idle / complete"]
Log --> Core
Core -->|needs outside work| Intent
Core -->|no work left| Idle
end
Intent --> Runtime["Runtime adapters<br/>perform real I/O"]
Runtime -->|result event| Log
This stack is entirely workflow engine agnostic, and it can be thoroughly tested in isolation by simulating the effect adapters.
The purpose of the deterministic core is to decide what goes into the context window of the next LLM turn, plus the provider API configurations. Anything that does not pertain to this problem, needs to live elsewhere. In Lightspeed, we call the history and state of an individual context window a session.
So, what are the things that need to feed into the LLM session?
- Top-level instructions (prompts/system messages)
- Configured tool definitions (including MCP)
- Transcript/message items, which can the split further:
- Inputs: user messages, business events
- LLM output items: responses, reasoning traces, tool calls, compaction traces
- Tool results
- Actively managed transcript items: skill catalogs, memory subsystem, etc
- (not in the context window) LLM configurations such as model, reasoning efforts
The main challenge is how to balance what goes into the context window each turn, what to retain when compacting the context window (because it is full), and how to do all this with as much LLM caching consistency as possible.
Lightspeed adds the absolute minimal abstraction over the LLM provider data structures and APIs. Many agent SDKs (e.g. LangChain) convert the provider specific data into a unified structure and then convert it back when they pass it back to the LLM. We, on the other hand, extract only the information that is needed to decide and branch inside the deterministic core. The provider-native data is stored inside blobs inside content addressed storage.
Workflow engines differentiate between the deterministic code that expresses the business logic and the code that executes effects such as database calls or API calls, usually called "activities" or "tasks". This introduces an important seam that need to be carefully managed. Specifically, the data that travels back and forth between workflow and activities needs to be kept to a minimum, because all those transitions are logged and stored (which is part of the magic that makes the workflows "durable").
flowchart TD
Workflow["Durable workflow<br/>records and replays history"]
Workflow -->|small intent<br/>ids + blob refs| Activity["Activity<br/>LLM / tool / I/O"]
Activity -->|small result<br/>status + blob refs| Workflow
Workflow --> History[("Workflow history<br/>must stay small")]
Activity <--> Store[("Blob / CAS store<br/>large context, tool output,<br/>provider-native data")]
Lightspeed solves this by offloading all data that is not directly needed by the workflow logic to a content addressed storage (CAS) system. The structures that are passed between workflow and activities are extremely thin, keeping workflow state and log size small and efficient. So, instead of passing, say, the entire user input message to the LLM activity, we first store it in the CAS and then only pass a reference to the blob–and vice versa with model outputs.
With the above pieces in place, running an agent inside a workflow runtime becomes feasible and pleasant. We just have to put it all together.
flowchart TD
Client["Client / CLI"] --> Gateway["Lightspeed API gateway<br/>temporal-server"]
Gateway --> Temporal["Temporal service<br/>durable workflow engine"]
subgraph Lightspeed["Lightspeed runtime"]
Workflow["Session workflow<br/>temporal-workflow"]
Core["Deterministic core<br/>crates/engine"]
Worker["Worker activities<br/>temporal-server"]
Runtime["Effect adapters<br/>llm-runtime + tools"]
Store[("Session log + CAS<br/>store-pg / store-fs")]
end
Temporal --> Workflow
Workflow --> Core
Core -->|effect intent| Workflow
Workflow --> Worker
Worker --> Runtime
Runtime --> Store
Worker -->|result event refs| Workflow
Workflow --> Store
Runtime --> External["LLM providers<br/>tools / environments"]
The Temporal workflow owns an instance of the deterministic core–aka a "session". It drives the core state machine until it is idle. When not idle, it sends the the effect intents via activities to real APIs and services, such as LLM providers. It also logs all events that constitute a session state in a Postgres store (or optionally an file system store, for testing). Small CAS blobs get stored in Postgres, large blobs go to S3 (also supporting different blob providers).
Around the main stack, there is also a gateway API and CLI tooling to make interacting with the whole Lightspeed system easier.
Prerequisites:
- Rust toolchain with edition 2024 support (e.g. rustup)
- Docker with Compose for the local Postgres, MinIO, and Temporal stack
OPENAI_API_KEYfor live OpenAI-backed chat and eval runsANTHROPIC_API_KEYfor live Anthropic client tests
Easiest is to copy .env_example to .env and set provider keys there. The
hosted server worker mode registers real provider adapters and session-mounted
VFS tools; for OpenAI-backed local chat, set OPENAI_API_KEY.
Build and test:
cargo build
cargo testThe hosted path runs three pieces locally:
- Docker infra: Postgres/CAS catalog, MinIO object storage, Temporal.
temporal-server: registers the Temporal workflow/activities and exposes the public JSON-RPC API on HTTP. Its binary is namedserver, and it can also run only the worker or only the gateway.cli: starts or resumes sessions and submits chat messages through the gateway.
From the repository root:
local/up.shThis starts Postgres on localhost:15432, MinIO on localhost:29000,
Temporal on localhost:7233, and the Temporal UI on http://localhost:8233.
Each shell that runs Lightspeed commands should load the local environment:
source local/env.shOpen a first shell:
source local/env.sh
# export OPENAI_API_KEY=... # omit this if it is already in .env
cargo run -p temporal-serverWith no subcommand, the server binary runs the gateway and Temporal worker
together in one process. The gateway listens on http://127.0.0.1:18080 by default.
Optional health check:
curl http://127.0.0.1:18080/healthFor split deployments, run the two roles separately:
cargo run -p temporal-server -- worker
cargo run -p temporal-server -- gatewayOpen another shell:
source local/env.sh
cargo run -p cli -- chat --newThat starts an interactive TUI session. LIGHTSPEED_API_URL is exported by
local/env.sh, so you do not need to pass --api-url.
For OpenAI-backed chat, the CLI sends typed session/run configuration through
the API. Use --model ... on a command, or set LIGHTSPEED_CHAT_MODEL, if you want
a specific model.
To chat with a local directory mounted as a writable CAS-backed VFS workspace:
cargo run -p cli -- chat --new --mount docs/The CLI snapshots the directory locally, uploads missing blobs, creates a VFS
workspace from that snapshot, mounts it at /workspace, and starts the chat
session with /workspace as the working directory. Use --mount-path to pick
a different VFS mount path.
The cli package builds the lightspeed binary, so installed usage is equivalent:
lightspeed chat --newlocal/down.shTo reset persisted local state while keeping containers available:
local/reset.shDefault deterministic tests:
cargo testIgnored live provider tests require API keys and may cost money:
cargo test -p llm-clients -- --ignoredSee CONTRIBUTING.md