You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refreshed 2026-06-05 — the original body below targets the obsolete v1 tiered-memory system. The opportunity survives in v2 form: the ReAct loop is already append-only but no cache_control breakpoints exist in the Anthropic adapter. See the refreshed assessment in the comments for the current plan of record.
Summary
Audit and optimize context assembly order to maximize LLM provider cache hits, potentially achieving up to 4x cost reduction through prompt caching.
Background: State of the Art
From Philipp Schmid's 5 Practical Tips for Context Engineering:
"Context Ordering Matters: Try to use 'append-only' context, adding new information to the end. This maximizes cache hits reducing cost (4x) and latency."
LLM providers (Anthropic, OpenAI) implement prompt caching where repeated prefixes are cached. If your context window looks like:
Important
Refreshed 2026-06-05 — the original body below targets the obsolete v1 tiered-memory system. The opportunity survives in v2 form: the ReAct loop is already append-only but no
cache_controlbreakpoints exist in the Anthropic adapter. See the refreshed assessment in the comments for the current plan of record.Summary
Audit and optimize context assembly order to maximize LLM provider cache hits, potentially achieving up to 4x cost reduction through prompt caching.
Background: State of the Art
From Philipp Schmid's 5 Practical Tips for Context Engineering:
LLM providers (Anthropic, OpenAI) implement prompt caching where repeated prefixes are cached. If your context window looks like:
And only
[Current Request]changes between calls, the prefix can be cached. But if you reorder or modify earlier sections, the cache is invalidated.Key principle: Static content first, dynamic content last.
Current State in CodeFRAME
The tiered memory system assembles context, but the ordering of that assembly is unclear:
With Claude API's prompt caching (available since late 2024), improper ordering directly impacts costs.
Investigation Tasks
Map current context assembly order
Identify cache-breaking patterns
Implement append-only assembly
static_prefix + append_only_dynamicMeasure cache hit rates
Success Criteria
Cost Impact Estimate
If CodeFRAME averages 50 LLM calls per task with 10K tokens of static context:
References