Skip to content

research(nightly): semantic-cache — HNSW-backed query result cache for RAG and agent memory#601

Draft
ruvnet wants to merge 4 commits into
mainfrom
research/nightly/2026-06-23-semantic-cache
Draft

research(nightly): semantic-cache — HNSW-backed query result cache for RAG and agent memory#601
ruvnet wants to merge 4 commits into
mainfrom
research/nightly/2026-06-23-semantic-cache

Conversation

@ruvnet

@ruvnet ruvnet commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

Adds nightly RuVector research for semantic-cache (2026-06-23).

  • 13.49× speedup on near-duplicate query workloads (66.6 µs vs 899.2 µs)
  • 100% hit rate on near-duplicate queries, zero false positives on random queries
  • 103 KB cache memory for 200 entries × 128 dims
  • WASM-compatible, zero external dependencies beyond rand
  • SemanticCache trait: clean API for NoCache / FixedSemanticCache / AdaptiveSemanticCache

Deliverables

  1. Working Rust PoCcrates/ruvector-semantic-cache/ with 4 source files, 7 acceptance tests, benchmark binary
  2. ADR-268docs/adr/ADR-268-semantic-cache.md
  3. Research documentdocs/research/nightly/2026-06-23-semantic-cache/README.md
  4. Public gistdocs/research/nightly/2026-06-23-semantic-cache/gist.md

Real Benchmark Numbers (cargo run --release)

Variant Workload Hit Rate Mean (µs) Speedup
NoCache near_dup 0% 899.2 1.00×
FixedSemanticCache near_dup 100% 66.6 13.49×
AdaptiveSemanticCache near_dup 13.2% 1,032.9 0.87×
FixedSemanticCache mixed (50% hit) 50% 572.4 1.55×

ACCEPTANCE: ALL PASS. Breakeven: ≥ 23% hit rate.

Research doc: docs/research/nightly/2026-06-23-semantic-cache/README.md
ADR: docs/adr/ADR-268-semantic-cache.md

🤖 Generated with claude-flow
https://claude.ai/code/session_01FW9sGTp6EzHqbyxKhvAG49


Generated by Claude Code

claude and others added 4 commits June 23, 2026 07:28
SOTA discovery: QVCache (EuroMLSys 2025), vCache (arXiv:2502.03771),
GPTCache, and 5 other systems confirmed semantic caching is production-
valuable. All are Python-first; no Rust-native HNSW-co-designed cache
exists. Selected topic: HNSW-backed semantic query result cache.

Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01FW9sGTp6EzHqbyxKhvAG49
Implements SemanticCache trait with three variants:
- NoCache: pure baseline (always miss)
- FixedSemanticCache: HNSW key index + fixed cosine threshold 0.92
- AdaptiveSemanticCache: HNSW key index + sliding-window percentile threshold

Internal HNSW (src/hnsw.rs) stores L2-normalized query vectors as cache
keys. Cosine similarity computed from L2-squared distances on unit vectors.
LRU eviction when max_entries exceeded.

All measurements from cargo run --release; no invented numbers.

Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01FW9sGTp6EzHqbyxKhvAG49
Real cargo run --release numbers on x86_64 linux:
- FixedSemanticCache near_dup: 100% hit rate, 66.6 µs mean (13.49× speedup)
- FixedSemanticCache mixed: 50% hit rate, 572.4 µs mean (1.55× speedup)
- NoCache near_dup: 899.2 µs mean (baseline)
- Cache memory: 103.1 KB for 200 entries × 128 dims
- Warmup: 23.4 ms for 200 entries
- Breakeven: >= 23% hit rate for latency benefit
- ACCEPTANCE: ALL PASS

Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01FW9sGTp6EzHqbyxKhvAG49
ADR-268: proposes ruvector-semantic-cache as first-class RuVector
capability with SemanticCache trait, 3 variants, benchmark evidence
(13.49x speedup), failure modes, security considerations, migration path.

Research doc covers:
- 2026 SOTA survey (QVCache, vCache, GPTCache, CacheRAG, Bifrost)
- Forward-looking 2036-2046 thesis (semantic manifolds for agents)
- ruvnet ecosystem fit (agent-memory, lsm-ann, proof-gate, ruFlo, MCP)
- Real benchmark results
- Memory and performance math
- 8 practical + 8 exotic applications
- Production crate layout proposal
- ADR-268

Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01FW9sGTp6EzHqbyxKhvAG49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants