SDPA tests: select attn output by shape#20285
Conversation
Summary: Follow-up to D108625761 (pytorch#20283), which fixed `test_sdpa_config`'s ambiguous attention-output selection; this handles the remaining edge cases in the sibling SDPA test functions. `sdpa_with_kv_cache` returns `[k_cache, v_cache, attn_output]`, where the attention output is `[1, S, Hq, D]` and each KV cache is `[1, Cmax, Hkv, D]`. Selecting the attention output by flat element count is ambiguous whenever `S*Hq == Cmax*Hkv` (all three tensors then share a `numel`) — the deterministic `llama1b_prefill` failure D108625761 fixed in `test_sdpa_config`. The three other SDPA test functions — `test_sdpa_replay`, `test_sdpa_dynamic_decode`, and `test_sdpa_incache_decode` — still matched by `numel`; no currently-configured sequence triggers the collision, but they carry the same latent ambiguity. This applies the same shape-based selection: match the attention output as the 4-D tensor `[1, S, Hq, D]` and (for replay and dynamic decode) classify the two caches as `[1, Cmax, Hkv, D]`, keeping the per-step k-vs-v identification by content unchanged. Test-only; no kernel, runtime, or export change. Authored with Claude Code. Differential Revision: D108650388
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20285
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 4 PendingAs of commit 2762587 with merge base 99ca02f ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@JulianCloudNTH has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108650388. |
This PR needs a
|
Summary:
Follow-up to D108625761 (#20283), which fixed
test_sdpa_config's ambiguous attention-output selection; this handles the remaining edge cases in the sibling SDPA test functions.sdpa_with_kv_cachereturns[k_cache, v_cache, attn_output], where the attention output is[1, S, Hq, D]and each KV cache is[1, Cmax, Hkv, D]. Selecting the attention output by flat element count is ambiguous wheneverS*Hq == Cmax*Hkv(all three tensors then share anumel) — the deterministicllama1b_prefillfailure D108625761 fixed intest_sdpa_config. The three other SDPA test functions —test_sdpa_replay,test_sdpa_dynamic_decode, andtest_sdpa_incache_decode— still matched bynumel; no currently-configured sequence triggers the collision, but they carry the same latent ambiguity. This applies the same shape-based selection: match the attention output as the 4-D tensor[1, S, Hq, D]and (for replay and dynamic decode) classify the two caches as[1, Cmax, Hkv, D], keeping the per-step k-vs-v identification by content unchanged. Test-only; no kernel, runtime, or export change.Authored with Claude Code.
Differential Revision: D108650388