Skip to content

fix: isolate anonymous file statistics cache#22950

Open
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:fix/project_statistics_bounds
Open

fix: isolate anonymous file statistics cache#22950
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:fix/project_statistics_bounds

Conversation

@kumarUjjawal

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Anonymous file reads can read the same path with different explicit schemas in the same session. The shared file statistics cache was keyed by table/path metadata, but did not validate that cached statistics matched the schema used to compute them.

This could reuse narrower cached statistics for a later wider schema read and panic during statistics projection.

What changes are included in this PR?

This PR routes anonymous listing table statistics through a per-table cache instead of the shared session cache.

Named tables still use the shared session cache, since their table reference gives the cache a stable identity.

It also adds a regression test that first warms statistics with the physical schema, then reads the same Parquet file with a wider explicit schema.

Are these changes tested?

Yes

Are there any user-facing changes?

No API Change

@github-actions github-actions Bot added core Core DataFusion crate catalog Related to the catalog crate labels Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: ProjectionExprs::project_statistics index out of bounds

1 participant