Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063
Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063vup903 wants to merge 7 commits into
Conversation
Introduce zarr.core.json_parse.parse_json, a single type-annotation-driven validator that consolidates the scattered per-field parse_* helpers. Handles primitives, Literal, unions/Optional, fixed and variadic tuples, Sequence/list (coerced to tuple), Mapping/dict, and TypedDict, with a bool-vs-int safe primitive check. Adds tests/test_json_parse.py (94 tests) and a changelog fragment. No existing call sites migrated yet; this is the proof-of-direction module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-developers#3285) Pilot migration delegating three representative helpers to the unified parse_json validator. Public signatures and return types are unchanged. parse_zarr_format re-wraps parse_json's ValueError/TypeError as MetadataValidationError with the original message to preserve observable behavior. parse_json is imported function-locally to avoid circular imports. Focused suites green: test_common/test_config/test_metadata/test_json_parse = 430 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns (zarr-developers#3285) Apply ruff check/format to satisfy the Lint CI hook. Keep typing.Union/Optional spellings in tests (with noqa) to cover that origin path alongside X | Y. Rework _parse_typeddict to derive required/optional from get_type_hints(include_extras=True) + __total__ instead of __required_keys__, so class-syntax NotRequired is detected even when 'from __future__ import annotations' stringizes the hints (as in zarr's metadata modules). test_json_parse + test_common + test_metadata = 393 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lopers#3285) get_origin(hint) is Required/NotRequired tripped mypy's comparison-overlap and unreachable checks; typing origin as Any keeps the runtime check while satisfying mypy. Full 'uv run --frozen mypy' is clean (190 files); ruff check/format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delegate the literal/type check of parse_indexing_order, parse_node_type, parse_node_type_array, parse_separator, two parse_zarr_format variants (group, v2), and parse_name to the unified parse_json. Public signatures and return types unchanged; each preserves its original exception type and message (wrapping parse_json's ValueError/TypeError into MetadataValidationError / NodeTypeValidationError / the original ValueError/TypeError where tests assert on them). parse_json imported function-locally to avoid circular imports. Focused suites + full mypy green (1196 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s#3285) Delegate the int/bool type check of parse_checksum, parse_clevel, parse_blocksize, parse_typesize, parse_gzip_level, and parse_zstd_level to parse_json, keeping each helper's range/bound check and exact error messages. Original exception types are preserved by wrapping parse_json's ValueError/TypeError. Note: parse_json rejects bool where the old isinstance(data, int) accepted it; verified no caller/test passes a bool to these, so this is a deliberate, more-correct strictening. parse_json imported function-locally. Codec suite + full mypy green (794 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4063 +/- ##
==========================================
- Coverage 93.50% 93.49% -0.01%
==========================================
Files 90 91 +1
Lines 11979 12132 +153
==========================================
+ Hits 11201 11343 +142
- Misses 778 789 +11
🚀 New features to boost your workflow:
|
|
Hi @d-v-b, I put up a draft so you can see where I'm heading before I take it further. So far it adds On the de-dup side, I've moved 16 of the scattered Before I migrate the rest, a few things I'd like your read on:
I'm holding off on the One heads up on CI: the failing Test jobs are a pre-existing collection error in Let me know what you think on the four questions and I'll keep going. |
|
@vup903, if we are willing to add |
|
nearly all of the types we need are now defined in zarr-metadata, so we should investigate if msgspec can handle all of those types correctly. zarr-metadata itself uses pydantic for its test suite, and I think i ended up choosing that after tryng msgspec and running into some issues, but I don't recall what they were. Maybe something to do with tuples? If its helpful, we could widen the JSON collection types in zarr-metadata to |
Summary
This is a draft / proof-of-direction PR for #3285, opened to show the intended approach and get early feedback before migrating all call sites — per @d-v-b's request in the issue thread.
It introduces a single unified runtime type checker,
parse_json(value, type_annotation), that consolidates the scattered per-fieldparse_*helpers (there are currently ~42 of them acrosssrc/zarr). The function validates a JSON-decoded value against a type annotation and returns data assignable to that annotation (coercing where sensible, e.g.parse_json([1, 2, 3], tuple[int, int, int]) -> (1, 2, 3)), or raises a clear error.What's in this draft
src/zarr/core/json_parse.py—parse_jsonplus sub-parsers, scoped to the JSON-relevant categories named in the issue:None,str,int,float,bool(with a bool-vs-int-safe check:Trueis not accepted forint, and1is not accepted forbool)Literal[...]OptionaltupleSequence[T]/list[T]→ returned as a tupleMapping[str, T]/dict[str, T]TypedDicttests/test_json_parse.py— 94 tests covering every category, incl. the bool/int edge cases, coercion, unions, nested TypedDict.parse_json, with public signatures and behavior unchanged:core/common.py::parse_order→parse_json(data, Literal["C", "F"])core/common.py::parse_bool→parse_json(data, bool)core/metadata/v3.py::parse_zarr_format→parse_json(data, Literal[3]), re-wrapping the error asMetadataValidationErrorto preserve the original exception type and message.Focused suites pass locally (Python 3.12,
test.py3.12-optionalenv):test_common,test_config,test_metadata, andtest_json_parse→ 430 passed.Open questions for maintainers (direction feedback)
src/zarr/core/json_parse.pyis a placeholder; happy to move it (e.g. into a typing/metadata utility module).parse_jsonraises plainValueError/TypeErrorsince it's field-agnostic; field-specific callers (likeparse_zarr_format) re-wrap into their domain error. Does that split match what you'd want?NotRequired— the current_parse_typeddictreads__required_keys__/__optional_keys__directly. Underfrom __future__ import annotationswith class-syntax TypedDicts,typing_extensionscan't see theNotRequiredwrapper at class-creation time, so an optional key can be mis-classified as required. The robust fix is to resolve hints viatyping_extensions.get_type_hints(..., include_extras=True)and inspect forRequired/NotRequired. I've left this for the follow-up once the overall direction is confirmed, since none of the three pilot helpers are TypedDicts.If the direction looks right, I'll migrate the remaining
parse_*call sites incrementally in follow-up commits.Closes #3285 once fully migrated (draft for now).