Added reader for JSONEachRow format. Updated documentation and examples#2871
Added reader for JSONEachRow format. Updated documentation and examples#2871chernser wants to merge 9 commits into
Conversation
|
Repository collaborators can run the JMH benchmark suite against this PR by commenting: Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%): Only one benchmark run per PR is active at a time — issuing a new |
Client V2 CoverageCoverage Report
Class Coverage
|
JDBC V2 CoverageCoverage Report
Class Coverage
|
JDBC V1 CoverageCoverage Report
Class Coverage
|
Client V1 CoverageCoverage Report
Class Coverage
|
TriageCategory: Summary What this impacts
Concerns
Required reviewer action
|
|
@cursor review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 2091837. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9f4287b. Configure here.
|



Summary
General
Client V2 mainly supports binary formats. However there is a demand and practical need to support also formats from JSON family because of JSON popularity and effectiveness to represent complex structured data.
There is no specific JSON reader because any application can make a request via client and read input stream with favorite JSON parser. However creating such reader would help to bring JSON parsing to JDBC. As interface is already define only type mapping and some glue code is required.
Goal of this PR is to add harness for text format readers. New common interface class is created to let abstract readers. Dedicated interfaces for binary and text formats will have very specific methods.
Client has no intent to include all JSON parsing libraries and all classes are implemented in isolated way - they are not referenced by default.
New json reader has important part of the code that adopts primitive types to java ones. This conversion is required, for example, in JDBC for needs of ResultSetImpl.
Client Support
Client is the main component to implement JSON support. It should be in the style of extension or plug-in. No direct references to any JSON libraries should be. User will configure library instance and client should have a way to use it. Therefor next problems should be addressed:
There are two libraries we will support - GSON (https://github.com/google/gson) and Jackson (https://github.com/FasterXML/jackson). Both libraries has root class that accepts configuration and customized for user needs and both root classes create a parser or reader that is bound to
InputStream.JSON parser will be used in reader and in this case instantiation is an application task. In general text format support is not a goal for the client so no dedicated method for creating readers for non-ClickHouse formats. This solved problem with customization as parser is instantiated by user.
As both libraries create IO stream bound entities to work with JSON it will be convenient to provide sort of a factory. This class will become an abstraction that used to create a wrapper for JSON parsing library. Another abstraction is
JsonParserinterface that is used by reader to iterate thru rows.JDBC
JDBC Driver is often use when minimal custom code is expected and it is the place where we have to provide selection between JSON processing libraries. This should be implemented by providing class name of
JsonParserFactoryimplementation. It solves problem of instantiation. Besides user may specify own implementation class name if customization is required. Instantiation will be performed at connection creation phase.JDBC driver will create JSON reader if
JsonParserFactoryis defined.Checklist
Delete items not relevant to your PR:
Note
Medium Risk
Large new public API and query-time server setting injection; behavior is opt-in for quoting and binary readers remain the default path, with extensive tests mitigating regression risk.
Overview
Introduces JSONEachRow support in client-v2 via a pluggable text-format stack, while refactoring format readers so binary and text encodings share one API.
Reader model: Typed accessors move to new
ClickHouseFormatReader;ClickHouseBinaryFormatReaderandClickHouseTextFormatReaderspecialize binary vs text.JSONEachRowFormatReaderstreams rows through aJsonParser/JsonParserFactory, with bundled Jackson and Gson factories (Jackson/Gson deps areprovidedso the core client does not hard-depend on either library).JSONEachRow behavior: The reader infers
TableSchemafrom the first row (SchemaUtils+ public, unmodifiableClickHouseDataType.DATA_TYPE_TO_CLASS), exposes the same typed getters as binary readers where applicable, and defers parse errors so already-buffered valid rows are not dropped on a bad following line.Query integration: New
json_disable_number_quotingconfig; when set forJSONEachRowqueries,Client#queryapplies ClickHouseoutput_format_json_quote_*server settings so large integers, floats, and decimals can be emitted as JSON numbers. Explicit server settings are otherwise left alone.Coverage includes broad unit/integration tests for the reader, parsers, schema inference, and the opt-in quoting behavior.
Reviewed by Cursor Bugbot for commit 9f4287b. Bugbot is set up for automated code reviews on this repo. Configure here.