ClickHouse-Encoder ================== Fast XS encoder for ClickHouse Native format. Builds a binary block from a Perl arrayref of rows; the result is the request body for an `insert ... format native` operation over HTTP, the native TCP protocol, or via stdin to clickhouse-client. INSTALLATION perl Makefile.PL make make test make install REQUIREMENTS A 64-bit Perl (Config{ivsize} >= 8). No external C library is required; the encoder is implemented entirely in XS. SUPPORTED TYPES Int8/16/32/64, UInt8/16/32/64, Float32/64, BFloat16, String, FixedString(N), Date, Date32, DateTime, DateTime('tz'), DateTime64(p), Decimal32(s), Decimal64(s), Decimal128(s), Decimal256(s), Decimal(P, S), Enum8(...), Enum16(...), Bool / Boolean, UUID, IPv4, IPv6, Map(K, V), LowCardinality(String|FixedString|Nullable(...)), Variant(T1, T2, ...) (CH 24.1+), SimpleAggregateFunction(func, T), Tuple(T1, T2, ...) including named: Tuple(a Int32, b String), Geo: Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon, Array(T), Nullable(T), JSON / Object('json') (CH 24.8+): hashref input with nested hashref auto-flattening to dotted paths; per-path types inferred from Perl SV flags (Int64, Float64, Bool, String) and arrayref leaves encoded as Array(T) variants. Symmetric on decode (unflattens). Dynamic: standalone Dynamic column, same wire format as one JSON path's Dynamic sub-column without the Object wrapper. DateTime / DateTime64 strings accept ISO 8601 with timezone offsets (Z, +HH:MM, -HH:MM, +HHMM); the offset is applied to convert to UTC. See `perldoc ClickHouse::Encoder` for value coercion rules and limits. OUTPUT APIS encode(\@rows) return Native bytes for one block encode_into(\$buf, \@rows) append a block to an existing scalar encode_columns(\%cols) column-oriented input (same bytes) encode_to_handle($fh, \@rows) direct write to a filehandle stream(\&iter, \&writer, batch_size=>N) pull rows from iter, emit blocks streamer(\&writer, batch_size=>N) ->push_row($r); ...; ->finish ->reset / ->buffered_count / ->is_empty validate_rows(\@rows) [{row=>N,error=>...}] for bad rows encode_to_command(\@cmd, \@rows) pipe encoded bytes into a child cmd compressed_writer($mode, \&writer) wrap a writer with gzip/zstd flatten_nested(\@cols) expand Nested(...) -> flat name.field encode_row_binary(\@rows) RowBinary body (row-major format) decode_row_binary($bytes) decode a RowBinary byte string HTTP insert ClickHouse::Encoder->insert_http(host=>..., port=>..., table=>..., rows=>...) one-shot HTTP insert (POSTs Native bytes, optional zstd/gzip). ClickHouse::Encoder->bulk_inserter(host=>..., table=>..., columns=>...) long-lived inserter with auto-flush at batch_size, retries on transient errors, keep-alive, optional compression. ->summary rolls up CH X-ClickHouse-Summary stats across batches; ->last_response gives the most recent flush's HTTP response with parsed CH metadata attached at ->{ch}{query-id,server,summary,...}. ClickHouse::Encoder->for_query($select_sql, host=>..., port=>...) runs describe ($select_sql) and returns an encoder configured for that result shape; useful when the schema isn't a real table. ClickHouse::Encoder->ping(host=>..., port=>...) liveness probe via /ping; returns 1 or croaks. All HTTP entry points accept scheme=>'https' (needs IO::Socket::SSL + Net::SSLeay), ssl_options/verify_SSL pass-throughs to HTTP::Tiny, settings=>{...} for per-query CH settings, and dedup_token=>$id for idempotent inserts. SCHEMA INTROSPECTION ClickHouse::Encoder->for_table($table, via => 'client', ...) ClickHouse::Encoder->for_table($table, via => 'http', port => 8123, ...) ClickHouse::Encoder->server_version(host => ..., port => ...) fetches select version() over HTTP, returns {major,minor,patch,...}. ClickHouse::Encoder->types list of supported type names ClickHouse::Encoder->schema_diff(\@a, \@b) {added,removed,changed} ClickHouse::Encoder->apply_schema_diff($diff, table=>...) alter table statements (drops -> modifies -> adds) ClickHouse::Encoder->format_create_table(table=>..., columns=>...) create table SQL; columns accept codec/ttl/default/... ClickHouse::Encoder->parse_create_table($ddl) show create table -> hashref {database,table,columns,...} ClickHouse::Encoder->parse_wkt($wkt) WKT -> Geo arrayref shape $enc->estimate_size($nrows) byte-size hint for sizing DECIMAL HELPERS ClickHouse::Encoder->decimal128_str($n) / ->decimal256_str($n) format a 16- or 32-byte little-endian decimal value as a signed base-10 string (host bigint avoidance for big precisions). DECODER ClickHouse::Encoder->decode_block($bytes) / ->decode_rows($bytes) are the XS-side decoder for select ... format native responses. Supports every type encode handles; round-trips are symmetric. ->decode_blocks($bytes) walks a concatenated multi-block stream (also accepts a callback). ->decode_blocks_iter($bytes) returns a coderef iterator. ->decode_stream($fh, $cb) pulls bytes incrementally from a filehandle - memory bounded by one block at a time. ->decode_block($bytes, $offset, \%keep) skips data for unwanted columns (memory win on wide select *). DOCUMENTATION See `perldoc ClickHouse::Encoder` after install, or the POD in lib/ClickHouse/Encoder.pm. EXAMPLES eg/insert_http.pl - end-to-end insert over HTTP::Tiny eg/insert_streaming.pl - reuse one encoder across many batches eg/for_table.pl - schema discovery via clickhouse-client eg/from_csv.pl - read CSV, encode, insert via HTTP eg/insert_clickhouse_local.pl - server-less ingest to Parquet/ORC eg/etl_dbi.pl - DBI -> Native -> insert pipeline eg/insert_compressed.pl - zstd/gzip compression on the wire eg/insert_async_ev.pl - non-blocking concurrent inserts via EV eg/insert_with_lowcardinality.pl - LC(String) wire-size demo eg/json_lines_ingest.pl - NDJSON streaming -> for_table -> insert eg/streaming_aggregate.pl - pre-aggregate, flush to SummingMergeTree eg/postgres_to_clickhouse.pl - DBD::Pg -> Native -> insert, streaming eg/clickhouse_replication.pl - CH -> CH replication via Native pipe eg/parallel_loader.pl - fork N workers, parallel partition load eg/redis_to_clickhouse.pl - drain a Redis stream/list into a CH table eg/syslog_ingest.pl - parse RFC 5424 syslog lines, ingest eg/json_streaming.pl - NDJSON -> JSON column via streamer eg/json_query.pl - select format native -> decode_blocks -> walk eg/json_aggregate.pl - group-by aggregation pipeline over JSON eg/migrate_table.pl - copy CH -> CH, schema auto-detected eg/replay.pl - replay a captured Native byte stream eg/native_to_jsonl.pl - convert Native stream to NDJSON eg/select_blocks_streaming.pl - streaming select via select_blocks eg/json_path_projection.pl - column projection on JSON select eg/csv_export.pl - select Native -> CSV writer eg/migrate_with_transform.pl - CH -> CH ETL with row transform eg/replay_pcap.pl - off-line dump of captured Native bytes eg/tcp_compressed_pipeline.pl - TCP insert with negotiated LZ4 compression eg/rowbinary_insert.pl - insert via the RowBinary format eg/async_insert.pl - server-side async insert via settings eg/geo_from_wkt.pl - WKT geometry -> Geo columns via parse_wkt eg/insert_with_settings.pl - per-query settings + dedup token eg/ping_healthcheck.pl - wait-for-server readiness gate via ping eg/observability.pl - read X-ClickHouse summary/progress stats eg/schema_migrate.pl - show create -> diff -> alter table BENCHMARKS See bench/. Native is typically 2-5x faster than TabSeparated for insert ingestion. WIRE format See doc/wire-format.md for a working reference of the subset of the ClickHouse Native binary format this module emits. LICENSE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.