Feature request: streaming Parquet / Arrow IPC support

Spliterator currently handles CSV, TSV, and JSONL — the three most common delimited text formats. In data engineering pipelines, **Parquet** is the dominant columnar storage format, and Arrow IPC is the standard for zero-copy data transfer between processes.

It would be valuable to add a `ParquetSpliterator` (or `ArrowSpliterator`) that could:

1. Stream a Parquet file row-by-row (mapping row groups to async iterables)
2. Accept an optional Arrow schema for typed column projection
3. Work with the same Generator/AsyncGenerator interfaces the library already exposes

Since Spliterator already uses a streaming/iterator model, this would fit naturally — Parquet row groups are already designed to be read incrementally. The Apache Arrow JS bindings ([@apache-arrow/\*](https://www.npmjs.com/package/@apache-arrow)) provide Parquet read support that could be wrapped behind Spliterator\'s existing interfaces.

Would a PR along these lines be welcome? I\'d be happy to contribute an initial implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: streaming Parquet / Arrow IPC support #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature request: streaming Parquet / Arrow IPC support #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions