-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Labels
documentationImprovements or additions to documentation (not KDocs)Improvements or additions to documentation (not KDocs)
Milestone
Description
About Parquet
Apache Parquet is an open-source, column-oriented data file format designed for efficient data storage and retrieval. It provides several advantages:
- Columnar storage: Data is stored column-by-column, which enables efficient compression and encoding schemes
- Schema evolution: Supports adding new columns without breaking existing data readers
- Efficient querying: Optimized for analytics workloads where you typically read a subset of columns
- Cross-platform: Works across different programming languages and data processing frameworks
- Compression: Built-in support for various compression algorithms (GZIP, Snappy, etc.)
Parquet files are commonly used in data lakes, data warehouses, and big data processing pipelines. They're frequently created by tools like Apache Spark, Pandas, Dask, and various cloud data services.
Typical use cases
- Exchanging columnar datasets between Spark and Kotlin/JVM applications.
- Analytical workloads where columnar compression and predicate pushdown matter.
- Reading data exported from data lakes and lakehouse tables (e.g., from Spark, Hive, or Delta/Iceberg exports).
Android Compatibility
If you need to process Parquet files in an Android application, consider:
- Processing files on a server and exposing the data via an API
- Converting Parquet files to a supported format (JSON, CSV) for Android consumption
- Using cloud-based data processing services
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentation (not KDocs)Improvements or additions to documentation (not KDocs)