GeoParquet | Notion

[https://docs.google.com/presentation/d/1JA2JX7q7UUHnKlITNteM68gL-rd0_tSMhT7zX9tMueE/edit?usp=sharing](https://docs.google.com/presentation/d/1JA2JX7q7UUHnKlITNteM68gL-rd0_tSMhT7zX9tMueE/preview?usp=sharing)

Best practice for creating GeoParquet files

https://github.com/opengeospatial/geoparquet/blob/main/format-specs/distributing-geoparquet.md#best-practices-for-distributing-geoparquet see the TL;DR:

Use zstd for compression, and set the compression level to 15.
Be sure to include the bbox covering, and use GeoParquet version 1.1.
Spatially order the data within the file.
Set the maximum row group size between 50,000 and 150,000 per row.
If the data is larger than ~2 gigabytes consider spatially partitioning the file
Use STAC Metadata metadata to describe the data.

The demo I used

so that you can read/write the low-level details of a parquet file is with PyArrow

https://github.com/jorisvandenbossche/2025-cloudscaping-geoparquet-workshop

Other links

STAC & GeoParquet: https://bsky.app/profile/developmentseed.org/post/3lygffozqv22y
GeoParquet 2.0 vs Parquet: https://rednegra.net/blog/20250925-parquet-with-geometry-type-is-not-geoparquet/ + https://cloudnativegeo.org/blog/2025/02/geoparquet-2.0-going-native/
Nested columns are possible, see “bbox”
Arrow vs Parquet, see also https://geoarrow.org/format.html
🎧 A podcast about GeoParquet: https://mapscaping.com/podcast/geoparquet-for-beginners/

<aside> 🏋🏼‍♀️

A small (formative) exercice

Construct a GeoParquet file of all the buildings in NL containing only a few attributes, and check how many there are + what is the average height/area of them.

With DuckDB.

⇒ Why 11.8M buildings while there are ~11M in the BAG?

</aside>