[https://docs.google.com/presentation/d/1JA2JX7q7UUHnKlITNteM68gL-rd0_tSMhT7zX9tMueE/edit?usp=sharing](https://docs.google.com/presentation/d/1JA2JX7q7UUHnKlITNteM68gL-rd0_tSMhT7zX9tMueE/preview?usp=sharing)
Best practice for creating GeoParquet files
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/distributing-geoparquet.md#best-practices-for-distributing-geoparquet see the TL;DR:
- Use zstd for compression, and set the compression level to 15.
- Be sure to include the bbox covering, and use GeoParquet version 1.1.
- Spatially order the data within the file.
- Set the maximum row group size between 50,000 and 150,000 per row.
- If the data is larger than ~2 gigabytes consider spatially partitioning the file
- Use STAC Metadata metadata to describe the data.
The demo I used
so that you can read/write the low-level details of a parquet file is with PyArrow
Other links
<aside>
🏋🏼♀️
A small (formative) exercice
Construct a GeoParquet file of all the buildings in NL containing only a few attributes, and check how many there are + what is the average height/area of them.
With DuckDB.
⇒ Why 11.8M buildings while there are ~11M in the BAG?
</aside>