Rasters are in essence structured multidimensional data structures. In this lecture we cover the Cloud Optimized Geotiff (COG), Zarr and HDF5/NetCDF file formats.

👨🏼‍🏫 Given by Maarten

Slides

GEO5019_Pronk

Summary

With the increase of size and availability of geospatial data, downloading/copying everything is not feasible anymore, and custom subsetting services become too expensive to scale/configure/maintain. The simplest, but also the most widely adopted technology to subset something is the HTTP range request (’97). However, now you only have a stream (vector) of bytes, so we need to change geospatial file formats to support this use case. Essentially, we need to add a (spatial) index, spatial coherence (tiling/chunking) and overviews (pyramids) are also good to have. Some file formats are flexible enough to support this directly (COG), but more generic multidimensional dataset such as HDF5/NetCDF aren’t. For those datasets, you can make external indexes (Kerchunk), or use a similar but exploded file structure (Zarr).

Dog is loading

HTTP.get(url; headers=["Range"=>"bytes=0-300000"])

dog.jpg

Further reading

On magic bytes as file format identifier https://en.wikipedia.org/wiki/List_of_file_signatures