Rasters are in essence structured multidimensional data structures. In this lecture we cover the Cloud Optimized Geotiff (COG), Zarr and HDF5/NetCDF file formats.
👨🏼🏫 Given by Maarten
With the increase of size and availability of geospatial data, downloading/copying everything is not feasible anymore, and custom subsetting services become too expensive to scale/configure/maintain. The simplest, but also the most widely adopted technology to subset something is the HTTP range request (’97). However, now you only have a stream (vector) of bytes, so we need to change geospatial file formats to support this use case. Essentially, we need to add a (spatial) index, spatial coherence (tiling/chunking) and overviews (pyramids) are also good to have. Some file formats are flexible enough to support this directly (COG), but more generic multidimensional dataset such as HDF5/NetCDF aren’t. For those datasets, you can make external indexes (Kerchunk), or use a similar but exploded file structure (Zarr).
HTTP.get(url; headers=["Range"=>"bytes=0-300000"])

On magic bytes as file format identifier https://en.wikipedia.org/wiki/List_of_file_signatures