Contributor
Humaid Kidwai

Cloud-native OGC SensorThings API


Mentors
Benjamin Pross
Organization
52°North Spatial Information Research GmbH
Technologies
java, spring, aws, postgres, GeoParquet, Apache Iceberg
Topics
sensor web, big data, geospatial, data lakehouse
OGC SensorThings API is an international standard to eliminate vendor lock-ins in IoT systems and to create an open geospatial ecosystem by defining a standard data model and describe how to retrieve the ingested data. However, the standard by itself does not specify a way to physically store the data. Most of the current server implementations are all based on Postgres databases using the PostGIS extension. Unfortunately, relational databases are not a great choice to store large volumes of data, which is often the case with IoT applications. As a result, retrieving large volumes of data from any of the open source SensorThings API servers is painfully slow. Modern data lakehouse standards and cloud-native geospatial file formats offer a scalable, modular, cost-effective and a much faster way to store and work with large volumes of geospatial data on the web. Specifically, Apache Iceberg is an open table format for organizing data lakes in object stores and ensuring ACID guarantees. GeoParquet is a cloud-native geospatial file encoding, based on a columnar data storage format for tabular data that significantly compresses the data and improves querying efficiency. A cloud native SensorThings API extension using Apache Iceberg could significantly enhance the standard's ability to ingest and aggregate large heterogeneous streams of sensor data. The proposal hence puts forth a design architecture for any SensorThings API server to use Iceberg to store and retrieve sensor data in a more efficient manner reducing memory overhead and network latencies when retrieving such data over the web. Eventually, as GeoParquet gets merged into Iceberg, the implementation will support a much faster alternative to existing SensorThings API implementations for handling sensor data at scale.