Apache DataFusion
Foundational library for DBs and compute engines
Foundational library for DBs and compute engines
DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format.
“Out of the box,” DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. Python Bindings are also available.
DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can customize DataFusion at almost all points including additional data sources, query languages, functions, custom operators and more.
Contributor Guidance