Contributor
BinRuan

MariaDB Columnstore - Parquet support in cpimport - proposal - Bin Ruan


Mentors
Gagan Goel
Organization
MariaDB
Technologies
c++, git
Topics
database
The task is to make cpimport support parquet files as input or design a general framework to add new input formats. Considering cpimport finish its workflow by two stages, read and parse, so I plan to solve this problem from these two stages. For read stage, I will use some external library to help finish reading like `parquet-cpp`, `arrow` and so on. By using parquet-cpp, I can directly read parquet file and by using `arrow::table` I can store data in table format. Later, in parse stage, I plan to use table format data and parse it to insert data into target table. After finishing the task, cpimport can handle parquet file input or other poplular formats like arrow, avro and successfully insert data into the database.