A typical workflow in interactive data analysis consists of :
- Loading data (e.g. a CSV on disk)
- Transforming the data
- Various data processing stages
- Storing the result in some form (e.g. in a database).
The goal of this project is to provide a unified and idiomatic Haskell way of carrying out these tasks. Informally, you can think of “dplyr”/“tidyr” from the R ecosystem, but type safe. This project aims to provide a library with the following features:
- An efficient data structure for possibly larger-than-memory tabular data. The
Frameslibrary is notable prior work, and this project may build on top of it (namely, by extending its functionality for generating types from stored data).
- A set of functions to “tidy”/clean the data to bring it to a form fit for further analysis, e.g. splitting one column to multiple columns (“spread”) or vice versa (“gather”).
- A DSL for performing a representative set of relational operations e.g. filtering/aggregation.