Operator Based Machine Learning Pipeline Construction
- Mentors
- Lars Kotthoff, Bernd Bischl
- Organization
- R project for statistical computing
The package mlr
is a comprehensive machine learning toolkit for R, providing a standardized interface to over sixty machine learning R packages, in combination with a wide range of features related to visualization, data manipulation, model evaluation and selection, and parameter tuning. Even though mlr offers the possibility of performing automatic data preprocessing when applying a machine learning algorithm, the current implementation is relatively limited in scope and functionality. This project seeks to extend mlr's capability in this regard, by developing a supplementary package mlrCPO
with an API that gives more flexibility to the user, and by providing access to a wider range of preprocessing methods. The project introduces a first-class CPO
("Composable Preprocessing Operator") object that represents a particular data transformation procedure, and which can be organized in pipelines using a composition operator %>>%
. Many CPO
classes implementing the most popular and widely used preprocessing methods are implemented.