This GSoC project aims to develop a tool for managing dataset and experimenting the data. This project is important because most of the time used during solving any machine learning problem takes place in data manipulation process. The better data guarantees to result in the better result. However, real-world data are not in same format and tend to be noisy with mistakes, outliers and missing variables. This project aims to provide user-friendly and easy-to-use command line application by having functionalities that monitor, convert wrongly labeled features, and reduce the noisy variables and etc. It will be followed by clear documentations to promote other users to utilize.




  • Tham