Categorical data support for Daru, Statsample and Statsample-glm
- Mentors
- zverok, Alexej Gossmann, Rodrigo Botafogo, Sameer Deshmukh
- Organization
- Ruby Science Foundation
In Data Science acquiring insight from data is what matters and any good data analysis tool must have the key functionalities required to extract information from data. Now, it’s needless to say how ubiquitous categorical data is in Data Analysis but they are not currently supported in Daru and operations on them like regression is not supported in Statsample and Statsample-glm. This shortcoming of Daru prevents data to be easily and to its full extent analyzed. This project aims to achieve just that.
This project has broadly two goals:
- First is to efficiently store and manage categorical data in a Data Frame. This will be done with the help of a new data type CategoricalData and a class CategoricalIndex dedicated for this purpose. It also involves easy visualization of categorical data with help of plotting functionality.
- Next goal is to update various statistical analysis such as regression in Statsample and Statsample-glm taking categorical data into account.
On accomplishing both of these tasks, one would be able to see Data more clearly.