Making improvements in CRRao.jl
- Mentors
- Nosferican, Sourish Das
- Organization
- The Julia Language
- Technologies
- julia, GitHub Actions
- Topics
- Statistical Modeling, Statistical Inference
Overall, the proposal is about adding a few missing features in the package, like addition of new models, some metaprogramming, lazy evaluation, testing, documentation and linting.
Currently, the package only supports four regression models (Linear, Logistic, Poisson and Negative Binomial regression). There is a huge scope of adding many more models to the package. To start with, we can add the Gamma and Inverse Gaussian regression models, and the ARIMA model for time series. Over time, more models can be added to the package.
There is also a scope of adding new macros to the package to make the syntax more clean. An example of this is in the fitmodel function of the package (and is explained via a code snippet in the proposal). From the documentation, it can also be seen that any model has many attributes which are computed when the fit function is called. When working on big data, it is possible that not all the attribute values are needed, so it is currently inefficient to compute every attribute. Instead, one can compute the attributes lazily, i.e as and when they are needed. Rigorous unit testing should be added to the package to ensure code correctness. Also, the package code can use a lot of help with its documentation and code coverage. All of these can be implemented using CI/CD. Finally, in its current state, the package lacks strict code conventions (like variable names, commit message rules etc). These are essential to ensure good quality code and to make the job easier for future contributors. To enforce this, linting can be added to the code along with Git hooks to enforce these conventions. Also, a well defined API should be documented for the models, so that new models can be added at ease.