Machine Learning for Anomaly Detection in Open Source Communities
- Mentors
- Sean Goggins, Carter Landis, Gabe Heim
- Organization
- CHAOSS
Open-source software development is a collaborative effort that requires decentralized decision making from different developers and maintainers. In order to measure the progress of the project, It is important to quantify the code changes across time. CHAOSS provides analytics and metrics to help open source communities measure the impact of the developer’s work on the project and the impact of the project on the community. Augur is a prototyped implementation of the CHAOSS Project on open source software metrics which systematically integrates data from several open-source repositories, issue trackers, mailing lists, etc. Anamoly detection is a common data science strategy of finding extreme data points (outliers), whose features differ vastly from other normal data points. From an open-source software development perspective, it detects unusual surges and drops in development activities like code-commits, pull-requests, etc. This project aims to identify the different types of anomalies that are available from trace data and deliver a personalized notification to the user using several machine learning techniques.