Contributor
Vidit Agarwal

Quality Control: Consensus


Mentors
Maxim, Roman Donchenko
Organization
CVAT
Technologies
python, django, typescript
Topics
web, statistics, Data Driven Machine Learning
Problems faced while developing labelled data: 1. Annotations made by a single person can include their biases into the data and eventually into the model 2. There are subjective tasks at times, which might not have a clear answer associated with them 3. The size of the dataset might be huge enough for a few people to annotate To resolve this, a solution is to let multiple people annotate the data and then aggregate the annotations based on a consensus. Crowdsourcing has become an essential paradigm for efficiently labelling large datasets, especially in the era of data-driven ML solutions. However, the challenge arises in aggregating the labels from a diverse pool of annotators, each with varying reliability. This project aims to develop solutions to handle the above problems and integrate their solution into CVAT. It