Quality Control: Consensus
- Mentors
- Maxim, Roman Donchenko
- Organization
- CVAT
- Technologies
- python, django, typescript
- Topics
- web, statistics, Data Driven Machine Learning
Problems faced while developing labelled data:
1. Annotations made by a single person can include their biases into the data and eventually into the model
2. There are subjective tasks at times, which might not have a clear answer associated with them
3. The size of the dataset might be huge enough for a few people to annotate
To resolve this, a solution is to let multiple people annotate the data and then aggregate the annotations based on a consensus. Crowdsourcing has become an essential paradigm for efficiently labelling large datasets, especially in the era of data-driven ML solutions. However, the challenge arises in aggregating the labels from a diverse pool of annotators, each with varying reliability.
This project aims to develop solutions to handle the above problems and integrate their solution into CVAT. It