Contributor
Kaushlendra Pratap Singh

Copyright False Positive Detection Using ML


Mentors
HastagAB, Anupam Ghosh, GMishx, Vasudev
Organization
FOSSology

Fossology's copyright detection agent uses a rule-based approach to detect copyright statements but the agent is showing a lot of False Positives result. The way proposed to get better accuracy and results is by using ML techniques it. The most likely way is using NLP for data pre-processing and then making a Knowledge Base Relation corpus on top of which we can apply different ML algorithms and can improve up the accuracy. Applying checks on a different level and limiting down a statement to only the part where only the copyright statement is present will definitely help in reducing the False Positives.