CuckooML will deliver a mechanism to find similarities between malware through analysing reports about them. Additionally, the software will be able to detect new types of malware - reports not similar to anything seen before (anomalies) - hence, it will constitute an invaluable tool for security researchers.
Through the project a state of the art data science and machine learning approaches will be developed and integrated into Cuckoo package being accessible through both: command-line toolkit and as a web based interface.
First and foremost, a set of features that well represents the diversity of malware reports will be engineered; it is a common knowledge that the classification - clustering in particular - can only be as good as the features that are used.
The development will be focused on fuzzy (soft) clustering approaches that can be easily calibrated to maximise their performance. Additionally, the probabilities returned by the classifier will be adapted to serve as novelty detection mechanism.
Finally, developed approach will be versatile in a sense that user will be able to choose a context based on which clustering is performed e.g. platforms at risk, type of threat, etc.