The increase in volume and diversity of malware attacks has created a difficult situation for security analysts. Now more than ever, automatic systems capable of large-scale data analysis and aggregation of multiple sources have become an industry necessity. Malware analytics is no longer the realm of the individual human analyst. The amount of newly collected malware samples grows every year and there is no sign that this trend is nearing its peak. Defending and fighting against malicious attacks requires tools and techniques to recognize their origin, family, actor, purpose and modus operandi. In this project, I will develop a system that can use a wide range of malware analytic outputs (source code, DNS/ASN data, peinfo results, yara rules, IPs, domains etc.) to automatically detect relationships between malicious objects in a database and assign a score to each detected relationship. This system should accept user requests for analytic jobs and utilize distributed machine learning techniques and tools in order to extract knowledge out of the malware-related Big Data available. The system output for the analyst should be a graphical representation of the generated results.



Donika Mirdita


  • webstergd
  • Huang3
  • Zachary Hanif