Contributor
Dongge Liu

Project Proposal: Implement topic creation using machine learning


Mentors
Linas Valiukas
Organization
Berkman Klein Center for Internet and Society at Harvard University

While retrieving articles with simple boolean queries will suffer from a list of problems in general, several machine learning approaches can be applied to allow better performance in terms of accuracy and efficiency. In particular, instead of naively focusing on the existence of the topic in the article, techniques with machine learning can broaden the search range by including more relevant secondary keywords in addition to the various variation of the given topic. Meanwhile, methods (such as SVM, nearest neighbor) can also be employed to prune the fetched articles to filter out those are irrelevant to our topic despite containing the exact words on the topic. Furthermore, due to the fact that articles may use different choices of words over time, time series analysis needs to be applied to keep track of the evolution of one topic during a certain period of time. In this project, I plan to design and implement a proof of concept unsupervised machine learning approach to retrieve articles and prune the results, with the assistance of mathematical models.