For GSoC2017, I intend to use the Shogun library on health data and show the usefulness of machine learning in applications that could save people's lives and benefit society. More specifically, I want to focus on analyzing health data for applications such as clinical decision support and mortality prediction. The dataset I will work with is the MIMIC database, which is comprised of information relating to patients admitted to the ICU at a large hospital. The data mainly includes demographic, administrative, and clinical data from over 45,000 critical care patients. The project will be divided into two parts: In the first part, I plan to perform data cleaning and apply various machine learning algorithms on the MIMIC dataset for mortality prediction, predict the risk of developing certain diseases, determine the effectiveness of certain drugs, and more. In the final part, I will explore more novel methods like LSTMs to exploit the time-series data. Recent research has shown good results of using deep learning on electronic health records.


Olivier Nguyen


  • Heiko Strathmann
  • Lea Goetz