The purpose of this GSoC project will be to modernize Pattern, a Python library for machine learning, natural language processing (NLP) and web mining. A substantial part of this undertaking is to port a majority of the code base to Python 3. This involves porting the individual modules and sub–modules piece by piece, where the whole process will be guided by unit tests. In the beginning, I will remove all tests from the pipeline that do not pass for Python 3 and take this pared–down code base as a starting point, porting parts of the code and putting the respective unit tests back in as I go along. Missing unit tests must be added before moving on. Since porting Python 2 code to Python 3 code is a standard problem for the Python community, there are many different tools available that can help in this regard. In addition to that, I'd like to extend this project to a bit of a Hausmeister project (housekeeping for Pattern), and optimize/modernize the code base in terms of execution speed, memory usage and documentation.


Markus Beuckelmann


  • Guy De Pauw
  • Tom De Smedt
  • Walter Daelemans