Creation of an NLP Toolkit to deal with Indian languages. I shall start with Hindi first and then follow it up with Bengali over the course of summer. The basic features which I want to implement in both these languages is

  • An Indian Tokenizer
  • A tool for non-contextual normalisation,
  • A POS Tagger (pre-trained, slow and fast tagger)
  • A Lemmatizer (a backoff Lemmatizer based on a Dictionary Lookup method)

The proposed timeline is to complete Hindi over the course of summer. Bengali is a future/additional deliverable.

Student

djokester

Mentors

  • Pawan Goyal
close

2017