Creation of an NLP Toolkit to deal with Indian languages. I shall start with Hindi first and then follow it up with Bengali over the course of summer. The basic features which I want to implement in both these languages is
- An Indian Tokenizer
- A tool for non-contextual normalisation,
- A POS Tagger (pre-trained, slow and fast tagger)
- A Lemmatizer (a backoff Lemmatizer based on a Dictionary Lookup method)
The proposed timeline is to complete Hindi over the course of summer. Bengali is a future/additional deliverable.