Tokenization for spaceless orthographies in Japanese
- Mentors
- Kevin Brubeck Unhammer
- Organization
- Apertium
- Technologies
- python, c++, xml
- Topics
- machine learning, nlp
Investigating the suitable tokenizer for east/south Asian languages which usually do not use spaces and implementing it. Besides, improving Japanese-related files.