Contributor
Anna Kondratjeva

Improving language pairs by mining MediaWiki Content Translation postedits


Mentors
Mikel Forcada, Francis Tyers
Organization
Apertium

The purpose of this proposal is to create a toolbox for automatic improvement of lexical component of a language pair. This toolbox might become a great way of improving language pairs by filling gaps in dictionaries and reducing the amount of human work at the same time. Even the released Apertium pairs are not perfect and sometimes do mistakes that can be easily fixed.

The idea is to mine existing machine translation postediting data in Mediawiki Content Translation, extract a set of potential postediting operators and then study and turn these operators into information that can be inserted in Apertium language pair (in form of monodix/bidix entries, lexical selection rules, transfer rules and so on).