A free/open-source machine translation platform

Technologies
python, javascript, c++, xml, bash
Topics
natural language processing, machine translation, less-resourced languages
A free/open-source machine translation platform

Apertium is a primarily shallow-transfer machine translation system, which uses finite state transducers for all of its lexical transformations, and hidden Markov models and/or constraint grammars for part-of-speech tagging or word category disambiguation.

Existing machine translation systems available at present are mostly commercial and use proprietary technologies, which makes them very hard to adapt to new usages; furthermore, they use different technologies across language pairs, which makes it very difficult, for instance, to integrate them in a single multilingual content management system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are available for only a few languages.

Apertium uses language-independent formalisms to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth.

At present, Apertium has released around 50 stable language pairs, delivering fast translation with reasonably intelligible or excellent results depending on the language pair. Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project.

2021 Program

Successful Projects

Contributor
Gourab Chakraborty
Mentor
Hèctor Alòs i Font
Organization
Apertium
Adopting the Hindi-Bengali language pair (unreleased language pair).
In this project, I aim to create a hin-ben repository in Apertium that also includes the task of creating/expanding the transfer rules, creating the...
Contributor
Omkar Prabhune
Mentor
Tino Didriksen, Xavi Ivars
Organization
Apertium
Apertium Browser Plugin
My project has been to develop the Apertium Browser Plugin. The previous Geriaoueg plugin is out of date, with the official link given in the wiki...
Contributor
Anuradha Pandey
Mentor
Hèctor Alòs i Font
Organization
Apertium
Adopt an unreleased language pair, Hindi-Bhojpuri
I plan on developing the Bhojpur-Hindi language pair in both directions i.e. bho-hin and hin-bho. This will involve building a monolingual...
Contributor
Kamush
Mentor
Sevilay Bayatli, Jonathan W
Organization
Apertium
Implementing new language pair: Kazakh - Uzbek
Having seen the benefits of the open-source Rule-Based Machine Translation platform - Apertium as an alternative to other free/commercial online...
Contributor
Okonkwo Ifeanyichukwu
Mentor
Nick Howell, Mikel Forcada, Jonathan W
Organization
Apertium
Ideas for Google Summer of Code/Morphological analyser
• Creating a high-accuracy morphological analyser for Ibo by contributing to the currently existing one; • Increasing WER on the eng-ibo pair...
Contributor
naan_dhaan
Mentor
Kevin Brubeck Unhammer
Organization
Apertium
User friendly lexical training
The procedure for lexical selection training is a bit messy, with various scripts involved that require lots of manual tweaking, and many third party...
Contributor
Daniel Swanson
Mentor
Tino Didriksen
Organization
Apertium
Unipertium
3 mostly unrelated smaller projects that all happen to start with "uni": UNIcode, UNIt testing, and UNIversal dependencies transfer (the latter being...
Contributor
Daniil Ignatiev
Mentor
Nick Howell
Organization
Apertium
A morphological analyzer for Bagvalal
Bagvalal is an endangered typologically rare Caucasian language from the Nakh-Daghestanian family. Its conservation and study are constrained by the...
Contributor
Timo Rantakaulio
Mentor
Jack Rueter, Flammie
Organization
Apertium
Finnish, Olonets-Karelian and Karelian lexicon development
The three languages that this application targets are closely related Balto-Finnic languages spoken in geographical proximity to one another. Finnish...
Contributor
Azamat Akimniyazov
Mentor
Sevilay Bayatli, Jonathan W
Organization
Apertium
Develop a prototype MT system for a strategic language pair uzb->kaa
In this project I'm going to continue developing translation pair Uzb-Kaa languages. In the list of different pairs of Turkic languages, I analyzed...