A free/open-source machine translation platform

Technologies
python, c++, xml, bash
Topics
natural language processing, machine translation, less-resourced languages
A free/open-source machine translation platform

Apertium is a shallow-transfer machine translation system, which uses finite state transducers for all of its lexical transformations, and hidden Markov models and/or constraint grammars for part-of-speech tagging or word category disambiguation.

Existing machine translation systems available at present are mostly commercial and use proprietary technologies, which makes them very hard to adapt to new usages; furthermore, they use different technologies across language pairs, which makes it very difficult, for instance, to integrate them in a single multilingual content management system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are available for only a few languages.

Apertium uses a language-independent specification to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth.

At present, Apertium has released more than 40 stable language pairs, delivering fast translation with reasonably intelligible or excellent results depending on the language pair. Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project.

2020 Program

Successful Projects

Contributor
Shashwat Goel
Mentor
Mikel Forcada, Jorge Gracia
Organization
Apertium
Bilingual Dictionary Discovery via Graph Exploration
A crucial step in developing a language pair is writing its bilingual dictionary, which maps a lemma X in language A to a lemma Y in language B if X...
Contributor
Priyank Modi
Mentor
Anastasia Kuznetsova, Francis Tyers
Organization
Apertium
Adopt an unreleased language pair : Hindi-Punjabi
I plan on developing the Hindi-Punjabi language pair in both directions i.e. hin-pan and pan-hin. This'll involve improving the monolingual...
Contributor
Elmurod Kuriyozov
Mentor
Jonathan Washington, sevilay bayatli
Organization
Apertium
State-of-the-art Morphological Analayser for Uzbek language and improved language pairs uz-kk, uz-ky, uz-tr.
Creating the State-of-the-art HFST-based Morphological Analayser for Uzbek language, contributing on the Karakapak and Uyghur Morphological...
Contributor
Hèctor Alòs Font
Mentor
Gianfranco Fronteddu, Xavi Ivars
Organization
Apertium
Adopting the French-Arpitan language pair
I propose to create a bidirectional French-Arpitan translator. Arpitan (often called Franco-Provençal) is an endangered and heavily under-resourced...
Contributor
Amirniyaz Mambetniyazov
Mentor
Jonathan Washington, sevilay bayatli
Organization
Apertium
Adopting an unreleased language pair of Uzb-> Kaa
In this project I am going to create a new language pair uzb-kaa. Last year I have helped with translations to GSoC 2019 student,as I am native...
Contributor
Khalid Alnajjar
Mentor
Jackrueter, Daniel Swanson
Organization
Apertium
Extending Ve’rdd for Apertium Needs
This proposal is targeted to the task named "A Web Interface to expanding dictionary lemmas integrate with GitLab/GitHub". As I have already...
Contributor
Tanmai Khanna
Mentor
Flammie, Tino Didriksen
Organization
Apertium
Modifying the apertium stream format and solving the markup reordering problem using wordbound blanks
Markup handling has been a problem in Apertium for a long time. It was done using superblanks that encapsulate markup information inside them during...