A free/open-source machine translation platform

Apertium is a primarily shallow-transfer machine translation system, which uses finite state transducers for all of its lexical transformations, and hidden Markov models and/or constraint grammars for part-of-speech tagging or word category disambiguation.

Existing machine translation systems available at present are mostly commercial and use proprietary technologies, which makes them very hard to adapt to new usages; furthermore, they use different technologies across language pairs, which makes it very difficult, for instance, to integrate them in a single multilingual content management system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are available for only a few languages.

Apertium uses language-independent formalisms to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth.

At present, Apertium has released around 50 stable language pairs, delivering fast translation with reasonably intelligible or excellent results depending on the language pair. Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project.

lightbulb_outline View ideas list

Technologies

  • c++
  • python
  • bash
  • xml
  • javascript

Topics

  • Other
  • machine translation
  • natural language processing
  • less-resourced languages
comment IRC Channel
email Mailing list
mail_outline Contact email

Apertium 2021 Projects

  • Daniil Ignatiev
    A morphological analyzer for Bagvalal
    Bagvalal is an endangered typologically rare Caucasian language from the Nakh-Daghestanian family. Its conservation and study are constrained by the...
  • Anuradha Pandey
    Adopt an unreleased language pair, Hindi-Bhojpuri
    I plan on developing the Bhojpur-Hindi language pair in both directions i.e. bho-hin and hin-bho. This will involve building a monolingual...
  • Gourab Chakraborty
    Adopting the Hindi-Bengali language pair (unreleased language pair).
    In this project, I aim to create a hin-ben repository in Apertium that also includes the task of creating/expanding the transfer rules, creating the...
  • Omkar Prabhune
    Apertium Browser Plugin
    My project has been to develop the Apertium Browser Plugin. The previous Geriaoueg plugin is out of date, with the official link given in the wiki...
  • Azamat Akimniyazov
    Develop a prototype MT system for a strategic language pair uzb->kaa
    In this project I'm going to continue developing translation pair Uzb-Kaa languages. In the list of different pairs of Turkic languages, I analyzed...
  • Timo Rantakaulio
    Finnish, Olonets-Karelian and Karelian lexicon development
    The three languages that this application targets are closely related Balto-Finnic languages spoken in geographical proximity to one another. Finnish...
  • Okonkwo Ifeanyichukwu
    Ideas for Google Summer of Code/Morphological analyser
    • Creating a high-accuracy morphological analyser for Ibo by contributing to the currently existing one; • Increasing WER on the eng-ibo pair...
  • Kamush
    Implementing new language pair: Kazakh - Uzbek
    Having seen the benefits of the open-source Rule-Based Machine Translation platform - Apertium as an alternative to other free/commercial online...
  • Daniel Swanson
    Unipertium
    3 mostly unrelated smaller projects that all happen to start with "uni": UNIcode, UNIt testing, and UNIversal dependencies transfer (the latter being...
  • naan_dhaan
    User friendly lexical training
    The procedure for lexical selection training is a bit messy, with various scripts involved that require lots of manual tweaking, and many third party...
close

2021