Contributor
Shashwat Goel

Bilingual Dictionary Discovery via Graph Exploration


Mentors
Mikel Forcada, Jorge Gracia
Organization
Apertium

A crucial step in developing a language pair is writing its bilingual dictionary, which maps a lemma X in language A to a lemma Y in language B if X and Y have the same meaning and lexical analysis.

Apertium’s existing dataset has appreciable amounts of data, which can be visualized in a graph-like structure, previously done in the Apertium RDF project (Jorge Gracia et al., 2016). In essence, each pair is visualized as a ‘vertex’ on a graph, with an ‘edge’ between vertices that exist as a translation on Apertium. By exploring this graph, new translations can be inferred. In particular, bilingual dictionaries can be generated for any two languages that do not have an existing language pair but independently exist on Apertium.

It is notable that polysemy diminishes the use of a simple transitive relation. Apertium’s existing “Crossdics” tool follows this model.

My proposal will build upon the cycle density algorithm (M. Villegas et al., 2016) that defines confidence metrics in inferred edges based on cycle density. The underlying concept is that a cycle in the graph increases the confidence that vertices on the cycle are valid translation pairs.