Contributor
Anna Kondratjeva

Implementing a shallow syntactic function labeller


Mentors
Mikel L. Forcada, Francis Tyers
Organization
Apertium

In many pairs it is useful to know in addition to the morphological tags of a word, syntactic function tags in order to make an adequate translation.

The shallow syntactic function labeller is a tool which takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a seq2seq model trained on prepared datasets, which were made from parsed syntax-labelled corpora (for instance, UD-treebanks).

The dataset for an encoder contains sequences of morphological tags, the dataset for a decoder contains sequences of labels, in both cases one sequence is a one sentence. The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string.

So, in the end of the work there will be:

  1. The labeller itself, which parses the string, restores a model for a needed language from a file, gives a sequence of tags to the model, gets a sequence of labels as an output and applies these labels to the original string
  2. Files with trained models, which are saved in a suitable format (it could be, for example, JSON)