In many pairs it is useful to know in addition to the morphological tags of a word, syntactic function tags in order to make an adequate translation.
The shallow syntactic function labeller is a tool which takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a seq2seq model trained on prepared datasets, which were made from parsed syntax-labelled corpora (for instance, UD-treebanks).
The dataset for an encoder contains sequences of morphological tags, the dataset for a decoder contains sequences of labels, in both cases one sequence is a one sentence. The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string.
So, in the end of the work there will be:
- The labeller itself, which parses the string, restores a model for a needed language from a file, gives a sequence of tags to the model, gets a sequence of labels as an output and applies these labels to the original string
- Files with trained models, which are saved in a suitable format (it could be, for example, JSON)