Leveraging Morphological Data from Linguistic Software Tools for Computational Resource Generation
- Mentors
- Daniel Swanson, Flammie
- Organization
- Apertium
- Technologies
- python
- Topics
- machine translation, language technology, Less-resource languages
This proposal aims to leverage the language documentation data compiled by linguists in popular fieldwork software tools for extraction of morphological data that can be integrated into the Apertium platform. While linguists are able to build elaborate descriptions of a language, including detailed morphological analysis, using software tools such as FLEx and TLex, this data is seldom used in the generation of computational tools for languages. Data from FLEx and TLex can be exported in XML format and this data can then be used to extract information that can be used in the creation of a monolingual dictionary in Apertium, such as head words, parts of speech, morphological segmentation, and more depending on the information detailed by the linguist(s) working on the data.