Multilingual Corpus Pipeline
- Mentors
- Peter Uhrig, Mark Turner, Francis Steen
- Organization
- Red Hen Lab
This project aims to build a pipeline for a searchable corpus on multiple languages. We will be using NewsScape data for the project and tools like SyntaxNet for dependency parsing and PoS tagging.