Contributor
Parth Kapadia

Text-Extraction Libraries


Mentors
James Aylett, Bruno Baruffaldi
Organization
Xapian Search Engine Library

Project: Text-Extraction Libraries

Currently, Omega has support for various file formats such as .htm, .html, .pdf, .csv etc. This project will focus on adding functionality for various other file formats to Omega.

  • The project is based on adding support for extracting data from various file formats using some external filters or shared libraries.
  • There has already been a GSoC 2019 project, which defined a way to safely use external libraries.
  • This project will be built on top of this and proposes to add support for various other file formats.