EDGAR-crawler 2.0: Enhancing Information Extraction of Company Reports
- Mentors
- Lefteris Loukas, Ion Androutsopoulos
- Organization
- Open Technologies Alliance - GFOSS
- Technologies
- python, Transformers, Regex, Gradio
- Topics
- natural language processing, fintech, Large Language Models, Information Extraction
This project aims to expand the information extraction capabilities of the EDGAR-crawler, a project which allows users to download different types of company-published reports on the SEC managed platform EDGAR. More specifically, the project will add support for the 10-Q and 8-K report types. Additionally, it is planned to explore the possibilty of using large language models in order to automatically create regular expressions to support the current workflow in two ways: First, when an already supported extraction mechanism is facing a report with structural errors, using automatic regex generators could fix the problem online. Second, using this technology would allow for the addition of many more report types to the crawler much more quickly than if it were done manually.