Contributor: Qingyue Xu

Parser and Crawler for Biodiversity checklists

Mentors: Narayani Barve, rohitmg, Vijay Barve, Thomas Vattakaven
Organization: R project for statistical computing

Compiling taxonomic checklists from varied sources of data is a common task that biodiversity informaticians encounter. Data for checklists usually occur within textual formats and significant manual effort is required to extract taxonomic names from a given text into a tabular format. Textual data in sources such as research publications and websites, frequently also contain additional attributes like synonyms, common names, higher taxonomy, and distribution. This project aims to facilitate a quick extraction of textual data into tabular lists from various sources including given files in different formats, as well as biodiversity websites which have an iterative subpage structure. It realizes an easy aggregation of biodiversity data in a structured format that can be used for further processing and upload onto data aggregation initiatives and help in compiling biodiversity data.