The Data Retriever is a package manager for publicly accessible data. The Data retriever automatically finds, downloads, and pre-processes publicly available datasets and it stores these datasets in a ready-to-analyze state. A number of data providers require the use of an account with an associated Login or API key to access data programmatically. The Data Retriever currently has support for the Kaggle API allowing users to securely use the Data Retriever to install datasets hosted by Kaggle. The goal of this project is to find sources of public Data which require a Login/API key to access the data and integrate them into Data Retriever. Two APIs (Socrata and CKAN) have been thoroughly researched and can be added. The users will place the appropriate credentials in a file in their home directory. The Data Retriever will automatically identify the required credential files and handle the login/API request to download the dataset.

Organization

Student

Aakash Chaudhary

Mentors

  • Henry Senyondo
  • Ethan White
close

2021