Contributor
Chanchal Kumar Maji

Improving and adding more functionality to the TensorFlow-Datasets library with the addition of DatasetBuilders for important research datasets.


Mentors
Vikram Tiwari, Marcin Michalski
Organization
TensorFlow

Tensorflow Datasets or tfds makes the work of the user easier by transforming the raw dataset into a standard format so that it can be immediately fed into the machine learning pipeline. This library handles the downloading of data, transforming it into a standard format, as well as preparing and constructing it as a tf.data.Dataset - so that building data pipelines and dividing records into training and testing splits is straightforward. No preprocessing is necessary from the user side. Each dataset is implemented as a subclass of DatasetBuilder which handles the dataset appropriately. There are a lot of good quality research datasets for which DatasetBuilders are still needed to be implemented.
This proposal is based on adding more functionality to TensorFlow-Datasets library and creating DatasetBuilders for some common important research datasets.