Improving and adding more functionality to the TensorFlow-Datasets library with the addition of DatasetBuilders for important research datasets.
- Mentors
- Vikram Tiwari, Marcin Michalski
- Organization
- TensorFlow
Tensorflow Datasets or tfds makes the work of the user easier by transforming the raw dataset into a standard format so that it can be immediately fed into the machine learning pipeline. This library handles the downloading of data, transforming it into a standard format, as well as preparing and constructing it as a tf.data.Dataset - so that building data pipelines and dividing records into training and testing splits is straightforward. No preprocessing is necessary from the user side. Each dataset is implemented as a subclass of DatasetBuilder which handles the dataset appropriately. There are a lot of good quality research datasets for which DatasetBuilders are still needed to be implemented.
This proposal is based on adding more functionality to TensorFlow-Datasets library and creating DatasetBuilders for some common important research datasets.