tf.data.Dataset is the defacto way of loading and preprocessing data in TensorFlow for training machine learning models on CPUs, GPUs and TPUs. There are a range of advantages available with the tf.data API suitable for distribution of workers using various tf.distribute.Strategy(s). Whereas, the keras_preprocessing framework is popularly used to load and preprocess data for keras Model(s). It comprises a bunch of classes (inherited from keras.utils.Sequence) based on NumPy and SciPy implementations making it unsuitable for either multi-worker strategies or prefetching.

Recently, TensorFlow has introduced the Preprocessing Layers API that allows preprocessing layers to be serialized with the model itself. It is fundamentally aimed at unifying the functionalities available within keras.preprocessing. In this project, we would implement the API specifications (related to image operations by implementing ImagePipeline class) as discussed in the latest Keras RFC which proposes changes and redesign of the Keras Preprocessing API in favour of performance and usability.

Organization

Student

Swarup

Mentors

  • Paige Bailey
  • Zhenyu Tan
close

2020