The projects aim to design a system that clusters the images and the audio from the media broadcasts and then re-orders them accordingly in the red hen rapid annotator. The important part is to figure out the features algorithms and the thresholds to use. The project will start with data collection. Data from the previous redhen research will be enough for the task. The second step would be to do the pre-processing of the image and the audio data as it would lead to better performance of the system. The third step would be to find out the best features from the audio and the image data. And at the end first, the conventional algorithms would be used to make it as baseline performance and then the deep neural models will be used to cluster the data.