Contributor
donghun lee

Multimodal television show segmentation


Mentors
Tim Groeling, Francis Steen, Kai Chan
Organization
Red Hen Lab

I aim to build a general system that detects natural boundaries of TV shows. This task has long been under the realm of manual approach by skilled workers, but the recent development in machine learning may offer a new opportunity where the system uses the multimodal cues of videos like humans would. The final product will be a Python-based classifier that takes video data as input and produce as output for each decision unit (1 second of frames) whether there has been a change of show (from one program to another). The performance of the system will be measured against test data for which the show boundaries had been annotated by manual efforts. The performance benchmark will be whatever best detector the Red Hen Lab provides (e.g. cc-keyword-spacing). As a corollary to solving the main task, I aim to develop a system that automatically generates meta information (indexing/tagging), the subtask noted in the original problem description.