Hardsubx - Extraction of burned in subtitles from videos
- Mentors
- Anshul
- Organization
- CCExtractor Development
I want to add the capability of extracting burned in (hard) subtitles from videos to CCExtractor. As of now, CCExtractor works by only extracting caption data in the video if it is present in specific structures in the stream, and skips the actual video data (pixels) completely. But a lot of videos have hard subtitles burned into them, extracting which is a computer vision problem, and something which CCExtractor did not earlier have the capability to process.
I want to create a system which will be able to extract well-formed subtitles from any input video, and be robust to the variations in font, size, color and language of the subtitles.