Contributor
Shashwat Singh

Porting Hardsubx to Rust


Mentors
Punit Lodha
Organization
CCExtractor Development
Technologies
c, rust
Topics
OCR, Port
The proposal outlines a high level plan to port the Hardsubx module to Rust. The proposal intends to port Hardsubx to Rust with the minimum amount of rewrites required while ensuring maximum memory safety through various stages. The final version of Hardsubx will have the exact same external interface as the old one. The live document is available in this url for feedback and changes: https://docs.google.com/document/d/1U9RqfbjfVNUkUYOBzT6yn1ANLGo_v58wtepslbQhtV4/edit?usp=sharing Edit: - Add support for tesseract cube engine for better performance in extracting burnt-in subtitles - Update subtitles for tests because the different tesseract engine will generate different subtitle files from before. Edit2: - add promising rust wrapper options for Tesseract, Leptonica, and ffmpeg Edit 3: - divide the timeline phases into mostly biweekly periods Edit 4: - add guarantees on subtitle quality in the deliverables - add contingency on LSTM replacing the current system. Now, in the experimentation period we need to determine if the computational tradeoff is worth it for using an LSTM. Edit 5: - add deliverables for first evals - add other possible commitments in the GSoC period