Contributor: Sumit Vohra

Multimodal Egocentric Perception (with video, audio, eyetracking data)

Mentors: Mehul Bhatt, skrish13, Jakob Suchan
Organization: Red Hen Lab

Hey, I have been in constant touch with Mehul regarding my project on Multi-modal Egocentric Perception. I have already had a skype meet with him before drafting this final pre-proposal.

Abstract: The idea of the project is to introduce multimodality in recognizing everyday activities and scenes. As per today, no work has been done now which includes multimodality into account (especially audio ) to determine the kind of activities and scenes that person is involved when it comes to egocentric perception. I have already built an audio-based model based on popular IEEE-DCASE-challenge which can successfully classify scenes into categories like ( person is walking in a park, driving in a car ) for egocentric views. I plan to extend my work as part of gsoc-2k18 to incorporate my model with video-based models and increase the scope of model from scenes to scenes+activities. The final breakdown of the steps is submitted in the pre-proposal attached above. The idea is to built pyscene-detect for egocentric videos, which would be a prominent contribution owing to growing research in the area of first person view videos.