The proposal aims to identify elements of co-speech gestures in a massive data of television news. The steps will include building a flawed data-set, which can be then manually corrected for blended classic joint attention scenarios. I have two co-speech gestures in mind, which include one trivial and one complex implementation both aiming at the movement of head and gaze direction.

1.“Yes/No” gesture with shaking the head horizontally or vertically. 2.Fast bobbing head with slow closing of eyes, as a gesture of understanding.

Other than identification of the above to co-speech gestures, the project aims to create a feedback mechanism for improvement of the detection of such gestures. Also it attempts to work on better algorithms for emotion/gaze direction etc. detection. Identification of such gestures would be done first by implementing on some classifiers I already know about, including the “Haar-Classifier” and the rest of the project would concentrate on building and training one’s own classifier.



Soumitra Agarwal


  • Mark Turner