This proposal concerns the addition of pose data extraction using OpenPose and the generation of posture and gesture embeddings to Red Hen’s pipelines for processing the NewsScape datasets, and posture and gesture search by example and similarity search to the vitrivr video information retrieval software. The focus is on reorienting some of the existing work towards similarity search in a pilot production environment. Embeddings for static body and hand poses will be created using Spatial (Non-Temporal) Graph Convolutional Networks using triplet loss. Candidates for illustrative gestures will be segmented based upon manually engineered features on skeletons from OpenPose and then embedded using Spatial Temporal Graph Convolutional Networks combined with triplet loss. Functionality for querying using a webcam will be added to vitrivr, and the embedding pipeline will be integrated into cineast



Frankie Robertson


  • Heiko Schuldt
  • Mahnaz Parian