Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
Title | Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos |
Publication Type | Conference Papers |
Year of Publication | 2009 |
Authors | Gupta A, Srinivasan P, Shi J, Davis LS |
Conference Name | Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on |
Date Published | 2009/06// |
Keywords | (artificial, action, activity, analysis;integer, AND-OR, annotation;video, coding;, constraint;video, construction;semantic, extraction;graph, framework;plots, graph;encoding;human, grounded, intelligence);spatiotemporal, learning, meaning;spatio-temporal, model, phenomena;video, Programming, programming;learning, recognition;human, representation;integer, storyline, theory;image, understanding;visually |
Abstract | Analyzing videos of human activities involves not only recognizing actions (typically based on their appearances), but also determining the story/plot of the video. The storyline of a video describes causal relationships between actions. Beyond recognition of individual actions, discovering causal relationships helps to better understand the semantic meaning of the activities. We present an approach to learn a visually grounded storyline model of videos directly from weakly labeled data. The storyline model is represented as an AND-OR graph, a structure that can compactly encode storyline variation across videos. The edges in the AND-OR graph correspond to causal relationships which are represented in terms of spatio-temporal constraints. We formulate an Integer Programming framework for action recognition and storyline extraction using the storyline model and visual groundings learned from training data. |
DOI | 10.1109/CVPR.2009.5206492 |