Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video
 

Benjamin Laxton Jongwoo Lim David Kriegman
University of California, San Diego Honda Research Institute, Mountain View University of California, San Diego

CVPR 2007

 

Here we show a sequence of frames in which a cooking activity is taking place output by our activity recognition system. Below each frame is the current state estimate in the activity graph.

 

Abstract

 

We present a scalable approach to recognizing and describing complex activities in video sequences. We are interested in long-term, sequential activities that may have several parallel streams of action. Our approach integrates temporal, contextual and ordering constraints with output from low-level visual detectors to recognize complex, long-term activities. We argue that a hierarchical, object-oriented design lends our solution to be scalable in that higher-level reasoning components are independent from the particular low-level detector implementation and that recognition of additional activities and actions can easily be added. Three major components to realize this design are: a dynamic Bayesian network structure for representing activities comprised of partially ordered sub-actions, an object-oriented action hierarchy for building arbitrarily complex action detectors and an approximate Viterbi-like algorithm for inferring the most likely observed sequence of actions. Additionally, this study proposes the Erlang distribution as a comprehensive model of idle time between actions and frequency of observing new actions. We show results for our approach on real video sequences containing complex activities.

 

Citation

Benjamin Laxton, Jongwoo Lim and David Kriegman. Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In the IEEE International Conferance on Computer Vision and Pattern Recognition, 2007.

Paper

Paper pdf.

Overview Video

Microsoft MPEG4 (19.5 MB) with audio