Abstract
Despite the progress made during recent years in video understanding, extracting relations among actors in a video is still a largely unexplored area. In this chapter, we review one of the ?rst studies towards learning such relations from videos using visual and auditory cues. The main contribution can be stated as the association of low-level video features to social relations by machine learning methodology. Specifically, support vector regression is leveraged to estimate local grouping cues from low-level visual and auditory features. These locally defined grouping cues are then synthesized to derive the affinity between actors. Finally, the social network defined by the resulting affinity is analyzed to ?nd communities of actors and identify the leader of each community. Furthermore, as an extension to the basic framework, we discuss the relationship between visual concepts and social relations. We demonstrate the performance of these approaches on a set of videos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The movies in or dataset are (1) G.I. Joe: The Rise of Cobra (2009); (2) Harry Potter and the Half-Blood Prince (2009); (3) Public Enemies (2009); (4) Troy (2004); (5) Braveheart (1995); (6) Year One (2009); (7) Coraline (2009); (8) True Lies (1994); (9) The Chronicles of Narnia: The Lion, the Witch and the Wardrobe (2005); and (10) The Lord of the Rings: The Return of the King (2003) .
- 2.
In movie (10), Gollum has a good personality except for when he is close to the ring. The ring changes the good behavior of the actors to bad except for Frodo.
- 3.
Ground truth leaders are: (1) Duke and McCullen; (2) Harry and Snape; (3) Dillinger and Purvis; (4) Achilles and Hector; (5) Wallace and Longshanks; (6) Zed and King; (7) Coraline and Other Mother; (8) Harry and Salim; (9) Aslan and Witch; and (10) Frodo and Witch-king.
References
Al-Hames, M., Lenz, C., Reiter, S., Schenk, J., Wallhoff, F., Rigoll, G.: Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden markov model. In: International Conference on Image Processing (2007)
Ali, S., Basharat, A., Shah. M.: Chaotic invariants for human action recognition. In: IEEE International Conference on Computer Vision (2007)
Alon, J., Athitsos, V., Yuan, Q., Sclaroff, S.: A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1685–1699 (2009)
Arandjelović, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: ACM International Conference on Image and Video Retrieval (2005)
Chen, J., Zaiane, O., Goebel, R.: Detecting communities in social networks using max-min modularity. In: SIAM Conference on Data Mining (2009)
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: alignment and parsing of video and text transcription. In: European Conference on Computer Vision (2008)
Ding, L., Fan, Q., Hsiao, J., Pankanti, S.: Graph based event detection from realistic videos using weak feature correspondence. In: International Conference on Acoustics, Speech, and Signal Processing (2010)
Ding, L., Yilmaz, A.: Learning relations among movie characters: a social network perspective. In: European Conference on Computer Vision (2010)
Ding, L., Yilmaz, A.: Inferring social relations from visual concepts. In: International Conference on Computer Vision (2011)
Dufrenois, F., Colliez, J., Hamad, D.: Crisp weighted support vector regression for robust single model estimation: application to object tracking in image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)
Eagle, N., Pentland, A.: Eigenbehaviors: identifying structure in routine. Behav. Ecol. Sociobiol. 63(7), 1057–1066 (2009)
Eagle, N., Pentland, A., Lazer, D.: Inferring social network structure using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision (2003)
Fan, Y., Shelton, C.R.: Learning continuous-time social network dynamics. In: Conference on Uncertainty in Artificial Intelligence (2009)
Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: a first-person perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Freeman, L.: Centrality in social networks: conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)
Ge, W., Collins, R., Ruback, B.: Automatically detecting the small group structure of a crowd. In: IEEE Workshop on Applications of Computer Vision (2009)
Holden, C.: Giving girls a chance: patterns of talk in co-operative group work. Gend. Educ. 5(2), 179–189 (1993)
Jiang, H., Fels, S., Little, H.: A linear programming approach for multiple object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)
Kusakunniran, W., Wu, Q., Zhang, J., Li, H.: Support vector regression for multi-view gait recognition based on local motion feature selection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Kyriazis, N., Argyros., A.: Physically plausible 3d scene tracking: the single actor hypothesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Laptev, I., Lindeberg, T.: Space-time interest points. In: IEEE International Conference on Computer Vision (2003)
Lin, J., Wang, W.: Weakly-supervised violence detection in movies with audio and video based co-training. In: Pacific-Rim Conference on Multimedia (2009)
Lu, Z., Carreira-Perpinan, M.A.: Constrained spectral clustering through affinity propagation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conferences on Artificial Intelligence (1981)
Myhill, D.: Bad boys and good girls? patterns of interaction and response in whole class teaching. Br. Educ. Res. J. 28(3), 339–352 (2002)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Pei, M., Dong, Z., Zhao, M.: Event recognition based on social roles in continuous video. In: IEEE International Conference on Multimedia and Expo (2013)
Qiu, J., Lin, Z., Tang, C., Qiao, S.: Discovering organizational structure in dynamic social network. In: IEEE International Conference on Data Mining (2009)
Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discovery in human events. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Rasheed, Z., Shah, M.: Movie genre classification by exploiting audio-visual features of previews. In: International Conference on Pattern Recognition (2002)
Ruhnau, B.: Eigenvector-centrality? a node-centrality. Soc. Netw. 22(4), 357–365 (2000)
Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (1994)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Song, Y., Morency, L.-P., Davis, R.: Action recognition by hierarchical sequence summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Sugiyama, M.: Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. J. Mach. Learn. Res. 8, 1027–1061 (2007)
Wang, G., Gallagher, A., Luo, J., Forsyth, D.: Seeing people in social context: recognizing people and social relationships. In: European Conference on Computer Vision (2010)
Wasserman, S., Faust, K., Iacobucci, D.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)
Weng, C.-Y., Chu, W.-T., Wu, J.-L.: Rolenet: Movie analysis from the perspective of social networks. IEEE Trans. Multimedia 11(2), 256–271 (2009)
Yanagawa, A., Chang, S.-F., Kennedy, L., Hsu, W.: Columbia university’s baseline detectors for 374 lscom semantic visual concepts. Technical report, Columbia University (2007)
Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: A bayesian approach toward finding communities and their evolutions in dynamic social networks. In: SIAM Conference on Data Mining (2009)
Yilmaz, A., Shah, M.: Recognizing human actions in videos acquired by uncalibrated moving cameras. In: International Conference on Computer Visioniccv (2005)
Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Comput. Vis. Image Underst. 109(3), 335–351 (2008)
Yu, T., Lim, S.-N., Patwardhan, K., Krahnstoever, N.: Monitoring, recognizing and discovering social networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimedia 8(4), 686–697 (2006)
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered hmms. IEEE Trans. Multimedia 8(3), 509–520 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ding, L., Yilmaz, A. (2014). Learning Social Relations from Videos: Features, Models, and Analytics. In: Fu, Y. (eds) Human-Centered Social Media Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-05491-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-05491-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05490-2
Online ISBN: 978-3-319-05491-9
eBook Packages: Computer ScienceComputer Science (R0)