Learning Social Relations from Videos: Features, Models, and Analytics

Ding, Lei; Yilmaz, Alper

doi:10.1007/978-3-319-05491-9_2

Lei Ding² &
Alper Yilmaz³

1976 Accesses
2 Citations

Abstract

Despite the progress made during recent years in video understanding, extracting relations among actors in a video is still a largely unexplored area. In this chapter, we review one of the ?rst studies towards learning such relations from videos using visual and auditory cues. The main contribution can be stated as the association of low-level video features to social relations by machine learning methodology. Specifically, support vector regression is leveraged to estimate local grouping cues from low-level visual and auditory features. These locally defined grouping cues are then synthesized to derive the affinity between actors. Finally, the social network defined by the resulting affinity is analyzed to ?nd communities of actors and identify the leader of each community. Furthermore, as an extension to the basic framework, we discuss the relationship between visual concepts and social relations. We demonstrate the performance of these approaches on a set of videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multimodal Approach for Multiple-Relation Extraction in Videos

Article 15 September 2021

Multi-stream Fusion Model for Social Relation Recognition from Videos

InSocialNet: Interactive visual analytics for role—event videos

Article Open access 17 January 2020

Notes

1.
The movies in or dataset are (1) G.I. Joe: The Rise of Cobra (2009); (2) Harry Potter and the Half-Blood Prince (2009); (3) Public Enemies (2009); (4) Troy (2004); (5) Braveheart (1995); (6) Year One (2009); (7) Coraline (2009); (8) True Lies (1994); (9) The Chronicles of Narnia: The Lion, the Witch and the Wardrobe (2005); and (10) The Lord of the Rings: The Return of the King (2003) .
2.
In movie (10), Gollum has a good personality except for when he is close to the ring. The ring changes the good behavior of the actors to bad except for Frodo.
3.
Ground truth leaders are: (1) Duke and McCullen; (2) Harry and Snape; (3) Dillinger and Purvis; (4) Achilles and Hector; (5) Wallace and Longshanks; (6) Zed and King; (7) Coraline and Other Mother; (8) Harry and Salim; (9) Aslan and Witch; and (10) Frodo and Witch-king.

References

Al-Hames, M., Lenz, C., Reiter, S., Schenk, J., Wallhoff, F., Rigoll, G.: Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden markov model. In: International Conference on Image Processing (2007)
Google Scholar
Ali, S., Basharat, A., Shah. M.: Chaotic invariants for human action recognition. In: IEEE International Conference on Computer Vision (2007)
Google Scholar
Alon, J., Athitsos, V., Yuan, Q., Sclaroff, S.: A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1685–1699 (2009)
Article Google Scholar
Arandjelović, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: ACM International Conference on Image and Video Retrieval (2005)
Google Scholar
Chen, J., Zaiane, O., Goebel, R.: Detecting communities in social networks using max-min modularity. In: SIAM Conference on Data Mining (2009)
Google Scholar
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: alignment and parsing of video and text transcription. In: European Conference on Computer Vision (2008)
Google Scholar
Ding, L., Fan, Q., Hsiao, J., Pankanti, S.: Graph based event detection from realistic videos using weak feature correspondence. In: International Conference on Acoustics, Speech, and Signal Processing (2010)
Google Scholar
Ding, L., Yilmaz, A.: Learning relations among movie characters: a social network perspective. In: European Conference on Computer Vision (2010)
Google Scholar
Ding, L., Yilmaz, A.: Inferring social relations from visual concepts. In: International Conference on Computer Vision (2011)
Google Scholar
Dufrenois, F., Colliez, J., Hamad, D.: Crisp weighted support vector regression for robust single model estimation: application to object tracking in image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Eagle, N., Pentland, A.: Eigenbehaviors: identifying structure in routine. Behav. Ecol. Sociobiol. 63(7), 1057–1066 (2009)
Article Google Scholar
Eagle, N., Pentland, A., Lazer, D.: Inferring social network structure using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)
Article Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision (2003)
Google Scholar
Fan, Y., Shelton, C.R.: Learning continuous-time social network dynamics. In: Conference on Uncertainty in Artificial Intelligence (2009)
Google Scholar
Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: a first-person perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Freeman, L.: Centrality in social networks: conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)
Google Scholar
Ge, W., Collins, R., Ruback, B.: Automatically detecting the small group structure of a crowd. In: IEEE Workshop on Applications of Computer Vision (2009)
Google Scholar
Holden, C.: Giving girls a chance: patterns of talk in co-operative group work. Gend. Educ. 5(2), 179–189 (1993)
Article Google Scholar
Jiang, H., Fels, S., Little, H.: A linear programming approach for multiple object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Kusakunniran, W., Wu, Q., Zhang, J., Li, H.: Support vector regression for multi-view gait recognition based on local motion feature selection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Kyriazis, N., Argyros., A.: Physically plausible 3d scene tracking: the single actor hypothesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: IEEE International Conference on Computer Vision (2003)
Google Scholar
Lin, J., Wang, W.: Weakly-supervised violence detection in movies with audio and video based co-training. In: Pacific-Rim Conference on Multimedia (2009)
Google Scholar
Lu, Z., Carreira-Perpinan, M.A.: Constrained spectral clustering through affinity propagation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conferences on Artificial Intelligence (1981)
Google Scholar
Myhill, D.: Bad boys and good girls? patterns of interaction and response in whole class teaching. Br. Educ. Res. J. 28(3), 339–352 (2002)
Article Google Scholar
Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Article Google Scholar
Pei, M., Dong, Z., Zhao, M.: Event recognition based on social roles in continuous video. In: IEEE International Conference on Multimedia and Expo (2013)
Google Scholar
Qiu, J., Lin, Z., Tang, C., Qiao, S.: Discovering organizational structure in dynamic social network. In: IEEE International Conference on Data Mining (2009)
Google Scholar
Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discovery in human events. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Rasheed, Z., Shah, M.: Movie genre classification by exploiting audio-visual features of previews. In: International Conference on Pattern Recognition (2002)
Google Scholar
Ruhnau, B.: Eigenvector-centrality? a node-centrality. Soc. Netw. 22(4), 357–365 (2000)
Article Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (1994)
Google Scholar
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Song, Y., Morency, L.-P., Davis, R.: Action recognition by hierarchical sequence summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Sugiyama, M.: Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. J. Mach. Learn. Res. 8, 1027–1061 (2007)
MATH Google Scholar
Wang, G., Gallagher, A., Luo, J., Forsyth, D.: Seeing people in social context: recognizing people and social relationships. In: European Conference on Computer Vision (2010)
Google Scholar
Wasserman, S., Faust, K., Iacobucci, D.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)
Google Scholar
Weng, C.-Y., Chu, W.-T., Wu, J.-L.: Rolenet: Movie analysis from the perspective of social networks. IEEE Trans. Multimedia 11(2), 256–271 (2009)
Article Google Scholar
Yanagawa, A., Chang, S.-F., Kennedy, L., Hsu, W.: Columbia university’s baseline detectors for 374 lscom semantic visual concepts. Technical report, Columbia University (2007)
Google Scholar
Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: A bayesian approach toward finding communities and their evolutions in dynamic social networks. In: SIAM Conference on Data Mining (2009)
Google Scholar
Yilmaz, A., Shah, M.: Recognizing human actions in videos acquired by uncalibrated moving cameras. In: International Conference on Computer Visioniccv (2005)
Google Scholar
Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Comput. Vis. Image Underst. 109(3), 335–351 (2008)
Article Google Scholar
Yu, T., Lim, S.-N., Patwardhan, K., Krahnstoever, N.: Monitoring, recognizing and discovering social networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimedia 8(4), 686–697 (2006)
Article Google Scholar
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered hmms. IEEE Trans. Multimedia 8(3), 509–520 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Ohio State University, Boston, MA, 02110, USA
Lei Ding
The Ohio State University, Columbus, OH, 43210, USA
Alper Yilmaz

Authors

Lei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Alper Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Ding .

Editor information

Editors and Affiliations

Dept. of ECE, College of Engineering, Northeastern University, Boston, Massachusetts, USA
Yun Fu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ding, L., Yilmaz, A. (2014). Learning Social Relations from Videos: Features, Models, and Analytics. In: Fu, Y. (eds) Human-Centered Social Media Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-05491-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-05491-9_2
Published: 25 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05490-2
Online ISBN: 978-3-319-05491-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Social Relations from Videos: Features, Models, and Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multimodal Approach for Multiple-Relation Extraction in Videos

Multi-stream Fusion Model for Social Relation Recognition from Videos

InSocialNet: Interactive visual analytics for role—event videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Social Relations from Videos: Features, Models, and Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multimodal Approach for Multiple-Relation Extraction in Videos

Multi-stream Fusion Model for Social Relation Recognition from Videos

InSocialNet: Interactive visual analytics for role—event videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation