iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s11042-024-18238-4
Contrastive disentanglement for self-supervised motion style transfer | Multimedia Tools and Applications Skip to main content
Log in

Contrastive disentanglement for self-supervised motion style transfer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Motion style transfer, which aims to transfer the style from a source motion to the target while keeping its content, has recently gained considerable attention. Some existing works have shown promising results but required labeled data for supervised training, limiting their applicability. In this paper, we present a novel self-supervised learning method for motion style transfer. Specifically, we cast the problem into a contrastive learning framework, which disentangles the human motion representation into a content code and a style code, and the result can be generated by compositing the style code of source motion and the content code of target motion. To encourage better code disentanglement and composition, we investigate InfoNCE loss and Triplet loss in a self-supervised manner. This framework aims at generating reasonable motions while guaranteeing the disentanglement of the latent codes. Comprehensive experiments have been conducted over the benchmark datasets and demonstrated our superior performance over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The dataset and sourcecode of Refs.[59,4,44] can be found as follows: Ref. [59]: http://mocap.cs.cmu.edu/ Ref. [4]: https://github.com/DeepMotionEditing/deep-motion-editing Ref. [44]: https://github.com/tianxintao/Online-Motion-Style-Transfer

References

  1. Tenenbaum JB, Freeman WT (1996) Separating style and content. In: Mozer M, Jordan MI, Petsche T (eds) NIPS, pp 662–668. MIT Press, ???

  2. Holden D, Habibie I, Kusajima I, Komura T (2017) Fast neural style transfer for motion data. IEEE Comput Graph Appl 37(4):42–49

    Article  Google Scholar 

  3. Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia, pp 18–1184. ACM, ???

  4. Aberman K, Weng Y, Lischinski D, Cohen-Or D, Chen B (2020) Unpaired motion style transfer from video to animation. ACM Trans Graph 39(4):64

    Article  Google Scholar 

  5. Pan J, Sun H, Kong Y (2021) Fast human motion transfer based on a meta network. Inf Sci 547:367–383

    Article  Google Scholar 

  6. Wang W, Xu J, Zhang L, Wang Y, Liu J (2020) Consistent video style transfer via compound regularization. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp 12233–12240. AAAI Press, ???

  7. Park SS, Jang D-K, Lee S-H (2021) Diverse motion stylization for multiple style domains via spatial-temporal graph-based generative model. Proceedings of the ACM on computer graphics and interactive techniques 4:1–17

  8. Jang D-K, Park SS, Lee S-H (2022) Motion puzzle: Arbitrary motion style transfer by body part. ACM Trans Graph (TOG) 41:1–16

    Google Scholar 

  9. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence

  10. Kotovenko D, Sanakoyeu A, Lang S, Ommer B (2019) Content and style disentanglement for artistic style transfer. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp 4421–4430. IEEE, ???

  11. Li Y, Li Y, Lu J, Shechtman E, Lee YJ, Singh KK (2022) Contrastive learning for diverse disentangled foreground generation. In: Computer vision - ECCV. Lecture notes in computer science, vol 13676, pp 334–351. Springer, ???

  12. Bengio Y, Courville AC, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  13. Kovar L, Gleicher M, Pighin FH (2002) Motion graphs. ACM Trans Graph 21(3):473–482

    Article  Google Scholar 

  14. Min J, Chai J (2012) Motion graphs++: a compact generative model for semantic motion analysis and synthesis. ACM Trans Graph 31(6):153–115312

    Article  Google Scholar 

  15. Safonova A, Hodgins JK (2007) Construction and optimal search of interpolated motion graphs. ACM Trans Graph 26(3):106

    Article  Google Scholar 

  16. Shapiro A, Cao Y, Faloutsos P (2006) Style components. In: Gutwin C, Mann S (eds) Graphics Interface, pp 33–39

  17. Grochow K, Martin SL, Hertzmann A, Popovic Z (2004) Style-based inverse kinematics. ACM Trans Graph 23(3):522–531

    Article  Google Scholar 

  18. Wang JM, Fleet DJ, Hertzmann A (2008) Gaussian process dynamical models for human motion. IEEE Trans Pattern Anal Mach Intell 30(2):283–298

    Article  Google Scholar 

  19. Ukita N, Kanade T (2012) Gaussian process motion graph models for smooth transitions among multiple actions. Comput Vis Image Underst 116(4):500–509

    Article  Google Scholar 

  20. Zhou L, Shang L, Shum HPH, Leung H (2014) Human motion variation synthesis with multivariate gaussian processes. Comput Animat Virtual Worlds 25(3–4):303–311

    Google Scholar 

  21. Lau M, Bar-Joseph Z, Kuffner J (2009) Modeling spatial and temporal variation in motion data. ACM Trans Graph 28(5):171

    Article  Google Scholar 

  22. Young JE, Igarashi T, Sharlin E (2008) Puppet master: Designing reactive character behavior by demonstration. In: Gross MH, James DL (eds) Eurographics/ACM SIGGRAPH symposium on computer animation, SCA, pp 183–191. Eurographics Association, ???

  23. Levine S, Wang JM, Haraux A, Popovic Z, Koltun V (2012) Continuous character control with low-dimensional embeddings. ACM Trans Graph 31(4):28–12810

    Article  Google Scholar 

  24. Ma, W., Xia, S., Hodgins, J.K., Yang, X., Li, C., Wang, Z.: Modeling style and variation in human motion. In: Popovic, Z., Otaduy, M.A. (eds.) Eurographics/ACM SIGGRAPH Symposium on Computer Animation, pp. 21–30 (2010)

  25. Zheng Q, Wu W, Pan H, Mitra NJ, Cohen-Or D, Huang H (2021) Inferring object properties from human interaction and transferring them to new motions. Comput. Vis. Media 7(3):375–392

    Article  Google Scholar 

  26. Zhou, Y., Li, Z., Xiao, S., He, C., Huang, Z., Li, H.: Auto-conditioned recurrent networks for extended complex human motion synthesis. In: International Conference on Learning Representations, ICLR. OpenReview.net, ??? (2018)

  27. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4674–4683. IEEE Computer Society, ??? (2017)

  28. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-rnn: Deep learning on spatio-temporal graphs. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 5308–5317 (2016)

  29. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

  30. Sadoughi, N., Busso, C.: Novel realizations of speech-driven head movements with generative adversarial networks. In: ICASSP, pp. 6169–6173. IEEE, ??? (2018)

  31. Starke S, Zhao Y, Komura T, Zaman KA (2020) Local motion phases for learning multi-contact character movements. ACM Trans. Graph. 39(4):54

    Article  Google Scholar 

  32. Wang Z, Chai J, Xia S (2021) Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE Trans. Vis. Comput. Graph. 27(1):14–28

    Article  Google Scholar 

  33. Rose C, Cohen MF, Bodenheimer B (1998) Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications 18(5):32–40

    Article  Google Scholar 

  34. Hoyet L, Ryall K, Zibrek K, Park H, Lee J, Hodgins JK, O’Sullivan C (2013) Evaluating the distinctiveness and attractiveness of human motions on realistic virtual bodies. ACM Trans. Graph. 32(6):204–120411

    Article  Google Scholar 

  35. Kiiski, H., Hoyet, L., Cullen, B., O’Sullivan, C., Newell, F.N.: Perception and prediction of social intentions from human body motion. In: ACM Symposium on Applied Perception, p. 134. ACM, ??? (2013)

  36. Smith HJ, Neff M (2017) Understanding the impact of animated gesture performance on personality perceptions. ACM Trans. Graph. 36(4):49–14912

    Article  Google Scholar 

  37. Torresani, L., Hackney, P., Bregler, C.: Learning motion style synthesis from perceptual observations. In: Schölkopf, B., Platt, J.C., Hofmann, T. (eds.) Neural Information Processing Systems, pp 1393–1400 (2006)

  38. Kim, H.J., Lee, S.: Perceptual characteristics by motion style category. In: Cignoni, P., Miguel, E. (eds.) Annual Conference of the European Association for Computer Graphics, pp 1–4 (2019)

  39. Hsu E, Pulli K, Popovic J (2005) Style translation for human motion. ACM Trans. Graph. 24(3):1082–1089

    Article  Google Scholar 

  40. Ikemoto L, Arikan O, Forsyth DA (2009) Generalizing motion edits with gaussian processes. ACM Trans. Graph. 28(1):1–1112

    Article  Google Scholar 

  41. Jing Y, Yang Y, Feng Z, Ye J, Yu Y, Song M (2020) Neural style transfer: A review. IEEE Trans. Vis. Comput. Graph. 26(11):3365–3385

    Article  Google Scholar 

  42. Holden D, Saito J, Komura T (2016) A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35(4):138–113811

    Article  Google Scholar 

  43. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2414–2423. IEEE Computer Society, ??? (2016)

  44. Smith HJ, Cao C, Neff M, Wang Y (2019) Efficient neural networks for real-time motion style transfer. Proc. ACM Comput. Graph. Interact. Tech. 2(2):13–11317

    Article  Google Scholar 

  45. Xu, J., Xu, H., Ni, B., Yang, X., Wang, X., Darrell, T.: Hierarchical style-based networks for motion synthesis. In: ECCV. Lecture Notes in Computer Science, vol. 12356, pp. 178–194. Springer, ??? (2020)

  46. Tao, T., Zhan, X., Chen, Z., van de Panne, M.: Style-erd: Responsive and coherent online motion style transfer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6583–6593 (2022)

  47. Wen, Y.-H., Yang, Z., Fu, H., Gao, L., Sun, Y., Liu, Y.-J.: Autoregressive stylized motion synthesis with generative flow. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13607–13607 (2021)

  48. Locatello, F., Bauer, S., Lucic, M., Rätsch, G., Gelly, S., Schölkopf, B., Bachem, O.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: Chaudhuri, K., Salakhutdinov, R. (eds.) ICML. Proceedings of Machine Learning Research, vol. 97, pp. 4114–4124. PMLR, ??? (2019)

  49. Xue Y, Guo Y, Zhang H, Xu T, Zhang S, Huang X (2022) Deep image synthesis from intuitive user input: A review and perspectives. Comput. Vis. Media 8(1):3–31

    Article  Google Scholar 

  50. Liu, Y., Wei, F., Shao, J., Sheng, L., Yan, J., Wang, X.: Exploring disentangled feature representation beyond face identification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2080–2089. IEEE Computer Society, ??? (2018)

  51. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, ICLR. OpenReview.net, ??? (2017)

  52. Kim, H., Mnih, A.: Disentangling by factorising. In: Dy, J.G., Krause, A. (eds.) ICML, vol. 80, pp. 2654–2663 (2018)

  53. Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. CoRR abs/1711.00848 (2017)

  54. Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing motion and content for natural video sequence prediction. In: 5th International Conference on Learning Representations, ICLR. OpenReview.net, ??? (2017)

  55. Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 4414–4423 (2017)

  56. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018)

  57. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp 9726–9735. IEEE, ??? (2020)

  58. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: IEEE International Conference on Computer Vision, ICCV, pp. 2794–2802. IEEE Computer Society, ??? (2015)

  59. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, pp. 539–546 (2005)

  60. Zhang, Y., Tang, F., Dong, W., Huang, H., Ma, C., Lee, T., Xu, C.: Domain enhanced arbitrary image style transfer via contrastive learning. In: Nandigjav, M., Mitra, N.J., Hertzmann, A. (eds.) SIGGRAPH ’22, pp. 12–1128. ACM, ??? (2022)

  61. Hénaff, O.J.: Data-efficient image recognition with contrastive predictive coding. In: ICML, vol. 119, pp. 4182–4192. PMLR, ??? (2020)

  62. CMU : Cmu graphics lab motion capture database. http://mocap.cs.cmu.edu/ (2019)

  63. Xia S, Wang C, Chai J, Hodgins JK (2015) Realtime style transfer for unlabeled heterogeneous human motion. ACM Trans. Graph. 34(4):119–111910

    Article  Google Scholar 

  64. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)

  65. Binkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD gans. In: 6th International Conference on Learning Representations, ICLR (2018)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Zizhao Wu conceived the presented idea. Zizhao Wu developted the theory and algorithm. Siyuan Mao and Cheng Zhang carried out the experiments. Zizhao Wu wrote the manuscript with the support from Yigang Wang and Ming Zeng.

Corresponding author

Correspondence to Zizhao Wu.

Ethics declarations

Conflicts of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Mao, S., Zhang, C. et al. Contrastive disentanglement for self-supervised motion style transfer. Multimed Tools Appl 83, 70523–70544 (2024). https://doi.org/10.1007/s11042-024-18238-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18238-4

Keywords

Navigation