Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

Lee, Dongheui; Soloperto, Raffaele; Saveriano, Matteo

doi:10.1007/s10514-017-9645-x

Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

Published: 15 June 2017

Volume 42, pages 125–145, (2018)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

727 Accesses
8 Citations
Explore all metrics

Abstract

In this paper we propose a new bidirectional invariant motion descriptor of a rigid body. The proposed invariant representation is not affected by rotations, translations, time, linear and angular scaling. Invariant properties of the proposed representation enable to recognize gestures in realistic scenarios with unexpected variations (e.g., changes in user’s initial pose, execution time or an observation point), while Cartesian trajectories are sensitive to these changes. The proposed invariant representation also allows reconstruction of the original motion trajectory, which is useful for human-robot interaction applications where a robot recognizes human actions and executes robot’s proper behaviors using same descriptors. By removing the dependency on absolute pose and scaling factors of the Cartesian trajectories the proposed descriptor achieves flexibility to generate different motion instances from the same invariant representation. In order to illustrate the effectiveness of our proposed descriptor in motion recognition and generation, it is tested on three datasets and experiments on a NAO humanoid robot and a KUKA LWR IV$+$ manipulator and compared with other existing invariant representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Flexible Trajectory Description for Effective Rigid Body Motion Reproduction and Recognition

Article 20 April 2023

Advances in description of 3D human motion

Article 09 June 2018

The Measure of Motion Similarity for Robotics Application

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

This work is based on our preliminary results presented in Soloperto et al. (2015). Our previous work has been extended in several ways: (i) we provide more theoretical insights including a compact closed form of DHB invariants; (ii) we theoretically compare DHB with existing invariant representations, in order to underline differences and similarities; (iii) we compare the recognition performance of DHB invariants with several state-of-the-art approaches; (iv) we report several experiments to show that DHB invariants can be adopted as flexible motion descriptors to execute complex tasks.
In the discrete time case, the integral $\int _{t=0}^{t_f} {|\bullet |}$ in (42) is replaced by $\sum _{t=0}^{t_f} {|\bullet |}$.
As shown in Sects. 8.1.1 and 8.1.3, the proposed DHB descriptor works reasonably well with kinect sensors, which does not ensure tracking of the perfectly same point of a body part.
For simplicity, the acronym of the author name (DS) is used to refer the representation in De Schutter (2010).
Time dependencies are omitted to simplify the notation.
The result is obtained from Algorithm 1 by neglecting the summation and subtraction operations.
The reconstruction procedure in Sect. 4 can also be applied to EFS descriptor. Both reconstruction methods (Wu and Li (2010) and Sect. 4) reproduce similar reconstruction errors.
A smaller sampling time generates more twist samples and more invariant values. Hence, more products have to be computed in (39) to reconstruct the motion, which increases errors due to the finite precision.
Available on-line: creativedistraction.com/downloads/ gesture.zip.
www.xsens.com/products/xsens-mvn.
Available on-line: research.microsoft.com/en-us/um/ people/zliu/actionrecorsrc.
There exists 3 invariants to represent translational motion of the MSR Action3D dataset.
www.aldebaran.com/en/cool-robots/nao.
For example, for full body motions of a human/humanoid, their heights are the reference. For hand motion, the length of its arm/manipulation are useful.

References

Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Robot programming by demonstration. In O. Khatib & B. Siciliano (Eds.), Springer handbook of robotics (pp. 1371–1394). Berlin: Springer.
Chapter Google Scholar
Bishop, C. M., et al. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Black, M., & Jepson, D. (1998). A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. European conference on computer vision, Lecture notes in computer science (Vol. 1406, pp. 909–924). Berlin: Springer.
Burger, B., Ferrané, I., Lerasle, F., & Infantes, G. (2011). Two-handed gesture recognition and fusion with speech to command a robot. Autonomous Robots, 32(2), 129–147.
Article Google Scholar
Chartrand, R. (2011). Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 1–12.
Article MathSciNet MATH Google Scholar
De Schutter, J. (2010). Invariant description of rigid body motion trajectories. Journal of Mechanisms and Robotics, 2(1), 1–9.
Article Google Scholar
De Schutter, J., Di Lello, E., De Schutter, J., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In International conference on robotics and automation (pp. 2071–2078).
Denavit, J., & Hartenberg, R. S. (1965). A kinematic notation for lower-pair mechanisms based on matrices. Transaction of the ASME Journal of Applied Mechanics, 22(2), 215–221.
MathSciNet MATH Google Scholar
Dieleman, S., De Fauw, J., & Kavukcuoglu, K. (2016). Exploiting cyclic symmetry in convolutional neural networks. International Conference on Machine Learning.
Hu, K., & Lee, D. (2012). Biped locomotion primitive learning, control and prediction from human data. In 10th International IFAC symposium on robot control (SYROCO).
Hu, K., Ott, C., & Lee, D. (2014). Online human walking imitation in task and joint space based on quadratic programming. In IEEE international conference on robotics and automation (pp. 3458–3464). IEEE.
Isard M, Blake A (1996) Contour tracking by stochastic propagation of conditional density. In European conference on computer vision (pp. 343–356).
Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from rgb-d videos. International Journal of Robotic Research, 32, 951–970.
Article Google Scholar
Kühnel, W. (2006). Differential geometry: Curves-surfaces-manifolds. Providence: American Mathematical Society.
MATH Google Scholar
LeCun Y (2012) Learning invariant feature hierarchies. In European conference on computer vision (pp. 496–505).
Lee, D., & Nakamura, Y. (2010). Mimesis model from partial observations for a humanoid robot. International Journal of Robotics Research, 29(1), 60–80.
Article Google Scholar
Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human–robot interaction. In IEEE international conference on robotics and automation (pp. 1535–1542).
Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In Conference on computer vision and pattern recognition workshops (pp. 9–14).
Magnanimo, V., Saveriano, M., Rossi, S., & Lee, D. (2014). A Bayesian approach for task recognition and future human activity prediction. In International symposium on robot and human interactive communication (pp. 726–731).
Murray, R. M., Sastry, S. S., & Zexiang, L. (1994). A mathematical introduction to robotic manipulation (1st ed.). Boca Raton: CRC Press.
MATH Google Scholar
Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639.
Article Google Scholar
Piao, Y., Hayakawa, K., & Sato, J. (2002). Space-time invariants and video motion extraction from arbitrary viewpoints. In International conference on pattern recognition (pp. 56–59).
Piao, Y., Hayakawa, K., & Sato, J. (2004). Space-time invariants for recognizing 3d motions from arbitrary viewpoints under perspective projection. In International conference on image and graphics (pp. 200–203).
Psarrou, A., Gong, S., & Walter, M. (2002). Recognition of human gestures and behaviour based on motion trajectories. Image and Vision Computing, 20(5–6), 349–358.
Article Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).
Rao, C., Yilmaz, A., & Shah, M. (2002). View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2), 203–226.
Article MATH Google Scholar
Rao, C., Shah, M., & Syeda-Mahmood, T. (2003). Action recognition based on view invariant spatio-temporal analysis. In ACM multimedia.
Rauch, H. E., Striebel, C. T., & Tung, F. (1965). Maximum likelihood estimates of linear dynamic systems. Journal of the American Institute of Aeronautics and Astronautics, 3(8), 1445–1450.
Article MathSciNet Google Scholar
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.
Article MATH Google Scholar
Sanguansat, P. (2012). Multiple multidimensional sequence alignment using generalized dynamic time warping. WSEAS Transactions on Mathematics, 11(8), 668–678.
Google Scholar
Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In International symposium on robot and human interactive communication (pp. 650–655).
Saveriano, M., An, S., & Lee, D. (2015). Incremental kinesthetic teaching of end-effector and null-space motion primitives. In International conference on robotics and automation (pp. 3570–3575).
Schreiber, G., Stemmer, A., & Bischoff, R. (2010). The fast research interface for the kuka lightweight robot. In ICRA workshop on innovative robot control architectures for demanding (Research) applications (pp. 15–21).
Siciliano, B., Sciavicco, L., Villani, L., & Oriolo, G. (2009). Robotics-modelling, planning and control. Berlin: Springer.
Google Scholar
Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In International conference on robotics and automation (pp. 6146–6152).
Vochten, M., De Laet, T., & De Schutter, J. (2015). Comparison of rigid body motion trajectory descriptors for motion representation and recognition. In International conference on robotics and automation (pp. 3010–3017).
Waldherr, S., Romero, R., & Thrun, S. (2000). A gesture based interface for human–robot interaction. Autonomous Robots, 9(2), 151–173.
Article Google Scholar
Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In Conference on computer vision and pattern recognition (pp. 1290–1297).
Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1119–1122).
Weiss, I. (1993). Geometric invariants and object recognition. International Journal of Computer Vision, 10(3), 207–231.
Article Google Scholar
Wu, S., & Li, Y. F. (2008). On signature invariants for effective motion trajectory recognition. International Journal of Robotic Research, 27(8), 895–917.
Article Google Scholar
Wu, S., & Li, Y. F. (2010). Motion trajectory reproduction from generalized signature description. Pattern Recognition, 43(1), 204–221.
Article MATH Google Scholar
Wu, Y., & Huang, T. S. (2001). Vision-based gesture recognition: A review. In Gesture-based communication in human–computer interaction, lecture notes in computer science (pp. 103–115). Berlin: Springer.
Xia, L., Chen, C. C., Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In Conference on computer vision and pattern recognition workshops (pp 20–27).
Yan, P., Khan, S. M., & Shah, M. (2008). Learning 4d action feature models for arbitrary view action recognition. In International conference on computer vision and pattern recognition (pp. 1–7).
Zisserman, A., & Maybank, S. (1994). A case against epipolar geometry. In Applications of invariance in computer vision, lecture notes in computer science (Vol. 825, pp. 69–88). Berlin: Springer.

Download references

Acknowledgements

This work has been supported by the Technical University of Munich, International Graduate School of Science and Engineering.

Author information

Authors and Affiliations

Human-Centered Assistive Robotics, Technical University of Munich, Munich, Germany
Dongheui Lee & Matteo Saveriano
Department of Electrical, Electronic, and Information Engineering, Università di Bologna, Bologna, Italy
Raffaele Soloperto

Authors

Dongheui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Soloperto
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Saveriano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongheui Lee.

Appendices

Appendix A: Rigid Body Motion Representation

To represent rigid body motions it is convenient to attach an orthogonal frame to the rigid body (body frame) and to describe the pose (position and orientation) of the body frame wrt a fixed frame (world frame). In each time instant the position of the rigid body is represented by the vector $\mathbf {p}$ connecting the origin of the body frame with the origin of the world frame. The axes of the body frame can be projected along the axes of the world frame by the means of the direction cosines. Hence, the orientation of the rigid body is described by collecting the direction cosines into a $3 \times 3$ rotation matrix $\mathbf {R}$. It is possible to show that a minimal representation of the orientation consists of 3 values (Siciliano et al. 2009). In this work, we use the rotation vector to represent the orientation.

The rotation vector $\mathbf {r} = \theta \hat{\mathbf{r}}$ is computed from $\mathbf {R}$ as:

$$\begin{aligned}&\theta = \text {arccos} \left( \frac{trace\left( \mathbf {R}\right) -1}{2}\right) ,\nonumber \\&\hat{\mathbf{r}} = \frac{1}{2\sin {\theta }} \begin{bmatrix} \mathbf{{R}}\left( 3,2\right) - \mathbf{{R}}\left( 2,3\right) \\ \mathbf{{R}}\left( 1,3\right) - \mathbf{{R}}\left( 3,1\right) \\ \mathbf{{R}}\left( 2,1\right) - \mathbf{{R}}\left( 1,2\right) \\\end{bmatrix} \end{aligned}$$

The rotation matrix $\mathbf {R}$ is computed from $\mathbf {r}$ as:

$$\begin{aligned} \mathbf {R} = \exp (\mathbf {r}) = \mathbf {I} + \frac{\mathbf {S}(\mathbf {r})}{\theta } \sin (\theta ) + \frac{\mathbf {S}^{2}(\mathbf {r})}{\theta ^{2}}(1 - \cos (\theta )) ~, \end{aligned}$$

where the skew-symmetric matrix $\mathbf {S}(\mathbf {r})$ is given by:

$$\begin{aligned} \mathbf {S}(\mathbf {r}) = \begin{bmatrix} 0&-r_z&r_y \\ r_z&0&-r_x \\ -r_y&r_x&0 \\ \end{bmatrix} ~. \end{aligned}$$

Appendix B: Proofs of the relationships in Sect. 7.1

1. $m_{\omega } = d_{\omega }^1$ derives from (20) and $d_{\omega }^1$ in Table 2.

2. $\theta _{\omega }^{1} \approx d_{\omega }^{2} \Delta t$. For $\Delta t \longrightarrow 0$, we can neglect the arc tangent in (28). Hence, we can rewrite $\theta _{\omega }^{1}$ in (23) as:

$$\begin{aligned} \begin{aligned} \theta _{\omega }^{1}&\approx \frac{\Vert {\varvec{\omega }}_t \times {\varvec{\omega }}_{t+1}\Vert }{{\varvec{\omega }}_{t} \cdot {\varvec{\omega }}_{t+1}} = \frac{\Vert {\varvec{\omega }}_t \times ({\varvec{\omega }}_{t} + \Delta {\varvec{\omega }}_{t})\Vert }{{\varvec{\omega }}_{t} \cdot ({\varvec{\omega }}_{t} + \Delta {\varvec{\omega }}_{t})}\\&\approx \frac{\Vert {\varvec{\omega }}_t \times \Delta {\varvec{\omega }}_{t}\Vert }{\Vert {\varvec{\omega }}_{t} \Vert ^{2}}\frac{\Delta t}{\Delta t} \approx \frac{\Vert {\varvec{\omega }}_t \times \dot{{\varvec{\omega }}}_{t}\Vert }{\Vert {\varvec{\omega }}_{t} \Vert ^{2}}\Delta t = d_{\omega }^{2} \Delta t~. \end{aligned} \end{aligned}$$

(43)

3. $\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t$. Recall that $\mathbf {a} \times \mathbf {b} = -\mathbf {b} \times \mathbf {a}$ and that $\mathbf {a}\cdot (\mathbf {b} \times \mathbf {c}) = \mathbf {c}\cdot (\mathbf {a} \times \mathbf {b})$. $\theta _{\omega }^{2}$ in (23) can be re-written as:

$$\begin{aligned} \begin{aligned} \theta _{\omega }^{2}&= \arctan {\left( \frac{\Vert {\varvec{\omega }}_{t+1}\Vert {\varvec{\omega }}_{t+2} \cdot \left( {\varvec{\omega }}_{t+1}\times {\varvec{\omega }}_{t} \right) }{\left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) }\right) } \\&= \arctan {\left( \frac{\Vert {\varvec{\omega }}_{t+1}\Vert \left( {\varvec{\omega }}_{t}\times {\varvec{\omega }}_{t+1} \right) \cdot {\varvec{\omega }}_{t+2}}{\left( {\varvec{\omega }}_{t} \times {\varvec{\omega }}_{t+1}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) }\right) } \end{aligned} \end{aligned}$$

(44)

The denominator of (44) can be re-written as:

$$\begin{aligned} \begin{aligned}&\left( {\varvec{\omega }}_{t} \times {\varvec{\omega }}_{t+1}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) \\&\approx \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \cdot \left[ \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \times \left( {\varvec{\omega }}_{t} \times 2\Delta {\varvec{\omega }}_{t}\right) \right] \\&= \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \cdot \left[ 2({\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t})-({\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t})\right] \frac{\Delta {t}^{2}}{\Delta {t}^{2}} \\&\approx \left( {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t}\right) \cdot \left( {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t}\right) \Delta {t}^{2} = \Vert {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t} \Vert ^{2}\Delta {t}^{2} \end{aligned} \end{aligned}$$

(45)

Considering that $\ddot{{{\varvec{a}}}}_t \approx ({{\varvec{a}}}_{t+2} + {{\varvec{a}}}_t)/\Delta {t}^2$, the numerator of (44) can be re-written as:

$$\begin{aligned} \begin{aligned}&\Vert {\varvec{\omega }}_{t+1}\Vert \left( {\varvec{\omega }}_{t}\times {\varvec{\omega }}_{t+1} \right) \cdot {\varvec{\omega }}_{t+2} \approx \Vert {\varvec{\omega }}_t \Vert \left( {\varvec{\omega }}_{t}\times \Delta {\varvec{\omega }}_{t} \right) \cdot \\&(\ddot{{\varvec{\omega }}}_{t} \Delta t^{2} - {\varvec{\omega }}_t) \approx \Vert {\varvec{\omega }}_t \Vert \left( {\varvec{\omega }}_{t}\times \dot{{\varvec{\omega }}}_{t} \right) \cdot \ddot{{\varvec{\omega }}}_{t} \Delta t^{3} \end{aligned} \end{aligned}$$

(46)

Finally, combining (45), (46) and (44), and neglecting the arc tangent, we obtain that $\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t$ for $\Delta t \longrightarrow 0$.

Appenix C: Proofs of the relationships in Sect. 7.2

1. $m_{v} = e_{v}^1$ derives from (19) and $e_{v}^1$ in Table 2. $m_{\omega } = e_{\omega }^1$ derives from (20) and $e_{\omega }^1$ in Table 2.

2. $\theta _{\omega }^{1} \approx e_{\omega }^2 \Delta t$ derives from (43) recalling that $e_{\omega }^2 = d_{\omega }^2$. $\theta _{v}^{1} \approx e_{v}^2 \Delta t$ can be proven by following similar steps as in (43) and considering $e_{v}^2$ in Table 2.

3. $\theta _{v}^{2} \approx e_{v}^3 \Delta t$ and $\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t$. Following similar steps as in (45) and (46), and recalling that $(\mathbf {a} \times \mathbf {b})\times (\mathbf {a} \times \mathbf {c}) = \left[ \mathbf {a}\cdot (\mathbf {b} \times \mathbf {c})\right] \mathbf {a}$, it is possible to prove that $\theta _{v}^{2} \approx e_{v}^3 \Delta t$ and $\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, D., Soloperto, R. & Saveriano, M. Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton Robot 42, 125–145 (2018). https://doi.org/10.1007/s10514-017-9645-x

Download citation

Received: 16 April 2016
Accepted: 01 June 2017
Published: 15 June 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10514-017-9645-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Flexible Trajectory Description for Effective Rigid Body Motion Reproduction and Recognition

Advances in description of 3D human motion

The Measure of Motion Similarity for Robotics Application

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Rigid Body Motion Representation

Appendix B: Proofs of the relationships in Sect. 7.1

Appenix C: Proofs of the relationships in Sect. 7.2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Flexible Trajectory Description for Effective Rigid Body Motion Reproduction and Recognition

Advances in description of 3D human motion

The Measure of Motion Similarity for Robotics Application

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Rigid Body Motion Representation

Appendix B: Proofs of the relationships in Sect. 7.1

Appenix C: Proofs of the relationships in Sect. 7.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation