Abstract
In this paper we propose a new bidirectional invariant motion descriptor of a rigid body. The proposed invariant representation is not affected by rotations, translations, time, linear and angular scaling. Invariant properties of the proposed representation enable to recognize gestures in realistic scenarios with unexpected variations (e.g., changes in user’s initial pose, execution time or an observation point), while Cartesian trajectories are sensitive to these changes. The proposed invariant representation also allows reconstruction of the original motion trajectory, which is useful for human-robot interaction applications where a robot recognizes human actions and executes robot’s proper behaviors using same descriptors. By removing the dependency on absolute pose and scaling factors of the Cartesian trajectories the proposed descriptor achieves flexibility to generate different motion instances from the same invariant representation. In order to illustrate the effectiveness of our proposed descriptor in motion recognition and generation, it is tested on three datasets and experiments on a NAO humanoid robot and a KUKA LWR IV\(+\) manipulator and compared with other existing invariant representations.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This work is based on our preliminary results presented in Soloperto et al. (2015). Our previous work has been extended in several ways: (i) we provide more theoretical insights including a compact closed form of DHB invariants; (ii) we theoretically compare DHB with existing invariant representations, in order to underline differences and similarities; (iii) we compare the recognition performance of DHB invariants with several state-of-the-art approaches; (iv) we report several experiments to show that DHB invariants can be adopted as flexible motion descriptors to execute complex tasks.
In the discrete time case, the integral \(\int _{t=0}^{t_f} {|\bullet |}\) in (42) is replaced by \(\sum _{t=0}^{t_f} {|\bullet |}\).
For simplicity, the acronym of the author name (DS) is used to refer the representation in De Schutter (2010).
Time dependencies are omitted to simplify the notation.
The result is obtained from Algorithm 1 by neglecting the summation and subtraction operations.
A smaller sampling time generates more twist samples and more invariant values. Hence, more products have to be computed in (39) to reconstruct the motion, which increases errors due to the finite precision.
Available on-line: creativedistraction.com/downloads/ gesture.zip.
www.xsens.com/products/xsens-mvn.
Available on-line: research.microsoft.com/en-us/um/ people/zliu/actionrecorsrc.
There exists 3 invariants to represent translational motion of the MSR Action3D dataset.
www.aldebaran.com/en/cool-robots/nao.
For example, for full body motions of a human/humanoid, their heights are the reference. For hand motion, the length of its arm/manipulation are useful.
References
Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Robot programming by demonstration. In O. Khatib & B. Siciliano (Eds.), Springer handbook of robotics (pp. 1371–1394). Berlin: Springer.
Bishop, C. M., et al. (2006). Pattern recognition and machine learning. New York: Springer.
Black, M., & Jepson, D. (1998). A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. European conference on computer vision, Lecture notes in computer science (Vol. 1406, pp. 909–924). Berlin: Springer.
Burger, B., Ferrané, I., Lerasle, F., & Infantes, G. (2011). Two-handed gesture recognition and fusion with speech to command a robot. Autonomous Robots, 32(2), 129–147.
Chartrand, R. (2011). Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 1–12.
De Schutter, J. (2010). Invariant description of rigid body motion trajectories. Journal of Mechanisms and Robotics, 2(1), 1–9.
De Schutter, J., Di Lello, E., De Schutter, J., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In International conference on robotics and automation (pp. 2071–2078).
Denavit, J., & Hartenberg, R. S. (1965). A kinematic notation for lower-pair mechanisms based on matrices. Transaction of the ASME Journal of Applied Mechanics, 22(2), 215–221.
Dieleman, S., De Fauw, J., & Kavukcuoglu, K. (2016). Exploiting cyclic symmetry in convolutional neural networks. International Conference on Machine Learning.
Hu, K., & Lee, D. (2012). Biped locomotion primitive learning, control and prediction from human data. In 10th International IFAC symposium on robot control (SYROCO).
Hu, K., Ott, C., & Lee, D. (2014). Online human walking imitation in task and joint space based on quadratic programming. In IEEE international conference on robotics and automation (pp. 3458–3464). IEEE.
Isard M, Blake A (1996) Contour tracking by stochastic propagation of conditional density. In European conference on computer vision (pp. 343–356).
Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from rgb-d videos. International Journal of Robotic Research, 32, 951–970.
Kühnel, W. (2006). Differential geometry: Curves-surfaces-manifolds. Providence: American Mathematical Society.
LeCun Y (2012) Learning invariant feature hierarchies. In European conference on computer vision (pp. 496–505).
Lee, D., & Nakamura, Y. (2010). Mimesis model from partial observations for a humanoid robot. International Journal of Robotics Research, 29(1), 60–80.
Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human–robot interaction. In IEEE international conference on robotics and automation (pp. 1535–1542).
Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In Conference on computer vision and pattern recognition workshops (pp. 9–14).
Magnanimo, V., Saveriano, M., Rossi, S., & Lee, D. (2014). A Bayesian approach for task recognition and future human activity prediction. In International symposium on robot and human interactive communication (pp. 726–731).
Murray, R. M., Sastry, S. S., & Zexiang, L. (1994). A mathematical introduction to robotic manipulation (1st ed.). Boca Raton: CRC Press.
Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639.
Piao, Y., Hayakawa, K., & Sato, J. (2002). Space-time invariants and video motion extraction from arbitrary viewpoints. In International conference on pattern recognition (pp. 56–59).
Piao, Y., Hayakawa, K., & Sato, J. (2004). Space-time invariants for recognizing 3d motions from arbitrary viewpoints under perspective projection. In International conference on image and graphics (pp. 200–203).
Psarrou, A., Gong, S., & Walter, M. (2002). Recognition of human gestures and behaviour based on motion trajectories. Image and Vision Computing, 20(5–6), 349–358.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).
Rao, C., Yilmaz, A., & Shah, M. (2002). View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2), 203–226.
Rao, C., Shah, M., & Syeda-Mahmood, T. (2003). Action recognition based on view invariant spatio-temporal analysis. In ACM multimedia.
Rauch, H. E., Striebel, C. T., & Tung, F. (1965). Maximum likelihood estimates of linear dynamic systems. Journal of the American Institute of Aeronautics and Astronautics, 3(8), 1445–1450.
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.
Sanguansat, P. (2012). Multiple multidimensional sequence alignment using generalized dynamic time warping. WSEAS Transactions on Mathematics, 11(8), 668–678.
Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In International symposium on robot and human interactive communication (pp. 650–655).
Saveriano, M., An, S., & Lee, D. (2015). Incremental kinesthetic teaching of end-effector and null-space motion primitives. In International conference on robotics and automation (pp. 3570–3575).
Schreiber, G., Stemmer, A., & Bischoff, R. (2010). The fast research interface for the kuka lightweight robot. In ICRA workshop on innovative robot control architectures for demanding (Research) applications (pp. 15–21).
Siciliano, B., Sciavicco, L., Villani, L., & Oriolo, G. (2009). Robotics-modelling, planning and control. Berlin: Springer.
Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In International conference on robotics and automation (pp. 6146–6152).
Vochten, M., De Laet, T., & De Schutter, J. (2015). Comparison of rigid body motion trajectory descriptors for motion representation and recognition. In International conference on robotics and automation (pp. 3010–3017).
Waldherr, S., Romero, R., & Thrun, S. (2000). A gesture based interface for human–robot interaction. Autonomous Robots, 9(2), 151–173.
Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In Conference on computer vision and pattern recognition (pp. 1290–1297).
Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1119–1122).
Weiss, I. (1993). Geometric invariants and object recognition. International Journal of Computer Vision, 10(3), 207–231.
Wu, S., & Li, Y. F. (2008). On signature invariants for effective motion trajectory recognition. International Journal of Robotic Research, 27(8), 895–917.
Wu, S., & Li, Y. F. (2010). Motion trajectory reproduction from generalized signature description. Pattern Recognition, 43(1), 204–221.
Wu, Y., & Huang, T. S. (2001). Vision-based gesture recognition: A review. In Gesture-based communication in human–computer interaction, lecture notes in computer science (pp. 103–115). Berlin: Springer.
Xia, L., Chen, C. C., Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In Conference on computer vision and pattern recognition workshops (pp 20–27).
Yan, P., Khan, S. M., & Shah, M. (2008). Learning 4d action feature models for arbitrary view action recognition. In International conference on computer vision and pattern recognition (pp. 1–7).
Zisserman, A., & Maybank, S. (1994). A case against epipolar geometry. In Applications of invariance in computer vision, lecture notes in computer science (Vol. 825, pp. 69–88). Berlin: Springer.
Acknowledgements
This work has been supported by the Technical University of Munich, International Graduate School of Science and Engineering.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Rigid Body Motion Representation
To represent rigid body motions it is convenient to attach an orthogonal frame to the rigid body (body frame) and to describe the pose (position and orientation) of the body frame wrt a fixed frame (world frame). In each time instant the position of the rigid body is represented by the vector \(\mathbf {p}\) connecting the origin of the body frame with the origin of the world frame. The axes of the body frame can be projected along the axes of the world frame by the means of the direction cosines. Hence, the orientation of the rigid body is described by collecting the direction cosines into a \(3 \times 3\) rotation matrix \(\mathbf {R}\). It is possible to show that a minimal representation of the orientation consists of 3 values (Siciliano et al. 2009). In this work, we use the rotation vector to represent the orientation.
The rotation vector \(\mathbf {r} = \theta \hat{\mathbf{r}}\) is computed from \(\mathbf {R}\) as:
The rotation matrix \(\mathbf {R}\) is computed from \(\mathbf {r}\) as:
where the skew-symmetric matrix \(\mathbf {S}(\mathbf {r})\) is given by:
Appendix B: Proofs of the relationships in Sect. 7.1
1. \(m_{\omega } = d_{\omega }^1\) derives from (20) and \(d_{\omega }^1\) in Table 2.
2. \(\theta _{\omega }^{1} \approx d_{\omega }^{2} \Delta t\). For \(\Delta t \longrightarrow 0\), we can neglect the arc tangent in (28). Hence, we can rewrite \(\theta _{\omega }^{1}\) in (23) as:
3. \(\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t\). Recall that \(\mathbf {a} \times \mathbf {b} = -\mathbf {b} \times \mathbf {a}\) and that \(\mathbf {a}\cdot (\mathbf {b} \times \mathbf {c}) = \mathbf {c}\cdot (\mathbf {a} \times \mathbf {b})\). \(\theta _{\omega }^{2}\) in (23) can be re-written as:
The denominator of (44) can be re-written as:
Considering that \(\ddot{{{\varvec{a}}}}_t \approx ({{\varvec{a}}}_{t+2} + {{\varvec{a}}}_t)/\Delta {t}^2\), the numerator of (44) can be re-written as:
Finally, combining (45), (46) and (44), and neglecting the arc tangent, we obtain that \(\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t\) for \(\Delta t \longrightarrow 0\).
Appenix C: Proofs of the relationships in Sect. 7.2
1. \(m_{v} = e_{v}^1\) derives from (19) and \(e_{v}^1\) in Table 2. \(m_{\omega } = e_{\omega }^1\) derives from (20) and \(e_{\omega }^1\) in Table 2.
2. \(\theta _{\omega }^{1} \approx e_{\omega }^2 \Delta t\) derives from (43) recalling that \(e_{\omega }^2 = d_{\omega }^2\). \(\theta _{v}^{1} \approx e_{v}^2 \Delta t\) can be proven by following similar steps as in (43) and considering \(e_{v}^2\) in Table 2.
3. \(\theta _{v}^{2} \approx e_{v}^3 \Delta t\) and \(\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t\). Following similar steps as in (45) and (46), and recalling that \((\mathbf {a} \times \mathbf {b})\times (\mathbf {a} \times \mathbf {c}) = \left[ \mathbf {a}\cdot (\mathbf {b} \times \mathbf {c})\right] \mathbf {a}\), it is possible to prove that \(\theta _{v}^{2} \approx e_{v}^3 \Delta t\) and \(\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t\).
Rights and permissions
About this article
Cite this article
Lee, D., Soloperto, R. & Saveriano, M. Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton Robot 42, 125–145 (2018). https://doi.org/10.1007/s10514-017-9645-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9645-x