Browse > Article
http://dx.doi.org/10.7471/ikeee.2019.23.4.1295

Deep Learning-Based Human Motion Denoising  

Kim, Seong Uk (Dept. of Computer Science, Kangwon National University)
Im, Hyeonseung (Dept. of Computer Science, Kangwon National University)
Kim, Jongmin (Dept. of Computer Science, Kangwon National University)
Publication Information
Journal of IKEEE / v.23, no.4, 2019 , pp. 1295-1301 More about this Journal
Abstract
In this paper, we propose a novel method of denoising human motion using a bidirectional recurrent neural network (BRNN) with an attention mechanism. The corrupted motion captured from a single 3D depth sensor camera is automatically fixed in the well-established smooth motion manifold. Incorporating an attention mechanism into BRNN achieves better optimization results and higher accuracy than other deep learning frameworks because a higher weight value is selectively given to a more important input pose at a specific frame for encoding the input motion. Experimental results show that our approach effectively handles various types of motion and noise, and we believe that our method can sufficiently be used in motion capture applications as a post-processing step after capturing human motion.
Keywords
human motion; motion capture; motion denoising; attention; bidirectional recurrent neural network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Cireşan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," arXiv preprint arXiv:1202.2745, 2012. DOI: 10.1109/CVPR.2012.6248110
2 A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1725-1732, 2014. DOI: 10.1109/CVPR.2014.223
3 F. Nasse, C. Thurau, and G. A. Fink, "Face detection using gpu-based convolutional neural networks," International Conference on Computer Analysis of Images and Patterns, pp.83-90, 2009. DOI: 10.1007/978-3-642-03767-2_10
4 S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, no.1, pp.221-231, 2013. DOI: 10.1109/TPAMI.2012.59   DOI
5 J. Fan, W. Xu, Y. Wu, and Y. Gong, "Human Tracking Using Convolutional Neural Networks," IEEE Transactions on Neural Networks, vol.21, no.10, pp.1610-1623, 2010. DOI: 10.1109/TNN.2010.2066286   DOI
6 O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional Neural Networks for Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, no.10, pp.1533-1545, 2014. DOI: 10.1109/TASLP.2014.2339736   DOI
7 J. Chai and J. K. Hodgins, "Performance animation from low-dimensional control signals," ACM Trans. Graph, Vol.24, no.3, pp.686-696, 2005. DOI: 10.1145/1073204.1073248   DOI
8 Y. Lee, K. Wampler, G. Bernstein, J. Popovic, and Z. Popovic, "Motion fields for interactive character locomotion," ACM Trans. Graph, 29, 6, Article 138, 2010.
9 C. F. Rose III, P.-P. J. Sloan, and M. F. Cohen, "Artist‐directed inverse kinematics using radial basis function interpolation," Computer Graphics Forum, Vol.20. No.3. pp.239-250, 2001. DOI: 10.1111/1467-8659.00516   DOI
10 T. Mukai and S. Kuriyama, "Geostatistical motion interpolation," ACM Trans. Graph., vol.24, no.3, pp.1062-1070, 2005. DOI: 10.1145/1073204.1073313   DOI
11 D. Holden, J. Saito, T. Komura, and T. Joyce, "Learning motion manifolds with convolutional autoencoders," SIGGRAPH Asia 2015 Technical Briefs, Article 18, 2015.
12 A. Pandey, and D. Wang, "TCNN: Temporal Convolutional Neural Network for Real-Time Speech Enhancement in The Time Domain," 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6875-6879, 2019. DOI: 10.1109/ICASSP.2019.8683634
13 C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, "Temporal convolutional networks for action segmentation and detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.156-165, 2017.
14 L. Sun, K. Jia, D. Yeung, and B. E. Shi, "Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks," 2015 IEEE International Conference on Computer Vision (ICCV), pp4597-4605. 2015.
15 J. Guo and H. Chao, "Building an end-to-end spatial-temporal convolutional network for video super-resolution," Thirty-First AAAI Conference on Artificial Intelligence, 2017.
16 CMU, "Carnegie-Mellon Motion Capture Database," http://mocap.cs.cmu.edu/.
17 S. R. Buss, "Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods," IEEE Journal of Robotics and Automation, Vol.17, No.16, pp.1-19, 2004.
18 S. U. Kim, H. Jang, and J. Kim, "Human Motion Denoising Using Attention-Based Bidirectional Recurrent Neural Network," In SIGGRAPH Asia 2019 Posters (SA '19), Article 2, 2019. DOI: 10.1145/3355056.3364577
19 K. Grochow, S. L. Martin, A. Hertzmann, and Z. Popovic, "Style-based inverse kinematics," ACM Trans. Graph, vol.23, no.3, pp.522-531, 2004. DOI: 10.1145/1015706.1015755   DOI
20 N. D. Lawrence, "Gaussian process latent variable models for visualisation of high dimensional data," In Proceedings of the 16th International Conference on Neural Information Processing Systems (NIPS'03), pp.329-336, 2004.
21 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, pp.1097-1105, 2012. DOI: 10.1145/3065386