Browse > Article
http://dx.doi.org/10.17661/jkiiect.2020.13.3.197

Visualization of Korean Speech Based on the Distance of Acoustic Features  

Pok, Gou-Chol (Division of Computer and IT Instruction, PaiChai University)
Publication Information
The Journal of Korea Institute of Information, Electronics, and Communication Technology / v.13, no.3, 2020 , pp. 197-205 More about this Journal
Abstract
Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.
Keywords
Feature Clustering; Korean Pronunciation; SOM-VAE; Speech Processing; Speech Visualization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Beskow, O. Engwall, B. Granstrom, P. Nordqvist, and P. Wik, "Visualization of Speech and Audio for Hearing Impaired Persons," Technology and Disability, vol 20, pp. 97-107, 2008.   DOI
2 Y. Ueda, T. Sakada, and A. Watanabe, "Real-time Speech Visualization System for Speech Training and Diagnosis," Audio Engineering Society Convention Paper 8184, 2010 November 4, San Fransico, USA.
3 D. S. Kim, T. H. Lee, and D. M. Lee, "An ambient display for hearing impaired people," Proc. Human Computer Interface Korea (HCI2006), pp.46-51, 2006.
4 D. Silva, "Variation in Voice Onset Time for Korean Stops: A Case for Recent Sound Change", Korean Linguistics, vol. 13, 2006.
5 J. Y. Bae, "Acoustic Characteristics of Korean Stop Sounds According to Phonetic Environment: Focusing on Features on the Time Line", Phonectics and Speech Sciences, vol. 5.2, pp.139-159, 1999.
6 S. H. Kim, "A Study on Korean Affricates Produced by Vietnamese Speakers", vol. 59, pp. 145-168, Korean Linguistics, 2013.
7 Y. Dissen, J. Goldberg, and J. Keshet, "Formant Estimation and Tracking: A Deep Learning Approach", J. Acoustic Society, vol.145, no.2, pp.1-11, 2019   DOI
8 V. Fortuin, M. Huser, F. Locatello, H. Stratman, and G. Ratsch, "Deep Self-Organization: Interpretable Discreate Representation Learning on Time Series", arXiv:1806.02199, 2018.
9 H. S. Shin, "Phonological Information in Korean Language", Prunsasang, 2016.
10 D. Davies and D. Bouldin. "A Cluster Separation Measure". IEEE Trans on Pattern Analysis and Machine Intelligence. vol.2, pp.224-227, 1979.
11 P. J. Rousseeuw, "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis". Computational and Applied Mathematics. vol. 20, pp.53-65, 1987.   DOI
12 A. Watanabe, S. Tomishige, and M. Nakatake "Speech Visualization by Integrating Features for the Hearing Impaired", IEEE Trans. Speech Audio Proc., vol 8, no 4, pp. 454-466, 2000.   DOI