• Title/Summary/Keyword: 자음/모음

Search Result 218, Processing Time 0.021 seconds

Analysis of Acoustic Characteristics of Vowel and Consonants Production Study on Speech Proficiency in Esophageal Speech (식도발성의 숙련 정도에 따른 모음의 음향학적 특징과 자음 산출에 대한 연구)

  • Choi, Seong-Hee;Choi, Hong-Shik;Kim, Han-Soo;Lim, Sung-Eun;Lee, Sung-Eun;Pyo, Hwa-Young
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.7-27
    • /
    • 2003
  • Esophageal Speech uses the esophageal air during phonation. Fluent esophageal speakers frequently intake air in oral communication, but unskilled esophageal speakers are difficult with swallowing lots of air. The purpose of this study was to investigate the difference of acoustic characteristics of vowel and consonants production according to the speech proficiency level in esophageal speech. 13 normal male speakers and 13 male esophageal speakers (5 unskilled esophageal speakers, 8 skilled esophageal speakers) with age ranging from 50 to 70 years old. The stimuli were sustained /a/ vowel and 36 meaningless two syllable words. Used vowel is /a/ and consonants were 18 : /k, n, t, m, p, s, c, $C^{h},\;k^{h},\;t^{h},\;p^{h}$, h, I, k', t', p', s', c'/. Fundermental frequency (Fx), Jitter, shimmer, HNR, MPT were measured with by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). 36 meaningless words produced by esophageal speakers were presented to 3 speech-language pathologists who phonetically transcribed their responses. Fx, Jitter, HNR parameters is significant different between skilled esophageal speakers and unskilled esophageal speakers (P<.05). Considering manner of articulation, ANOVA showed that differences in two esophageal speech groups on speech proficiency were significant; Glide had the highest number of confusion with the other phoneme class, affricates are the most intelligible in the unskilled esophageal speech group, whereas in the skilled esophageal speech group fricatives resulted highest number of confusions, nasals are the most intelligible. In the place of articulation, glottal /h/ is the highest confusion consonant in both groups. Bilabials are the most intelligible in the skilled esophageal speech, velars are the most intelligible in the unskilled esophageal speech. In the structure of syllable, 'CV+V' is more confusion in the skilled esophageal group, unskilled esophageal speech group has similar confusion in both structures. In unskilled esophageal speech, significantly different Fx, Jitter, HNR acoustic parameters of vowel and the highest confusions of Liquid, Nasals consonants could be attributed to unstable, improper contact of neoglottis as vibratory source and insufficiency in the phonatory air supply, and higher motoric demand of remaining articulation due to morphological characteristics of vocal tract after laryngectomy.

  • PDF

An Efficient Character Image Enhancement and Region Segmentation Using Watershed Transformation (Watershed 변환을 이용한 효율적인 문자 영상 향상 및 영역 분할)

  • Choi, Young-Kyoo;Rhee, Sang-Burm
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.481-490
    • /
    • 2002
  • Off-line handwritten character recognition is in difficulty of incomplete preprocessing because it has not dynamic information has various handwriting, extreme overlap of the consonant and vowel and many error image of stroke. Consequently off-line handwritten character recognition needs to study about preprocessing of various methods such as binarization and thinning. This paper considers running time of watershed algorithm and the quality of resulting image as preprocessing for off-line handwritten Korean character recognition. So it proposes application of effective watershed algorithm for segmentation of character region and background region in gray level character image and segmentation function for binarization by extracted watershed image. Besides it proposes thinning methods that effectively extracts skeleton through conditional test mask considering routing time and quality of skeleton, estimates efficiency of existing methods and this paper's methods as running time and quality. Average execution time on the previous method was 2.16 second and on this paper method was 1.72 second. We prove that this paper's method removed noise effectively with overlap stroke as compared with the previous method.

The Effect of Sensory Integrative Intervention Focused on Proprioceptive-Vestibular Stimuli on the Handwriting and Fine Motor Function in Lower Grade Elementary School Children (고유-전정감각 중심의 감각통합 중재가 초등학교 저학년 아동의 글씨쓰기와 소운동에 미치는 영향)

  • Hwang, Ji-Hye;Kim, Hee-Jung;Jung, Hyerim
    • The Journal of Korean Academy of Sensory Integration
    • /
    • v.15 no.1
    • /
    • pp.10-20
    • /
    • 2017
  • Objective : The purpose of this study was to investigate the effect of proprioceptive-vestibular based sensory integrative intervention on handwriting and fine motor function in elementary school students in grades 1 to 3. Methods : In this study, eight students in an elementary school in Busan were enrolled. The intervention was conducted twice a week from May to October, 2016, and a total of 14 intervention sessions were conducted. In order to evaluate the writing ability and the fine-motor exercise ability before and after the intervention, the fine-motor movement items of the Korean alphabet writing test and the Bruininks-Oseretsky Test of Motor Proficiency (BOTMP) evaluation were used. Results : There was a statistically significant difference between the total scores of consonant writing and the Korean alphabet writing assessment after the intervention. In BOTMP-Fine motor, the response speed items showed statistically significant difference. Visual-motor control scores increased during intervention, but the statistical significance was not found. Conclusion : Sensory integration interventions might have positive effects on elementary school students' writing skills and fine motor functions.

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

  • Pok, Gou-Chol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.3
    • /
    • pp.197-205
    • /
    • 2020
  • Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.

Preprocessing Technique for Malicious Comments Detection Considering the Form of Comments Used in the Online Community (온라인 커뮤니티에서 사용되는 댓글의 형태를 고려한 악플 탐지를 위한 전처리 기법)

  • Kim Hae Soo;Kim Mi Hui
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.3
    • /
    • pp.103-110
    • /
    • 2023
  • With the spread of the Internet, anonymous communities emerged along with the activation of communities for communication between people, and many users are doing harm to others, such as posting aggressive posts and leaving comments using anonymity. In the past, administrators directly checked posts and comments, then deleted and blocked them, but as the number of community users increased, they reached a level that managers could not continue to monitor. Initially, word filtering techniques were used to prevent malicious writing from being posted in a form that could not post or comment if a specific word was included, but they avoided filtering in a bypassed form, such as using similar words. As a way to solve this problem, deep learning was used to monitor posts posted by users in real-time, but recently, the community uses words that can only be understood by the community or from a human perspective, not from a general Korean word. There are various types and forms of characters, making it difficult to learn everything in the artificial intelligence model. Therefore, in this paper, we proposes a preprocessing technique in which each character of a sentence is imaged using a CNN model that learns the consonants, vowel and spacing images of Korean word and converts characters that can only be understood from a human perspective into characters predicted by the CNN model. As a result of the experiment, it was confirmed that the performance of the LSTM, BiLSTM and CNN-BiLSTM models increased by 3.2%, 3.3%, and 4.88%, respectively, through the proposed preprocessing technique.

The Effect of Retrieval Difficulty and Association Strength on Memory Inhibition (자극의 인출난이도와 연합강도가 기억억제에 미치는 효과)

  • Yoonjae Jung
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.1
    • /
    • pp.21-38
    • /
    • 2023
  • The present study was designed to investigate the effect of the difficulty level of retrieval practice and the association strength of categories and stimuli within categories on memory inhibition. Most of the studies have investigated whether inhibition was occurred by manipulating the degree of association strength, emotion value or physical characteristics of non-retrieval practice words within the retrieval practice category. Therefore, it was necessary to study how inhibition occurs according to the degree of difficulty of retrieval stimuli during retrieval practice. The difficulty of retrieval was manipulated into three levels: difficult condition, normal condition, and easy condition through the degree of presentation of consonants and vowels of words during retrieval learning. Additionally, the strength of association between categories and words within categories was manipulated. In previous studies, retrieval-induced forgetting occurred under conditions where the association strength between categories and words within the categories was strong. On the other hand, retrieval-induced forgetting did not occur under conditions where the association strength between categories and words within the categories was weak. The present study, if the inhibition process differs according to the difficulty of retrieval, the possibility of different results from previous studies was explored according to the difference in the strength of association with the category. As a result of the study, in the condition of strong association strength, retrieval-induced forgetting was observed under normal and difficult retrieval difficulty conditions. Whereas retrieval-induced forgetting was not observed under conditions of easy retrieval difficulty condition. In the condition of weak association strength, retrieval-induced forgetting tended to occur under difficult retrieval difficulty conditions. Whereas retrieval-induced forgetting was not observed under conditions of normal and easy retrieval difficulty condition. These results suggest that memory inhibition may appear differently depending on the difficulty of retrieval.

The Relationship between the Musical System of the Jeongganbo and Sanskrit·Buddhism (세종·세조 악보와 불전(佛典)·범문(梵文)의 관계)

  • Yoon So-hee
    • 기호학연구
    • /
    • v.61
    • /
    • pp.187-217
    • /
    • 2019
  • During this study I examined closely the relationship between the system of the Jeongganbo and the Buddhist sutra and Sanskirt grammatical particles. King Sejong invented the Jeongganbo during the Josun Dynasty. It had 32 cells on one line and next king Sejo reformed it to have 16 cells with 4 partitions. From ancient times in Eastern music, any numbers with each instrument, music piece, tone scale, mode -even if it was just one tone- were symbolized with some meaning. Consequently, despite being unable to locate the appropriate records, I never doubted that the musical systems of Sejong and Sejo had meaning. Finally, I discovered the same numbers and symbolized meaning in Buddhist sutra as had the system of word formation in ancient Indian sanskrit. Moreover, it related to the Korean alphabet characters, Hangeul, invented by King Sejong. Firstly, there is a story that a person who has 32 moral characters entering the priesthood may become a Great Buddha. Alternatively, if he lives a secular life he will be a great leader, this according to the Buddhist sutra, Dirghagama. This myth goes back to two dharmas of Nivrtti laksha and Pravrtti Lakshaone in Veda. In later days, most Buddhist sutra described the 32 virtued person as embodying the character of Buddha. Especially in the 『Goanchaljebeobhang Sutra(觀察諸法行經)』, it is said that a significant man has 32 virtues, and the 16 sanskrit character as the door to entrance to the enlightened world. Initially Siddham, which is one kind of ancient sanskrit, was a 12 vowel alphabet. Thereafter a further 4 characters were added and it became 16 letters. In Buddhism, this alphabet was utilized as the method for the practice of Buddhistic austerities by recitation or imagining one after another. Finally, it became a mantra chanting. King Sejong formulated Hangeul to be a phonetic symbol script like sanskrit rather than Chinese characters. After its creation, several mantra books were written using the newly made Hangeul, Siddham and Chinese characters together. Also Hangeul was used for writing Buddha's life story and praising Buddha and the Bodhisattvas. Notable also is the resemblance in structure and pattern of rhythm between Korean and Indian traditional music. Considering these elements and factors, I was able to postulate that the Jeongganbo, with 32 cells and 16 cells, resulted from Buddhism and Siddham characters. But what we could not establish on record was the link with Josun Dynasty suppression of Buddhism. At that time, Confucianism was the official policy, but the King and common people believed and followed Buddhism, following long established customs since the Silla area.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.