• 제목/요약/키워드: 한국어 모음

Search Result 217, Processing Time 0.027 seconds

Abusive Sentence Detection using Deep Learning in Online Game (딥러닝를 사용한 온라인 게임에서의 욕설 탐지)

  • Park, Sunghee;Kim, Huy Kang;Woo, Jiyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.13-14
    • /
    • 2019
  • 욕설은 게임 내 가장 큰 불쾌 요소 중 하나이다. 지금까지 게임 사용자들의 욕설을 방지하기 위해서 금칙어를 기반으로 필터링 해왔으나, 한국어 특성상 단어를 변형하거나 중간에 숫자를 넣는 등 우회할 방법이 다양하기 때문에 효과적이지 않다. 따라서 본 논문에서는 실제 온라인 게임 'Archeage'에서 수집된 채팅 데이터를 기반으로 딥러닝 기법 중 하나인 콘볼루션 신경망을 사용하여 욕설을 탐지하는 모델을 구축하였다. 한글의 자음, 모음을 분리하여 실험하였을 때, 87%라는 정확도를 얻었다. 한 글자씩 분리한 경우, 조금 더 좋은 정확도를 얻었으나, 사전의 수가 자소를 분리한 경우보다 10배 이상 늘어난 것을 고려해보면 자소를 분리한 것이 더 효율적이다.

  • PDF

Consonant/Vowel Segmentation in Monosyllabic Speech Data Using the Fractal Dimension (프랙탈 차원을 이용한 단음절 음성의 자$\cdot$모음 분리)

  • Choi, Chul-Young;Kim, Hyung-Soon;Kim, Jae-Ho;Son, Kyung-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.3
    • /
    • pp.51-62
    • /
    • 1994
  • In this paper, we performed a class of experiments on segmenting consonant and vowel from Korean consonant-vowel (CV) monosyllable data, using the fractal dimension of the speech signals. We chose the Minkowski-Bouligand dimension as the fractal dimension, and computed it using the morphological covering method. In order to examine the usefulness of the fractal dimension in speech segmentation we carried out speech segmentation experiments using the fractal dimension alone, using the short-time energy alone, and using both the fractal dimension and the short-time energy, and compared the results. From the experiments, segmentation accuracy of $96.1\%$ was achieved for the case with using the multiplication of the slope of the fractal dimension and that of the energy, while the segmentation accuracies for the cases with using the slope of either the fractal dimension or energy alone were slightly lower $(93.6\%)$ or much lower $(88.0\%)$ than the above case, respectively. These results indicate that the fractal dimension can be used as a good parameter for speech segmentation.

  • PDF

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

  • Kim, Ki-Seok;Im, Eun-Jin;Hwang, Hee-Yung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.61-66
    • /
    • 1992
  • There have been many researches that uses neural network models for automatic speech recognition, but the main trend was finding the neural network models and learning rules appropriate to automatic speech recognition. However, the choice of the input speech parameter for the neural network as well as neural network model itself is a very important factor for the improvement of performance of the automatic speech recognition system using neural network. In this paper we select 6 speech parameters from surveys of the speech recognition papers which uses neural networks, and analyze the performance for the same data and the same neural network model. We use 8 sets of 9 Korean plosives and 18 sets of 8 Korean vowels. We use recurrent neural network and compare the performance of the 6 speech parameters while the number of nodes is constant. The delta cepstrum of linear predictive coefficients showed best result and the recognition rates are 95.1% for the vowels and 100.0% for plosives.

  • PDF

$F_2$ Formant Frequency Characteristics of the Aging Male and Female Speakers (한국어 모음에서 연령증가에 따른 제2음형대의 변화양상)

  • 김찬우;차흥억;장일환;김선태;오승철;석윤식;이영숙
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.10 no.2
    • /
    • pp.119-123
    • /
    • 1999
  • Background and Objectives : Conditions such as muscle atrophy, stretching of strap muscles, and continued craniofacial growth factors have been cited as contributing to the changes observed in the vocal tract structure and function in elderly speakers. The purpose of the present study is to compare F$_1$ and F$_2$ frequency levels in elderly and young adult male and female speakers producing a series of vowels ranging from high-front to low-back placement. Material and Methods : The subjects were two groups of young adults(10 males, 10 females, mean age 21 years old range 19-24 years) and two groups of elderly speakers(10 males, 10 females, mean age 67 years : range 60-84 years). Each subject participated in speech pathologist to be a speaker of unimpared standard Korean. The headphone was positioned 2 cm from the speakers lips. Each speaker sustained the five vowels for 5 s. Formant frequency measures were obtained from an analysis of linear predictive coding in CSL model 4300B(Kay co). Results : Repeated measure AVOVA procedures were completed on the $F_1$ and $F_2$ data for the male and female speakers. $F_2$ formant frequency levels were proven to be significantly lower fir elderly speakers. Conclusions : We presume $F_2$ vocal cavity(from the point of tongue constriction to lip) lengthening in elderly speakers. The research designed to observe dynamic speech production more directly will be needed.

  • PDF

The Perception of Vowels Synthesized in Vowel Space by $F_1\;and\;F_2$: A Study on the Differences between Vowel Perception of Seoul and Kyungnam Dialectal Speakers ($F_1$$F_2$ 모음공간에서 합성된 한국어 모음 지각)

  • Choi, Yang-Gyu;Shin, Hyun-Jung;Kwon, Oh-Seek
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.201-211
    • /
    • 1997
  • Acoustically a naturally-spoken vowel is composed of five formants. However, the acoustic quality of a vowel is known to be mostly determined by $F_1\;and\;F_2$. The main purpose of this study was to examine how synthesized vowels with $F_1\;and\;F_2$ are perceived by Korean native speakers. In addion, we are interested in finding whether the synthesized vowels are perceived differently by standard Korean speakers and Kyungnam regional dialect speakers. In the experiment 9 Seoul standard Korean speakers and 9 Kyungnam dialect speakers heard 536 vowels synthesized in vowel space with $F_1\;by\;F_2$ and categorized them into one of 10 Korean vowels. The resultant vowel map showed that each Korean vowel occupies an unique area in the two-dimensional vowel space of $F_1\;by\;F_2$, and confirmed that $F_1\;and\;F_2$ play important roles in the perception of vowels. The results also showed that the Seoul speakers and the Kyungnam speakers perceive the synthesized vowels differently. For example, /e/ versus /$\varepsilon$/ contrast, /y/, and /$\phi$/ are perceived differently by the Seoul speakers, whereas they were perceptually confused by the Kyungnam speakers. These results might be due to the different vowel systems of the standard Korean and the Kyungnam regional dialect. While the latter uses a six-vowel system which has no /e/ vs /$/ contrast, /v/ vs /i/ contrast, /y/, and /$\phi$/, the former recognizes these as different vowels. This result suggests that the vowel system of differing dialect restricts the perception of the Korean vowels. Unexpectedly /i/ does not occupy any area in the vowel apace. This result suggests that /i/ cannot be synthesized without $F_3$.

  • PDF

Comparative Analysis on Pronunciation Contents in Korean Integrated Textbooks (한국어 통합 교재에 나타난 발음 내용의 비교 분석)

  • Park, Eunha
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.4
    • /
    • pp.268-278
    • /
    • 2018
  • The purpose of this study is to compare and analyze phonetic items such as the phonemic system, phonological rules, and pronunciation descriptions and notations incorporated in the textbooks. Based on our analysis results, we point out the problems related to pronunciation education, and suggest directions for improvement. First, the presentation order of consonants and vowels in the phonological systems sections of each textbook was different. We recommend that a standard for consonant and vowel presentation order should be prepared, but that this standard should take into consideration the specific purpose of the textbook; the learning strategies and goals, as well as the possibility of teaching and learning. Second, similar to phonemic systems, the presentation order of phonological rules was different for each textbook. To create a standard order for phonological rules, we have to standardize the order of presentation of rules and determine which rules should be presented. Furthermore, when describing phonological rules, the content should be described in common and essential terms as much as possible without the use of jargon. Third, in other matters of pronunciation, there were problems such as examples for pronunciation and lack of exercises. Regarding this, we propose to provide sentences or dialogues as examples for pronunciation, and to link these to various activities and other language functions for pronunciation practice.

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

The Error Pattern Analysis of the HMM-Based Automatic Phoneme Segmentation (HMM기반 자동음소분할기의 음소분할 오류 유형 분석)

  • Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.5
    • /
    • pp.213-221
    • /
    • 2006
  • Phone segmentation of speech waveform is especially important for concatenative text to speech synthesis which uses segmented corpora for the construction of synthetic units. because the quality of synthesized speech depends critically on the accuracy of the segmentation. In the beginning. the phone segmentation was manually performed. but it brings the huge effort and the large time delay. HMM-based approaches adopted from automatic speech recognition are most widely used for automatic segmentation in speech synthesis, providing a consistent and accurate phone labeling scheme. Even the HMM-based approach has been successful, it may locate a phone boundary at a different position than expected. In this paper. we categorized adjacent phoneme pairs and analyzed the mismatches between hand-labeled transcriptions and HMM-based labels. Then we described the dominant error patterns that must be improved for the speech synthesis. For the experiment. hand labeled standard Korean speech DB from ETRI was used as a reference DB. Time difference larger than 20ms between hand-labeled phoneme boundary and auto-aligned boundary is treated as an automatic segmentation error. Our experimental results from female speaker revealed that plosive-vowel, affricate-vowel and vowel-liquid pairs showed high accuracies, 99%, 99.5% and 99% respectively. But stop-nasal, stop-liquid and nasal-liquid pairs showed very low accuracies, 45%, 50% and 55%. And these from male speaker revealed similar tendency.

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

  • Pok, Gou-Chol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.3
    • /
    • pp.197-205
    • /
    • 2020
  • Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF