• 제목/요약/키워드: Syllable Model

검색결과 77건 처리시간 0.029초

Profane or Not: Improving Korean Profane Detection using Deep Learning

  • Woo, Jiyoung;Park, Sung Hee;Kim, Huy Kang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권1호
    • /
    • pp.305-318
    • /
    • 2022
  • Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • 제45권4호
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

한-일 수화 영상통신을 위한 3차원 모델 (3D model for korean-japanese sign language image communication)

  • 신성효;김상운
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1998년도 하계종합학술대회논문집
    • /
    • pp.929-932
    • /
    • 1998
  • In this paper we propose a method of representing emotional experessions and lip shapes for sign language communication using 3-dimensional model. At first we employ the action units (AU) of facial action coding system(FACS) to display all shapes. Then we define 11 basic lip shapes and sounding times of each components in a syllable in order to synthesize the lip shapes more precisely for korean characters. Experimental results show that the proposed method could be used efficiently for the sign language image communication between different languages.

  • PDF

경상방언 대학생들이 발음한 국어 한자어 장단음 분석 (An Analysis of Short and Long Syllables of Sino-Korean Words Produced by College Students with Kyungsang Dialect)

  • 양병곤
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.131-138
    • /
    • 2015
  • The initial syllables of a pair of Sino-Korean words are generally differentiated in their meaning by either short or long durations. They are realized differently by the dialect and generation of speakers. Recent research has reported that the temporal distinction has gradually faded away. The aim of this study is to examine whether college students with Kyungsang dialect made the distinction temporally using a statistical method of Mixed Effects Model. Thirty students participated in the recording of five pairs of Korean words in clear or casual speaking styles. Then, the author measured the durations of the initial syllables of the words and made a descriptive analysis of the data followed by applying Mixed Effects Models to the data by setting gender, length, and style as fixed effects, and subject and syllable as random effects, and tested their effects on the initial syllable durations. Results showed that college students with Kyungsang dialect did not produce the long and short syllables distinctively with any statistically significant difference between them. Secondly, there was a significant difference in the duration of the initial syllables between male and female students. Thirdly, there was also a significant difference in the duration of the initial syllables produced in the clear or casual styles. The author concluded that college students with Kyungsang dialect do not produce long and short Sino-Korean syllables distinctively, and any statistical analysis on the temporal aspect should be carefully made considering both fixed and random effects. Further studies would be desirable to examine production and perception of the initial syllables by speakers with various dialect, generation, and age groups.

HMM 부모델을 이용한 단어 인식에 관한 연구 (A Study on Word Recognition using sub-model based Hidden Markov Model)

  • 신원호
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
    • /
    • pp.395-398
    • /
    • 1994
  • In this paper the word recognition using sub-model based Hidden Markov Model was studied. Phoneme models were composed of 61 phonemes in therms of Korean language pronunciation characteristic. Using this, word model was maded by serial concatenation. But, in case of this phoneme concatenation, the second and the third phoneme of syllable are overlapped in distribution at the same time. So considering this, the method that combines the second and the third phoneme to one model was proposed. And to prevent the increase in number of model, similar phonemes were combined to one, and finially, 57 models were created. In experiment proper model structure of sub-model was searched for, and recognition results were compared. So similar recognition results were maded, and overall recognition rates were increased in case of using parameter tying method.

  • PDF

회귀신경망을 이용한 음성인식에 관한 연구 (A Study on Speech Recognition using Recurrent Neural Networks)

  • 한학용;김주성;허강인
    • 한국음향학회지
    • /
    • 제18권3호
    • /
    • pp.62-67
    • /
    • 1999
  • 본 논문은 회귀신경망을 이용한 음성인식에 관한 연구이다. 예측형 신경망으로 음절단위로 모델링한 후 미지의 입력음성에 대하여 예측오차가 최소가 되는 모델을 인식결과로 한다. 이를 위해서 예측형으로 구성된 신경망에 음성의 시변성을 신경망 내부에 흡수시키기 위해서 회귀구조의 동적인 신경망인 회귀예측신경망을 구성하고 Elman과 Jordan이 제안한 회귀구조에 따라 인식성능을 서로 비교하였다. 음성DB는 ETRI의 샘돌이 음성 데이터를 사용하였다. 그리고, 신경망의 최적모델을 구하기 위하여 예측차수와 은닉층 유니트 수의 변화에 따른 인식률의 변화와 문맥층에서 자기회귀계수를 두어 이전의 값들이 문맥층에서 누적되도록 하였을 경우에 대한 인식률의 변화를 비교하였다. 실험결과, 최적의 예측차수, 은닉층 유니트수, 자기회귀계수는 신경망의 구조에 따라 차이가 나타났으며, 전반적으로 Jordan망이 Elman망보다 인식률이 높았으며, 자기회귀계수에 대한 영향은 신경망의 구조와 계수값에 따라 불규칙하게 나타났다.

  • PDF

식도발성 남성 발화의 말 속도 (Speech Rates of Male Esophageal Speech)

  • 박원경;심희정;고도흥
    • 말소리와 음성과학
    • /
    • 제4권3호
    • /
    • pp.143-149
    • /
    • 2012
  • The purpose of this study is to investigate the speech rate of an esophageal speech group that is capable of vocalization after surgery. The subjects in this experiment were 10 male esophageal speakers and 10 male laryngeal speakers. Each group read a reading passage that was recorded by a DAT recorder (Rolando, EDIROL R-09). These records were analyzed by using CSL (Computerized Speech Lab, model 4150). The results were as follows: (1) the overall speech rate of esophageal speech was 2.50 SPS (syllable per second) while the overall speech rate of laryngeal speech was 4.23 SPS. (2) The articulatory rate of esophageal speech was 3.14 SPS (syllable per second) while the articulatory rate of laryngeal speech was 4.75 SPS. Speech rates as well as articulatory rates of esophageal speech were significantly lower than laryngeal speech. These differences between the two groups may be due to reduced efficiency of airflows across the pharyngeal-esophageal segment for esophageal speakers when compared to airflow through the glottis for laryngeal speakers. These results would provide a guideline in speech rates for esophageal speakers in clinical settings.

Language Specific Variations of Domain-initial Strengthening and its Implications on the Phonology-Phonetics Interface: with Particular Reference to English and Hamkyeong Korean

  • Kim, Sung-A
    • 음성과학
    • /
    • 제11권3호
    • /
    • pp.7-21
    • /
    • 2004
  • The present study aims to investigate domain-initial strengthening phenomenon, which refers to strengthening of articulatory gestures at the initial positions of prosodic domains. More specifically, this paper presents the result of an experimental study of initial syllables with onset consonants (initial-syllable vowels henceforth) of various prosodic domains in English and Hamkyeong Korean, a pitch accent dialect spoken in the northern part of North Korea. The durations of initial-syllable vowels are compared to those of second vowels in real-word tokens for both languages, controlling both stress and segmental environment. Hamkyeong Korean, like English, tuned out to strengthen the domain-initial consonants. With regard to vowel durations, no significant prosodic effect was found in English. On the other hand, Hamkyeong Korean showed significant differences between the durations of initial and non-initial vowels in the higher prosodic domains. The theoretical implications of the findings are as follows: The potentially universal phenomenon of initial strengthening is shown to be subject to language specific variations in its implementation. More importantly, the distinct phonetics- phonology model (Pierrehumbert & Beckman, 1998; Keating, 1990; Cohn, 1993) is better equipped to account for the facts in the present study.

  • PDF

TMS320C30을 이용한 소규모 Voice Dialing 시스템 (The small scale Voice Dialing System using TMS320C30)

  • 이항섭
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1991년도 학술발표회 논문집
    • /
    • pp.58-63
    • /
    • 1991
  • This paper describes development of small scale voice dialing system using TMS320C30. Recognition vocabuliary is used 50 department name within university. In vocabulary below the middle scale, word unit recognition is more practice than phoneme unit or syllable unit recognition. In this paper, we performend recognition and model generation using DMS(Dynamic Multi-Section) and implemeted voice dialing system using TMS320C30. As a result of recognition, we achieved a 98% recognition rate in condition of section 22 and weight 0.6 and recognition time took 4 seconds.

  • PDF

Brain-Operated Typewriter using the Language Prediction Model

  • Lee, Sae-Byeok;Lim, Heui-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제5권10호
    • /
    • pp.1770-1782
    • /
    • 2011
  • A brain-computer interface (BCI) is a communication system that translates brain activity into commands for computers or other devices. In other words, BCIs create a new communication channel between the brain and an output device by bypassing conventional motor output pathways consisting of nerves and muscles. This is particularly useful for facilitating communication for people suffering from paralysis. Due to the low bit rate, it takes much more time to translate brain activity into commands. Especially it takes much time to input characters by using BCI-based typewriters. In this paper, we propose a brain-operated typewriter which is accelerated by a language prediction model. The proposed system uses three kinds of strategies to improve the entry speed: word completion, next-syllable prediction, and next word prediction. We found that the entry speed of BCI-based typewriter improved about twice as much through our demonstration which utilized the language prediction model.