• Title/Summary/Keyword: Phonetics

Search Result 948, Processing Time 0.023 seconds

Phoneme Recognition Using Frequency State Neural Network (주파수 상태 신경 회로망을 이용한 음소 인식)

  • Lee, Jun-Mo;Hwang, Yeong-Soo;Kim, Seong-Jong;Shin, In-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.12-19
    • /
    • 1994
  • This paper reports a new structure for phoneme recognition neural network. The proposed neural network is able to deal with the structure of the frequency bands as well as the temporal structure of phonemic features which used in the conventional TSNN. We trained this neural network using the phonetics (아, 이, 오, ㅅ, ㅊ, ㅍ, ㄱ, ㅇ, ㄹ, ㅁ) and the phoneme recognition of this neural network was a little better than those of conventional TDNN and TSNN using only temporal structure of phonemic features.

  • PDF

A Phonetics Based Design of PLU Sets for Korean Speech Recognition (한국어 음성인식을 위한 음성학 기반의 유사음소단위 집합 설계)

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • MALSORI
    • /
    • no.65
    • /
    • pp.105-124
    • /
    • 2008
  • This paper presents the effects of different phone-like-unit (PLU) sets in order to propose an optimal PLU set for the performance improvement of Korean automatic speech recognition (ASR) systems. The examination of 9 currently used PLU sets indicates that most of them include a selection of allophones without any sufficient phonetic base. In this paper, a total of 34 PLU sets are designed based on Korean phonetic characteristics arid the effects of each PLU set are evaluated through experiments. The results show that the accuracy rate of each phone is influenced by different phonetic constraint(s) which determine(s) the PLU sets, and that an optimal PLU set can be anticipated through the phonetic analysis of the given speech data.

  • PDF

The Phonetics and Phonology of English Schwa

  • Ahn, Soo-Woong
    • Korean Journal of English Language and Linguistics
    • /
    • v.1 no.2
    • /
    • pp.311-329
    • /
    • 2001
  • This paper wanted to test the reality of English schwa by phonetic and phonological methods. Phonetically it wanted to see acoustic evidence of the relationship between the full vowels and their reduced vowels in the unstressed positions. Phonologically it wanted to prove how systematic the schwa sound is by the constraint-based grammar. As a result, the schwa phenomenon in English was supported both phonetically and phonologically. In the phonetic analysis no relationship Was found in the distribution of the F1 and F2 of the full vowels and their reduced vowels in the unstressed syllables of the derived words. The reduced vowels tended to converge into a target of F1 516 and F2 1815. The view that the schwa sounds have a target was supported. On the phonological side the constraint-based tableau produced the successful output by using FAITH (V), (equation omitted)V, FAITH V[-BACK+HiC], V[-Low, -TNS]#, REDUCE V[-STR, -TNS] as constraints. No ranking was found. Any violation of the constraints ousted the candidates.

  • PDF

Improvement of convergence speed in FDICA algorithm with weighted inner product constraint of unmixing matrix (분리행렬의 가중 내적 제한조건을 이용한 FDICA 알고리즘의 수렴속도 향상)

  • Quan, Xingri;Bae, Keunsung
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.17-25
    • /
    • 2015
  • For blind source separation of convolutive mixtures, FDICA(Frequency Domain Independent Component Analysis) algorithms are generally used. Since FDICA algorithm such as Sawada FDICA, IVA(Independent Vector Analysis) works on the frequency bin basis with a natural gradient descent method, it takes much time to converge. In this paper, we propose a new method to improve convergence speed in FDICA algorithm. The proposed method reduces the number of iteration drastically in the process of natural gradient descent method by applying a weighted inner product constraint of unmixing matrix. Experimental results have shown that the proposed method achieved large improvement of convergence speed without degrading the separation performance of the baseline algorithms.

A Speech Waveform Forgery Detection Algorithm Based on Frequency Distribution Analysis (음성 주파수 분포 분석을 통한 편집 의심 지점 검출 방법)

  • Heo, Hee-Soo;So, Byung-Min;Yang, IL-Ho;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.35-40
    • /
    • 2015
  • We propose a speech waveform forgery detection algorithm based on the flatness of frequency distribution. We devise a new measure of flatness which emphasizes the local change of the frequency distribution. Our measure calculates the sum of the differences between the energies of neighboring frequency bands. We compare the proposed measure with conventional flatness measures using a set of a large amount of test sounds. We also compare- the proposed method with conventional detection algorithms based on spectral distances. The results show that the proposed method gives lower equal error rate for the test set compared to the conventional methods.

Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model (Deep Neural Network 언어모델을 위한 Continuous Word Vector 기반의 입력 차원 감소)

  • Kim, Kwang-Ho;Lee, Donghyun;Lim, Minkyu;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.3-8
    • /
    • 2015
  • In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google's Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-${\left|V\right|}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).

Gender difference in the sound change of lexical pitch accents of South Kyungsang Korean

  • Lee, Hyunjung
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.123-130
    • /
    • 2015
  • Given a recent finding showing that female speakers of South Kyungsang Korean is undergoing a sound change of the lexical pitch accent, this study tested whether the change is also reflected for male speech. This study compared F0 scaling and timing properties of accent words produced by younger female and male speakers of South Kyungsang Korean. The results indicated clear gender-related differences, showing more distinct acoustic properties across the accent words for male production compared to females. Despite the better distinction, however, younger male speakers showed peak delay where the F0 peaks are located further to the right compared to conservative speakers' production. Therefore, it might be suggested that younger male speakers' accent productions are in between conservative and innovative phonetic forms.

Performance Comparison of Deep Feature Based Speaker Verification Systems (깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교)

  • Kim, Dae Hyun;Seong, Woo Kyeong;Kim, Hong Kook
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.9-16
    • /
    • 2015
  • In this paper, several experiments are performed according to deep neural network (DNN) based features for the performance comparison of speaker verification (SV) systems. To this end, input features for a DNN, such as mel-frequency cepstral coefficient (MFCC), linear-frequency cepstral coefficient (LFCC), and perceptual linear prediction (PLP), are first compared in a view of the SV performance. After that, the effect of a DNN training method and a structure of hidden layers of DNNs on the SV performance is investigated depending on the type of features. The performance of an SV system is then evaluated on the basis of I-vector or probabilistic linear discriminant analysis (PLDA) scoring method. It is shown from SV experiments that a tandem feature of DNN bottleneck feature and MFCC feature gives the best performance when DNNs are configured using a rectangular type of hidden layers and trained with a supervised training method.

A Study on the Vowel Duration of the Buckeye Corpus (벅아이 코퍼스의 모음 길이 연구)

  • Chung, Hyejung;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.103-110
    • /
    • 2015
  • The purpose of this study is to assess the vowel property by examining the vowel duration of the American English vowles found in the Buckeye corpus[6]. The vowel durations were analyzed in terms of various linguistic factors including the number of syllables of the word containing the vowel, the location of the vowel in a word, types of stress, function versus content word, the word frequency in the corpus and the speech rate calculated from the three consecutive words. The findings from this work agreed mostly with those from earlier studies, but with some exceptions. The relationship between the speech rate and the vowel duration proved non-linear.

Korean Semantic Similarity Measures for the Vector Space Models

  • Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.49-55
    • /
    • 2015
  • It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.