• 제목/요약/키워드: Speech Class

검색결과 140건 처리시간 0.021초

국소 극대-극소점 간의 간격정보를 이용한 시간영역에서의 음성인식을 위한 파라미터 추출 방법 (A Time-Domain Parameter Extraction Method for Speech Recognition using the Local Peak-to-Peak Interval Information)

  • 임재열;김형일;안수길
    • 전자공학회논문지B
    • /
    • 제31B권2호
    • /
    • pp.28-34
    • /
    • 1994
  • In this paper, a new time-domain parameter extraction method for speech recognition is proposed. The suggested emthod is based on the fact that the local peak-to-peak interval, i.e., the interval between maxima and minima of speech waveform is closely related to the frequency component of the speech signal. The parameterization is achieved by a sort of filter bank technique in the time domain. To test the proposed parameter extraction emthod, an isolated word recognizer based on Vector Quantization and Hidden Markov Model was constructed. As a test material, 22 words spoken by ten males were used and the recognition rate of 92.9% was obtained. This result leads to the conclusion that the new parameter extraction method can be used for speech recognition system. Since the proposed method is processed in the time domain, the real-time parameter extraction can be implemented in the class of personal computer equipped onlu with an A/D converter without any DSP board.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

정제 알고리즘을 이용한 한국인 화자의 영어 발화 자동 진단 시스템 (Automatic Pronunciation Diagnosis System of Korean Students' English Using Purification Algorithm)

  • 양일호;김민석;유하진;한혜승;이주경
    • 말소리와 음성과학
    • /
    • 제2권2호
    • /
    • pp.69-75
    • /
    • 2010
  • We propose an automatic pronunciation diagnosis system to evaluate the pronunciation of a foreign language without the uttered text. We recorded English utterances spoken by native and Korean speakers, and utterances spoken by Koreans are evaluated by native speakers based on three criteria: fluency, accuracy of phones and intonation. The system evaluates the utterances of test Korean speakers based on the differences of log-likelihood given two models: one is trained by English speech uttered by native speakers, and the other is trained by English speech uttered by Korean speakers. We also applied purification algorithm to increase class differentiability. The purification can detect and eliminate the non-speech frames such as short pauses, occlusive silences that do not help to discriminate between utterances. As the results, our proposed system has higher correlation with the human scores than the baseline system.

  • PDF

악교정 환자의 악교정 수술전후 발음양상에 대한 비교연구 (The Comparative Study of Effect on Speech before and after Orthognathic Surgery of Patients)

  • 권경환;김수남;이동근;조용민;이숙향
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • 제22권2호
    • /
    • pp.191-205
    • /
    • 2000
  • The purpose of this study was undertaken to determine the effects of orthognathic surgery on speech. The hyposis stated herein is that functional behaviors of the dentofacial complex, such as speech production, may be adversely affected by deviations of a structural nature(especially, Class III malocclusion). Twenty adults with Class III malocclusion(13 female and 7 male) were studied preoperative, immediate postoperative and either 6 or 12 months postoperative lateral cephalograms. They had mandibular prognathism and had undergone mandible setback operation. The position of tongue, soft palate(Uvula), hyoid bone, respiratory track width, and pharyngeal depth were assessed on lateral cephalograms with 23 cephalometric variables, ANOVA, Paired t-tests and Pearson's product-moment correlation coefficient tests were used to evalute the operative changes in all cephalometric parameters. A experienced speech and language pathologists performed narrow phonetic transcriptions of tape-recorded words and sentences produced by each of the ninth patients and the recording tapes were analyzed by phonetic computer program(Computerized Speech Lab(CSL) Model 4300BI(U.S.A.)) These judges also recorded their ratings of each patient's overall consonants, hypernasality, hyponasality, and articulation proficiency. The results obtained are as follows; 1. There were significant changes in distance of posterior pharyngeal wall to tongue (TI-TW2, TS-TW3) after the surgery at 6 months postoperatively(each p<0.01 p<0.05). 2. The posterior tongue point(TI, TS, PPT) moved posteriorly after surgery and remained to its changed position at 6 months postoperatively(p<0.05). The displacement of tongue was correlated with the movement of mandibular setback amount(p<0.05). The hyoid bone moved posteriorly superiorly after immediate postoperative period. There was significant changes in hyoid bone movement after immediated postoperative period(p<0.05), but returned to its original position during the follow-up period(p>0.05) 3. The soft palate was displaced posteriorly superiorly after immediated operative period and remained to its changed position at 6 months postoperatively(p<0.05). ANS-PNS-SPT angle increasing, PPU-PPPo distance narrowing was showed after surgery, and remained its appearance 6 months postoperatively(p<0.05). 4. There were significant changes in formant value and squre diagram of vowel sound after the orthognathic surgery and the follow-up period. There were significant changes in /ㅅ/sound and posterior tongue sound. 5. The posterior movement of tongue and the posteriosuperior movement of soft palate was correlated with mandibular setback amount after orthognathic surgery. On the vowel squre diagram, the author found that the place of articulation after operation moved downward, backward, upward. 6. In assessing speech abnormalities, dental occlusion should be considered as a contributing factor. The vast majority of subjects with preoperative misarticulations eliminated or reduced their errors following orthognathic surgery. There was significant difference in speech impovement between pre- and postoperation.

  • PDF

한국어 /ㅛ/의 발음 양상 연구: 발음형 빈도와 음향적 특징을 중심으로 (Pronunciation of the Korean diphthong /jo/: Phonetic realizations and acoustic properties)

  • 이향원
    • 말소리와 음성과학
    • /
    • 제15권1호
    • /
    • pp.9-17
    • /
    • 2023
  • 이 연구의 목적은 한국어 이중모음 /ㅛ/가 다양한 언어학적 환경에서 어떠한 발음 변이 양상을 보이는지 밝히는 것이다. 특히 음성적 변이와 분포 범위의 연관성에 주목하여 /ㅛ/의 발음 양상을 논의하였다. 서울코퍼스의 여성 화자 10명의 발화에서 나타난 /ㅛ/의 운율적 위치(단음절, 어절 초, 어절 중, 어절 말)와 어휘 부류(내용어, 기능어)를 분석하였다. 각 환경에서 /ㅛ/의 출현 빈도를 파악한 결과, 운율적 위치에 따라 어휘 부류와 발음형 실현이 달라지는 양상을 보였다. 음향 분석을 통해 기능어에서 나타나는 /ㅛ/에서는 음성적 약화가 빈번하게 일어나는 것을 확인하였다. 어휘 부류는 /ㅛ/의 평균적인 음가를 달라지게 하지는 않았지만 개별 토큰의 분포 양상에서는 차이가 발견되었다. 이를 통해 언어학적 환경이 모음의 음성적 분포 양상에 영향을 미친다는 것을 알 수 있었다.

Emotion recognition from speech using Gammatone auditory filterbank

  • 레바부이;이영구;이승룡
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2011년도 한국컴퓨터종합학술대회논문집 Vol.38 No.1(A)
    • /
    • pp.255-258
    • /
    • 2011
  • An application of Gammatone auditory filterbank for emotion recognition from speech is described in this paper. Gammatone filterbank is a bank of Gammatone filters which are used as a preprocessing stage before applying feature extraction methods to get the most relevant features for emotion recognition from speech. In the feature extraction step, the energy value of output signal of each filter is computed and combined with other of all filters to produce a feature vector for the learning step. A feature vector is estimated in a short time period of input speech signal to take the advantage of dependence on time domain. Finally, in the learning step, Hidden Markov Model (HMM) is used to create a model for each emotion class and recognize a particular input emotional speech. In the experiment, feature extraction based on Gammatone filterbank (GTF) shows the better outcomes in comparison with features based on Mel-Frequency Cepstral Coefficient (MFCC) which is a well-known feature extraction for speech recognition as well as emotion recognition from speech.

자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교 (Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech)

  • 한승훈;강병옥;동성희
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권8호
    • /
    • pp.365-370
    • /
    • 2023
  • 사람은 노화과정에 따라 발화의 호흡, 조음, 높낮이, 주파수, 언어 표현 능력 등이 변화한다. 본 논문에서는 이러한 변화로부터 발생하는 음향적, 언어적 특징을 기반으로 발화 데이터를 성인과 노인 두 그룹으로 분류하는 성능을 비교하고자 한다. 음향적 특징으로는 발화 음성의 주파수 (frequency), 진폭(amplitude), 스펙트럼(spectrum)과 관련된 특징을 사용하였으며, 언어적 특징으로는 자연어처리 분야에서 우수한 성능을 보이고 있는 한국어 대용량 코퍼스 사전학습 모델인 KoBERT를 통해 발화 전사문의 맥락 정보를 담은 은닉상태 벡터 표현을 추출하여 사용하였다. 본 논문에서는 음향적 특징과 언어적 특징을 기반으로 학습된 각 모델의 분류 성능을 확인하였다. 또한, 다운샘플링을 통해 클래스 불균형 문제를 해소한 뒤 성인과 노인 두 클래스에 대한 각 모델의 F1 점수를 확인하였다. 실험 결과로, 음향적 특징을 사용하였을 때보다 언어적 특징을 사용하였을 때 성인과 노인 분류에서 더 높은 성능을 보이는 것으로 나타났으며, 클래스 비율이 동일하더라도 노인에 대한 분류 성능보다 성인에 대한 분류 성능이 높음을 확인하였다.

한국어 품사 기반 온톨로지 구축 방법 및 차량 서비스 적용 방안 (Constructing Ontology based on Korean Parts of Speech and Applying to Vehicle Services)

  • 차시호;류민우
    • 디지털산업정보학회논문지
    • /
    • 제17권4호
    • /
    • pp.103-108
    • /
    • 2021
  • Knowledge graph is a technology that improves search results by using semantic information based on various resources. Therefore, due to these advantages, the knowledge graph is being defined as one of the core research technologies to provide AI-based services recently. However, in the case of the knowledge graph, since the form of knowledge collected from various service domains is defined as plain text, it is very important to be able to analyze the text and understand its meaning. Recently, various lexical dictionaries have been proposed together with the knowledge graph, but since most lexical dictionaries are defined in a language other than Korean, there is a problem in that the corresponding language dictionary cannot be used when providing a Korean knowledge service. To solve this problem, this paper proposes an ontology based on the parts of speech of Korean. The proposed ontology uses 9 parts of speech in Korean to enable the interpretation of words and their semantic meaning through a semantic connection between word class and word class. We also studied various scenarios to apply the proposed ontology to vehicle services.

대학 내 사무실의 스피치 프라이버시 측정 및 평가 (Measurement and evaluation of speech privacy in university office rooms)

  • 임재섭;최영지
    • 한국음향학회지
    • /
    • 제38권4호
    • /
    • pp.396-405
    • /
    • 2019
  • 본 논문에서는 대학 내 밀폐형 사무실의 SPC(Speech Privacy Class) 값을 측정하고 평가하였다. 대학 캠퍼스 내 3곳의 건물에 위치한 5곳 대상공간에서 실간 음압레벨차이(Level Difference, LD)와 수음실의 암소음 레벨($L_b$)을 각각 측정하였다. 5곳 대상공간은 모두 인접실과 복도가 인접해있다. SPC값을 도출하기 위해 필요한 LD값과 기존의 차음성능 측정방법인 투과손실(Transmission Loss, TL)을 함께 측정하여 비교하였다. 측정결과, 5곳 대상공간은 SPC 최소 기준치인 70을 만족하지 못하였다. 5곳 대상공간의 평균 $L_b$값은 29.2 dB이며 SPC 최소 기준치를 만족하기 위해서는 LD값이 41 dB 이상이어야 한다. SPC 최소 기준치를 만족하기 위해서 1/3옥타브밴드 160 Hz ~ 5000 Hz 주파수대역에서 평균 TL값은 40 dB 이상이 되도록 음향설계가 이루어져야 한다. LD값에 가장 큰 영향을 미치는 인자는 음원실과 수음실 간 인접벽체의 개구부 유무이다. 따라서 인접벽체에 개구부가 존재할 경우 차음성능이 높은 재료로 개구부를 대체하여 적절한 SPC값을 만족할 수 있다.

The Phonetic Realization of High Tone in North Kyungsang Korean

  • Chang, Woo-Hyeok
    • 음성과학
    • /
    • 제11권3호
    • /
    • pp.37-54
    • /
    • 2004
  • The main goal of this study is to examine the current issue of the deletion of high tone vs. the downstep or upstep of high tone in North Kyungsang Korean (NKK). In this phonetic experiment, five native speakers of North Kyungsang Korean participated and two categories, such as compounds and two-word phrases were included as a test material. This experiment shows that when the first word belongs to the nonfinal class, the high tone of the second word is overwhelmingly deleted. When the first word belongs to the final class, the high tone of it is also overwhelmingly deleted. It is thus concluded that when two words are combined into a phrase, the peak of one word retains, whereas the peak of the other is deleted. It is confirmed that a single high tone prominence in a phonological phrase in NKK is not due to the processes of down step or upstep but the deletion process.

  • PDF