• Title/Summary/Keyword: Speech Class

검색결과 140건 처리시간 0.021초

강의실내의 물리지표와 주관적평가와의 상관관계 (The relevancy between physical index and subjective appraisal of class)

  • Lee, Chai-Bong;Kim, Yong-Man
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2002년도 추계학술대회논문초록집
    • /
    • pp.374.1-374
    • /
    • 2002
  • The eventual purpose of this research is to make optimum standards for acoustic-environment by using not only physical characteristics but also subjective appraisals. First, basic Physical data were measured which were necessary to establish standards for acoustic environment in campus buildings, TSP has used to measure sound levels, reverberation times, clearness indexes, and speech-transmission-index. (omitted)

  • PDF

음성인식기 구현을 위한 SVM과 독립성분분석 기법의 적용 (Adoption of Support Vector Machine and Independent Component Analysis for Implementation of Speech Recognizer)

  • 박정원;김평환;김창근;허강인
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2164-2167
    • /
    • 2003
  • In this paper we propose effective speech recognizer through recognition experiments for three feature parameters(PCA, ICA and MFCC) using SVM(Support Vector Machine) classifier In general, SVM is classification method which classify two class set by finding voluntary nonlinear boundary in vector space and possesses high classification performance under few training data number. In this paper we compare recognition result for each feature parameter and propose ICA feature as the most effective parameter

  • PDF

부가 주성분분석을 이용한 미지의 환경에서의 화자식별 (Speaker Identification Using Augmented PCA in Unknown Environments)

  • 유하진
    • 대한음성학회지:말소리
    • /
    • 제54호
    • /
    • pp.73-83
    • /
    • 2005
  • The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.

  • PDF

An Audio-Visual Teaching Aid (AVTA) with Scrolling Display and Speech to Text over the Internet

  • Davood Khalili;Chung, Wan-Young
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 V
    • /
    • pp.2649-2652
    • /
    • 2003
  • In this Paper, an Audio-Visual Teaching aid (AVTA) for use in a classroom and with Internet is presented. A system, which was designed and tested, consists of a wireless Microphone system, Text to Speech conversion Software, Noise filtering circuit and a Computer. An IBM compatible PC with sound card and Network Interface card and a Web browser and a voice and text messenger service were used to provide slightly delayed text and also voice over the internet for remote teaming, while providing scrolling text from a real time lecture in a classroom. The motivation for design of this system, was to aid Korean students who may have difficulty in listening comprehension while have, fairly good reading ability of text. This application of this system is twofold. On one hand it will help the students in a class to view and listen to a lecture, and on the other hand, it will serve as a vehicle for remote access (audio and text) for a classroom lecture. The project provides a simple and low cost solution to remote learning and also allows a student to have access to classroom in emergency situations when the student, can not attend a class. In addition, such system allows the student in capturing a teacher's lecture in audio and text form, without the need to be present in class or having to take many notes. This system will therefore help students in many ways.

  • PDF

음악청취가 중학생의 발표불안에 미치는 영향 (Effect of Listening to Music on Speech Anxiety among Middle-school Female Students)

  • 오윤숙;손진훈;장은혜;석지아;이옥현
    • 감성과학
    • /
    • 제7권4호
    • /
    • pp.43-49
    • /
    • 2004
  • 본 연구는 중학생을 대상으로 지속적인 음악청취가 발표불안에 미치는 영향을 밝히고자 하였다. 실험 참여자는 여중생 두 개 반을 대상으로 통제집단 33명, 실험집단 33명씩 총 66명이고, 음악처치를 받는 집단과 아무런 처치가 주어지지 않는 집단으로 나누었다. 처치 전, 처치 1주 후 및 2주 후에 자기보고식으로 이루어진 발표불안척도(SAS)와 교사의 평가로 이루어진 발표행동 평가척도(SBES)를 통하여 발표불안을 측정하였다. 결과는 음악처치를 받은 실험집단에서 발표불안이 의미 있게 감소하는 것으로 나타났다. 따라서 음악에 의한 감성 유발이 발표상황에서 받는 스트레스에 긍정적인 효과가 있었다.

  • PDF

음성인식 기반 응급상황관제 (Emergency dispatching based on automatic speech recognition)

  • 이규환;정지오;신대진;정민화;강경희;장윤희;장경호
    • 말소리와 음성과학
    • /
    • 제8권2호
    • /
    • pp.31-39
    • /
    • 2016
  • In emergency dispatching at 119 Command & Dispatch Center, some inconsistencies between the 'standard emergency aid system' and 'dispatch protocol,' which are both mandatory to follow, cause inefficiency in the dispatcher's performance. If an emergency dispatch system uses automatic speech recognition (ASR) to process the dispatcher's protocol speech during the case registration, it instantly extracts and provides the required information specified in the 'standard emergency aid system,' making the rescue command more efficient. For this purpose, we have developed a Korean large vocabulary continuous speech recognition system for 400,000 words to be used for the emergency dispatch system. The 400,000 words include vocabulary from news, SNS, blogs and emergency rescue domains. Acoustic model is constructed by using 1,300 hours of telephone call (8 kHz) speech, whereas language model is constructed by using 13 GB text corpus. From the transcribed corpus of 6,600 real telephone calls, call logs with emergency rescue command class and identified major symptom are extracted in connection with the rescue activity log and National Emergency Department Information System (NEDIS). ASR is applied to emergency dispatcher's repetition utterances about the patient information. Based on the Levenshtein distance between the ASR result and the template information, the emergency patient information is extracted. Experimental results show that 9.15% Word Error Rate of the speech recognition performance and 95.8% of emergency response detection performance are obtained for the emergency dispatch system.

Utilizing debate techniques in English speaking class

  • Jung, Sook-Kyung
    • 영어어문교육
    • /
    • 제12권1호
    • /
    • pp.103-129
    • /
    • 2006
  • This paper presents a case study of the effectiveness of debate class in promoting speaking skills of advanced learners. The researcher adopted English debate techniques in an English speaking class during four-week teacher training program and investigated how teachers responded to the new technique. Forty-five middle and high school teachers participated in the study and classroom observation, pre-survey, post-survey, and focus group interviews were used as the major research methods. The teacher pre-survey results presented that teachers prefer a conversation class where they can directly acquire proper sentence patterns and speaking strategies rather than spend time in performing communicative events. The results of the focus group interview and post-survey confirmed that a debate class can meet this specific teachers' needs. Most teachers responded positively to the debate classes since: 1) debate techniques are relatively new ideas to Korean teachers; 2) debate techniques require speed and accuracy in speech; thus teachers could learn to present their ideas logically and efficiently in a limited time through repeated argument exercises. The study result implies that debate technique can be an effective vehicle in an EFL context to promote advanced learners' logical thinking skills and logical English sentence structures.

  • PDF

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

  • Haskey, S.J.;Datta, S.
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.464-467
    • /
    • 1996
  • This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker

  • PDF

k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법 (Determination of representative emotional style of speech based on k-means algorithm)

  • 오상신;엄세연;장인선;안충현;강홍구
    • 한국음향학회지
    • /
    • 제38권5호
    • /
    • pp.614-620
    • /
    • 2019
  • 본 논문은 전역 스타일 토큰(Global Style Token, GST)을 사용하는 종단 간(end-to-end) 감정 음성 합성 시스템의 성능을 높이기 위해 각 감정의 스타일 벡터를 효과적으로 결정하는 방법을 제안한다. 기존 방법은 각 감정을 표현하기 위해 한 개의 대푯값만을 사용하므로 감정 표현의 풍부함 측면에서 크게 제한된다. 이를 해결하기 위해 본 논문에서는 k-평균 알고리즘을 사용하여 다수의 대표 스타일을 추출하는 방법을 제안한다. 청취 평가를 통해 제안 방법을 이용해 추출한 각 감정의 대표 스타일이 기존 방법에 비해 감정 표현 정도가 뛰어나며, 감정 간의 차이를 명확히 구별할 수 있음을 보였다.

식도발성의 숙련 정도에 따른 모음의 음향학적 특징과 자음 산출에 대한 연구 (Analysis of Acoustic Characteristics of Vowel and Consonants Production Study on Speech Proficiency in Esophageal Speech)

  • 최성희;최홍식;김한수;임성은;이성은;표화영
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.7-27
    • /
    • 2003
  • Esophageal Speech uses the esophageal air during phonation. Fluent esophageal speakers frequently intake air in oral communication, but unskilled esophageal speakers are difficult with swallowing lots of air. The purpose of this study was to investigate the difference of acoustic characteristics of vowel and consonants production according to the speech proficiency level in esophageal speech. 13 normal male speakers and 13 male esophageal speakers (5 unskilled esophageal speakers, 8 skilled esophageal speakers) with age ranging from 50 to 70 years old. The stimuli were sustained /a/ vowel and 36 meaningless two syllable words. Used vowel is /a/ and consonants were 18 : /k, n, t, m, p, s, c, $C^{h},\;k^{h},\;t^{h},\;p^{h}$, h, I, k', t', p', s', c'/. Fundermental frequency (Fx), Jitter, shimmer, HNR, MPT were measured with by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). 36 meaningless words produced by esophageal speakers were presented to 3 speech-language pathologists who phonetically transcribed their responses. Fx, Jitter, HNR parameters is significant different between skilled esophageal speakers and unskilled esophageal speakers (P<.05). Considering manner of articulation, ANOVA showed that differences in two esophageal speech groups on speech proficiency were significant; Glide had the highest number of confusion with the other phoneme class, affricates are the most intelligible in the unskilled esophageal speech group, whereas in the skilled esophageal speech group fricatives resulted highest number of confusions, nasals are the most intelligible. In the place of articulation, glottal /h/ is the highest confusion consonant in both groups. Bilabials are the most intelligible in the skilled esophageal speech, velars are the most intelligible in the unskilled esophageal speech. In the structure of syllable, 'CV+V' is more confusion in the skilled esophageal group, unskilled esophageal speech group has similar confusion in both structures. In unskilled esophageal speech, significantly different Fx, Jitter, HNR acoustic parameters of vowel and the highest confusions of Liquid, Nasals consonants could be attributed to unstable, improper contact of neoglottis as vibratory source and insufficiency in the phonatory air supply, and higher motoric demand of remaining articulation due to morphological characteristics of vocal tract after laryngectomy.

  • PDF