통합 검색 | Korea Science

Spectrum 강조특성을 이용한 음성신호에서 Voicd - Unvoiced - Silence 분류 (Voiced, Unvoiced, and Silence Classification of human speech signals by enphasis characteristics of spectrum)

배명수;안수길
- 한국음향학회지
- /
- 제4권1호
- /
- pp.9-15
- /
- 1985
In this paper, we describe a new algorithm for deciding whether a given segment of a speech signal is classified as voiced speech, unvoiced speech, or silence, based on parameters made on the signal. The measured parameters for the voiced-unvoiced classfication are the areas of each Zero crossing interval, which is given by multiplication of the magnitude by the inverse zero corssing rate of speech signals. The employed parameter for the unvoiced-silence classification, also, are each of positive area summation during four milisecond interval for the high frequency emphasized speech signals.
PDF

Multiple Acoustic Cues for Stop Recognition

Yun, Weon-Hee
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 10월 학술대회지
- /
- pp.3-16
- /
- 2003
ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters
PDF

Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
- ETRI Journal
- /
- 제36권3호
- /
- pp.514-517
- /
- 2014
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.
https://doi.org/10.4218/etrij.14.0213.0181 인용 PDF KSCI KPUBS

비인강폐쇄부전 환자의 언어교정을 위해 발음 보조장치를 이용한 증례 (The Use of a Temporary Speech Aid Prosthesis to Treat Speech in Velopharyngeal Insufficiency (VPI))

김은주;고승오;신효근;김현기
- 음성과학
- /
- 제9권4호
- /
- pp.3-14
- /
- 2002
VPI occurs when the velum and lateral and posterior pharyngeal wall fail to separate the nasal cavity from the oral cavity during deglutition and speech. There are a number of congenital and acquired conditions which result in VPI. Congenital conditions include cleft palate, submucous cleft palate and congenital palatal insufficiency (CPI). Acquired conditions include carcinoma of the palate or pharynx and neurologic disorders. The speech characteristics of VPI is characterized by hypernasality, nasal air emission, decreased intraoral air pressure, increased nasal air flow, decreased intelligibility. VPI can be treated with various methods that include speech therapy, surgical procedures to reduce the velopharyngeal gap, speech aid prosthesis, and combination of surgery and prosthesis. This article describes four cases of VPI treated by speech aid prosthesis and speech therapy with satisfactory result.
PDF

감정 인식을 위한 음성 특징 도출 (Extraction of Speech Features for Emotion Recognition)

권철홍;송승규;김종열;김근호;장준수
- 말소리와 음성과학
- /
- 제4권2호
- /
- pp.73-78
- /
- 2012
Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.
https://doi.org/10.13064/KSSS.2012.4.2.073 인용 PDF

멀티미디어 환경을 위한 정서음성의 모델링 및 합성에 관한 연구 (Modelling and Synthesis of Emotional Speech on Multimedia Environment)

조철우;김대현
- 음성과학
- /
- 제5권1호
- /
- pp.35-47
- /
- 1999
This paper describes procedures to model and synthesize emotional speech in a multimedia environment. At first, procedures to model the visual representation of emotional speech are proposed. To display the sequences of the images in synchronized form with speech, MSF(Multimedia Speech File) format is proposed and the display software is implemented. Then the emotional speech sinal is collected and analysed to obtain the prosodic characteristics of the emotional speech in limited domain. Multi-emotional sentences are spoken by actors. From the emotional speech signals, prosodic structures are compared in terms of the pseudo-syntactic structure. Based on the analyzed result, neutral speech is transformed into a specific emotinal state by modifying the prosodic structures.
PDF

운율경계에 위치한 어두 모음의 성문 특성: 음향적 상관성을 중심으로 (Glottal Characteristics of Word-initial Vowels in the Prosodic Boundary: Acoustic Correlates)

손형숙
- 말소리와 음성과학
- /
- 제2권3호
- /
- pp.47-63
- /
- 2010
This study provides a description of the glottal characteristics of the word-initial low vowels /a, $\ae$/ in terms of a set of acoustic parameters and discusses glottal configuration as their acoustic correlates. Furthermore, it examines the effect of prosodic boundary on the glottal properties of the vowels, seeking an account of the possible role of prosodic structure based on prosodic theory. Acoustic parameters reported to indicate glottal characteristics were obtained from the measurements made directly from the speech spectrum on recordings of Korean and English collected from 45 speakers. They consist of two separate groups of native Korean and native English speakers, each including both male and female speakers. Based on the three acoustic parameters of open quotient (OQ), first-formant bandwidth (B1), and spectral tilt (ST), comparisons were made between the speech of males and females, between the speech of native Korean and native English speakers, and between Korean and English produced by native Korean speakers. Acoustic analysis of the experimental data indicates that some or all glottal parameters play a crucial role in differentiating the speech groups, despite substantial interspeaker variations. Statistical analysis of the Korean data indicates prosodic strengthening with respect to the acoustic parameters B1 and OQ, suggesting acoustic enhancement in terms of the degree of glottal abduction and the glottal closure during a vibratory cycle.
PDF

사상체질과 음성특징과의 상관관계 연구 (A Study on Correlation between Sasang Constitution and Speech Features)

권철홍;김종열;김근호;한성만
- 혜화의학회지
- /
- 제19권2호
- /
- pp.219-228
- /
- 2011
Objective : Sasang constitution medicine utilizes voice characteristics to diagnose a person's constitution. In this paper we propose methods to analyze Sasang constitution using speech information technology. That is, this study aims at establishing the relationship between Sasang constitutions and their corresponding voice characteristics by investigating various speech variables. Materials & Methods : Voice recordings of 1,406 speakers are obtained whose constitutions have been already diagnosed by the experts in the fields. A total of 144 speech features obtained from five vowels and a sentence are used. The features include pitch, intensity, formant, bandwidth, MDVP and MFCC related variables for each constitution. We analyze the speech variables and find whether there are statistically significant differences among three constitutions. Results : The main speech variables classifying three constitutions are related to pitch and MFCCs for male, and formant and MFCCs for female. The correct decision rate is 73.7% for male Soeumin, 63.3% for male Soyangin, 57.3% for male Taeumin, 74.0% for female Soeumin, 75.6% for female Soyangin, 94.3% for female Taeumin, and 73.0% on the average. Conclusion : Experimental results show that statistically significant correlation between some speech variables and the constitutions is observed.
PDF KSCI

점막하구개열 환자 공명장애의 스펙트럼 특성 연구 (Spectral characteristics of resonance disorders in submucosal type cleft palate patients)

김현철;이종석;임대호;백진아;신효근;김현기
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
- /
- pp.152-154
- /
- 2007
Submucosal type cleft palate is subdivision of cleft palate. Because of late detection, the treatment - for example, the operation or the speech therapy - for the submucosal type cleft palate patient usually late. In this study, we want to find the objective characteristics of submucosal type cleft palate patient, comparing with the normal and the complete cleft palate patient. Experimental groups are 10 submucosal type cleft palate patients who got the operation in our hospital, 10 complete cleft palate patients. And, 10 normals as control group. The sentence patterns using in this study is simple 5 vowels. Using CSL program we evaluate the Formant, Bandwidth. We analized the spectral characteristics of speech signals of 3 groups, before and after the operation.
PDF

자동차 주행 환경에서의 음성 전달 명료도와 음성 인식 성능 비교 (Comparison of Speech Intelligibility & Performance of Speech Recognition in Real Driving Environments)

이광현;최대림;김영일;김봉완;이용주
- 대한음성학회지:말소리
- /
- 제50호
- /
- pp.99-110
- /
- 2004
The normal transmission characteristics of sound are hardly obtained due to the various noises and structural factors in a running car environment. It is due to the channel distortion of the original source sound recorded by microphones, and it seriously degrades the performance of the speech recognition in real driving environments. In this paper we analyze the degree of intelligibility under the various sound distortion environments by channels according to driving speed with respect to speech transmission index(STI) and compare the STI with rates of speech recognition. We examine the correlation between measures of intelligibility depending on sound pick-up patterns and performance in speech recognition. Thereby we consider the optimal location of a microphone in single channel environment. In experimentation we find that high correlation is obtained between STI and rates of speech recognition.
PDF

검색결과 967건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)