통합 검색 | Korea Science

Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features

Sun, Linhui;Li, Qiu;Fu, Sheng;Li, Pingan
- ETRI Journal
- /
- 제44권3호
- /
- pp.462-475
- /
- 2022
Although researchers have proposed numerous techniques for speech emotion recognition, its performance remains unsatisfactory in many application scenarios. In this study, we propose a speech emotion recognition model based on a genetic algorithm (GA)-decision tree (DT) fusion of deep and acoustic features. To more comprehensively express speech emotional information, first, frame-level deep and acoustic features are extracted from a speech signal. Next, five kinds of statistic variables of these features are calculated to obtain utterance-level features. The Fisher feature selection criterion is employed to select high-performance features, removing redundant information. In the feature fusion stage, the GA is is used to adaptively search for the best feature fusion weight. Finally, using the fused feature, the proposed speech emotion recognition model based on a DT support vector machine model is realized. Experimental results on the Berlin speech emotion database and the Chinese emotion speech database indicate that the proposed model outperforms an average weight fusion method.
https://doi.org/10.4218/etrij.2020-0458 인용 PDF KSCI

음성정보와 문법정보를 이용한 한국어 운율 경계의 자동 추정 (Automatic Detection of Korean Prosodic Boundaries U sing Acoustic and Grammatical Information)

김선희;전재훈;홍혜진;정민화
- 대한음성학회지:말소리
- /
- 제66호
- /
- pp.117-130
- /
- 2008
This paper presents a method for automatically detecting Korean prosodic boundaries using both acoustic and grammatical information for the performance improvement of speech information processing systems. While most of previous works are solely based on grammatical information, our method utilizes not only grammatical information constructed by a Maximum-Entropy-based grammar model using 10 grammatical features, but also acoustical information constructed by a GMM-based acoustic model using 14 acoustic features. Given that Korean prosodic structure has two intonationally defined prosodic units, intonation phrase (IP) and accentual phrase (AP), experimental results show that the detection rate of AP boundaries is 82.6%, which is higher than the labeler agreement rate in hand transcribing, and that the detection rate of IP boundaries is 88.7%, which is slightly lower than the labeler agreement rate.
PDF

모음사이의 예사소리와 된소리의 구분에 대한 실험음성학적 연구 (An Experimental Study of Korean Intervocalic Lak and Tense Stop Consonants)

김효숙
- 대한음성학회지:말소리
- /
- 제33_34호
- /
- pp.1-10
- /
- 1997
Korean stop consonants are well known for their tripple distinction. In word initial position lax, tense and aspirated consonants are all voiceless. They are differentiated by the degree of tension, aspiration and VOT(voice onset time). But in intervocalic position, lax consonants become voiced. In this study I compare the acoustic features of Korean intervocalic lax and tense stops. The closure duration of lax stops is shorter than that of tense consonants. The preceding vowel length is longer in tan than that in tense consonants. I modify the above acoustic characteristics by an experimental methods. For example, I shorten the closure duration of intervocalic tense stops by 5 steps. r also do auditory tests which will show us listener's reaction on the above examples. And do the same job with the preceding vowels. According to the auditory test, the closure duration does an important role in differentiating Korean intervocalic lax and tense stops. But the preceding vowel length has almost nothing to do with the distinction between lax and tense stops. So I conclude that acoustic features also have hierarchy. Some features have categorical characteristics and others don't.
PDF

한국어 원거리 음성의 모음의 음향적 특성 (Acoustic Characteristics of Vowels in Korean Distant-Talking Speech)

이숙향;김선희
- 대한음성학회지:말소리
- /
- 제55권
- /
- pp.61-76
- /
- 2005
This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.
PDF

Detection of onset of failure in prestressed strands by cluster analysis of acoustic emissions

Ercolino, Marianna;Farhidzadeh, Alireza;Salamone, Salvatore;Magliulo, Gennaro
- Structural Monitoring and Maintenance
- /
- 제2권4호
- /
- pp.339-355
- /
- 2015
Corrosion of prestressed concrete structures is one of the main challenges that engineers face today. In response to this national need, this paper presents the results of a long-term project that aims at developing a structural health monitoring (SHM) technology for the nondestructive evaluation of prestressed structures. In this paper, the use of permanently installed low profile piezoelectric transducers (PZT) is proposed in order to record the acoustic emissions (AE) along the length of the strand. The results of an accelerated corrosion test are presented and k-means clustering is applied via principal component analysis (PCA) of AE features to provide an accurate diagnosis of the strand health. The proposed approach shows good correlation between acoustic emissions features and strand failure. Moreover, a clustering technique for the identification of false alarms is proposed.
https://doi.org/10.12989/smm.2015.2.4.339 인용 KSCI

Unsupervised Learning-Based Pipe Leak Detection using Deep Auto-Encoder

Yeo, Doyeob;Bae, Ji-Hoon;Lee, Jae-Cheol
- 한국컴퓨터정보학회논문지
- /
- 제24권9호
- /
- pp.21-27
- /
- 2019
In this paper, we propose a deep auto-encoder-based pipe leak detection (PLD) technique from time-series acoustic data collected by microphone sensor nodes. The key idea of the proposed technique is to learn representative features of the leak-free state using leak-free time-series acoustic data and the deep auto-encoder. The proposed technique can be used to create a PLD model that detects leaks in the pipeline in an unsupervised learning manner. This means that we only use leak-free data without labeling while training the deep auto-encoder. In addition, when compared to the previous supervised learning-based PLD method that uses image features, this technique does not require complex preprocessing of time-series acoustic data owing to the unsupervised feature extraction scheme. The experimental results show that the proposed PLD method using the deep auto-encoder can provide reliable PLD accuracy even considering unsupervised learning-based feature extraction.
https://doi.org/10.9708/jksci.2019.24.09.021 인용 PDF KSCI

A Prosodic Analysis on the Korean Subjective Particles -With Reference to the Establishment of Acoustic Features-

Seong, Cheol-Jae
- The Journal of the Acoustical Society of Korea
- /
- 제20권3E호
- /
- pp.3-9
- /
- 2001
This study aims to describe a prosodic pattern on the Korean subjective particles with respect to their discourse function. 4 kinds of Korean subjective particles were mainly investigated with reference to sentential location, grammatical relations that precede or follow the word including subjective particles, and prosodic phrasing. F0 and energy were gradually diminished as the particles moved down to the sentential final position. 'Ga'particle, which has been potentially regarded as having a grammatical focusing function, looks like to show relatively higher F0 in sentential medial in discourse. At sentential medial position, when the words including 'ga, eun, and neun'particles were preceded by adverbials, the acoustic variables of particles tended to be diminished by some ratio in comparison with the mean value. The duration of particles might vary with respect to style variation and especially that it tended to diminish from 150 basic, 50 separate, and finally 50 discourse successively. And there's some specific phenomenon that prosodic phrasing itself was relatively easily taken place after 'eun' and 'neun' particles. Finally, I tried to catch the prosodic characteristics (which would be established as acoustic features) of inter-word position at which specific subjective particles were intervened. These acoustic features can be made up of the duration and F0 fluctuation activated in the successive 3 syllables in which word (or prosodic) boundary was located.
PDF

음향학적 자질을 활용한 비디오 스피치 요약의 자동 추출과 표현에 관한 연구 (Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features)

김현희
- 정보관리학회지
- /
- 제29권4호
- /
- pp.191-208
- /
- 2012
스피치 요약을 생성하는데 있어서 두 가지 중요한 측면은 스피치에서 핵심 내용을 추출하는 것과 추출한 내용을 효과적으로 표현하는 것이다. 본 연구는 강의 자료의 스피치 요약의 자동 생성을 위해서 스피치 자막이 없는 경우에도 적용할 수 있는 스피치의 음향학적 자질 즉, 스피치의 속도, 피치(소리의 높낮이) 및 강도(소리의 세기)의 세 가지 요인을 이용하여 스피치 요약을 생성할 수 있는지 분석하고, 이 중 가장 효율적으로 이용할 수 있는 요인이 무엇인지 조사하였다. 조사 결과, 강도(최대값 dB과 최소값 dB간의 차이)가 가장 효율적인 요인으로 확인되었다. 이러한 강도를 이용한 방식의 효율성과 특성을 조사하기 위해서 이 방식과 본문 키워드 방식간의 차이를 요약문의 품질 측면에서 분석하고, 이 두 방식에 의해서 각 세그먼트(문장)에 할당된 가중치간의 관계를 분석해 보았다. 그런 다음 추출된 스피치의 핵심 세그먼트를 오디오 또는 텍스트 형태로 표현했을 때 어떤 특성이 있는지 이용자 관점에서 분석해 봄으로써 음향학적 특성을 이용한 스피치 요약을 효율적으로 추출하여 표현하는 방안을 제안하였다.
https://doi.org/10.3743/KOSIM.2012.29.4.191 인용 PDF KSCI

언어와 민요의 운율 자질에 관한 음향음성학적 연구 (An Acoustic Study of Prosodic Features of Korean Spoken Language and Korean Folk Song (Minyo))

구희산
- 음성과학
- /
- 제10권3호
- /
- pp.133-144
- /
- 2003
The purpose of this acoustic experimental study was to investigate interrelation between prosodic features of Korean spoken language and those of Korean folk songs. The words of Changbutaryoung were spoken for analysis of spoken language by three female graduate students and the song was sung for musical features by three Kyunggi Minyo singers. Pitch contours were analyzed from sound spectrogram made by Pitch Works. Results showed that special musical voices (breaking, tinkling, vibrating, etc.) and tunes (rising, falling, level, etc) of folk song were discovered at the same place where accents of spoken language came. It appeared that, even though the patterns of pitch contour were different from each other, there was positive interrelation between prosodic features of Korean spoken language and those of Korean folk songs.
PDF

Acoustic Measurement of English read speech by native and nonnative speakers

Choi, Han-Sook
- 말소리와 음성과학
- /
- 제3권3호
- /
- pp.77-88
- /
- 2011
Foreign accent in second language production depends heavily on the transfer of features from the first language. This study examines acoustic variations in segments and suprasegments by native and nonnative speakers of English, searching for patterns of the transfer and plausible indexes of foreign accent in English. The acoustic variations are analyzed with recorded read speech by 20 native English speakers and 50 Korean learners of English, in terms of vowel formants, vowel duration, and syllabic variation induced by stress. The results show that the acoustic measurements of vowel formants and vowel and syllable durations display difference between native speakers and nonnative speakers. The difference is robust in the production of lax vowels, diphthongs, and stressed syllables, namely the English-specific features. L1 transfer on L2 specification is found both at the segmental levels and at the suprasegmental levels. The transfer levels measured as groups and individuals further show a continuum of divergence from the native-like target. Overall, the eldest group, students who are in the graduate schools, shows more native-like patterns, suggesting weaker foreign accent in English, whereas the high school students tend to involve larger deviation from the native speakers' patterns. Individual results show interdependence between segmental transfer and prosodic transfer, and correlation with self-reported proficiency levels. Additionally, experience factors in English such as length of English study and length of residence in English speaking countries are further discussed as factors to explain the acoustic variation.
PDF

검색결과 328건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)