Search | Korea Science

Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features

Sun, Linhui;Li, Qiu;Fu, Sheng;Li, Pingan
- ETRI Journal
- /
- v.44 no.3
- /
- pp.462-475
- /
- 2022
Although researchers have proposed numerous techniques for speech emotion recognition, its performance remains unsatisfactory in many application scenarios. In this study, we propose a speech emotion recognition model based on a genetic algorithm (GA)-decision tree (DT) fusion of deep and acoustic features. To more comprehensively express speech emotional information, first, frame-level deep and acoustic features are extracted from a speech signal. Next, five kinds of statistic variables of these features are calculated to obtain utterance-level features. The Fisher feature selection criterion is employed to select high-performance features, removing redundant information. In the feature fusion stage, the GA is is used to adaptively search for the best feature fusion weight. Finally, using the fused feature, the proposed speech emotion recognition model based on a DT support vector machine model is realized. Experimental results on the Berlin speech emotion database and the Chinese emotion speech database indicate that the proposed model outperforms an average weight fusion method.
https://doi.org/10.4218/etrij.2020-0458 인용 PDF KSCI

Automatic Detection of Korean Prosodic Boundaries U sing Acoustic and Grammatical Information (음성정보와 문법정보를 이용한 한국어 운율 경계의 자동 추정)

Kim, Sun-Hee;Jeon, Je-Hun;Hong, Hye-Jin;Chung, Min-Hwa
- MALSORI
- /
- no.66
- /
- pp.117-130
- /
- 2008
This paper presents a method for automatically detecting Korean prosodic boundaries using both acoustic and grammatical information for the performance improvement of speech information processing systems. While most of previous works are solely based on grammatical information, our method utilizes not only grammatical information constructed by a Maximum-Entropy-based grammar model using 10 grammatical features, but also acoustical information constructed by a GMM-based acoustic model using 14 acoustic features. Given that Korean prosodic structure has two intonationally defined prosodic units, intonation phrase (IP) and accentual phrase (AP), experimental results show that the detection rate of AP boundaries is 82.6%, which is higher than the labeler agreement rate in hand transcribing, and that the detection rate of IP boundaries is 88.7%, which is slightly lower than the labeler agreement rate.
PDF

An Experimental Study of Korean Intervocalic Lak and Tense Stop Consonants (모음사이의 예사소리와 된소리의 구분에 대한 실험음성학적 연구)

Kim Hyo-Suk
- MALSORI
- /
- no.33_34
- /
- pp.1-10
- /
- 1997
Korean stop consonants are well known for their tripple distinction. In word initial position lax, tense and aspirated consonants are all voiceless. They are differentiated by the degree of tension, aspiration and VOT(voice onset time). But in intervocalic position, lax consonants become voiced. In this study I compare the acoustic features of Korean intervocalic lax and tense stops. The closure duration of lax stops is shorter than that of tense consonants. The preceding vowel length is longer in tan than that in tense consonants. I modify the above acoustic characteristics by an experimental methods. For example, I shorten the closure duration of intervocalic tense stops by 5 steps. r also do auditory tests which will show us listener's reaction on the above examples. And do the same job with the preceding vowels. According to the auditory test, the closure duration does an important role in differentiating Korean intervocalic lax and tense stops. But the preceding vowel length has almost nothing to do with the distinction between lax and tense stops. So I conclude that acoustic features also have hierarchy. Some features have categorical characteristics and others don't.
PDF

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

Lee Sook-hyang;Kim Sunhee
- MALSORI
- /
- v.55
- /
- pp.61-76
- /
- 2005
This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.
PDF

Detection of onset of failure in prestressed strands by cluster analysis of acoustic emissions

Ercolino, Marianna;Farhidzadeh, Alireza;Salamone, Salvatore;Magliulo, Gennaro
- Structural Monitoring and Maintenance
- /
- v.2 no.4
- /
- pp.339-355
- /
- 2015
Corrosion of prestressed concrete structures is one of the main challenges that engineers face today. In response to this national need, this paper presents the results of a long-term project that aims at developing a structural health monitoring (SHM) technology for the nondestructive evaluation of prestressed structures. In this paper, the use of permanently installed low profile piezoelectric transducers (PZT) is proposed in order to record the acoustic emissions (AE) along the length of the strand. The results of an accelerated corrosion test are presented and k-means clustering is applied via principal component analysis (PCA) of AE features to provide an accurate diagnosis of the strand health. The proposed approach shows good correlation between acoustic emissions features and strand failure. Moreover, a clustering technique for the identification of false alarms is proposed.
https://doi.org/10.12989/smm.2015.2.4.339 인용 KSCI

Unsupervised Learning-Based Pipe Leak Detection using Deep Auto-Encoder

Yeo, Doyeob;Bae, Ji-Hoon;Lee, Jae-Cheol
- Journal of the Korea Society of Computer and Information
- /
- v.24 no.9
- /
- pp.21-27
- /
- 2019
In this paper, we propose a deep auto-encoder-based pipe leak detection (PLD) technique from time-series acoustic data collected by microphone sensor nodes. The key idea of the proposed technique is to learn representative features of the leak-free state using leak-free time-series acoustic data and the deep auto-encoder. The proposed technique can be used to create a PLD model that detects leaks in the pipeline in an unsupervised learning manner. This means that we only use leak-free data without labeling while training the deep auto-encoder. In addition, when compared to the previous supervised learning-based PLD method that uses image features, this technique does not require complex preprocessing of time-series acoustic data owing to the unsupervised feature extraction scheme. The experimental results show that the proposed PLD method using the deep auto-encoder can provide reliable PLD accuracy even considering unsupervised learning-based feature extraction.
https://doi.org/10.9708/jksci.2019.24.09.021 인용 PDF KSCI

A Prosodic Analysis on the Korean Subjective Particles -With Reference to the Establishment of Acoustic Features-

Seong, Cheol-Jae
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3E
- /
- pp.3-9
- /
- 2001
This study aims to describe a prosodic pattern on the Korean subjective particles with respect to their discourse function. 4 kinds of Korean subjective particles were mainly investigated with reference to sentential location, grammatical relations that precede or follow the word including subjective particles, and prosodic phrasing. F0 and energy were gradually diminished as the particles moved down to the sentential final position. 'Ga'particle, which has been potentially regarded as having a grammatical focusing function, looks like to show relatively higher F0 in sentential medial in discourse. At sentential medial position, when the words including 'ga, eun, and neun'particles were preceded by adverbials, the acoustic variables of particles tended to be diminished by some ratio in comparison with the mean value. The duration of particles might vary with respect to style variation and especially that it tended to diminish from 150 basic, 50 separate, and finally 50 discourse successively. And there's some specific phenomenon that prosodic phrasing itself was relatively easily taken place after 'eun' and 'neun' particles. Finally, I tried to catch the prosodic characteristics (which would be established as acoustic features) of inter-word position at which specific subjective particles were intervened. These acoustic features can be made up of the duration and F0 fluctuation activated in the successive 3 syllables in which word (or prosodic) boundary was located.
PDF

Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features (음향학적 자질을 활용한 비디오 스피치 요약의 자동 추출과 표현에 관한 연구)

Kim, Hyun-Hee
- Journal of the Korean Society for information Management
- /
- v.29 no.4
- /
- pp.191-208
- /
- 2012
Two fundamental aspects of speech summary generation are the extraction of key speech content and the style of presentation of the extracted speech synopses. We first investigated whether acoustic features (speaking rate, pitch pattern, and intensity) are equally important and, if not, which one can be effectively modeled to compute the significance of segments for lecture summarization. As a result, we found that the intensity (that is, difference between max DB and min DB) is the most efficient factor for speech summarization. We evaluated the intensity-based method of using the difference between max-DB and min-DB by comparing it to the keyword-based method in terms of which method produces better speech summaries and of how similar weight values assigned to segments by two methods are. Then, we investigated the way to present speech summaries to the viewers. As such, for speech summarization, we suggested how to extract key segments from a speech video efficiently using acoustic features and then present the extracted segments to the viewers.
https://doi.org/10.3743/KOSIM.2012.29.4.191 인용 PDF KSCI

An Acoustic Study of Prosodic Features of Korean Spoken Language and Korean Folk Song (Minyo) (언어와 민요의 운율 자질에 관한 음향음성학적 연구)

Koo, Hee-San
- Speech Sciences
- /
- v.10 no.3
- /
- pp.133-144
- /
- 2003
The purpose of this acoustic experimental study was to investigate interrelation between prosodic features of Korean spoken language and those of Korean folk songs. The words of Changbutaryoung were spoken for analysis of spoken language by three female graduate students and the song was sung for musical features by three Kyunggi Minyo singers. Pitch contours were analyzed from sound spectrogram made by Pitch Works. Results showed that special musical voices (breaking, tinkling, vibrating, etc.) and tunes (rising, falling, level, etc) of folk song were discovered at the same place where accents of spoken language came. It appeared that, even though the patterns of pitch contour were different from each other, there was positive interrelation between prosodic features of Korean spoken language and those of Korean folk songs.
PDF

Acoustic Measurement of English read speech by native and nonnative speakers

Choi, Han-Sook
- Phonetics and Speech Sciences
- /
- v.3 no.3
- /
- pp.77-88
- /
- 2011
Foreign accent in second language production depends heavily on the transfer of features from the first language. This study examines acoustic variations in segments and suprasegments by native and nonnative speakers of English, searching for patterns of the transfer and plausible indexes of foreign accent in English. The acoustic variations are analyzed with recorded read speech by 20 native English speakers and 50 Korean learners of English, in terms of vowel formants, vowel duration, and syllabic variation induced by stress. The results show that the acoustic measurements of vowel formants and vowel and syllable durations display difference between native speakers and nonnative speakers. The difference is robust in the production of lax vowels, diphthongs, and stressed syllables, namely the English-specific features. L1 transfer on L2 specification is found both at the segmental levels and at the suprasegmental levels. The transfer levels measured as groups and individuals further show a continuum of divergence from the native-like target. Overall, the eldest group, students who are in the graduate schools, shows more native-like patterns, suggesting weaker foreign accent in English, whereas the high school students tend to involve larger deviation from the native speakers' patterns. Individual results show interdependence between segmental transfer and prosodic transfer, and correlation with self-reported proficiency levels. Additionally, experience factors in English such as length of English study and length of residence in English speaking countries are further discussed as factors to explain the acoustic variation.
PDF

Search Result 322, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)