Search | Korea Science

Emotion recognition from speech using Gammatone auditory filterbank

Le, Ba-Vui;Lee, Young-Koo;Lee, Sung-Young
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06a
- /
- pp.255-258
- /
- 2011
An application of Gammatone auditory filterbank for emotion recognition from speech is described in this paper. Gammatone filterbank is a bank of Gammatone filters which are used as a preprocessing stage before applying feature extraction methods to get the most relevant features for emotion recognition from speech. In the feature extraction step, the energy value of output signal of each filter is computed and combined with other of all filters to produce a feature vector for the learning step. A feature vector is estimated in a short time period of input speech signal to take the advantage of dependence on time domain. Finally, in the learning step, Hidden Markov Model (HMM) is used to create a model for each emotion class and recognize a particular input emotional speech. In the experiment, feature extraction based on Gammatone filterbank (GTF) shows the better outcomes in comparison with features based on Mel-Frequency Cepstral Coefficient (MFCC) which is a well-known feature extraction for speech recognition as well as emotion recognition from speech.

Study on Electric Stimulus Pattern in Cochlear Implant Using a Computer Model (신경모델링을 이용한 인공와우 전기자극 패턴 연구)

Yang, Hyejin;Woo, Jihwan
- Journal of the Institute of Electronics and Information Engineers
- /
- v.49 no.12
- /
- pp.249-255
- /
- 2012
A cochlear implant system uses charge-balanced biphasic pulses that are known to reduce tissue damage than monophasic pulses. In this study, we investigated effect of pulse pattern on neural responses using a computer model, based on the Hodgkin-Huxley equation. Electric pulse phase, pulse duration, and phase gap have been systematically varied to characterize auditory nerve responses. The results show that neural responses, dynamic range and threshold are represented as a function of stimulus patterns and duration. The results could greatly extend to develop more efficient cochlear implant stimulation.
https://doi.org/10.5573/ieek.2012.49.12.249 인용 PDF

Speech Enhancement in Noisy Speech Using Neural Network (신경회로망을 사용한 잡음이 중첩된 음성 강조)

Choi, Jae-Seung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.5 s.305
- /
- pp.165-172
- /
- 2005
In speech recognition under a noisy environment, it is necessary to construct a system which reduces the noise and enhances the speech. Then it is effective to imitate the human auditory system which has an excellent analytical spectrum mechanism for speech enhancement. Accordingly, this paper proposes an adaptive method using the auditory mechanism which is called lateral inhibition. This method first estimates the noise intensity by neural network, then adaptively adjusts both the coefficients of the lateral inhibition and the adjusting coefficient of amplitude component according to the noise intensity for each input frame. It is confirmed that the proposed method is effective for speech degraded by white noise, colored noise, and road noise based on the spectral distortion measurement.
PDF KSCI

Isolated-Word Speech Recognition in Telephone Environment Using Perceptual Auditory Characteristic (인지적 청각 특성을 이용한 고립 단어 전화 음성 인식)

Choi, Hyung-Ki;Park, Ki-Young;Kim, Chong-Kyo
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.39 no.2
- /
- pp.60-65
- /
- 2002
In this paper, we propose GFCC(gammatone filter frequency cepstrum coefficient) parameter which was based on the auditory characteristic for accomplishing better speech recognition rate. And it is performed the experiment of speech recognition for isolated word acquired from telephone network. For the purpose of comparing GFCC parameter with other parameter, the experiment of speech recognition are carried out using MFCC and LPCC parameter. Also, for each parameter, we are implemented CMS(cepstral mean subtraction)which was applied or not in order to compensate channel distortion in telephone network. Accordingly, we found that the recognition rate using GFCC parameter is better than other parameter in the experimental result.
PDF KSCI

Pattern classification of the synchronized EEG records by an auditory stimulus for human-computer interface (인간-컴퓨터 인터페이스를 위한 청각 동기방식 뇌파신호의 패턴 분류)

Lee, Yong-Hee;Choi, Chun-Ho
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.12
- /
- pp.2349-2356
- /
- 2008
In this paper, we present the method to effectively extract and classify the EEG caused by only brain activity when a normal subject is in a state of mental activity. We measure the synchronous EEG on the auditory event when a subject who is in a normal state thinks of a specific task, and then shift the baseline and reduce the effect of biological artifacts on the measured EEG. Finally we extract only the mental task signal by averaging method, and then perform the recognition of the extracted mental task signal by computing the AR coefficients. In the experiment, the auditory stimulus is used as an event and the EEG was recorded from the three channel $C_3-A_1$, $C_4-A_2$ and $P_Z-A_1$. After averaging 16 times for each channel output, we extracted the features of specific mental tasks by modeling the output as 12th order AR coefficients. We used total 36th order coefficient as an input parameter of the neural network and measured the training data 50 times per each task. With data not used for training, the rate of task recognition is 34-92 percent on the two tasks, and 38-54 percent on the four tasks.
https://doi.org/10.6109/jkiice.2008.12.12.2349 인용 PDF KSCI

NMF-Feature Extraction for Sound Classification (소리 분류를 위한 NMF특징 추출)

Yong-Choon Cho;Seungin Choi;Sung-Yang Bang
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.4-6
- /
- 2003
A holistic representation, such as sparse ceding or independent component analysis (ICA), was successfully applied to explain early auditory processing and sound classification. In contrast, Part-based representation is an alternative way of understanding object recognition in brain. In this paper. we employ the non-negative matrix factorization (NMF)［1］which learns parts-based representation for sound classification. Feature extraction methods from spectrogram using NMF are explained. Experimental results show that NMF-based features improve the performance of sound classification over ICA-based features.
PDF

Development of a Baseline Platform for Spoken Dialog Recognition System (대화음성인식 시스템 구현을 위한 기본 플랫폼 개발)

Chung Minhwa;Seo Jungyun;Lee Yong-Jo;Han Myungsoo
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.32-35
- /
- 2003
This paper describes our recent work for developing a baseline platform for Korean spoken dialog recognition. In our work, We have collected about 65 hour speech corpus with auditory transcriptions. Linguistic information on various levels such as mophology, syntax, semantics, and discourse is attached to the speech database by using automatic or semi-automatic tools for tagging linguistic information.
PDF

Multiple Task Performance and Psychological Refractory Period in Children: Focusing on PRP Paradigm Tasks (유아의 다중과제 수행과 심리적 불응기: PRP 패러다임 과제를 중심으로)

Kim, Bokyung;Yi, Soon Hyung
- Korean Journal of Child Studies
- /
- v.38 no.3
- /
- pp.75-90
- /
- 2017
Objective: This study aimed to identify children's cognitive processing and performance characteristics while multiple task performance. It confirmed whether their multiple task performance and psychological refractory period (PRP) varied by task condition (stimulus onset asynchrony [SOA] and task difficulty) and stimulus modality. Methods: Seventy 5-year-olds were recruited. Multi-task tools were developed using the E-prime software. The children were required to respond to two stimuli (visual or auditory) presented with microscopic time difference and their response times (RTs) were recorded. Results: As the SOA increased, the RTs in the first task increased, while the RTs in the second task and PRP decreased. The RTs of the first and second tasks, and the PRP for difficult tasks, were significantly longer than those for easy tasks were. Additionally, there was an interaction effect between the SOA and task difficulty. Although there was no main effect of stimulus modality, task difficulty moderated the modality effect. In the high difficulty condition, the RTs of the first and second tasks and PRP for the visual-visual task were significantly longer than those for auditory-auditory task were. Conclusion: These results inform theoretical discussions on children's multi-task mechanism, and the loss of multiple task performance. Additionally, they provide practical implications and information on the composition of multi-tasks suitable for children in educational environments.
https://doi.org/10.5723/kjcs.2017.38.3.75 인용 PDF KSCI

Prediction of the Exposure to 1763MHz Radiofrequency Radiation Based on Gene Expression Patterns

Lee, Min-Su;Huang, Tai-Qin;Seo, Jeong-Sun;Park, Woong-Yang
- Genomics & Informatics
- /
- v.5 no.3
- /
- pp.102-106
- /
- 2007
Radiofrequency (RF) radiation at the frequency of mobile phones has been not reported to induce cellular responses in in vitro and in vivo models. We exposed HEI-OC1, conditionally-immortalized mouse auditory cells, to RF radiation to characterize cellular responses to 1763 MHz RF radiation. While we could not detect any differences upon RF exposure, whole-genome expression profiling might provide the most sensitive method to find the molecular responses to RF radiation. HEI-OC1 cells were exposed to 1763 MHz RF radiation at an average specific absorption rate (SAR) of 20 W/kg for 24 hr and harvested after 5 hr of recovery (R5), alongside sham-exposed samples (S5). From the whole-genome profiles of mouse neurons, we selected 9 differentially-expressed genes between the S5 and R5 groups using information gain-based recursive feature elimination procedure. Based on support vector machine (SVM), we designed a prediction model using the 9 genes to discriminate the two groups. Our prediction model could predict the target class without any error. From these results, we developed a prediction model using biomarkers to determine the RF radiation exposure in mouse auditory cells with perfect accuracy, which may need validation in in vivo RF-exposure models.
PDF KSCI

Sound-Field Speech Evoked Auditory Brainstem Response in Cochlear-Implant Recipients

Jarollahi, Farnoush;Valadbeigi, Ayub;Jalaei, Bahram;Maarefvand, Mohammad;Zarandy, Masoud Motasaddi;Haghani, Hamid;Shirzhiyan, Zahra
- Korean Journal of Audiology
- /
- v.24 no.2
- /
- pp.71-78
- /
- 2020
Background and Objectives: Currently limited information is available on speech stimuli processing at the subcortical level in the recipients of cochlear implant (CI). Speech processing in the brainstem level is measured using speech-auditory brainstem response (S-ABR). The purpose of the present study was to measure the S-ABR components in the sound-field presentation in CI recipients, and compare with normal hearing (NH) children. Subjects and Methods: In this descriptive-analytical study, participants were divided in two groups: patients with CIs; and NH group. The CI group consisted of 20 prelingual hearing impairment children (mean age=8.90±0.79 years), with ipsilateral CIs (right side). The control group consisted of 20 healthy NH children, with comparable age and sex distribution. The S-ABR was evoked by the 40-ms synthesized /da/ syllable stimulus that was indicated in the sound-field presentation. Results: Sound-field S-ABR measured in the CI recipients indicated statistically significant delayed latencies, than in the NH group. In addition, these results demonstrated that the frequency following response peak amplitude was significantly higher in CI recipients, than in the NH counterparts (p<0.05). Finally, the neural phase locking were significantly lower in CI recipients (p<0.05). Conclusions: The findings of sound-field S-ABR demonstrated that CI recipients have neural encoding deficits in temporal and spectral domains at the brainstem level; therefore, the sound-field S-ABR can be considered an efficient clinical procedure to assess the speech process in CI recipients.
https://doi.org/10.7874/jao.2019.00353 인용

Search Result 313, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)