Search | Korea Science

Kwon, Ho-Min
- Journal of the Institute of Convergence Signal Processing
- /
- v.12 no.2
- /
- pp.113-118
- /
- 2011
In this paper, we develop a speaker gender classification technique using collaborative sensor fusion for use in a wireless sensor network. The distributed sensor nodes remove the unwanted input data using the BER(Band Energy Ration) based voice activity detection, process only the relevant data, and transmit the hard labeled decisions to the fusion center where a global decision fusion is carried out. This takes advantages of power consumption and network resource management. The Bayesian sensor fusion and the global weighting decision fusion methods are proposed to achieve the gender classification. As the number of the sensor nodes varies, the Bayesian sensor fusion yields the best classification accuracy using the optimal operating points of the ROC(Receiver Operating Characteristic) curves_ For the weights used in the global decision fusion, the BER and MCL(Mutual Confidence Level) are employed to effectively combined at the fusion center. The simulation results show that as the number of the sensor nodes increases, the classification accuracy was even more improved in the low SNR(Signal to Noise Ration) condition.
PDF KSCI

Kim, Do-Hyung;Kim, Hye-Jin;Bae, Kyung-Sook;Yun, Woo-Han;Ban, Kyu-Dae;Park, Beom-Chul;Yoon, Ho-Sub
- The Journal of Korea Robotics Society
- /
- v.3 no.3
- /
- pp.165-175
- /
- 2008
For an advanced intelligent service, the need of HRI technology has recently been increasing and the technology has been also improved. However, HRI components have been evaluated under stable and controlled laboratory environments and there are no evaluation results of performance in real environments. Therefore, robot service providers and users have not been getting sufficient information on the level of current HRI technology. In this paper, we provide the evaluation results of the performance of the HRI components on the robot platforms providing actual services in pilot service sites. For the evaluation, we select face detection component, speaker gender classification component and sound localization component as representative HRI components closing to the commercialization. The goal of this paper is to provide valuable information and reference performance on appling the HRI components to real robot environments.
PDF

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.4
- /
- pp.775-780
- /
- 2013
This paper proposes a speaker-dependent speech recognition algorithm which can classify the gender for male and female speakers in white noise and car noise, using a neural network. The proposed speech recognition algorithm is trained by the neural network to recognize the gender for male and female speakers, using LPC (Linear Predictive Coding) cepstrum coefficients. In the experiment results, the maximal improvement of total speech recognition rate is 96% for white noise and 88% for car noise, respectively, after trained a total of six neural networks. Finally, the proposed speech recognition algorithm is compared with the results of a conventional speech recognition algorithm in the background noisy environment.
https://doi.org/10.6109/jkiice.2013.17.4.775 인용 PDF KSCI

Jonghwan Na;Bowon Lee
- Phonetics and Speech Sciences
- /
- v.15 no.2
- /
- pp.43-51
- /
- 2023
In this paper, we propose an approach for dialect classification based on the speed and pause of speech utterances as well as the age and gender of the speakers. Dialect classification is one of the important techniques for speech analysis. For example, an accurate dialect classification model can potentially improve the performance of speaker or speech recognition. According to previous studies, research based on deep learning using Mel-Frequency Cepstral Coefficients (MFCC) features has been the dominant approach. We focus on the acoustic differences between regions and conduct dialect classification based on the extracted features derived from the differences. In this paper, we propose an approach of extracting underexplored additional features, namely the speed and the pauses of speech utterances along with the metadata including the age and the gender of the speakers. Experimental results show that our proposed approach results in higher accuracy, especially with the speech rate feature, compared to the method only using the MFCC features. The accuracy improved from 91.02% to 97.02% compared to the previous method that only used MFCC features, by incorporating all the proposed features in this paper.
https://doi.org/10.13064/KSSS.2023.15.2.043 인용 PDF