Search | Korea Science

Park, Jisu;Yun, Young-Sun;Cha, Shin;Park, Jeon Gue
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.466-472
- /
- 2021
Speaker Change Detection (SCD) refers to finding the moment when the main speaker changes from one person to the next in a speech conversation. In speaker change detection, difficulties arise due to overlapping speakers, inaccuracy in the information labeling, and data imbalance. To solve these problems, TIMIT corpus widely used in speech recognition have been concatenated artificially to obtain a sufficient amount of training data, and the detection of changing speaker has performed after identifying overlapping speakers. In this paper, we propose an speaker change detection system that considers the speaker overlapping. We evaluated and verified the performance using various approaches. As a result, a detection system similar to the X-Vector structure was proposed to remove the speaker overlapping region, while the Bi-LSTM method was selected to model the speaker change system. The experimental results show a relative performance improvement of 4.6 % and 13.8 % respectively, compared to the baseline system. Additionally, we determined that a robust speaker change detection system can be built by conducting related studies based on the experimental results, taking into consideration text and speaker information.
https://doi.org/10.7776/ASK.2021.40.5.466 인용 PDF KSCI

Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.6
- /
- pp.566-571
- /
- 2009
This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.
https://doi.org/10.7776/ASK.2009.28.6.566 인용 PDF KSCI

Lee, Won-Tae;Kang, Jang-Mook
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.14 no.4
- /
- pp.157-163
- /
- 2014
User's sensitivity is recognized as a very important parameter for communication between company, government and personnel. Especially in many studies, researchers use voice tone, voice speed, facial expression, moving direction and speed of body, and gestures to recognize the sensitivity. Multiplex modality is more precise than single modality however it has limited recognition rate and overload of data processing according to multi-sensing also an excellent algorithm is needed to deduce the sensing value. That is as each modality has different concept and property, errors might be happened to convert the human sensibility to standard values. To deal with this matter, the sensibility expression modality is needed to be extracted using technologies like analyzing of relational network, understanding of context and digital filter from multiplex modality. In specific situation to recognize the sensibility if the priority modality and other surrounding modalities are processed to implicit values, a robust system can be composed in comparison to the consuming of computer resource. As a result of this paper, it is proposed how to assign the weight of multiplex modality using implicit data.
https://doi.org/10.7236/JIIBC.2014.14.4.157 인용 PDF KSCI

Jun-Ho Choi;Seung-Min Park
- The Journal of the Korea institute of electronic communication sciences
- /
- v.19 no.3
- /
- pp.583-588
- /
- 2024
With the emergence of Brain-Computer Interface (BCI) technology, analyzing mirror neurons has become more feasible. However, evaluating the accuracy of BCI systems that rely on human thoughts poses challenges due to their qualitative nature. To harness the potential of BCI, we propose a new approach to measure accuracy based on the characteristics of mirror neurons in the human brain that are influenced by speech speed, depending on the ultimate goal of movement. In Chapter 2 of this paper, we introduce mirror neurons and provide an explanation of human posture estimation for mirror neurons. In Chapter 3, we present a powerful pose estimation method suitable for real-time dynamic environments using the technique of human posture estimation. Furthermore, we propose a method to analyze the accuracy of BCI using this robotic environment.
https://doi.org/10.13067/JKIECS.2024.19.3.583 인용 PDF