• Title/Summary/Keyword: robust speech recognition

Search Result 224, Processing Time 0.022 seconds

Proposal of speaker change detection system considering speaker overlap (화자 겹침을 고려한 화자 전환 검출 시스템 제안)

  • Park, Jisu;Yun, Young-Sun;Cha, Shin;Park, Jeon Gue
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.466-472
    • /
    • 2021
  • Speaker Change Detection (SCD) refers to finding the moment when the main speaker changes from one person to the next in a speech conversation. In speaker change detection, difficulties arise due to overlapping speakers, inaccuracy in the information labeling, and data imbalance. To solve these problems, TIMIT corpus widely used in speech recognition have been concatenated artificially to obtain a sufficient amount of training data, and the detection of changing speaker has performed after identifying overlapping speakers. In this paper, we propose an speaker change detection system that considers the speaker overlapping. We evaluated and verified the performance using various approaches. As a result, a detection system similar to the X-Vector structure was proposed to remove the speaker overlapping region, while the Bi-LSTM method was selected to model the speaker change system. The experimental results show a relative performance improvement of 4.6 % and 13.8 % respectively, compared to the baseline system. Additionally, we determined that a robust speaker change detection system can be built by conducting related studies based on the experimental results, taking into consideration text and speaker information.

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.

A Study on the Weight Allocation Method of Humanist Input Value and Multiplex Modality using Tacit Data (암묵 데이터를 활용한 인문학 인풋값과 다중 모달리티의 가중치 할당 방법에 관한 연구)

  • Lee, Won-Tae;Kang, Jang-Mook
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.157-163
    • /
    • 2014
  • User's sensitivity is recognized as a very important parameter for communication between company, government and personnel. Especially in many studies, researchers use voice tone, voice speed, facial expression, moving direction and speed of body, and gestures to recognize the sensitivity. Multiplex modality is more precise than single modality however it has limited recognition rate and overload of data processing according to multi-sensing also an excellent algorithm is needed to deduce the sensing value. That is as each modality has different concept and property, errors might be happened to convert the human sensibility to standard values. To deal with this matter, the sensibility expression modality is needed to be extracted using technologies like analyzing of relational network, understanding of context and digital filter from multiplex modality. In specific situation to recognize the sensibility if the priority modality and other surrounding modalities are processed to implicit values, a robust system can be composed in comparison to the consuming of computer resource. As a result of this paper, it is proposed how to assign the weight of multiplex modality using implicit data.

Robust Real-time Pose Estimation to Dynamic Environments for Modeling Mirror Neuron System (거울 신경 체계 모델링을 위한 동적 환경에 강인한 실시간 자세추정)

  • Jun-Ho Choi;Seung-Min Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.583-588
    • /
    • 2024
  • With the emergence of Brain-Computer Interface (BCI) technology, analyzing mirror neurons has become more feasible. However, evaluating the accuracy of BCI systems that rely on human thoughts poses challenges due to their qualitative nature. To harness the potential of BCI, we propose a new approach to measure accuracy based on the characteristics of mirror neurons in the human brain that are influenced by speech speed, depending on the ultimate goal of movement. In Chapter 2 of this paper, we introduce mirror neurons and provide an explanation of human posture estimation for mirror neurons. In Chapter 3, we present a powerful pose estimation method suitable for real-time dynamic environments using the technique of human posture estimation. Furthermore, we propose a method to analyze the accuracy of BCI using this robotic environment.