• Title/Summary/Keyword: Speaker Detection

Search Result 108, Processing Time 0.027 seconds

Optimization of State-Based Real-Time Speech Endpoint Detection Algorithm (상태변수 기반의 실시간 음성검출 알고리즘의 최적화)

  • Kim, Su-Hwan;Lee, Young-Jae;Kim, Young-Il;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.137-143
    • /
    • 2010
  • In this paper, a speech endpoint detection algorithm is proposed. The proposed algorithm is a kind of state transition-based ones for speech detection. To reject short-duration acoustic pulses which can be considered noises, it utilizes duration information of all detected pulses. For the optimization of parameters related with pulse lengths and energy threshold to detect speech intervals, an exhaustive search scheme is adopted while speech recognition rates are used as its performance index. Experimental results show that the proposed algorithm outperforms the baseline state-based endpoint detection algorithm. At 5 dB input SNR for the beamforming input, the word recognition accuracies of its outputs were 78.5% for human voice noises and 81.1% for music noises.

  • PDF

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Development of a Cost-Effective Tele-Robot System Delivering Speaker's Affirmative and Negative Intentions (화자의 긍정·부정 의도를 전달하는 실용적 텔레프레즌스 로봇 시스템의 개발)

  • Jin, Yong-Kyu;You, Su-Jeong;Cho, Hye-Kyung
    • The Journal of Korea Robotics Society
    • /
    • v.10 no.3
    • /
    • pp.171-177
    • /
    • 2015
  • A telerobot offers a more engaging and enjoyable interaction with people at a distance by communicating via audio, video, expressive gestures, body pose and proxemics. To provide its potential benefits at a reasonable cost, this paper presents a telepresence robot system for video communication which can deliver speaker's head motion through its display stanchion. Head gestures such as nodding and head-shaking can give crucial information during conversation. We also can assume a speaker's eye-gaze, which is known as one of the key non-verbal signals for interaction, from his/her head pose. In order to develop an efficient head tracking method, a 3D cylinder-like head model is employed and the Harris corner detector is combined with the Lucas-Kanade optical flow that is known to be suitable for extracting 3D motion information of the model. Especially, a skin color-based face detection algorithm is proposed to achieve robust performance upon variant directions while maintaining reasonable computational cost. The performance of the proposed head tracking algorithm is verified through the experiments using BU's standard data sets. A design of robot platform is also described as well as the design of supporting systems such as video transmission and robot control interfaces.

Invisible Messenger: A System to Whisper in a Person′s Ear Remotely by integrating Visual Tracking and Speaker Array

  • Mizoguchi, Hiroshi;Kanamori, Tomohiko;Okabe, Kosuke;Hiraoka, Kazuyuki;Tanaka, Masaru;Shigehara, Takaomi;Mishima, Taketoshi
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1897-1900
    • /
    • 2002
  • This paper proposes a novel computer-human interface, named invisible Messenger. It integrates face detection and tracking, and speaker array signal processing. By speaker array it is possible to form acoustic focus at the arbitrary location that is measured by the face tracking. Thus the proposed system can whisper in a person's ear as if an invisible virtual messenger were standing by the person. Not only speculative discussion, the authors have implemented a working prototype system based upon the proposed idea. This paper also describes about this prototype. In order to confirm effectiveness of the proposed idea, the authors conduct experiments using the implemented system. Experimental results demonstrate the effectivenss of the proposed idea.

  • PDF

Bidirectional Alarm Equipment for Protection for Trackside Worker using Bone-anchored Speaker

  • Hwang, Jong-Gyu;Jo, Hyun-Jeong
    • International Journal of Safety
    • /
    • v.10 no.1
    • /
    • pp.36-40
    • /
    • 2011
  • Personnel maintaining or repairing the railway tracks or signaling facilities around tracks may experience the sensory disorder when doing maintenance works at the trackside of railway for long time. In this case personnel maintaining at the trackside may collide with the train since they cannot recognize the approach of motor-car although it approaches to the vicinity of maintenance workplace because of the sensory block phenomenon occurred due to their long hours of continued monotonous maintenance work. In order to prevent such motor-car accidents that may occur because railway track workers are unable to recognize the approaching train, the safety alarm equipment is developed to make the approaching motor-car send radio signals and bidirectional detection mechanism between approaching train and trackside personnel. It shows the possibility of utilization in various forms of safety equipment for workers only to the safety helmet to be worn by the maintenance workers while using the configuration of transmitting/receiving sides. In the paper it is represented new alarm equipment, which is the bone-anchored speaker-based safety helmet to be worn by the maintenance workers.

  • PDF

Speech Interface with Echo Canceller and Barge- In Functionality for Telematic System (텔레매틱스 시스템을 위한 반향제거 및 Barge-In 기능을 갖는 음성인터페이스)

  • Kim, Jun;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.483-490
    • /
    • 2009
  • In this paper, we develop a speech interface that has acoustic echo cancelling and barge-in functionalities in the car environment. In the echo canceller, DT (Double-Talk) detection algorithm using the correlation coefficients between reference and desired signals can make DT detection errors often in the background noise. We reduce the DT detection errors by using the average power of noise and echo estimated from the input signal. In addition, to make it possible for drivers to give speech command to the system by interrupting the speaker output, barge-in functionality is implemented with the combination of DT detection and appropriate gain control of the speaker output. Through the computer simulation with the assumed car environment and experiment in the real laboratory environment, implemented speech interface has shown good performance in removing acoustic echo signals in the noisy environment with proper operation of barge-in functionality.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF

Some effects of audio-visual speech in perceiving Korean

  • Kim, Jee-Sun;Davis, Chris
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.335-342
    • /
    • 1999
  • The experiments reported here investigated whether seeing a speaker's face (visible speech) affects the perception and memory of Korean speech sounds. In order to exclude the possibility of top-down, knowledge-based influences on perception and memory, the experiments tested people with no knowledge of Korean. The first experiment examined whether visible speech (Auditory and Visual - AV) assists English native speakers (with no knowledge of Korean) in the detection of a syllable within a Korean speech phrase. It was found that a syllable was more likely to be detected within a phrase when the participants could see the speaker's face. The second experiment investigated whether English native speakers' judgments about the duration of a Korean phrase would be affected by visible speech. It was found that in the AV condition participant's estimates of phrase duration were highly correlated with the actual durations whereas those in the AO condition were not. The results are discussed with respect to the benefits of communication with multimodal information and future applications.

  • PDF

Enhancement of Authentication Performance based on Multimodal Biometrics for Android Platform (안드로이드 환경의 다중생체인식 기술을 응용한 인증 성능 개선 연구)

  • Choi, Sungpil;Jeong, Kanghun;Moon, Hyeonjoon
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.3
    • /
    • pp.302-308
    • /
    • 2013
  • In this research, we have explored personal authentication system through multimodal biometrics for mobile computing environment. We have selected face and speaker recognition for the implementation of multimodal biometrics system. For face recognition part, we detect the face with Modified Census Transform (MCT). Detected face is pre-processed through eye detection module based on k-means algorithm. Then we recognize the face with Principal Component Analysis (PCA) algorithm. For speaker recognition part, we extract features using the end-point of voice and the Mel Frequency Cepstral Coefficient (MFCC). Then we verify the speaker through Dynamic Time Warping (DTW) algorithm. Our proposed multimodal biometrics system shows improved verification rate through combining two different biometrics described above. We implement our proposed system based on Android environment using Galaxy S hoppin. Proposed system presents reduced false acceptance ratio (FAR) of 1.8% which shows improvement from single biometrics system using the face and the voice (presents 4.6% and 6.7% respectively).

Development of a Real-time Voice Recognition Dialing System; (실시간 음성인식 다이얼링 시스템 개발)

  • 이세웅;최승호;이미숙;김흥국;오광철;김기철;이황수
    • Information and Communications Magazine
    • /
    • v.10 no.10
    • /
    • pp.22-29
    • /
    • 1993
  • This paper describes development of a real-time voice recognition dialing system which can recognize around one hundred word vocabularies in speaker independent mode. The voice recognition algorithm is implemented on a DSP board with a telephone interface plugged in an IBM PC AT/486. In the DSP board, procedures for feature extraction, vector quantization(VQ), and end-point detection are performed simultaneously in every 10msec frame interval to satisfy real-time constraints after the word starting point detection. In addition, we optimize the VQ codebook size and the end-point detection procedure to reduce recognition time and memory requirement. The demonstration system is being displayed in MOBILAB of Korea Mobile Telecom at the Taejon EXPO '93.

  • PDF