• Title/Summary/Keyword: 화자확인 시스템

Search Result 122, Processing Time 0.024 seconds

Real-time Eye Contact System Using a Kinect Depth Camera for Realistic Telepresence (Kinect 깊이 카메라를 이용한 실감 원격 영상회의의 시선 맞춤 시스템)

  • Lee, Sang-Beom;Ho, Yo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.4C
    • /
    • pp.277-282
    • /
    • 2012
  • In this paper, we present a real-time eye contact system for realistic telepresence using a Kinect depth camera. In order to generate the eye contact image, we capture a pair of color and depth video. Then, the foreground single user is separated from the background. Since the raw depth data includes several types of noises, we perform a joint bilateral filtering method. We apply the discontinuity-adaptive depth filter to the filtered depth map to reduce the disocclusion area. From the color image and the preprocessed depth map, we construct a user mesh model at the virtual viewpoint. The entire system is implemented through GPU-based parallel programming for real-time processing. Experimental results have shown that the proposed eye contact system is efficient in realizing eye contact, providing the realistic telepresence.

Improvement of User Recognition Rate using Multi-modal Biometrics (다중생체인식 기법을 이용한사용자 인식률 향상)

  • Geum, Myung-Hwan;Lee, Kyu-Won;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.8
    • /
    • pp.1456-1462
    • /
    • 2008
  • In general, it is known a single biometric-based personal authentication has limitation to improve recognition rate due to weakness of individual recognition scheme. The recognition rate of face recognition system can be reduced by environmental factor such as illumination, while speaker verification system does not perform well with added surrounding noise. In this paper, a multi-modal biometric system composed of face and voice recognition system is proposed in order to improve the performance of the individual authentication system. The proposed empirical weight sum rule based on the reliability of the individual authentication system is applied to improve the performance of multi-modal biometrics. Since the proposed system is implemented using JAVA applet with security function, it can be utilized in the field of user authentication on the generic Web.

An Avoiding Technique of Utterance Duplication for Voice-activated Chatbot (음성 기반 챗봇을 위한 중복 발화 회피 기법)

  • Jeon, Won-Pyo;Kim, Hark-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.225-227
    • /
    • 2011
  • 최근 스마트폰 및 게임, 로봇, 어플리케이션 등 다양한 분야에서 음성 기반 채팅 시스템이 활용되고 있다. 하지만 그 성능은 아직 만족스럽지 못하다. 본 논문은 다양한 시스템 발화를 위해 문장의 내용어, 카테고리, 발화시간, 화자 정보 등을 이용하여 직전 발화와 현재 발화를 비교한다. 동일한 발화일 경우 해당 카테고리 내 다른 문장을 발화하여 발화의 다양성을 확보하고, 적용 카테고리가 아닐 경우 댓구를 이용하여 대화를 다른 주제로 유도한다. 실험 결과 중복 발화에 대해 다양한 응답을 확인 할 수 있었다.

Application of AMDF for Improvement of algorithm in estimation sytem of speech source (음원위치 추정 시스템에서 속도향상을 위한 AMDF의 적용)

  • 송도훈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06d
    • /
    • pp.64-67
    • /
    • 1998
  • 원격지간 화상회의 시스템에서 화자의 위치에 따른 카메라 제어를 위해서는 마이크로폰 배렬(Microphone Array)로 수음한 음성신호에 대해 각 마이크로폰간의 빠른 지연시간 추정이 요구된다. 본 연구에서는 음원위치 추정을 위한 지연시간(Time delay) 계산을 위해 AMDF(Average Magnitude Difference Function)를 적용하여 연산시간을 단축시키는데 목적을 두고 있다. 기본의 상호상관함수 (Cross-correlation )알고리즘 과 본 연구에서 적용한 AMDF 알고리즘을 비교하기 위해 SNR 10dB 와 20dB 인 200Hz, 500Hz, 1kHz, 2kHz의 정현파 합성신호와 단음절 음성신호에 대해 시뮬레이션을 행하였다. 시뮬레이션 결과 AMDF 알고리즘의 정확한 지연시간 추정을 확인하였다.

  • PDF

Depth Video Post-processing for Immersive Teleconference (원격 영상회의 시스템을 위한 깊이 영상 후처리 기술)

  • Lee, Sang-Beom;Yang, Seung-Jun;Ho, Yo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.6A
    • /
    • pp.497-502
    • /
    • 2012
  • In this paper, we present an immersive videoconferencing system that enables gaze correction between users in the internet protocol TV (IPTV) environment. The proposed system synthesizes the gaze corrected images using the depth estimation and the virtual view synthesis algorithms as one of the most important techniques of 3D video system. The conventional processes, however, causes several problems, especially temporal inconsistency of a depth video. This problem leads to flickering artifacts discomforting viewers. Therefore, in order to reduce the temporal inconsistency problem, we exploit the joint bilateral filter which is extended to the temporal domain. In addition, we apply an outlier reduction operation in the temporal domain. From experimental results, we have verified that the proposed system is sufficient to generate the natural gaze-corrected image and realize immersive videoconferencing.

Implementation of Speaker Independent Speech Recognition System Using Independent Component Analysis based on DSP (독립성분분석을 이용한 DSP 기반의 화자 독립 음성 인식 시스템의 구현)

  • 김창근;박진영;박정원;이광석;허강인
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.2
    • /
    • pp.359-364
    • /
    • 2004
  • In this paper, we implemented real-time speaker undependent speech recognizer that is robust in noise environment using DSP(Digital Signal Processor). Implemented system is composed of TMS320C32 that is floating-point DSP of Texas Instrument Inc. and CODEC for real-time speech input. Speech feature parameter of the speech recognizer used robust feature parameter in noise environment that is transformed feature space of MFCC(met frequency cepstral coefficient) using ICA(Independent Component Analysis) on behalf of MFCC. In recognition result in noise environment, we hew that recognition performance of ICA feature parameter is superior than that of MFCC.

Speech Recognition based Message Transmission System for the Hearing Impaired Persons (청각장애인을 위한 음성인식 기반 메시지 전송 시스템)

  • Kim, Sung-jin;Cho, Kyoung-woo;Oh, Chang-heon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.12
    • /
    • pp.1604-1610
    • /
    • 2018
  • The speech recognition service is used as an ancillary means of communication by converting and visualizing the speaker's voice into text to the hearing impaired persons. However, in open environments such as classrooms and conference rooms it is difficult to provide speech recognition service to many hearing impaired persons. For this, a method is needed to efficiently provide it according to the surrounding environment. In this paper, we propose a system that recognizes the speaker's voice and transmits the converted text to many hearing impaired persons as messages. The proposed system uses the MQTT protocol to deliver messages to many users at the same time. The end-to-end delay was measured to confirm the service delay of the proposed system according to the QoS level setting of the MQTT protocol. As a result of the measurement, the delay between the most reliable Qos level 2 and 0 is 111ms, confirming that it does not have a great influence on conversation recognition.

A Study on Background Speaker Model Design for Portable Speaker Verification Systems (휴대용 화자확인시스템을 위한 배경화자모델 설계에 관한 연구)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.35-43
    • /
    • 2003
  • General speaker verification systems improve their recognition performances by normalizing log likelihood ratio, using a speaker model and its background speaker model that are required to be verified. So these systems rely heavily on the availability of much speaker independent databases for background speaker model design. This constraint, however, may be a burden in practical and portable devices such as palm-top computers or wireless handsets which place a premium on computations and memory. In this paper, new approach for the GMM-based background model design used in portable speaker verification system is presented when the enrollment data is available. This approach is to modify three parameters of GMM speaker model such as mixture weights, means and covariances along with reduced mixture order. According to the experiment on a 20 speaker population from YOHO database, we found that this method had a promise of effective use in a portable speaker verification system.

  • PDF

A Smart doorlock with recognition of facial and speaker (안면 인식과 화자 인식을 이용한 스마트 도어락)

  • Kim, Tae Kyung;Kwon, Yong Guk;Jeong, Jae Eun;Jeon, Gwang-Gil
    • Annual Conference of KIPS
    • /
    • 2017.11a
    • /
    • pp.569-570
    • /
    • 2017
  • 현재 가장 많이 사용되는 비밀번호 도어락 시스템은 외부 노출의 가능성 때문에 범죄의 위험성이 크다. 이러한 방식을 보완하기 위하여 안면 인식과 음성 인식 두 가지 기술을 결합하여 보안성을 높이는 기술을 구현하였다. 이에 본 논문은 아두이노를 사용하여 사람을 확인하고 인증하는 모듈인 보이저 모듈, 음성인식과 화자인식을 지원하는 아두이노와 그의 음성인식 모듈 Easy VR을 제시한다. 두 가지 기술의 결합으로 보안성을 높여 강력 범죄를 예방한다.

Multi channel far field speaker verification using teacher student deep neural networks (교사 학생 심층신경망을 활용한 다채널 원거리 화자 인증)

  • Jung, Jee-weon;Heo, Hee-Soo;Shim, Hye-jin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.483-488
    • /
    • 2018
  • Far field input utterance is one of the major causes of performance degradation of speaker verification systems. In this study, we used teacher student learning framework to compensate for the performance degradation caused by far field utterances. Teacher student learning refers to training the student deep neural network in possible performance degradation condition using the teacher deep neural network trained without such condition. In this study, we use the teacher network trained with near distance utterances to train the student network with far distance utterances. However, through experiments, it was found that performance of near distance utterances were deteriorated. To avoid such phenomenon, we proposed techniques that use trained teacher network as initialization of student network and training the student network using both near and far field utterances. Experiments were conducted using deep neural networks that input raw waveforms of 4-channel utterances recorded in both near and far distance. Results show the equal error rate of near and far-field utterances respectively, 2.55 % / 2.8 % without teacher student learning, 9.75 % / 1.8 % for conventional teacher student learning, and 2.5 % / 2.7 % with proposed techniques.