• Title/Summary/Keyword: Speaker identification systems

Search Result 28, Processing Time 0.019 seconds

Speaker Identification using Incremental Neural Network and LPCC (Incremental Neural Network 과 LPCC을 이용한 화자인식)

  • 허광승;박창현;이동욱;심귀보
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.341-344
    • /
    • 2002
  • 음성은 화자들의 특징을 가지고 있다. 이 논문에서는 신경망에 기초한 Incremental Learning을 이용하여 화자인식시스템을 소개한다. 컴퓨터를 통하여 녹음된 문장들은 FFT를 거치면서 Frequency 영역으로 바뀌고, 모음들의 특징을 가지고 있는 Formant를 이용하여 모음들을 추출한다. 추출된 모음들은 LPC처리를 통하여 화자의 특성을 가지고 있는 Coefficient값들을 얻는다. LPCC과정과 Vector Quantization을 통해 10개의 특징 점들은 학습을 위한 Input으로 들어가고 화자 수에 따라 증가되는 Hidden Layer와 Output Layer들을 가지고 있는 신경망을 통해 화자인식을 수행한다.

Lie Detection Technique using Video from the Ratio of Change in the Appearance

  • Hossain, S.M. Emdad;Fageeri, Sallam Osman;Soosaimanickam, Arockiasamy;Kausar, Mohammad Abu;Said, Aiman Moyaid
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.165-170
    • /
    • 2022
  • Lying is nuisance to all, and all liars knows it is nuisance but still keep on lying. Sometime people are in confusion how to escape from or how to detect the liar when they lie. In this research we are aiming to establish a dynamic platform to identify liar by using video analysis especially by calculating the ratio of changes in their appearance when they lie. The platform will be developed using a machine learning algorithm along with the dynamic classifier to classify the liar. For the experimental analysis the dataset to be processed in two dimensions (people lying and people tell truth). Both parameter of facial appearance will be stored for future identification. Similarly, there will be standard parameter to be built for true speaker and liar. We hope this standard parameter will be able to diagnosed a liar without a pre-captured data.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

Utilization of age information for speaker verification using multi-task learning deep neural networks (멀티태스크 러닝 심층신경망을 이용한 화자인증에서의 나이 정보 활용)

  • Kim, Ju-ho;Heo, Hee-Soo;Jung, Jee-weon;Shim, Hye-jin;Kim, Seung-Bin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.593-600
    • /
    • 2019
  • The similarity in tones between speakers can lower the performance of speaker verification. To improve the performance of speaker verification systems, we propose a multi-task learning technique using deep neural network to learn speaker information and age information. Multi-task learning can improve generalization performances, because it helps deep neural networks to prevent hidden layers from overfitting into one task. However, we found in experiments that learning of age information does not work well in the process of learning the deep neural network. In order to improve the learning, we propose a method to dynamically change the objective function weights of speaker identification and age estimation in the learning process. Results show the equal error rate based on RSR2015 evaluation data set, 6.91 % for the speaker verification system without using age information, 6.77 % using age information only, and 4.73 % using age information when weight change technique was applied.

Modified HMM Decoder based on Observation Confidence for Speaker Identification (화자인식을 위한 관측신뢰도 기반 변형된 HMM 디코더)

  • Tariquzzaman, Md.;Min, So-Hui;Kim, Jin-Yeong;Na, Seung-Yu
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.443-446
    • /
    • 2007
  • 음성신호는 잡음 또는 전송 채널의 특성에 의하여 왜곡되고, 왜곡된 음성은 음성인식 및 화자인식의 성능을 크게 저하시킨다. 이러한 문제점을 극복하기 위해 본 논문에서는 Gaussian mixture model (GMM)에 적용된 신호대잡음비 (SNR)기반 신뢰도 가중 기법[1][2]을 Hidden Markov model(HMM) 디코더에 변형하여 적용하였다. HMM 디코더 변형은 HMM 상태별 관측확률을 논문 [1]에서 제시된 신뢰도로 가중함으로써 이루어졌다. 제안한 방법의 성능을 확인하기 위해 ETRI에서 만든 한국어 화자인식용 휴대폰 음성 DB를 사용하여 문맥종속 화자식별 실험을 하였다. 실험결과 기존 방법에 비해 제안한 방법의 화자인식률이 크게 향상됨을 확인 할 수 있었다.

  • PDF

Performance Comparison by Characteristic Parameter of Speaker Identification System using Neural Networks (신경회로망을 이용한 화자식별 시스템의 특징 파라미터에 따른 성능비교)

  • 정재룡;유재훈;배현;전병희;김성신
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.345-348
    • /
    • 2002
  • 음성인식 기술은 크게 음성인식과 화자인식 기술의 두 가지로 분류된다. 현재는 음성인식 기술이 널리 연구되고 있지만 점차 화자인식 기술의 중요성이 대두되고 있다. 본 논문에서는 화자인식 기술의 한 가지 분류로 임의 화자를 식별하기 위한 화자식별 기술을 연구 대상으로 하고 있으며, 신경회로망을 이용한 화자식별 시스템의 특징 추출 방법을 제시하고 그에 따른 성능을 비교하고 있다. 식별 단계에서 26명의 78개의 음성 샘플을 신경회로망의 역전파 알고리듬을 이용하여 학습하고, 테스트용으로 한 화자의 음성샘플이 사용되어 식별된다. 신경회로망의 입력 변수는 특징 파라미터로 선형예측계수, Mel-주파수 켑스트럼계수와 웨이블릿을 이용한 켑스트럼 계수를 사용하였다. 그 결과로써 화자식별 시스템의 신경회로망 모델2의 입력으로 혼합된 특징 파라미터를 사용한 경우가 다른 파라미터들을 사용한 경우와 비교하여 8.46~21.53%의 차를 가지고 가장 좋은 성능을 나타내었다.

Design of the broadband and compact phase-calibrator for array microphones (어레이 마이크로폰용 광대역 소형 위상교정기의 설계)

  • Ju, Hyeong-Sick;Kim, Yang-Hann
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.1032-1035
    • /
    • 2004
  • Pressure distribution is measured by way microphones to identify noise sources in the space. For example, beam-forming method or acoustic holography use phase information to identify the source. Therefore, the phase is significant information to correctly identify the source position. However, due to the microphone characteristics and measuring systems, measured signals always have errors, which make the identification difficult. Therefore, phase calibration of microphones is needed. Duct and speaker systems are generally used as calibrators. Acoustic characteristics of the calibrator are, of course, functions of many Parameters of the system: i.e. duct size, frequency, and microphone spacing. In this paper, design parameters which effect on the performance and size of the calibrators are considered. Then the parameters would be applied to design and real product of the phase-calibrator.

  • PDF

Study on development of the remote control door lock system including speeker verification function in real time (화자 인증 기능이 포함된 실시간 원격 도어락 제어 시스템 개발에 관한 연구)

  • Kwon, Soon-Ryang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.714-719
    • /
    • 2005
  • The paper attempts to design and implement the system which can remotely check visitors' speech or Image by a mobile phone. This system is designed to recognize who a visitor is through the automatic calling service, not through a short message, via the mobile phone, even when the home owner is outside. In general, door locks are controlled through the home Server, but it is more effective to control door locks by using DTMF signal from a real-time point of view. The technology suggested in this paper makes it possible to communicate between the visiter and the home owner by making a phone call to tile home owner's mobile phone automatically when the visiter visits the house even if the home owner is outside, and if necessary, it allows for the home owner to control the door lock remotely. Thanks to the system, the home owner is not restricted by time or space for checking the visitor's identification and controlling the door lock. In addition, the security system is improved by changing from the existing password form to the combination of password and speaker verification lot the verification procedure required for controlling the door lock and setting the environment under consideration of any disadvantages which may occur when the mobile Phone is lost. Also, any existing problems such as reconnection to tile network for controlling tile door lock are solved by controlling the door lock in real time by use of DTMF signal while on the phone.