• Title/Summary/Keyword: Speaker identification

Search Result 152, Processing Time 0.027 seconds

Design of the broadband and compact phase-calibrator for array microphones (어레이 마이크로폰용 광대역 소형 위상교정기의 설계)

  • Ju, Hyeong-Sick;Kim, Yang-Hann
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.1032-1035
    • /
    • 2004
  • Pressure distribution is measured by way microphones to identify noise sources in the space. For example, beam-forming method or acoustic holography use phase information to identify the source. Therefore, the phase is significant information to correctly identify the source position. However, due to the microphone characteristics and measuring systems, measured signals always have errors, which make the identification difficult. Therefore, phase calibration of microphones is needed. Duct and speaker systems are generally used as calibrators. Acoustic characteristics of the calibrator are, of course, functions of many Parameters of the system: i.e. duct size, frequency, and microphone spacing. In this paper, design parameters which effect on the performance and size of the calibrators are considered. Then the parameters would be applied to design and real product of the phase-calibrator.

  • PDF

Lie Detection Technique using Video from the Ratio of Change in the Appearance

  • Hossain, S.M. Emdad;Fageeri, Sallam Osman;Soosaimanickam, Arockiasamy;Kausar, Mohammad Abu;Said, Aiman Moyaid
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.165-170
    • /
    • 2022
  • Lying is nuisance to all, and all liars knows it is nuisance but still keep on lying. Sometime people are in confusion how to escape from or how to detect the liar when they lie. In this research we are aiming to establish a dynamic platform to identify liar by using video analysis especially by calculating the ratio of changes in their appearance when they lie. The platform will be developed using a machine learning algorithm along with the dynamic classifier to classify the liar. For the experimental analysis the dataset to be processed in two dimensions (people lying and people tell truth). Both parameter of facial appearance will be stored for future identification. Similarly, there will be standard parameter to be built for true speaker and liar. We hope this standard parameter will be able to diagnosed a liar without a pre-captured data.

Face Recognition and Preprocessing Technique for Speaker Identification in hard of hearing broadcasting (청각장애인용 방송에서 화자 식별을 위한 얼굴 인식 알고리즘 및 전처리 연구)

  • Kim, Nayeon;Cho, Sukhee;Bae, Byungjun;Ahn, ChungHyun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.450-452
    • /
    • 2020
  • 본 논문에서는 딥러닝 기반 얼굴 인식 알고리즘에 대해 살펴보고, 이를 청각장애인용 방송에서 화자를 식별하고 감정 표현 자막을 표출하기 위한 배우 얼굴 인식 기술에 적용하고자 한다. 우선, 배우 얼굴 인식을 위한 방안으로 원샷 학습 기반의 딥러닝 얼굴 인식 알고리즘인 ResNet-50 기반 VGGFace2 모델의 구성에 대해 이해하고, 이러한 모델을 기반으로 다양한 전처리 방식을 적용하여 정확도를 측정함으로써 실제 청각장애인용 방송에서 배우 얼굴을 인식하기 위한 방안에 대해 모색한다.

  • PDF

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.

A Perceptual Study on the Temporal Cues of English Intervocalic Plosives for Various Groups Depending on Background Language, English Listening Ability, and Age (언어별, 연령별, 수준별 집단에 의한 모음간 영어 파열음 유/무성 인지 연구)

  • Kang, Seok-Han
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.133-145
    • /
    • 2006
  • In order to understand the various groups' perceptual pattern in both VCV trochee and iambus, this study examined the identification correctness and cue robustness for the unit intervals in light of background language, age, and English listening ability. The 4 groups of Native Speakers of English, Korean College Students of High Listening Achievement, Korean College Students of Low Listening Achievement, and Korean Elementary Students took part in the experiments. Tokens of $/d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per$ in trochee and of $/{\eth}{\partial}\;p{\ae}d,\;{\eth}{\partial}\;b{\ae}d,\;{\eth}{\partial}\;t{\ae}d,\;{\eth}{\partial}\;d{\ae}d,\;{\eth}{\partial}\;k{\ae}d,\;{\eth}{\partial}\;g{\ae}d/$ in iambus were extracted and modified into experimental signals composed of two digits(voiced-1, voiceless-0) by following the temporal intervals, in which the signals consisted of preceding vowel, closure, VOT, and post-vowel. In the first experiment of identification correctness in VCV iambus environment, all groups showed almost 100% correctness rate, while in trochee environment all groups were different(native speaker 87%, college high 74%, college low 70%, elementary 65%). In the second experiment of cue robustness, all groups showed the similar perceptual pattern in both environments. There was the order of robustness cues in VCV trochee: pre-vowel ${\gg}$ closure ${\gg}$ VOT ${\gg}$ post-vowel, while the order in VCV iambus: VOT ${\gg}$ post-vowel ${\gg}$ closure ${\gg}$ pre-vowel. In some condition, however, we found moderately different perceptual pattern depending on language, age and listening level.

  • PDF

Pitch Period Detection Algorithm Using Modified AMDF (변형된 AMDF를 이용한 피치 주기 검출 알고리즘)

  • Seo Hyun-Soo;Bae Sang-Bum;Kim Nam-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.1
    • /
    • pp.23-28
    • /
    • 2006
  • Pitch period that is a important factor in speech signal processing is used in various applications such as speech recognition, speaker identification, speech analysis and synthesis. So many pitch detection algorithms have been studied until now. AMDF which is one of pitch period detection algorithms chooses the time interval from valley point to valley point as pitch period. In selection of valley point to detect pitch period, complexity of the algorithm is increased. So in this paper we proposed the simple algorithm using rotation transform of AMDF that detects global minimum valley point as pitch period of speech signal and compared it with existing methods through simulation.

Improving A Text Independent Speaker Identification System By Frame Level Likelihood Normalization (프레임단위유사도정규화를 이용한 문맥독립화자식별시스템의 성능 향상)

  • 김민정;석수영;정현열;정호열
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.487-490
    • /
    • 2001
  • 본 논문에서는 기존의 Caussian Mixture Model을 이용한 실시간문맥독립화자인식시스템의 성능을 향상시키기 위하여 화자검증시스템에서 좋은 결과를 나타내는 유사도정규화 ( Likelihood Normalization )방법을 화자식별시스템에 적용하여 시스템을 구현하였으며, 인식실험한 결과에 대해 보고한다. 시스템은 화자모델생성단과 화자식별단으로 구성하였으며, 화자모델생성단에서는, 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian Mixture Model)을 이용하여 화자모델을 작성하였으며. GMM의 파라미터를 최적화하기 위하여 MLE(Maximum Likelihood Estimation)방법을 사용하였다. 화자식별단에서는 학습된 데이터와 테스트용 데이터로부터 ML(Maximum Likelihood)을 이용하여 프레임단위로 유사도를 계산하였다. 계산된 유사도는 유사도 정규화 과정을 거쳐 스코어( SC)로 표현하였으며, 가장 높은 스코어를 가지는 화자를 인식화자로 결정한다. 화자인식에서 발성의 종류로는 문맥독립 문장을 사용하였다. 인식실험을 위해서는 ETRI445 DB와 KLE452 DB를 사용하였으며. 특징파라미터로서는 켑스트럼계수 및 회귀계수값만을 사용하였다. 인식실험에서는 등록화자의 수를 달리하여 일반적인 화자식별방법과 프레임단위유사도정규화방법으로 각각 인식실험을 하였다. 인식실험결과, 프레임단위유사도정규화방법이 인식화자수가 많아지는 경우에 일반적인 방법보다 향상된 인식률을 얻을수 있었다.

  • PDF

The effect of L2 experience on perception of Korean nasals

  • Yoo, Juyeon;Kang, Seokhan
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.63-69
    • /
    • 2016
  • Twenty five English native speakers with two different L2 experienced groups and nineteen native Koreans heard both Korean word-initial nasals (/m/ and /n/) in three vowel contexts (low, mid, and high) produced by a native Korean speaker. The experiment examined the hypothesis that Korean nasals are more likely to be judged or perceived correctly by the L2-experienced English learners of Korean than the unexperienced counterparts. The result showed that L2 experienced group was more sensitive to effects of vowel height in judging the Korean nasals in which the perception of nasals before the high vowels was more subject to it. In addition, place of nasal articulation causes asymmetry relations - bilabial nasal /m/ is more likely to be perceived as plosives rather than alveolar nasal /n/. The study found that the L2 experience has a somewhat limited role in perceiving the nasals correctly in the word-initial position, especially before the high vowels, in that even the L2 experienced English subjects have difficulty in identifying the Korean nasals correctly in this environment. Nevertheless, low L2 proficiency might be accounted for the difficulty in the bilabial nasal identification observed by the L2 experienced group.

A Study on the Symmetric Neural Networks and Their Applications (대칭 신경회로망과 그 응용에 관한 연구)

  • 나희승;박영진
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.16 no.7
    • /
    • pp.1322-1331
    • /
    • 1992
  • The conventional neural networks are built without considering the underlying structure of the problems. Hence, they usually contain redundant weights and require excessive training time. A novel neural network structure is proposed for symmetric problems, which alleviate some of the aforementioned drawback of the conventional neural networks. This concept is expanded to that of the constrained neural network which may be applied to general structured problems. Because these neural networks can not be trained by the conventional training algorithm, which destroys the weight structure of the neural networks, a proper training algorithm is suggested. The illustrative examples are shown to demonstrate the applicability of the proposed idea.