Search | Korea Science

Scream Sound Detection Based on Universal Background Model Under Various Sound Environments (다양한 소리 환경에서 UBM 기반의 비명 소리 검출)

Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.12 no.3
- /
- pp.485-492
- /
- 2017
GMM has been one of the most popular methods for scream sound detection. In the conventional GMM, the whole training data is divided into scream sound and non-scream sound, and the GMM is trained for each of them in the training process. Motivated by the idea that the process of scream sound detection is very similar to that of speaker recognition, the UBM which has been used quite successfully in speaker recognition, is proposed for use in scream sound detection in this study. We could find that UBM shows better performance than the traditional GMM from the experimental results.
https://doi.org/10.13067/JKIECS.2017.12.3.485 인용 PDF KSCI

Impostor Detection in Speaker Recognition Using Confusion-Based Confidence Measures

Kim, Kyu-Hong;Kim, Hoi-Rin;Hahn, Min-Soo
- ETRI Journal
- /
- v.28 no.6
- /
- pp.811-814
- /
- 2006
In this letter, we introduce confusion-based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model-universal background model (GMM-UBM) scheme, our confusion-based measures show better performance in noise-corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.
PDF

Real-Time Face-Detection Based on Multiple Color-Spaces and Multiple Thresholds for Distance Measurement Between Speaker and Smart-Phone (화자(話者)와 스마트폰의 거리 측정을 위한 다중 색 좌표계와 다중 임계치 기반 실시간 얼굴검출)

Lee, Jae-Won;Kwon, Goo-Rak;Hong, Sung-Hoon
- Journal of Korea Multimedia Society
- /
- v.14 no.4
- /
- pp.481-493
- /
- 2011
As the development of mobile devices, mobile phones are equipped with many features. Video-call feature is one of them. In this paper, we present distance measurement between speaker and smart-phone using multiple color spaces and multiple thresholds. first, detect face based on skin color information. and second, measure distance between speaker and smart-phone using the detected face region. Especially, the first considering point in the development of face area detection is real-time processing and the second point is robustness to solve the problems of face detection errors due to rapid change of object movement, lighting and background between adjacent frames.
https://doi.org/10.9717/kmms.2011.14.4.481 인용 PDF KSCI

Speaker Detection and Recognition for a Welfare Robot

Sugisaka, Masanori;Fan, Xinjian
- 제어로봇시스템학회:학술대회논문집
- /
- 2003.10a
- /
- pp.835-838
- /
- 2003
Computer vision and natural-language dialogue play an important role in friendly human-machine interfaces for service robots. In this paper we describe an integrated face detection and face recognition system for a welfare robot, which has also been combined with the robot's speech interface. Our approach to face detection is to combine neural network (NN) and genetic algorithm (GA): ANN serves as a face filter while GA is used to search the image efficiently. When the face is detected, embedded Hidden Markov Model (EMM) is used to determine its identity. A real-time system has been created by combining the face detection and recognition techniques. When motivated by the speaker's voice commands, it takes an image from the camera, finds the face inside the image and recognizes it. Experiments on an indoor environment with complex backgrounds showed that a recognition rate of more than 88% can be achieved.
PDF

A Study on SVM-Based Speaker Classification Using GMM-supervector (GMM-supervector를 사용한 SVM 기반 화자분류에 대한 연구)

Lee, Kyong-Rok
- Journal of IKEEE
- /
- v.24 no.4
- /
- pp.1022-1027
- /
- 2020
In this paper, SVM-based speaker classification is experimented with GMM-supervector. To create a speaker cluster, conventional speaker change detection is performed with the KL distance using the SNR-based weighting function. SVM-based speaker classification consists of two steps. In the first step, SVM-based classification between UBM and speaker models is performed, speaker information is indexed in each cluster, and then grouped by speaker. In the second step, the SVM-based classification between UBM and speaker models is performed by inputting the speaker cluster group. Linear and RBF are applied as kernel functions for SVM-based classification. As a result, in the first step, the case of applying the linear kernel showed better performance than RBF with 148 speaker clusters, MDR 0, FAR 47.3, and ER 50.7. The second step experiment result also showed the best performance with 109 speaker clusters, MDR 1.3, FAR 28.4, and ER 32.1 when the linear kernel was applied.
https://doi.org/10.7471/ikeee.2020.24.4.1022 인용 PDF KSCI

The Speaker Identification Using Incremental Learning (Incremental Learning을 이용한 화자 인식)

Sim, Kwee-Bo;Heo, Kwang-Seung;Park, Chang-Hyun;Lee, Dong-Wook
- Journal of the Korean Institute of Intelligent Systems
- /
- v.13 no.5
- /
- pp.576-581
- /
- 2003
Speech signal has the features of speakers. In this paper, we propose the speaker identification system which use the incremental learning based on neural network. Recorded speech signal through the Mic is passed the end detection and is divided voiced signal and unvoiced signal. The extracted 12 order cpestrum are used the input data for neural network. Incremental learning is the learning algorithm that the learned weights are remembered and only the new weights, that is created as adding new speaker, are trained. The architecture of neural network is extended with the number of speakers. So, this system can learn without the restricted number of speakers.
https://doi.org/10.5391/JKIIS.2003.13.5.576 인용 PDF KSCI

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.89-92
- /
- 2005
Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.
PDF

A Robust Method for Speech Replay Attack Detection

Lin, Lang;Wang, Rangding;Yan, Diqun;Dong, Li
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.1
- /
- pp.168-182
- /
- 2020
Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms the competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.
https://doi.org/10.3837/tiis.2020.01.010 인용 PDF KSCI HTML

A Study on the Mixed Model Approach and Symbol Probability Weighting Function for Maximization of Inter-Speaker Variation (화자간 변별력 최대화를 위한 혼합 모델 방식과 심볼 확률 가중함수에 관한 연구)

Chin Se-Hoon;Kang Chul-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.7
- /
- pp.410-415
- /
- 2005
Recently, most of the speaker verification systems are based on the pattern recognition approach method. And performance of the pattern-classifier depends on how to classify a variety of speakers' feature parameters. In order to classify feature parameters efficiently and effectively, it is of great importance to enlarge variations between speakers and effectively measure distances between feature parameters. Therefore, this paper would suggest the positively mixed model scheme that can enlarge inter-speaker variation by searching the individual model with world model at the same time. During decision procedure, we can maximize inter-speaker variation by using the proposed mixed model scheme. We also make use of a symbol probability weighting function in this system so as to reduce vector quantization errors by measuring symbol probability derived from the distance rate of between the world codebook and individual codebook. As the result of our experiment using this method, we could halve the Detection Cost Function (DCF) of the system from $2.37\%\;to\;1.16\%$.
PDF KSCI

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
- Journal of Convergence Society for SMB
- /
- v.6 no.4
- /
- pp.71-77
- /
- 2016
In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.
https://doi.org/10.22156/CS4SMB.2016.6.4.071 인용 PDF

Search Result 108, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)