Search | Korea Science

Estimation of Optimal Mixture Number of GMM for Environmental Sounds Recognition (환경음 인식을 위한 GMM의 혼합모델 개수 추정)

Han, Da-Jeong;Park, Aa-Ron;Baek, Sung-June
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.2
- /
- pp.817-821
- /
- 2012
In this paper we applied the optimal mixture number estimation technique in GMM(Gaussian mixture model) using BIC(Bayesian information criterion) and MDL(minimum description length) as a model selection criterion for environmental sounds recognition. In the experiment, we extracted 12 MFCC(mel-frequency cepstral coefficients) features from 9 kinds of environmental sounds which amounts to 27747 data and classified them with GMM. As mentioned above, BIC and MDL is applied to estimate the optimal number of mixtures in each environmental sounds class. According to the experimental results, while the recognition performances are maintained, the computational complexity decreases by 17.8% with BIC and 31.7% with MDL. It shows that the computational complexity reduction by BIC and MDL is effective for environmental sounds recognition using GMM.
https://doi.org/10.5762/KAIS.2012.13.2.817 인용 PDF KSCI

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

김민정;석수영;김광수;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.1
- /
- pp.8-14
- /
- 2002
In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.
PDF

Adaptive Background Modeling for Crowded Scenes (혼잡한 환경에 적합한 적응적인 배경모델링 방법)

Lee, Gwang-Gook;Song, Su-Han;Ka, Kee-Hwan;Yoon, Ja-Young;Kim, Jae-Jun;Kim, Whoi-Yul
- Journal of Korea Multimedia Society
- /
- v.11 no.5
- /
- pp.597-609
- /
- 2008
Due to the recursive updating nature of background model, previous background modeling methods are often perturbed by crowd scenes where foreground pixels occurs more frequently than background pixels. To resolve this problem, an adaptive background modeling method, which is based on the well-known Gaussian mixture background model, is proposed. In the proposed method, the learning rate of background model is adaptively adjusted with respect to the crowdedness of the scene. Consequently, the learning process is suppressed in crowded scene to maintain proper background model. Experiments on real dataset revealed that the proposed method could perform background subtraction effectively even in crowd situation while the performance is almost the same to the previous method in normal scenes. Also, the F-measure was increased by 5-10% compared to the previous background modeling methods in the video of crowded situations.
PDF

Unsupervised Change Detection Using Iterative Mixture Density Estimation and Thresholding

Park, No-Wook;Chi, Kwang-Hoon
- Proceedings of the KSRS Conference
- /
- 2003.11a
- /
- pp.402-404
- /
- 2003
We present two methods for the automatic selection of the threshold values in unsupervised change detection. Both methods consist of the same two procedures: 1) to determine the parameters of Gaussian mixtures from a difference image or ratio image, 2) to determine threshold values using the Bayesian rule for minimum error. In the first method, the Expectation-Maximization algorithm is applied for estimating the parameters of the Gaussian mixtures. The second method is based on the iterative thresholding that successively employs thresholding and estimation of the model parameters. The effectiveness and applicability of the methods proposed here are illustrated by an experiment on the multi-temporal KOMPAT-1 EOC images.
PDF

IMAGE SEGMENTATION BASED ON THE STATISTICAL VARIATIONAL FORMULATION USING THE LOCAL REGION INFORMATION

Park, Sung Ha;Lee, Chang-Ock;Hahn, Jooyoung
- Journal of the Korean Society for Industrial and Applied Mathematics
- /
- v.18 no.2
- /
- pp.129-142
- /
- 2014
We propose a variational segmentation model based on statistical information of intensities in an image. The model consists of both a local region-based energy and a global region-based energy in order to handle misclassification which happens in a typical statistical variational model with an assumption that an image is a mixture of two Gaussian distributions. We find local ambiguous regions where misclassification might happen due to a small difference between two Gaussian distributions. Based on statistical information restricted to the local ambiguous regions, we design a local region-based energy in order to reduce the misclassification. We suggest an algorithm to avoid the difficulty of the Euler-Lagrange equations of the proposed variational model.
https://doi.org/10.12941/jksiam.2014.18.129 인용 PDF KSCI

Speech Recognition in Noise Environments Using SPLICE with Phonetic Information (음성학적인 정보를 포함한 SPLICE를 이용한 잡음환경에서의 음성인식)

Kim Doo Hee;Kim Hyung Soon
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.83-86
- /
- 2002
훈련과정과 인식과정에서의 주변환경 잡음과 채널 특성 등의 불일치는 음성인식 성능을 급격히 저하시킨다. 이러한 불일치를 보상하기 위해서 켑스트럼 영역에서의 다양한 전처리 방법이 시도되고 있으며 최근에는 stereo 데이터와 잡음 음성의 Gaussian Mixture Model (GMM)을 이용해 보상벡터를 구하는 SPLICE 방법이 좋은 결과를 보이고 있다(1). 기존의 SPLICE가 전체 발성에 대해서 음향학적인 정보만으로 Gaussian 모델을 구하는 반면 본 논문에서는 발성에 해당하는 음소정보를 고려하여 전체 음향 공간을 각 음소에 대해 나누어서 모델링하고 각 음소에 대한 Gaussian 모델과 그 음소에 해당하는 음성데이터만을 이용하여 음소별 보상벡터가 훈련되도록 하였다. 이 경우 보상벡터는 잡음이 각 음소에 미치는 영향을 보다 자세히 나타내게 된다. Aurora 2 데이터베이스를 이용한 실험결과, 제안된 방법이 기존의 SPLICE방법에 비해 성능향상을 보였다.
PDF

A Variable Parameter Model based on SSMS for an On-line Speech and Character Combined Recognition System (음성 문자 공용인식기를 위한 SSMS 기반 가변 파라미터 모델)

석수영;정호열;정현열
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.7
- /
- pp.528-538
- /
- 2003
A SCCRS (Speech and Character Combined Recognition System) is developed for working on mobile devices such as PDA (Personal Digital Assistants). In SCCRS, the feature extraction is separately carried out for speech and for hand-written character, but the recognition is performed in a common engine. The recognition engine employs essentially CHMM (Continuous Hidden Markov Model), which consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. For generating contort independent variable parameter model, we propose the SSMS(Successive State and Mixture Splitting), which gives appropriate numbers of mixture and of states through splitting in mixture domain and in time domain. The recognition results show that the proposed SSMS method can reduce the total number of GOPDD (Gaussian Output Probability Density Distribution) up to 40.0% compared to the conventional method with fixed parameter model, at the same recognition performance in speech recognition system.
PDF KSCI

Comprehensive Performance Analysis and Comparison of various Digital Communication Systems in an Multipath Fading Channel with additive Mixture of Gaussian and Impulsive Noise [Part-1] (가우스성 잡음과 임펄스성 잡음이 혼재하는 다중전파 페이딩 전송로상에서의 제반 디지털 통신 시스템 특성의 종합분석 및 비교에 관한 연구(제 1 부))

김현철;고봉진;공병옥;조성준
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.14 no.3
- /
- pp.263-279
- /
- 1989
In part-1 of this paper, the error rate equations of digitally modulated signals transmitted though the Gaussian/Impulsive noise channel have been derived. Using the derived equations for the error probabilities of ASK, QAM, CPSK, DPSK, FSK and MSK signals, the error rate performances of digital modulation systems have been evaluated and represented in the figures as the functions of carrier-to-noise power ratio(CNR), impulsive index, and the ration of Gaussian noise power component to impulsive noise power component. The results are shown in graphs to known how much impulsive noise effects on digital signals than Gaussian noise.
PDF

Study on Image Processing Techniques Applying Artificial Intelligence-based Gray Scale and RGB scale

Lee, Sang-Hyun;Kim, Hyun-Tae
- International Journal of Advanced Culture Technology
- /
- v.10 no.2
- /
- pp.252-259
- /
- 2022
Artificial intelligence is used in fusion with image processing techniques using cameras. Image processing technology is a technology that processes objects in an image received from a camera in real time, and is used in various fields such as security monitoring and medical image analysis. If such image processing reduces the accuracy of recognition, providing incorrect information to medical image analysis, security monitoring, etc. may cause serious problems. Therefore, this paper uses a mixture of YOLOv4-tiny model and image processing algorithm and uses the COCO dataset for learning. The image processing algorithm performs five image processing methods such as normalization, Gaussian distribution, Otsu algorithm, equalization, and gradient operation. For RGB images, three image processing methods are performed: equalization, Gaussian blur, and gamma correction proceed. Among the nine algorithms applied in this paper, the Equalization and Gaussian Blur model showed the highest object detection accuracy of 96%, and the gamma correction (RGB environment) model showed the highest object detection rate of 89% outdoors (daytime). The image binarization model showed the highest object detection rate at 89% outdoors (night).
https://doi.org/10.17703/IJACT.2022.10.2.252 인용 PDF KSCI

Multi-Level Segmentation of Infrared Images with Region of Interest Extraction

Yeom, Seokwon
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.16 no.4
- /
- pp.246-253
- /
- 2016
Infrared (IR) imaging has been researched for various applications such as surveillance. IR radiation has the capability to detect thermal characteristics of objects under low-light conditions. However, automatic segmentation for finding the object of interest would be challenging since the IR detector often provides the low spatial and contrast resolution image without color and texture information. Another hindrance is that the image can be degraded by noise and clutters. This paper proposes multi-level segmentation for extracting regions of interest (ROIs) and objects of interest (OOIs) in the IR scene. Each level of the multi-level segmentation is composed of a k-means clustering algorithm, an expectation-maximization (EM) algorithm, and a decision process. The k-means clustering initializes the parameters of the Gaussian mixture model (GMM), and the EM algorithm estimates those parameters iteratively. During the multi-level segmentation, the area extracted at one level becomes the input to the next level segmentation. Thus, the segmentation is consecutively performed narrowing the area to be processed. The foreground objects are individually extracted from the final ROI windows. In the experiments, the effectiveness of the proposed method is demonstrated using several IR images, in which human subjects are captured at a long distance. The average probability of error is shown to be lower than that obtained from other conventional methods such as Gonzalez, Otsu, k-means, and EM methods.
https://doi.org/10.5391/IJFIS.2016.16.4.246 인용 PDF KSCI

Search Result 507, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)