• Title/Summary/Keyword: Gaussian mixture method

Search Result 302, Processing Time 0.025 seconds

Real Time Abandoned and Removed Objects Detection System (실시간 방치 및 제거 객체 검출 시스템)

  • Jeong, Cheol-Jun;Ahn, Tae-Ki;Park, Jong-Hwa;Park, Goo-Man
    • Journal of Broadcast Engineering
    • /
    • v.16 no.3
    • /
    • pp.462-470
    • /
    • 2011
  • We proposed a realtime object tracking system that detects the abandoned or disappeared objects. Because these events are caused by human, we used the tracking based algorithm. After the background subtraction by Gaussian mixture model, the shadow removal is applied for accurate object detection. The static object is classified as either of abandoned objects or disappeared object. We assigned monitoring time to the static object to overcome a situation that it is being overlapped by other object. We obtained more accurate detection by using region growing method. We implemented our algorithm by DSP processor and obtained an excellent result throughout the experiment.

Text Independent Speaker Verficiation Using Dominant State Information of HMM-UBM (HMM-UBM의 주 상태 정보를 이용한 음성 기반 문맥 독립 화자 검증)

  • Shon, Suwon;Rho, Jinsang;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.2
    • /
    • pp.171-176
    • /
    • 2015
  • We present a speaker verification method by extracting i-vectors based on dominant state information of Hidden Markov Model (HMM) - Universal Background Model (UBM). Ergodic HMM is used for estimating UBM so that various characteristic of individual speaker can be effectively classified. Unlike Gaussian Mixture Model(GMM)-UBM based speaker verification system, the proposed system obtains i-vectors corresponding to each HMM state. Among them, the i-vector for feature is selected by extracting it from the specific state containing dominant state information. Relevant experiments are conducted for validating the proposed system performance using the National Institute of Standards and Technology (NIST) 2008 Speaker Recognition Evaluation (SRE) database. As a result, 12 % improvement is attained in terms of equal error rate.

New Scheme for Smoker Detection (흡연자 검출을 위한 새로운 방법)

  • Lee, Jong-seok;Lee, Hyun-jae;Lee, Dong-kyu;Oh, Seoung-jun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.9
    • /
    • pp.1120-1131
    • /
    • 2016
  • In this paper, we propose a smoker recognition algorithm, detecting smokers in a video sequence in order to prevent fire accidents. We use description-based method in hierarchical approaches to recognize smoker's activity, the algorithm consists of background subtraction, object detection, event search, event judgement. Background subtraction generates slow-motion and fast-motion foreground image from input image using Gaussian mixture model with two different learning-rate. Then, it extracts object locations in the slow-motion image using chain-rule based contour detection. For each object, face is detected by using Haar-like feature and smoke is detected by reflecting frequency and direction of smoke in fast-motion foreground. Hand movements are detected by motion estimation. The algorithm examines the features in a certain interval and infers that whether the object is a smoker. It robustly can detect a smoker among different objects while achieving real-time performance.

A PCA-based MFDWC Feature Parameter for Speaker Verification System (화자 검증 시스템을 위한 PCA 기반 MFDWC 특징 파라미터)

  • Hahm Seong-Jun;Jung Ho-Youl;Chung Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.36-42
    • /
    • 2006
  • A Principal component analysis (PCA)-based Mel-Frequency Discrete Wavelet Coefficients (MFDWC) feature Parameters for speaker verification system is Presented in this Paper In this method, we used the 1st-eigenvector obtained from PCA to calculate the energy of each node of level that was approximated by. met-scale. This eigenvector satisfies the constraint of general weighting function that the squared sum of each component of weighting function is unity and is considered to represent speaker's characteristic closely because the 1st-eigenvector of each speaker is fairly different from the others. For verification. we used Universal Background Model (UBM) approach that compares claimed speaker s model with UBM on frame-level. We performed experiments to test the effectiveness of PCA-based parameter and found that our Proposed Parameters could obtain improved average Performance of $0.80\%$compared to MFCC. $5.14\%$ to LPCC and 6.69 to existing MFDWC.

Skin Region Detection Using Histogram Approximation Based Mean Shift Algorithm (Mean Shift 알고리즘 기반의 히스토그램 근사화를 이용한 피부 영역 검출)

  • Byun, Ki-Won;Joo, Jae-Heum;Nam, Ki-Gon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.4
    • /
    • pp.21-29
    • /
    • 2011
  • At existing skin detection methods using skin color information defined based on the prior knowldege, threshold value to be used at the stage of dividing the backround and the skin region was decided on a subjective point of view through experiments. Also, threshold value was selected in a passive manner according to their background and illumination environments in these existing methods. These existing methods displayed a drawback in that their performance was fully influenced by the threshold value estimated through repetitive experiments. To overcome the drawback of existing methods, this paper propose a skin region detection method using a histogram approximation based on the mean shift algorithm. The proposed method is to divide the background region and the skin region by using the mean shift method at the histogram of the skin-map of the input image generated by the comparison of the similarity with the standard skin color at the CbCr color space and actively finding the maximum value converged by brightness level. Since the histogram has a form of discontinuous function accumulated according to the brightness value of the pixel, it gets approximated as a Gaussian Mixture Model (GMM) using the Bezier Curve method. Thus, the proposed method detects the skin region by using the mean shift method and actively finding the maximum value which eventually becomes the dividing point, not by using the manually selected threshold value unlike other existing methods. This method detects the skin region high performance effectively through experiments.

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Laryngeal Cancer Screening using Cepstral Parameters (켑스트럼 파라미터를 이용한 후두암 검진)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.14 no.2
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

A Robust Object Detection and Tracking Method using RGB-D Model (RGB-D 모델을 이용한 강건한 객체 탐지 및 추적 방법)

  • Park, Seohee;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.18 no.4
    • /
    • pp.61-67
    • /
    • 2017
  • Recently, CCTV has been combined with areas such as big data, artificial intelligence, and image analysis to detect various abnormal behaviors and to detect and analyze the overall situation of objects such as people. Image analysis research for this intelligent video surveillance function is progressing actively. However, CCTV images using 2D information generally have limitations such as object misrecognition due to lack of topological information. This problem can be solved by adding the depth information of the object created by using two cameras to the image. In this paper, we perform background modeling using Mixture of Gaussian technique and detect whether there are moving objects by segmenting the foreground from the modeled background. In order to perform the depth information-based segmentation using the RGB information-based segmentation results, stereo-based depth maps are generated using two cameras. Next, the RGB-based segmented region is set as a domain for extracting depth information, and depth-based segmentation is performed within the domain. In order to detect the center point of a robustly segmented object and to track the direction, the movement of the object is tracked by applying the CAMShift technique, which is the most basic object tracking method. From the experiments, we prove the efficiency of the proposed object detection and tracking method using the RGB-D model.

Noise Robust Speech Recognition Based on Parallel Model Combination Adaptation Using Frequency-Variant (주파수 변이를 이용한 Parallel Model Combination 모델 적응에 기반한 잡음에 강한 음성인식)

  • Choi, Sook-Nam;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.3
    • /
    • pp.252-261
    • /
    • 2013
  • The common speech recognition system displays higher recognition performance in a quiet environment, while its performance declines sharply in a real environment where there are noises. To implement a speech recognizer that is robust in different speech settings, this study suggests the method of Parallel Model Combination adaptation using frequency-variant based on environment-awareness (FV-PMC), which uses variants in frequency; acquires the environmental data for speech recognition; applies it to upgrading the speech recognition model; and promotes its performance enhancement. This FV-PMC performs the speech recognition with the recognition model which is generated as followings: i) calculating the average frequency variant in advance among the readily-classified noise groups and setting it as a threshold value; ii) recalculating the frequency variant among noise groups when speech with unknown noises are input; iii) regarding the speech higher than the threshold value of the relevant group as the speech including the noise of its group; and iv) using the speech that includes this noise group. When noises were classified with the proposed FV-PMC, the average accuracy of classification was 56%, and the results from the speech recognition experiments showed the average recognition rate of Set A was 79.05%, the rate of Set B 79.43%m, and the rate of Set C 83.37% respectively. The grand mean of recognition rate was 80.62%, which demonstrates 5.69% more improved effects than the recognition rate of 74.93% of the existing Parallel Model Combination with a clear model, meaning that the proposed method is effective.

Railway Track Extraction from Mobile Laser Scanning Data (모바일 레이저 스캐닝 데이터로부터 철도 선로 추출에 관한 연구)

  • Yoonseok, Jwa;Gunho, Sohn;Jong Un, Won;Wonchoon, Lee;Nakhyeon, Song
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.2
    • /
    • pp.111-122
    • /
    • 2015
  • This study purposed on introducing a new automated solution for detecting railway tracks and reconstructing track models from the mobile laser scanning data. The proposed solution completes following procedures; the study initiated with detecting a potential railway region, called Region Of Interest (ROI), and approximating the orientation of railway track trajectory with the raw data. At next, the knowledge-based detection of railway tracks was performed for localizing track candidates in the first strip. In here, a strip -referring the local track search region- is generated in the orthogonal direction to the orientation of track trajectory. Lastly, an initial track model generated over the candidate points, which were detected by GMM-EM (Gaussian Mixture Model-Expectation & Maximization) -based clustering strip- wisely grows to capture all track points of interest and thus converted into geometric track model in the tracking by detection framework. Therefore, the proposed railway track tracking process includes following key features; it is able to reduce the complexity in detecting track points by using a hypothetical track model. Also, it enhances the efficiency of track modeling process by simultaneously capturing track points and modeling tracks that resulted in the minimization of data processing time and cost. The proposed method was developed using the C++ program language and was evaluated by the LiDAR data, which was acquired from MMS over an urban railway track area with a complex railway scene as well.