• Title/Summary/Keyword: EM기법

Search Result 162, Processing Time 0.02 seconds

A Text Categorization Method Improved by Removing Noisy Training Documents (오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상)

  • Han, Hyoung-Dong;Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.912-919
    • /
    • 2005
  • When we apply binary classification to multi-class classification for text categorization, we use the One-Against-All method generally, However, this One-Against-All method has a problem. That is, documents of a negative set are not labeled by human. Thus, they can include many noisy documents in the training data. In this paper, we propose that the Sliding Window technique and the EM algorithm are applied to binary text classification for solving this problem. We here improve binary text classification through extracting noise documents from the training data by the Sliding Window technique and re-assigning categories of these documents using the EM algorithm.

Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier (나이브베이즈 문서분류시스템을 위한 선택적샘플링 기반 EM 가속 알고리즘)

  • Chang Jae-Young;Kim Han-Joon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.369-376
    • /
    • 2006
  • This paper presents a new method of significantly improving conventional Bayesian statistical text classifier by incorporating accelerated EM(Expectation Maximization) algorithm. EM algorithm experiences a slow convergence and performance degrade in its iterative process, especially when real online-textual documents do not follow EM's assumptions. In this study, we propose a new accelerated EM algorithm with uncertainty-based selective sampling, which is simple yet has a fast convergence speed and allow to estimate a more accurate classification model on Naive Bayesian text classifier. Experiments using the popular Reuters-21578 document collection showed that the proposed algorithm effectively improves classification accuracy.

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

  • Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1045-1055
    • /
    • 2007
  • Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.

An EM Algorithm-Based Approach for Imputation of Pixel Values in Color Image (색조영상에서 랜덤결측화소값 대체를 위한 EM 알고리즘 기반 기법)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.305-315
    • /
    • 2010
  • In this paper, a frequentistic approach to impute the values of R, G, B-components in random missing pixels of color image is provided. Under assumption that the given image is a realization of Gaussian Markov random field, its model is designed such that each neighbor pixel values for a given pixel follows (independently) the normal distribution with covariance matrix scaled by an evaluates of the similarity between two pixel values, so that the imputation is not to be affected by the neighbors with different color. An approximate EM-based algorithm maximizing the underlying likelihood is implemented to estimate the parameters and to impute the missing pixel values. Some experiments are presented to show its effectiveness through performance comparison with a popular interpolation method.

Low-Voltage EM(Elasto-Magnetic) Sensing Technique for Tensile Force Management of PSC(Prestressed Concrete) Internal Tendon (PSC 내부 텐던의 긴장력 관리를 위한 저전압 EM 센싱 기법)

  • Park, Jihwan;Kim, Junkyeong;Eum, Ki-Young;Park, Seunghee
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.2
    • /
    • pp.87-92
    • /
    • 2019
  • In this paper, we have verified a low-voltage EM(elasto-magnetic) sensing technique for tensile force management of PSC(prestressed concrete) internal tendon in order to apply the technique to actual construction sites where stable power supply is difficult. From observation of past domestic and overseas PSC structural accident cases, it was found that PS tension is very important to maintain structural stability. In this paper, we have tried to measure the tensile force from a magnetic hysteresis curve through EM sensors according to voltage value by using relation between magnetostriction and stress of ferromagnetic material based on elastic-magnetic theory. For this purpose, EM sensor of double cylindrical coil type was fabricated and tensile force test equipment for PS tendon using hydraulic tensioning device was constructed. The experiment was conducted to confirm relationship between changes of permeability and tensile force from the measurement results of the maximum / minimum voltage amount. The change of magnetic hysteresis curve with magnitude of tensile force was also measured by reducing amount of voltage step by step. As a result, the slope of estimation equation in accordance with magnitude of magnetic field decreases with the voltage reduction. But it was confirmed a similar pattern of change of magnetic permeability for the magnetic hysteresis loop. So, in this study, it is considered that it is possible to manage the tensions of PSC internal tendon using EM sensing technique in low-voltage state.

Improving performance of Binary Text Classification Using the EM algorithm (EM 알고리즘을 이용한 이진 분류 문서 범주화의 성능 향상)

  • 한형동;고영중;서정연
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.790-792
    • /
    • 2004
  • 문서 범주화에서 이진분류를 다중 분류에 적용할 때, 일반적으로 One-Against-All 방법을 사용한다. 하지만, 이 One-Against-All 방법은 한가지 문제점을 가진다. 즉, positive 집합의 문서들은 사람이 직접 범주를 할당한 것이지만, negative 집합의 문서들은 사람이 직접 범주를 할당한 것이 아니기 때문에 오류 문서들이 포함될 수 있다는 것이다. 본 논문에서는 이러한 문제점을 해결하기 위해 Sliding Window기법과 EM 알고리즘을 이진 분류 기반의 문서 범주화에 적용할 것을 제안한다. 먼저 Sliding Window 기법을 이용하여 학습 데이터로부터 오류 문서들을 추출하고 이 문서들을 EM 알고리즘을 사용해서 다시 범주를 할당함으로써 이진 분류 기반의 문서 범주화 기법의 성능을 향상시킨다.

  • PDF

Imputation of Multiple Missing Values by Normal Mixture Model under Markov Random Field: Application to Imputation of Pixel Values of Color Image (마코프 랜덤 필드 하에서 정규혼합모형에 의한 다중 결측값 대체기법: 색조영상 결측 화소값 대체에 응용)

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.925-936
    • /
    • 2009
  • There very many approaches to impute missing values in the iid. case. However, it is hardly found the imputation techniques in the Markov random field(MRF) case. In this paper, we show that the imputation under MRF is just to impute by fitting the normal mixture model(NMM) under several practical assumptions. Our multivariate normal mixture model based approaches under MRF is applied to impute the missing pixel values of 3-variate (R, G, B) color image, providing a technique to smooth the imputed values.

Retargetable Intermediate Code Optimization System Using Tree Pattern Matching Techniques (트리패턴매칭기법의 재목적 가능한 중간코드 최적화 시스템)

  • Kim, Jeong-Suk;O, Se-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2253-2261
    • /
    • 1999
  • ACK generates optimized code using the string pattern matching technique in pattern table generator and peephole optimizer. But string pattern matching method is not effective due to the many comparative actions in pattern selection. We designed and implemented the EM intermediate code optimizer using tree pattern matching algorithm composed of EM tree generator, optimization pattern table generator and tree pattern matcher. Tree pattern matching algorithm practices the pattern matching that centering around root node with refer to the pattern table, with traversing the EM tree by top-down method. As a result, compare to ACK string pattern matching methods, we found that the optimized code effected to pattern selection time, and contributed to improved the pattern selection time by about 10.8%.

  • PDF

High-Reliable Classification of Multiple Induction Motor Faults using Robust Vibration Signatures in Noisy Environments based on a LPC Analysis and an EM Algorithm (LPC 분석 기법 및 EM 알고리즘 기반 잡음 환경에 강인한 진동 특징을 이용한 고 신뢰성 유도 전동기 다중 결함 분류)

  • Kang, Myeongsu;Jang, Won-Chul;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.2
    • /
    • pp.21-30
    • /
    • 2014
  • The use of induction motors has been recently increasing in a variety of industrial sites, and they play a significant role. This has motivated that many researchers have studied on developing fault detection and classification systems of induction motors in order to reduce economical damage caused by their faults. To early identify induction motor faults, this paper effectively estimates spectral envelopes of each induction motor fault by utilizing a linear prediction coding (LPC) analysis technique and an expectation maximization (EM) algorithm. Moreover, this paper classifies induction motor faults into their corresponding categories by calculating Mahalanobis distance using the estimated spectral envelopes and finding the minimum distance. Experimental results show that the proposed approach yields higher classification accuracies than the state-of-the-art conventional approach for both noiseless and noisy environments for identifying the induction motor faults.

Bayesian Hierachical Model using Gibbs Sampler Method: Field Mice Example (깁스 표본 기법을 이용한 베이지안 계층적 모형: 야생쥐의 예)

  • Song, Jae-Kee;Lee, Gun-Hee;Ha, Il-Do
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.247-256
    • /
    • 1996
  • In this paper, we applied bayesian hierarchical model to analyze the field mice example introduced by Demster et al.(1981). For this example, we use Gibbs sampler method to provide the posterior mean and compared it with LSE(Least Square Estimator) and MLR(Maximum Likelihood estimator with Random effect) via the EM algorithm.

  • PDF