Search | Korea Science

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
- Journal of KIISE:Software and Applications
- /
- v.34 no.12
- /
- pp.1045-1055
- /
- 2007
Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.
PDF KSCI

Bayesian Network-based Probabilistic Management of Software Metrics for Refactoring (리팩토링을 위한 소프트웨어 메트릭의 베이지안 네트워크 기반 확률적 관리)

Choi, Seunghee;Lee, Goo Yeon
- Journal of KIISE
- /
- v.43 no.12
- /
- pp.1334-1341
- /
- 2016
In recent years, the importance of managing software defects in the implementation stage has emerged because of the rapid development and wide-range usage of intelligent smart devices. Even if not a few studies have been conducted on the prediction models for software defects, their outcomes have not been widely shared. This paper proposes an efficient probabilistic management model of software metrics based on the Bayesian network, to overcome limits such as binary defect prediction models. We expect the proposed model to configure the Bayesian network by taking advantage of various software metrics, which can help in identifying improvements for refactoring. Once the source code has improved through code refactoring, the measured related metric values will also change. The proposed model presents probability values reflecting the effects after defect removal, which can be achieved by improving metrics through refactoring. This model could cope with the conclusive binary predictions, and consequently secure flexibilities on decision making, using indeterminate probability values.
https://doi.org/10.5626/JOK.2016.43.12.1334 인용 KSCI

Weighted Local Naive Bayes Link Prediction

Wu, JieHua;Zhang, GuoJi;Ren, YaZhou;Zhang, XiaYan;Yang, Qiao
- Journal of Information Processing Systems
- /
- v.13 no.4
- /
- pp.914-927
- /
- 2017
Weighted network link prediction is a challenge issue in complex network analysis. Unsupervised methods based on local structure are widely used to handle the predictive task. However, the results are still far from satisfied as major literatures neglect two important points: common neighbors produce different influence on potential links; weighted values associated with links in local structure are also different. In this paper, we adapt an effective link prediction model-local naive Bayes model into a weighted scenario to address this issue. Correspondingly, we propose a weighted local naive Bayes (WLNB) probabilistic link prediction framework. The main contribution here is that a weighted cluster coefficient has been incorporated, allowing our model to inference the weighted contribution in the predicting stage. In addition, WLNB can extensively be applied to several classic similarity metrics. We evaluate WLNB on different kinds of real-world weighted datasets. Experimental results show that our proposed approach performs better (by AUC and Prec) than several alternative methods for link prediction in weighted complex networks.
https://doi.org/10.3745/JIPS.04.0040 인용 PDF KSCI

Prediction of compressive strength of GGBS based concrete using RVM

Prasanna, P.K.;Ramachandra Murthy, A.;Srinivasu, K.
- Structural Engineering and Mechanics
- /
- v.68 no.6
- /
- pp.691-700
- /
- 2018
Ground granulated blast furnace slag (GGBS) is a by product obtained from iron and steel industries, useful in the design and development of high quality cement paste/mortar and concrete. This paper investigates the applicability of relevance vector machine (RVM) based regression model to predict the compressive strength of various GGBS based concrete mixes. Compressive strength data for various GGBS based concrete mixes has been obtained by considering the effect of water binder ratio and steel fibres. RVM is a machine learning technique which employs Bayesian inference to obtain parsimonious solutions for regression and classification. The RVM is an extension of support vector machine which couples probabilistic classification and regression. RVM is established based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. Compressive strength model has been developed by using MATLAB software for training and prediction. About 70% of the data has been used for development of RVM model and 30% of the data is used for validation. The predicted compressive strength for GGBS based concrete mixes is found to be in very good agreement with those of the corresponding experimental observations.
https://doi.org/10.12989/sem.2018.68.6.691 인용 KSCI

Context Aware System based on Bayesian Network driven Context Reasoning and Ontology Context Modeling

Ko, Kwang-Eun;Sim, Kwee-Bo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.8 no.4
- /
- pp.254-259
- /
- 2008
Uncertainty of result of context awareness always exists in any context-awareness computing. This falling-off in accuracy of context awareness result is mostly caused by the imperfectness and incompleteness of sensed data, because of this reasons, we must improve the accuracy of context awareness. In this article, we propose a novel approach to model the uncertain context by using ontology and context reasoning method based on Bayesian Network. Our context aware processing is divided into two parts; context modeling and context reasoning. The context modeling is based on ontology for facilitating knowledge reuse and sharing. The ontology facilitates the share and reuse of information over similar domains of not only the logical knowledge but also the uncertain knowledge. Also the ontology can be used to structure learning for Bayesian network. The context reasoning is based on Bayesian Networks for probabilistic inference to solve the uncertain reasoning in context-aware processing problem in a flexible and adaptive situation.
https://doi.org/10.5391/IJFIS.2008.8.4.254 인용 PDF KSCI

A Bayesian Inference Model for Landmarks Detection on Mobile Devices (모바일 디바이스 상에서의 특이성 탐지를 위한 베이지안 추론 모델)

Hwang, Keum-Sung;Cho, Sung-Bae;Lea, Jong-Ho
- Journal of KIISE:Computing Practices and Letters
- /
- v.13 no.1
- /
- pp.35-45
- /
- 2007
The log data collected from mobile devices contains diverse meaningful and practical personal information. However, this information is usually ignored because of its limitation of memory capacity, computation power and analysis. We propose a novel method that detects landmarks of meaningful information for users by analyzing the log data in distributed modules to overcome the problems of mobile environment. The proposed method adopts Bayesian probabilistic approach to enhance the inference accuracy under the uncertain environments. The new cooperative modularization technique divides Bayesian network into modules to compute efficiently with limited resources. Experiments with artificial data and real data indicate that the result with artificial data is amount to about 84% precision rate and about 76% recall rate, and that including partial matching with real data is about 89% hitting rate.
PDF KSCI

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.7 no.1
- /
- pp.81-98
- /
- 2013
Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.
https://doi.org/10.3837/tiis.2013.01.006 인용 PDF KSCI

An Application of Dirichlet Mixture Model for Failure Time Density Estimation to Components of Naval Combat System (디리슈레 혼합모형을 이용한 함정 전투체계 부품의 고장시간 분포 추정)

Lee, Jinwhan;Kim, Jung Hun;Jung, BongJoo;Kim, Kyeongtaek
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.42 no.4
- /
- pp.194-202
- /
- 2019
Reliability analysis of the components frequently starts with the data that manufacturer provides. If enough failure data are collected from the field operations, the reliability should be recomputed and updated on the basis of the field failure data. However, when the failure time record for a component contains only a few observations, all statistical methodologies are limited. In this case, where the failure records for multiple number of identical components are available, a valid alternative is combining all the data from each component into one data set with enough sample size and utilizing the useful information in the censored data. The ROK Navy has been operating multiple Patrol Killer Guided missiles (PKGs) for several years. The Korea Multi-Function Control Console (KMFCC) is one of key components in PKG combat system. The maintenance record for the KMFCC contains less than ten failure observations and a censored datum. This paper proposes a Bayesian approach with a Dirichlet mixture model to estimate failure time density for KMFCC. Trends test for each component record indicated that null hypothesis, that failure occurrence is renewal process, is not rejected. Since the KMFCCs have been functioning under different operating environment, the failure time distribution may be a composition of a number of unknown distributions, i.e. a mixture distribution, rather than a single distribution. The Dirichlet mixture model was coded as probabilistic programming in Python using PyMC3. Then Markov Chain Monte Carlo (MCMC) sampling technique employed in PyMC3 probabilistically estimated the parameters' posterior distribution through the Dirichlet mixture model. The simulation results revealed that the mixture models provide superior fits to the combined data set over single models.
https://doi.org/10.11627/jkise.2019.42.4.194 인용 PDF KSCI

Operational Availability Improvement through Online Monitoring and Advice For Emergency Diesel Generator

Lee, Jong-Beom;Kim, han-Gon;Kim, Byong-Sub;M. Golay;C.W. Kang;Y. Sui
- Proceedings of the Korean Nuclear Society Conference
- /
- 1998.05a
- /
- pp.264-270
- /
- 1998
This research broadens the prime concern of nuclear power plant operations from safe performance to both economic and safe performance. First emergency diesel generator is identified as one of main contributors for the lost plant availability through the review of plants forced outage records. The framework of an integrated architecture for performing modern on-line condition for operational availability improvement is configured in this work. For the development of the comprehensive sensor networks for complex target systems, an integrated methodology incorporating a structural hierarchy, a functional hierarchy, and a fault-system matrix is formulated. The second part of our research is development of intelligent diagnosis and maintenance advisory system, which employs Bayesian Belief networks (BBNs) as a high level reasoning tool incorporating inherent uncertainty use in probabilistic inference. Our prototype diagnosis algorithms are represented explicitly through topological symbols and links between them in a causal direction. As new evidence from sensor network development is entered into the model especially, our advisory of system provides operational advice concerning both availability and safety, so that the operator is able to determine the likely modes, diagnose the system state, locate root causes, and take the most advantageous action. Thereby, this advice improves operational availability
PDF

Nonstandard Machine Learning Algorithms for Microarray Data Mining

Zhang, Byoung-Tak
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2001.10a
- /
- pp.165-196
- /
- 2001
DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.
PDF

Search Result 48, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)