• Title/Summary/Keyword: statistical learning approach

Search Result 153, Processing Time 0.03 seconds

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

  • Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1045-1055
    • /
    • 2007
  • Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.

Pipeline Structural Damage Detection Using Self-Sensing Technology and PNN-Based Pattern Recognition (자율 감지 및 확률론적 신경망 기반 패턴 인식을 이용한 배관 구조물 손상 진단 기법)

  • Lee, Chang-Gil;Park, Woong-Ki;Park, Seung-Hee
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.31 no.4
    • /
    • pp.351-359
    • /
    • 2011
  • In a structure, damage can occur at several scales from micro-cracking to corrosion or loose bolts. This makes the identification of damage difficult with one mode of sensing. Hence, a multi-mode actuated sensing system is proposed based on a self-sensing circuit using a piezoelectric sensor. In the self sensing-based multi-mode actuated sensing, one mode provides a wide frequency-band structural response from the self-sensed impedance measurement and the other mode provides a specific frequency-induced structural wavelet response from the self-sensed guided wave measurement. In this study, an experimental study on the pipeline system is carried out to verify the effectiveness and the robustness of the proposed structural health monitoring approach. Different types of structural damage are artificially inflicted on the pipeline system. To classify the multiple types of structural damage, a supervised learning-based statistical pattern recognition is implemented by composing a two-dimensional space using the damage indices extracted from the impedance and guided wave features. For more systematic damage classification, several control parameters to determine an optimal decision boundary for the supervised learning-based pattern recognition are optimized. Finally, further research issues will be discussed for real-world implementation of the proposed approach.

Prediction of Wave Breaking Using Machine Learning Open Source Platform (머신러닝 오픈소스 플랫폼을 활용한 쇄파 예측)

  • Lee, Kwang-Ho;Kim, Tag-Gyeom;Kim, Do-Sam
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.4
    • /
    • pp.262-272
    • /
    • 2020
  • A large number of studies on wave breaking have been carried out, and many experimental data have been documented. Moreover, on the basis of various experimental data set, many empirical or semi-empirical formulas based primarily on regression analysis have been proposed to quantitatively estimate wave breaking for engineering applications. However, wave breaking has an inherent variability, which imply that a linear statistical approach such as linear regression analysis might be inadequate. This study presents an alternative nonlinear method using an neural network, one of the machine learning methods, to estimate breaking wave height and breaking depth. The neural network is modeled using Tensorflow, a machine learning open source platform distributed by Google. The neural network is trained by randomly selecting the collected experimental data, and the trained neural network is evaluated using data not used for learning process. The results for wave breaking height and depth predicted by fully trained neural network are more accurate than those obtained by existing empirical formulas. These results show that neural network is an useful tool for the prediction of wave breaking.

Reinforcement Method for Automated Text Classification using Post-processing and Training with Definition Criteria (학습방법개선과 후처리 분석을 이용한 자동문서분류의 성능향상 방법)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.811-822
    • /
    • 2005
  • Automated text categorization is to classify free text documents into predefined categories automatically and whose main goals is to reduce considerable manual process required to the task. The researches to improving the text categorization performance(efficiency) in recent years, focused on enhancing existing classification models and algorithms itself, but, whose range had been limited by feature based statistical methodology. In this paper, we propose RTPost system of different style from i.ny traditional method, which takes fault tolerant system approach and data mining strategy. The 2 important parts of RTPost system are reinforcement training and post-processing part. First, the main point of training method deals with the problem of defining category to be classified before selecting training sample documents. And post-processing method deals with the problem of assigning category, not performance of classification algorithms. In experiments, we applied our system to documents getting low classification accuracy which were laid on a decision boundary nearby. Through the experiments, we shows that our system has high accuracy and stability in actual conditions. It wholly did not depend on some variables which are important influence to classification power such as number of training documents, selection problem and performance of classification algorithms. In addition, we can expect self learning effect which decrease the training cost and increase the training power with employing active learning advantage.

Diagnosis by Rough Set and Information Theory in Reinforcing the Competencies of the Collegiate (러프집합과 정보이론을 이용한 대학생역량강화 진단)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.257-264
    • /
    • 2014
  • This paper presents the core competencies diagnosis system which targeted our collegiate students in an attempt to induce the core competencies for reinforcing the learning and employment capabilities. Because these days data give rise to a high level of redundancy and dimensionality with time complexity, they are more likely to have spurious relationships, and even the weakest relationships will be highly significant by any statistical test. So as to address the measurement of uncertainties from the classification of categorical data and the implementation of its analytic system, an uncertainty measure of rough entropy and information entropy is defined so that similar behaviors analysis is carried out and the clustering ability is demonstrated in the comparison with the statistical approach. Because the acquired and necessary competencies of the collegiate is deduced by way of the results of the diagnosis, i.e. common core competencies and major core competencies, they facilitate not only the collegiate life and the employment capability reinforcement but also the revitalization of employment and the adjustment to college life.

Pattern Recognition using Robust Feedforward Neural Networks (로버스트 다층전방향 신경망을 이용한 패턴인식)

  • Hwang, Chang-Ha;Kim, Sang-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.345-355
    • /
    • 1998
  • The back propagation(BP) algorithm allows multilayer feedforward neural networks to learn input-output mappings from training samples. It iteratively adjusts the network parameters(weights) to minimize the sum of squared approximation errors using a gradient descent technique. However, the mapping acquired through the BP algorithm may be corrupt when errorneous training data are employed. In this paper two types of robust backpropagation algorithms are discussed both from a theoretical point of view and in the case studies of nonlinear regression function estimation and handwritten Korean character recognition. For future research we suggest Bayesian learning approach to neural networks and compare it with two robust backpropagation algorithms.

  • PDF

Modeling mechanical strength of self-compacting mortar containing nanoparticles using wavelet-based support vector machine

  • Khatibinia, Mohsen;Feizbakhsh, Abdosattar;Mohseni, Ehsan;Ranjbar, Malek Mohammad
    • Computers and Concrete
    • /
    • v.18 no.6
    • /
    • pp.1065-1082
    • /
    • 2016
  • The main aim of this study is to predict the compressive and flexural strengths of self-compacting mortar (SCM) containing $nano-SiO_2$, $nano-Fe_2O_3$ and nano-CuO using wavelet-based weighted least squares-support vector machines (WLS-SVM) approach which is called WWLS-SVM. The WWLS-SVM regression model is a relatively new metamodel has been successfully introduced as an excellent machine learning algorithm to engineering problems and has yielded encouraging results. In order to achieve the aim of this study, first, the WLS-SVM and WWLS-SVM models are developed based on a database. In the database, nine variables which consist of cement, sand, NS, NF, NC, superplasticizer dosage, slump flow diameter and V-funnel flow time are considered as the input parameters of the models. The compressive and flexural strengths of SCM are also chosen as the output parameters of the models. Finally, a statistical analysis is performed to demonstrate the generality performance of the models for predicting the compressive and flexural strengths. The numerical results show that both of these metamodels have good performance in the desirable accuracy and applicability. Furthermore, by adopting these predicting metamodels, the considerable cost and time-consuming laboratory tests can be eliminated.

Financial Capability and Differences in Age and Ethnicity

  • MOKHTAR, Nuradibah;SABRI, Mohamad Fazli;HO, Catherine Soke Fun
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.10
    • /
    • pp.1081-1091
    • /
    • 2020
  • The objective of this study is to disclose the effect of socio-demographic characteristics such as, age and ethnicity which is comprised of Malay, Chinese, Indian and Others on four financial capability domains namely planning ahead, managing money, choosing products and staying informed. A closed ended self-administered questionnaire was disseminated to a total of 2000 respondents among four types of groups which consist of FELDA or rural area residents, private sector employees, government sector employees and youth in institutions of higher learning in Malaysia. Those four groups were selected to cover a wide range of Malaysian population. 500 respondents were involved in this study for each types of groups through purposive sampling technique. Analysis of Variance (ANOVA) and analysis via Statistical Package for Social Science (SPSS) was utilized in this study. The results revealed that age has significant effect on planning ahead, managing money, choosing products and staying informed. Whereas, ethnicities were found to have no effect on financial capability except planning ahead domain. It is suggested that more devotion should be placed on research and professional training in building respondents' financial capability. Furthermore, government and non-government organizations should develop a comprehensive approach to intensify their financial capability and upgrade their standards of living especially of financially vulnerable households.

A Study on the Experimental Application of the Artificial Neural Network for the Process Improvement (공정개선을 위한 인공신경망의 실험적 적용에 관한 연구)

  • 한우철
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.1
    • /
    • pp.174-183
    • /
    • 2002
  • In this paper a control chart pattern recognition methodology based on the back propagation algorithm and Multi layer perceptron, a neural computing theory, is presented. This pattern recognition algorithm, suitable for real time statistical process control. evaluates observations routinely collected for control charting to determine whether a Pattern, such as a cycle. trend or shift, which is exists in the data. This approach is promising because of its flexible training and high speed computation with low-end workstation. The artificial neural network methodology is developed utilizing the delta learning rule, sigmoid activation function with two hidden layers. In a computer integrated manufacturing environment, the operator need not routinely monitor the control chart but, rather, can be alerted to patterns by a computer signal generated by the proposed system.

  • PDF

Trend Analysis of Thyroid Cancer Research in Korea with Text Mining Techniques

  • Lee, Tae-Gyeong;Heo, Seong-Min;Shin, Seung-Hyeok;Yang, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.153-161
    • /
    • 2018
  • In this paper, we propose a text-centered approach to identify the research trend of thyroid cancer in Korea. We incorporate statistical analysis, text mining and machine learning techniques with our clinical insights to find connective associations between terminologies and to discover informative clusters of literatures. The incidence of thyroid cancer in Korea increased rapidly in the 2000s, which fueled the debate regarding overdiagnosis, but recently the number of patients undergoing surgery has decreased significantly due to conscious reform efforts from various circles. We analyzed the abstracts and keywords of related research papers from DBpia. It was found that most were case reports in the 1980s, and some papers in the 1990s discussed the early detection of thyroid cancer by mass screening. While many papers focused on different diagnostic techniques and the detection of small cancers in the 2000s, many emphasized more on the quality of life of patients in the 2010s. There was an apparent change in the topics of thyroid cancer research over past decades. The results of this study would serve as a reference guide for current and future research directions.