• Title/Summary/Keyword: statistical learning method

Search Result 483, Processing Time 0.022 seconds

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

Determination of Optimal Adhesion Conditions for FDM Type 3D Printer Using Machine Learning

  • Woo Young Lee;Jong-Hyeok Yu;Kug Weon Kim
    • Journal of Practical Engineering Education
    • /
    • v.15 no.2
    • /
    • pp.419-427
    • /
    • 2023
  • In this study, optimal adhesion conditions to alleviate defects caused by heat shrinkage with FDM type 3D printers with machine learning are researched. Machine learning is one of the "statistical methods of extracting the law from data" and can be classified as supervised learning, unsupervised learning and reinforcement learning. Among them, a function model for adhesion between the bed and the output is presented using supervised learning specialized for optimization, which can be expected to reduce output defects with FDM type 3D printers by deriving conditions for optimum adhesion between the bed and the output. Machine learning codes prepared using Python generate a function model that predicts the effect of operating variables on adhesion using data obtained through adhesion testing. The adhesion prediction data and verification data have been shown to be very consistent, and the potential of this method is explained by conclusions.

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (신경망 및 통계적 방법에 의한 클러스터링 성능평가)

  • 윤석환;민준영;신용백
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.19 no.37
    • /
    • pp.41-51
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Learning vector Quantization) for a neural method and the k-means algorithm fer a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k-means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

A Study on the AI Model for Prediction of Demand for Cold Chain Distribution of Drugs (의약품 콜드체인 유통 수요 예측을 위한 AI 모델에 관한 연구)

  • Hee-young Kim;Gi-hwan Ryu;Jin Cai ;Hyeon-kon Son
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.763-768
    • /
    • 2023
  • In this paper, the existing statistical method (ARIMA) and machine learning method (Informer) were developed and compared to predict the distribution volume of pharmaceuticals. It was found that a machine learning-based model is advantageous for daily data prediction, and it is effective to use ARIMA for monthly prediction and switch to Informer as the data increases. The prediction error rate (RMSE) was reduced by 26.6% compared to the previous method, and the prediction accuracy was improved by 13%, resulting in a result of 86.2%. Through this thesis, we find that there is an advantage of obtaining the best results by ensembleing statistical methods and machine learning methods. In addition, machine learning-based AI models can derive the best results through deep learning operations even in irregular situations, and after commercialization, performance is expected to improve as the amount of data increases.

Deep Learning-based Delinquent Taxpayer Prediction: A Scientific Administrative Approach

  • YongHyun Lee;Eunchan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.30-45
    • /
    • 2024
  • This study introduces an effective method for predicting individual local tax delinquencies using prevalent machine learning and deep learning algorithms. The evaluation of credit risk holds great significance in the financial realm, impacting both companies and individuals. While credit risk prediction has been explored using statistical and machine learning techniques, their application to tax arrears prediction remains underexplored. We forecast individual local tax defaults in Republic of Korea using machine and deep learning algorithms, including convolutional neural networks (CNN), long short-term memory (LSTM), and sequence-to-sequence (seq2seq). Our model incorporates diverse credit and public information like loan history, delinquency records, credit card usage, and public taxation data, offering richer insights than prior studies. The results highlight the superior predictive accuracy of the CNN model. Anticipating local tax arrears more effectively could lead to efficient allocation of administrative resources. By leveraging advanced machine learning, this research offers a promising avenue for refining tax collection strategies and resource management.

The Effect of Cooperative Learning method in Home Economics on students′Interest and Attitude about Subject matter (가정과 수업의 협동학습이 학생의 교과에 대한 흥미와 태도에 미치는 영향)

  • 양정혜;신상옥
    • Journal of Korean Home Economics Education Association
    • /
    • v.10 no.1
    • /
    • pp.137-151
    • /
    • 1998
  • The purpose of this study is (1)to develop the teaching plan based on Cooperative Learning approach and (2)to investigate the effect of students'Interest on Subject matter and Teaching method and Attitudes to others of the area of Foreign food in Home Economics class. Among those various types of Cooperative Learning's models, this study adopted 'Learning Together'developed by Johnsons. To investigate these purpose, subject matter were analyzed and reconstructed for Cooperative Learning. The tests were developed to evaluate the interest on the Subject matter and teaching methods, and the attitude to others of the students. 108 femail high school students were divided into two groups with 54 students-traditional learning condition, Cooperative Learning condition-and had a 5 session. The subject of the class was Foreign food including Western, Chinese, and Japanes food. Before and after the class, students were tested. The statistical methods used for the study methods used for the study were t-test. The research findings are as follows : When the students in the Cooperative Learning classes were compared before and after the test, (1)Interest on Subject matter were improved considerably(p〈.001) (2)Interest on Teaching methods were improved considerably(p〈.05) (3)Attitude to Others were improved considerably(p〈.001) Therefore when the teaching-learning model based on Cooperative Liarning was used in Home Economics class, their interest on the subject and teaching methods and attitude to others were improved.

  • PDF

A Study on RN Students′ Education Satisfaction Toward RN-to-BSN Programs (간호학사 편입학과정(RN-BSN)생들의 특성 및 교육만족도 조사)

  • 김현실;이옥자
    • Journal of Korean Academy of Nursing
    • /
    • v.29 no.4
    • /
    • pp.963-976
    • /
    • 1999
  • This study was undertaken to investigate the general characteristics of students, which include the degree of satisfaction, motives of admission, the recognition of advantages and disadvantages, opinion of students on self-directed learning, and planning and anticipatory effects after graduation. Data was collected through a questionnaire survey over a period of four months, from May 1997 to August 1997. The subjects used for this study consisted of 322 RN students sampled from six RN-to-BSN programs in Korea using the census sampling method. Statistical methods employed for this study included discriptive statistics, M ANOVA, and F-test. The results of the study are as follows 1. The RN students' motives of admission to RN-to-BSN programs were ‘for personal advancement’, ‘to earn a BSN degree’, and ‘for professional development’ in this order. 2. The RN students' responses to the advantages of RN-to-BSN programs were ‘acquisition of new knowledge and a BSN degree’ and ‘to gain professional thinking and a broader view’, while as the disadvantages of RN-to-BSN programs were ‘geographical isolation of institutions’, ‘limitation of information’, and ‘underdeveloped school environments’ in this order. 3. The survey based on opinions toward self-directed learning showed that there was a need of detailed guidelines for self-directed learning. Most agreed that it was a very effective learning method for a RN student, and the self-directed learning method Increases motives for learning. 4. The students' anticipatory effect after graduation were ‘self-achievement’, ‘development of professional skills’, and ‘admission to post-graduate school or programs to study abroad’. 5. The students were very satisfied with the quality of faculty members, and satisfied with the quality of lectures and teaching. However, students were unsatisfied with rented lecture rooms, and very unsatisfied with self-directed learning methods. 6. School nurses showed higher statistical significances in the need for teaching material and anticipatory effect after graduation than other RN students working in hospitals and public health agencies. Also, school nurses, public health nurses, and industry nurses showed higher statistical significances in motives of admission than RN students working in hospitals. Further more, staff nurses, school nurses, and industry nurses showed higher levels of satisfaction toward a RN-to-BSN programs than nurses in higher positions, such as administrators or directors of nursing. 7 City residents were more satisfied with RN-to-BSN programs than rural residents. Otherwise, the rural residents had higher motives for admission, a bigger need for teaching materials, and recognition of the disadvantages of RN-to-BSN programs than city residents. Finally, RN students who earned below a monthly income of ₩1,000,000 showed higher motivation for admission than those who earned more than ₩1,000,000.

  • PDF

A New Similarity Measure Based on Intraclass Statistics for Biometric Systems

  • Lee, Kwan-Yong;Park, Hye-Young
    • ETRI Journal
    • /
    • v.25 no.5
    • /
    • pp.401-406
    • /
    • 2003
  • A biometric system determines the identity of a person by measuring physical features that can distinguish that person from others. Since biometric features have many variations and can be easily corrupted by noises and deformations, it is necessary to apply machine learning techniques to treat the data. When applying the conventional machine learning methods in designing a specific biometric system, however, one first runs into the difficulty of collecting sufficient data for each person to be registered to the system. In addition, there can be an almost infinite number of variations of non-registered data. Therefore, it is difficult to analyze and predict the distributional properties of real data that are essential for the system to deal with in practical applications. These difficulties require a new framework of identification and verification that is appropriate and efficient for the specific situations of biometric systems. As a preliminary solution, this paper proposes a simple but theoretically well-defined method based on a statistical test theory. Our computational experiments on real-world data show that the proposed method has potential for coping with the actual difficulties in biometrics.

  • PDF