• Title/Summary/Keyword: Ensemble models

Search Result 365, Processing Time 0.024 seconds

AutoFe-Sel: A Meta-learning based methodology for Recommending Feature Subset Selection Algorithms

  • Irfan Khan;Xianchao Zhang;Ramesh Kumar Ayyasam;Rahman Ali
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1773-1793
    • /
    • 2023
  • Automated machine learning, often referred to as "AutoML," is the process of automating the time-consuming and iterative procedures that are associated with the building of machine learning models. There have been significant contributions in this area across a number of different stages of accomplishing a data-mining task, including model selection, hyper-parameter optimization, and preprocessing method selection. Among them, preprocessing method selection is a relatively new and fast growing research area. The current work is focused on the recommendation of preprocessing methods, i.e., feature subset selection (FSS) algorithms. One limitation in the existing studies regarding FSS algorithm recommendation is the use of a single learner for meta-modeling, which restricts its capabilities in the metamodeling. Moreover, the meta-modeling in the existing studies is typically based on a single group of data characterization measures (DCMs). Nonetheless, there are a number of complementary DCM groups, and their combination will allow them to leverage their diversity, resulting in improved meta-modeling. This study aims to address these limitations by proposing an architecture for preprocess method selection that uses ensemble learning for meta-modeling, namely AutoFE-Sel. To evaluate the proposed method, we performed an extensive experimental evaluation involving 8 FSS algorithms, 3 groups of DCMs, and 125 datasets. Results show that the proposed method achieves better performance compared to three baseline methods. The proposed architecture can also be easily extended to other preprocessing method selections, e.g., noise-filter selection and imbalance handling method selection.

A Study on the AI Model for Prediction of Demand for Cold Chain Distribution of Drugs (의약품 콜드체인 유통 수요 예측을 위한 AI 모델에 관한 연구)

  • Hee-young Kim;Gi-hwan Ryu;Jin Cai ;Hyeon-kon Son
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.763-768
    • /
    • 2023
  • In this paper, the existing statistical method (ARIMA) and machine learning method (Informer) were developed and compared to predict the distribution volume of pharmaceuticals. It was found that a machine learning-based model is advantageous for daily data prediction, and it is effective to use ARIMA for monthly prediction and switch to Informer as the data increases. The prediction error rate (RMSE) was reduced by 26.6% compared to the previous method, and the prediction accuracy was improved by 13%, resulting in a result of 86.2%. Through this thesis, we find that there is an advantage of obtaining the best results by ensembleing statistical methods and machine learning methods. In addition, machine learning-based AI models can derive the best results through deep learning operations even in irregular situations, and after commercialization, performance is expected to improve as the amount of data increases.

Prediction of Ship Travel Time in Harbour using 1D-Convolutional Neural Network (1D-CNN을 이용한 항만내 선박 이동시간 예측)

  • Sang-Lok Yoo;Kwang-Il Ki;Cho-Young Jung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.275-276
    • /
    • 2022
  • VTS operators instruct ships to wait for entry and departure to sail in one-way to prevent ship collision accidents in ports with narrow routes. Currently, the instructions are not based on scientific and statistical data. As a result, there is a significant deviation depending on the individual capability of the VTS operators. Accordingly, this study built a 1d-convolutional neural network model by collecting ship and weather data to predict the exact travel time for ship entry/departure waiting for instructions in the port. It was confirmed that the proposed model was improved by more than 4.5% compared to other ensemble machine learning models. Through this study, it is possible to predict the time required to enter and depart a vessel in various situations, so it is expected that the VTS operators will help provide accurate information to the vessel and determine the waiting order.

  • PDF

Predictive Models for the Tourism and Accommodation Industry in the Era of Smart Tourism: Focusing on the COVID-19 Pandemic (스마트관광 시대의 관광숙박업 영업 예측 모형: 코로나19 팬더믹을 중심으로)

  • Yu Jin Jo;Cha Mi Kim;Seung Yeon Son;Mi Jin Noh
    • Smart Media Journal
    • /
    • v.12 no.8
    • /
    • pp.18-25
    • /
    • 2023
  • The COVID-19 outbreak in 2020 caused continuous damage worldwode, especially the smart tourism industry was hit directly by the blockade of sky roads and restriction of going out. At a time when overseas travel and domestic travel have decreased significantly, the number of tourist hotels that are colsed and closed due to the continued deficit is increasing. Therefore, in this study, licensing data from the Ministry of Public Administraion and Security were collected and visualized to understand the operation status of the tourism and lodging industry. The machine learning classification algorithm was applied to implement the business status prediction model of the tourist hotel, the performance of the prediction model was optimized using the ensemble algorithm, and the performance of the model was evaluated through 5-Fold cross-validation. It was predicted that the survival rate of tourist hotels would decrease somewhat, but the actual survival rate was analyzed to be no different from before COVID-19. Through the prediction of the business status of the hotel industry in this paper, it can be used as a basis for grasping the operability and development trends of the entire tourism and lodging industry.

Development of AI-based Smart Agriculture Early Warning System

  • Hyun Sim;Hyunwook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.67-77
    • /
    • 2023
  • This study represents an innovative research conducted in the smart farm environment, developing a deep learning-based disease and pest detection model and applying it to the Intelligent Internet of Things (IoT) platform to explore new possibilities in the implementation of digital agricultural environments. The core of the research was the integration of the latest ImageNet models such as Pseudo-Labeling, RegNet, EfficientNet, and preprocessing methods to detect various diseases and pests in complex agricultural environments with high accuracy. To this end, ensemble learning techniques were applied to maximize the accuracy and stability of the model, and the model was evaluated using various performance indicators such as mean Average Precision (mAP), precision, recall, accuracy, and box loss. Additionally, the SHAP framework was utilized to gain a deeper understanding of the model's prediction criteria, making the decision-making process more transparent. This analysis provided significant insights into how the model considers various variables to detect diseases and pests.

Students' Performance Prediction in Higher Education Using Multi-Agent Framework Based Distributed Data Mining Approach: A Review

  • M.Nazir;A.Noraziah;M.Rahmah
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.135-146
    • /
    • 2023
  • An effective educational program warrants the inclusion of an innovative construction which enhances the higher education efficacy in such a way that accelerates the achievement of desired results and reduces the risk of failures. Educational Decision Support System (EDSS) has currently been a hot topic in educational systems, facilitating the pupil result monitoring and evaluation to be performed during their development. Insufficient information systems encounter trouble and hurdles in making the sufficient advantage from EDSS owing to the deficit of accuracy, incorrect analysis study of the characteristic, and inadequate database. DMTs (Data Mining Techniques) provide helpful tools in finding the models or forms of data and are extremely useful in the decision-making process. Several researchers have participated in the research involving distributed data mining with multi-agent technology. The rapid growth of network technology and IT use has led to the widespread use of distributed databases. This article explains the available data mining technology and the distributed data mining system framework. Distributed Data Mining approach is utilized for this work so that a classifier capable of predicting the success of students in the economic domain can be constructed. This research also discusses the Intelligent Knowledge Base Distributed Data Mining framework to assess the performance of the students through a mid-term exam and final-term exam employing Multi-agent system-based educational mining techniques. Using single and ensemble-based classifiers, this study intends to investigate the factors that influence student performance in higher education and construct a classification model that can predict academic achievement. We also discussed the importance of multi-agent systems and comparative machine learning approaches in EDSS development.

A Bi-directional Information Learning Method Using Reverse Playback Video for Fully Supervised Temporal Action Localization (완전지도 시간적 행동 검출에서 역재생 비디오를 이용한 양방향 정보 학습 방법)

  • Huiwon Gwon;Hyejeong Jo;Sunhee Jo;Chanho Jung
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.145-149
    • /
    • 2024
  • Recently, research on temporal action localization has been actively conducted. In this paper, unlike existing methods, we propose two approaches for learning bidirectional information by creating reverse playback videos for fully supervised temporal action localization. One approach involves creating training data by combining reverse playback videos and forward playback videos, while the other approach involves training separate models on videos with different playback directions. Experiments were conducted on the THUMOS-14 dataset using TALLFormer. When using both reverse and forward playback videos as training data, the performance was 5.1% lower than that of the existing method. On the other hand, using a model ensemble shows a 1.9% improvement in performance.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Analysis of Future Bioclimatic Zones Using Multi-climate Models (다중기후모형을 활용한 동북아시아의 미래 생물기후권역 변화분석)

  • Choi, Yuyoung;Lim, Chul-Hee;Ryu, Jieun;Jeon, Seongwoo
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.5
    • /
    • pp.489-508
    • /
    • 2018
  • As climate changes, it is necessary to predict changes in the habitat environment in order to establish more aggressive adaptation strategies. The bioclimatic classification which clusters of areas with similar habitats can provide a useful ecosystem management framework. Therefore, in this study, biological habitat environment of Northeast Asia was identified through the establishment of the bioclimatic zones, and the impac of climate change on the biological habitat was analyzed. An ISODATA clustering was used to classify Northeast Asia (NEA)into 15 bioclimatic zones, and climate change impacts were predicted by projecting the future spatial distribution of bioclimatic zones based upon an ensemble of 17 GCMs across RCP4.5 and 8.5 scenarios for 2050s, and 2070s. Results demonstrated that significant changes in bioclimatic conditions can be expected throughout the NEA by 2050s and 2070s. The overall zones moved upward, and some zones were predicted to be greatly expanded or shrunk where we suggested as regions requiring intensive management. This analysis provides the basis for understanding potential impacts of climate change on biodiversity and ecosystem. Also, this could be used more effectively to support decision making on climate change adaptation.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.