• Title/Summary/Keyword: ensemble methods

Search Result 286, Processing Time 0.023 seconds

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

Research on Data Tuning Methods to Improve the Anomaly Detection Performance of Industrial Control Systems (산업제어시스템의 이상 탐지 성능 개선을 위한 데이터 보정 방안 연구)

  • JUN, SANGSO;Lee, Kyung-ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.4
    • /
    • pp.691-708
    • /
    • 2022
  • As the technology of machine learning and deep learning became common, it began to be applied to research on anomaly(abnormal) detection of industrial control systems. In Korea, the HAI dataset was developed and published to activate artificial intelligence research for abnormal detection of industrial control systems, and an AI contest for detecting industrial control system security threats is being conducted. Most of the anomaly detection studies have been to create a learning model with improved performance through the ensemble model method, which is applied either by modifying the existing deep learning algorithm or by applying it together with other algorithms. In this study, a study was conducted to improve the performance of anomaly detection with a post-processing method that detects abnormal data and corrects the labeling results, rather than the learning algorithm and data pre-processing process. Results It was confirmed that the results were improved by about 10% or more compared to the anomaly detection performance of the existing model.

Prediction Model of CNC Processing Defects Using Machine Learning (머신러닝을 이용한 CNC 가공 불량 발생 예측 모델)

  • Han, Yong Hee
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.2
    • /
    • pp.249-255
    • /
    • 2022
  • This study proposed an analysis framework for real-time prediction of CNC processing defects using machine learning-based models that are recently attracting attention as processing defect prediction methods, and applied it to CNC machines. Analysis shows that the XGBoost, CatBoost, and LightGBM models have the same best accuracy, precision, recall, F1 score, and AUC, of which the LightGBM model took the shortest execution time. This short run time has practical advantages such as reducing actual system deployment costs, reducing the probability of CNC machine damage due to rapid prediction of defects, and increasing overall CNC machine utilization, confirming that the LightGBM model is the most effective machine learning model for CNC machines with only basic sensors installed. In addition, it was confirmed that classification performance was maximized when an ensemble model consisting of LightGBM, ExtraTrees, k-Nearest Neighbors, and logistic regression models was applied in situations where there are no restrictions on execution time and computing power.

Enhancing Medium-Range Forecast Accuracy of Temperature and Relative Humidity over South Korea using Minimum Continuous Ranked Probability Score (CRPS) Statistical Correction Technique (연속 순위 확률 점수를 활용한 통합 앙상블 모델에 대한 기온 및 습도 후처리 모델 개발)

  • Hyejeong Bok;Junsu Kim;Yeon-Hee Kim;Eunju Cho;Seungbum Kim
    • Atmosphere
    • /
    • v.34 no.1
    • /
    • pp.23-34
    • /
    • 2024
  • The Korea Meteorological Administration has improved medium-range weather forecasts by implementing post-processing methods to minimize numerical model errors. In this study, we employ a statistical correction technique known as the minimum continuous ranked probability score (CRPS) to refine medium-range forecast guidance. This technique quantifies the similarity between the predicted values and the observed cumulative distribution function of the Unified Model Ensemble Prediction System for Global (UM EPSG). We evaluated the performance of the medium-range forecast guidance for surface air temperature and relative humidity, noting significant enhancements in seasonal bias and root mean squared error compared to observations. Notably, compared to the existing the medium-range forecast guidance, temperature forecasts exhibit 17.5% improvement in summer and 21.5% improvement in winter. Humidity forecasts also show 12% improvement in summer and 23% improvement in winter. The results indicate that utilizing the minimum CRPS for medium-range forecast guidance provide more reliable and improved performance than UM EPSG.

Development of AI-based Smart Agriculture Early Warning System

  • Hyun Sim;Hyunwook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.67-77
    • /
    • 2023
  • This study represents an innovative research conducted in the smart farm environment, developing a deep learning-based disease and pest detection model and applying it to the Intelligent Internet of Things (IoT) platform to explore new possibilities in the implementation of digital agricultural environments. The core of the research was the integration of the latest ImageNet models such as Pseudo-Labeling, RegNet, EfficientNet, and preprocessing methods to detect various diseases and pests in complex agricultural environments with high accuracy. To this end, ensemble learning techniques were applied to maximize the accuracy and stability of the model, and the model was evaluated using various performance indicators such as mean Average Precision (mAP), precision, recall, accuracy, and box loss. Additionally, the SHAP framework was utilized to gain a deeper understanding of the model's prediction criteria, making the decision-making process more transparent. This analysis provided significant insights into how the model considers various variables to detect diseases and pests.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

A Bi-directional Information Learning Method Using Reverse Playback Video for Fully Supervised Temporal Action Localization (완전지도 시간적 행동 검출에서 역재생 비디오를 이용한 양방향 정보 학습 방법)

  • Huiwon Gwon;Hyejeong Jo;Sunhee Jo;Chanho Jung
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.145-149
    • /
    • 2024
  • Recently, research on temporal action localization has been actively conducted. In this paper, unlike existing methods, we propose two approaches for learning bidirectional information by creating reverse playback videos for fully supervised temporal action localization. One approach involves creating training data by combining reverse playback videos and forward playback videos, while the other approach involves training separate models on videos with different playback directions. Experiments were conducted on the THUMOS-14 dataset using TALLFormer. When using both reverse and forward playback videos as training data, the performance was 5.1% lower than that of the existing method. On the other hand, using a model ensemble shows a 1.9% improvement in performance.

Risk Factor Analysis of Cryopreserved Autologous Bone Flap Resorption in Adult Patients Undergoing Cranioplasty with Volumetry Measurement Using Conventional Statistics and Machine-Learning Technique

  • Yohan Son;Jaewoo Chung
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.1
    • /
    • pp.103-114
    • /
    • 2024
  • Objective : Decompressive craniectomy (DC) with duroplasty is one of the common surgical treatments for life-threatening increased intracranial pressure (ICP). Once ICP is controlled, cranioplasty (CP) with reinsertion of the cryopreserved autologous bone flap or a synthetic implant is considered for protection and esthetics. Although with the risk of autologous bone flap resorption (BFR), cryopreserved autologous bone flap for CP is one of the important material due to its cost effectiveness. In this article, we performed conventional statistical analysis and the machine learning technique understand the risk factors for BFR. Methods : Patients aged >18 years who underwent autologous bone CP between January 2015 and December 2021 were reviewed. Demographic data, medical records, and volumetric measurements of the autologous bone flap volume from 94 patients were collected. BFR was defined with absolute quantitative method (BFR-A) and relative quantitative method (BFR%). Conventional statistical analysis and random forest with hyper-ensemble approach (RF with HEA) was performed. And overlapped partial dependence plots (PDP) were generated. Results : Conventional statistical analysis showed that only the initial autologous bone flap volume was statistically significant on BFR-A. RF with HEA showed that the initial autologous bone flap volume, interval between DC and CP, and bone quality were the factors with most contribution to BFR-A, while, trauma, bone quality, and initial autologous bone flap volume were the factors with most contribution to BFR%. Overlapped PDPs of the initial autologous bone flap volume on the BRF-A crossed at approximately 60 mL, and a relatively clear separation was found between the non-BFR and BFR groups. Therefore, the initial autologous bone flap of over 60 mL could be a possible risk factor for BFR. Conclusion : From the present study, BFR in patients who underwent CP with autologous bone flap might be inevitable. However, the degree of BFR may differ from one to another. Therefore, considering artificial bone flaps as implants for patients with large DC could be reasonable. Still, the risk factors for BFR are not clearly understood. Therefore, chronological analysis and pathophysiologic studies are needed.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Very Short- and Long-Term Prediction Method for Solar Power (초 장단기 통합 태양광 발전량 예측 기법)

  • Mun Seop Yun;Se Ryung Lim;Han Seung Jang
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1143-1150
    • /
    • 2023
  • The global climate crisis and the implementation of low-carbon policies have led to a growing interest in renewable energy and a growing number of related industries. Among them, solar power is attracting attention as a representative eco-friendly energy that does not deplete and does not emit pollutants or greenhouse gases. As a result, the supplement of solar power facility is increasing all over the world. However, solar power is easily affected by the environment such as geography and weather, so accurate solar power forecast is important for stable operation and efficient management. However, it is very hard to predict the exact amount of solar power using statistical methods. In addition, the conventional prediction methods have focused on only short- or long-term prediction, which causes to take long time to obtain various prediction models with different prediction horizons. Therefore, this study utilizes a many-to-many structure of a recurrent neural network (RNN) to integrate short-term and long-term predictions of solar power generation. We compare various RNN-based very short- and long-term prediction methods for solar power in terms of MSE and R2 values.