• Title/Summary/Keyword: Ensemble model

Search Result 662, Processing Time 0.027 seconds

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

Development of decision support system for water resources management using GloSea5 long-term rainfall forecasts and K-DRUM rainfall-runoff model (GloSea5 장기예측 강수량과 K-DRUM 강우-유출모형을 활용한 물관리 의사결정지원시스템 개발)

  • Song, Junghyun;Cho, Younghyun;Kim, Ilseok;Yi, Jonghyuk
    • Journal of Satellite, Information and Communications
    • /
    • v.12 no.3
    • /
    • pp.22-34
    • /
    • 2017
  • The K-DRUM(K-water hydrologic & hydraulic Distributed RUnoff Model), a distributed rainfall-runoff model of K-water, calculates predicted runoff and water surface level of a dam using precipitation data. In order to obtain long-term hydrometeorological information, K-DRUM requires long-term weather forecast. In this study, we built a system providing long-term hydrometeorological information using predicted rainfall ensemble of GloSea5(Global Seasonal Forecast System version 5), which is the seasonal meteorological forecasting system of KMA introduced in 2014. This system produces K-DRUM input data by automatic pre-processing and bias-correcting GloSea5 data, then derives long-term inflow predictions via K-DRUM. Web-based UI was developed for users to monitor the hydrometeorological information such as rainfall, runoff, and water surface level of dams. Through this UI, users can also test various dam management scenarios by adjusting discharge amount for decision-making.

Uncertainty of Hydro-meteorological Predictions Due to Climate Change in the Republic of Korea (기후변화에 따른 우리나라 수문 기상학적 예측의 불확실성)

  • Nkomozepi, Temba;Chung, Sang-Ok
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.3
    • /
    • pp.257-267
    • /
    • 2014
  • The impact of the combination of changes in temperature and rainfall due to climate change on surface water resources is important in hydro-meteorological research. In this study, 4 hydro-meteorological (HM) models from the Rainfall Runoff Library in the Catchment Modeling Toolkit were used to model the impact of climate change on runoff in streams for 5 river basins in the Republic of Korea. Future projections from 2021 to 2040 (2030s), 2051 to 2070 (2060s) and 2081 to 2099 (2090s), were derived from 12 General Circulation Models (GCMs) and 3 representative concentration pathways (RCPs). GCM outputs were statistically adjusted and downscaled using Long-Ashton Research Station Weather Generator (LARS-WG) and the HM models were well calibrated and verified for the period from 1999 to 2009. The study showed that there is substantial spatial, temporal and HM uncertainty in the future runoff shown by the interquartile range, range and coefficient of variation. In summary, the aggregated runoff will increase in the future by 10~24%, 7~30% and 11~30% of the respective baseline runoff for the RCP2.6, RCP4.5 and RCP8.5, respectively. This study presents a method to model future stream-flow taking into account the HM model and climate based uncertainty.

A Study on Prediction of EPB shield TBM Advance Rate using Machine Learning Technique and TBM Construction Information (머신러닝 기법과 TBM 시공정보를 활용한 토압식 쉴드TBM 굴진율 예측 연구)

  • Kang, Tae-Ho;Choi, Soon-Wook;Lee, Chulho;Chang, Soo-Ho
    • Tunnel and Underground Space
    • /
    • v.30 no.6
    • /
    • pp.540-550
    • /
    • 2020
  • Machine learning has been actively used in the field of automation due to the development and establishment of AI technology. The important thing in utilizing machine learning is that appropriate algorithms exist depending on data characteristics, and it is needed to analysis the datasets for applying machine learning techniques. In this study, advance rate is predicted using geotechnical and machine data of TBM tunnel section passing through the soil ground below the stream. Although there were no problems of application of statistical technology in the linear regression model, the coefficient of determination was 0.76. While, the ensemble model and support vector machine showed the predicted performance of 0.88 or higher. it is indicating that the model suitable for predicting advance rate of the EPB Shield TBM was the support vector machine in the analyzed dataset. As a result, it is judged that the suitability of the prediction model using data including mechanical data and ground information is high. In addition, research is needed to increase the diversity of ground conditions and the amount of data.

A Study on the Predictability of the Number of Days of Heat and Cold Damages by Growth Stages of Rice Using PNU CGCM-WRF Chain in South Korea (PNU CGCM-WRF Chain을 이용한 남한지역 벼의 생육단계별 고온해 및 저온해 발생일수에 대한 예측성 연구)

  • Kim, Young-Hyun;Choi, Myeong-Ju;Shim, Kyo-Moon;Hur, Jina;Jo, Sera;Ahn, Joong-Bae
    • Atmosphere
    • /
    • v.31 no.5
    • /
    • pp.577-592
    • /
    • 2021
  • This study evaluates the predictability of the number of days of heat and cold damages by growth stages of rice in South Korea using the hindcast data (1986~2020) produced by Pusan National University Coupled General Circulation Model-Weather Research and Forecasting (PNU CGCM-WRF) model chain. The predictability is accessed in terms of Root Mean Square Error (RMSE), Normalized Standardized Deviations (NSD), Hit Rate (HR) and Heidke Skill Score (HSS). For the purpose, the model predictability to produce the daily maximum and minimum temperatures, which are the variables used to define heat and cold damages for rice, are evaluated first. The result shows that most of the predictions starting the initial conditions from January to May (01RUN to 05RUN) have reasonable predictability, although it varies to some extent depending on the month at which integration starts. In particular, the ensemble average of 01RUN to 05RUN with equal weighting (ENS) has more reasonable predictability (RMSE is in the range of 1.2~2.6℃ and NSD is about 1.0) than individual RUNs. Accordingly, the regional patterns and characteristics of the predicted damages for rice due to excessive high- and low-temperatures are well captured by the model chain when compared with observation, particularly in regions where the damages occur frequently, in spite that hindcasted data somewhat overestimate the damages in terms of number of occurrence days. In ENS, the HR and HSS for heat (cold) damages in rice is in the ranges of 0.44~0.84 and 0.05~0.13 (0.58~0.81 and -0.01~0.10) by growth stage. Overall, it is concluded that the PNU CGCM-WRF chain of 01RUN~05RUN and ENS has reasonable capability to predict the heat and cold damages for rice in South Korea.

A Study on Customer Review Rating Recommendation and Prediction through Online Promotional Activity Analysis - Focusing on "S" Company Wearable Products - (온라인 판매촉진활동 분석을 통한 고객 리뷰평점 추천 및 예측에 관한 연구 : S사 Wearable 상품중심으로)

  • Shin, Ho-cheol
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.118-129
    • /
    • 2022
  • The purpose of this report is to study a strategic model of promotion activities through various analysis and sales forecasting by selecting wearable products for domestic online companies and collecting sales data. For data analysis, various algorithms are used for analysis and the results are selected as the optimal model. The gradation boosting model, which is selected as the best result, will allow nine independent variables to be entered, including promotion type, price, amount, gender, model, company, grade, sales date, and region, when predicting dependent variables through supervised learning. In this study, the review values set as dependent variables for each type of sales promotion were studied in more detail through the ensemble analysis technique, and the main purpose is to analyze and predict them. The purpose of this study is to study the grades. As a result of the analysis, the evaluation result is 95% of AUC, and F1 is about 93%. In the end, it was confirmed that among the types of sales promotion activities, value-added benefits affected the number of reviews and review grades, and that major variables affected the review and review grades.

Water Level Prediction on the Golok River Utilizing Machine Learning Technique to Evaluate Flood Situations

  • Pheeranat Dornpunya;Watanasak Supaking;Hanisah Musor;Oom Thaisawasdi;Wasukree Sae-tia;Theethut Khwankeerati;Watcharaporn Soyjumpa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.31-31
    • /
    • 2023
  • During December 2022, the northeast monsoon, which dominates the south and the Gulf of Thailand, had significant rainfall that impacted the lower southern region, causing flash floods, landslides, blustery winds, and the river exceeding its bank. The Golok River, located in Narathiwat, divides the border between Thailand and Malaysia was also affected by rainfall. In flood management, instruments for measuring precipitation and water level have become important for assessing and forecasting the trend of situations and areas of risk. However, such regions are international borders, so the installed measuring telemetry system cannot measure the rainfall and water level of the entire area. This study aims to predict 72 hours of water level and evaluate the situation as information to support the government in making water management decisions, publicizing them to relevant agencies, and warning citizens during crisis events. This research is applied to machine learning (ML) for water level prediction of the Golok River, Lan Tu Bridge area, Sungai Golok Subdistrict, Su-ngai Golok District, Narathiwat Province, which is one of the major monitored rivers. The eXtreme Gradient Boosting (XGBoost) algorithm, a tree-based ensemble machine learning algorithm, was exploited to predict hourly water levels through the R programming language. Model training and testing were carried out utilizing observed hourly rainfall from the STH010 station and hourly water level data from the X.119A station between 2020 and 2022 as main prediction inputs. Furthermore, this model applies hourly spatial rainfall forecasting data from Weather Research and Forecasting and Regional Ocean Model System models (WRF-ROMs) provided by Hydro-Informatics Institute (HII) as input, allowing the model to predict the hourly water level in the Golok River. The evaluation of the predicted performances using the statistical performance metrics, delivering an R-square of 0.96 can validate the results as robust forecasting outcomes. The result shows that the predicted water level at the X.119A telemetry station (Golok River) is in a steady decline, which relates to the input data of predicted 72-hour rainfall from WRF-ROMs having decreased. In short, the relationship between input and result can be used to evaluate flood situations. Here, the data is contributed to the Operational support to the Special Water Resources Management Operation Center in Southern Thailand for flood preparedness and response to make intelligent decisions on water management during crisis occurrences, as well as to be prepared and prevent loss and harm to citizens.

  • PDF

The KMA Global Seasonal forecasting system (GloSea6) - Part 2: Climatological Mean Bias Characteristics (기상청 기후예측시스템(GloSea6) - Part 2: 기후모의 평균 오차 특성 분석)

  • Hyun, Yu-Kyung;Lee, Johan;Shin, Beomcheol;Choi, Yuna;Kim, Ji-Yeong;Lee, Sang-Min;Ji, Hee-Sook;Boo, Kyung-On;Lim, Somin;Kim, Hyeri;Ryu, Young;Park, Yeon-Hee;Park, Hyeong-Sik;Choo, Sung-Ho;Hyun, Seung-Hwon;Hwang, Seung-On
    • Atmosphere
    • /
    • v.32 no.2
    • /
    • pp.87-101
    • /
    • 2022
  • In this paper, the performance improvement for the new KMA's Climate Prediction System (GloSea6), which has been built and tested in 2021, is presented by assessing the bias distribution of basic variables from 24 years of GloSea6 hindcasts. Along with the upgrade from GloSea5 to GloSea6, the performance of GloSea6 can be regarded as notable in many respects: improvements in (i) negative bias of geopotential height over the tropical and mid-latitude troposphere and over polar stratosphere in boreal summer; (ii) cold bias of tropospheric temperature; (iii) underestimation of mid-latitude jets; (iv) dry bias in the lower troposphere; (v) cold tongue bias in the equatorial SST and the warm bias of Southern Ocean, suggesting the potential of improvements to the major climate variability in GloSea6. The warm surface temperature in the northern hemisphere continent in summer is eliminated by using CDF-matched soil-moisture initials. However, the cold bias in high latitude snow-covered area in winter still needs to be improved in the future. The intensification of the westerly winds of the summer Asian monsoon and the weakening of the northwest Pacific high, which are considered to be major errors in the GloSea system, had not been significantly improved. However, both the use of increased number of ensembles and the initial conditions at the closest initial dates reveals possibility to improve these biases. It is also noted that the effect of ensemble expansion mainly contributes to the improvement of annual variability over high latitudes and polar regions.

Covalent Organic Frameworks for Extremely High Reversible $CO_2$ and $H_2$ Uptake Capacity : A Multiscale Simulation Approach (우수한 가역적 이산화탄소 및 수소 저장성능을 가지는 공유결합성 유기적 골격구조체에 관한 다중스케일 접근법을 이용한 연구)

  • Choi, Yoon Jeong;Choi, Jung Hoon;Choi, Kyung Min;Kang, Jeung Ku
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2010.11a
    • /
    • pp.113.2-113.2
    • /
    • 2010
  • We report that the novel covalent organic frameworks (COFs) are capable of reversibly providing an extremely high uptake capacity of carbon dioxide and hydrogen at room temperature. These COFs are designed based on the multiscale simulations approach via the combination of ab initio calculations and force-field calculations. For this goal, we explore the adsorption sites of carbon dioxide and hydrogen on COFs, their porosity, as well as carbon dioxide adsorption isotherms. We identify the binding sites and energies of $CO_2$ on COFs using ab initio calculations and obtain the carbon dioxide adsorption isotherms using grand canonical ensemble Monte Carlo calculations. Moreover, the calculated adsorption isotherms are compared with the experimental values in order to build the reference model in describing the interactions between the $CO_2/H_2$ and the COFs and in predicting the $CO_2$ and $H_2$ adsorption isotherms of COFs. Finally, we design three new COFs, 2D COF-05, 3D COF-05 (ctn), and 3D COF-05 (bor), for the high capacity $CO_2/H_2$ and $H_2$ storage.

  • PDF