• Title/Summary/Keyword: 앙상블 기계학습

Search Result 78, Processing Time 0.03 seconds

Extracting Significant Information from Social Text using Machine Learning (기계학습을 활용한 소셜 텍스트의 주요 정보 추출 기법)

  • Kim, So-Hyeon;Kim, Han-joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.742-745
    • /
    • 2016
  • 빅데이터 시대를 맞이하여 텍스트마이닝과 오피니언마이닝의 활용도가 커지고 있는 시점에서 소셜 네트워크 데이터로부터 유용한 데이터를 추출하는 작업은 매우 중요하다. 이에 본 논문은 블로그 HTML 문서에서 추출한 태그 특징에 로지스틱 회귀 및 앙상블 기법을 적용하여 본문을 포함하는 태그를 분류하는 모델을 구성한 뒤 태그의 깊이 특징을 이용하여 주요 본문을 찾는 방법을 제안한다. 직접 수집한 데이터를 이용한 실험에서 태그 분류 정확도가 0.990, 본문을 찾아낸 문서의 비율이 80.5%로 나왔다.

Development of Machine Learning Ensemble Model using Artificial Intelligence (인공지능을 활용한 기계학습 앙상블 모델 개발)

  • Lee, K.W.;Won, Y.J.;Song, Y.B.;Cho, K.S.
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.34 no.5
    • /
    • pp.211-217
    • /
    • 2021
  • To predict mechanical properties of secondary hardening martensitic steels, a machine learning ensemble model was established. Based on ANN(Artificial Neural Network) architecture, some kinds of methods was considered to optimize the model. In particular, interaction features, which can reflect interactions between chemical compositions and processing conditions of real alloy system, was considered by means of feature engineering, and then K-Fold cross validation coupled with bagging ensemble were investigated to reduce R2_score and a factor indicating average learning errors owing to biased experimental database.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

An Improvement Study on the Hydrological Quantitative Precipitation Forecast (HQPF) for Rainfall Impact Forecasting (호우 영향예보를 위한 수문학적 정량강우예측(HQPF) 개선 연구)

  • Yoon Hu Shin;Sung Min Kim;Yong Keun Jee;Young-Mi Lee;Byung-Sik Kim
    • Journal of Korean Society of Disaster and Security
    • /
    • v.15 no.4
    • /
    • pp.87-98
    • /
    • 2022
  • In recent years, frequent localized heavy rainfalls, which have a lot of rainfall in a short period of time, have been increasingly causing flooding damages. To prevent damage caused by localized heavy rainfalls, Hydrological Quantitative Precipitation Forecast (HQPF) was developed using the Local ENsemble prediction System (LENS) provided by the Korea Meteorological Administration (KMA) and Machine Learning and Probability Matching (PM) techniques using Digital forecast data. HQPF is produced as information on the impact of heavy rainfall to prepare for flooding damage caused by localized heavy rainfalls, but there is a tendency to overestimate the low rainfall intensity. In this study, we improved HQPF by expanding the period of machine learning data, analyzing ensemble techniques, and changing the process of Probability Matching (PM) techniques to improve predictive accuracy and over-predictive propensity of HQPF. In order to evaluate the predictive performance of the improved HQPF, we performed the predictive performance verification on heavy rainfall cases caused by the Changma front from August 27, 2021 to September 3, 2021. We found that the improved HQPF showed a significantly improved prediction accuracy for rainfall below 10 mm, as well as the over-prediction tendency, such as predicting the likelihood of occurrence and rainfall area similar to observation.

Developing a regional fog prediction model using tree-based machine-learning techniques and automated visibility observations (시정계 자료와 기계학습 기법을 이용한 지역 안개예측 모형 개발)

  • Kim, Daeha
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1255-1263
    • /
    • 2021
  • While it could become an alternative water resource, fog could undermine traffic safety and operational performance of infrastructures. To reduce such adverse impacts, it is necessary to have spatially continuous fog risk information. In this work, tree-based machine-learning models were developed in order to quantify fog risks with routine meteorological observations alone. The Extreme Gradient Boosting (XGB), Light Gradient Boosting (LGB), and Random Forests (RF) were chosen for the regional fog models using operational weather and visibility observations within the Jeollabuk-do province. Results showed that RF seemed to show the most robust performance to categorize between fog and non-fog situations during the training and evaluation period of 2017-2019. While the LGB performed better than in predicting fog occurrences than the others, its false alarm ratio was the highest (0.695) among the three models. The predictability of the three models considerably declined when applying them for an independent period of 2020, potentially due to the distinctively enhanced air quality in the year under the global lockdown. Nonetheless, even in 2020, the three models were all able to produce fog risk information consistent with the spatial variation of observed fog occurrences. This work suggests that the tree-based machine learning models could be used as tools to find locations with relatively high fog risks.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Comparison of the Machine Learning Models Predicting Lithium-ion Battery Capacity for Remaining Useful Life Estimation (리튬이온 배터리 수명추정을 위한 용량예측 머신러닝 모델의 성능 비교)

  • Yoo, Sangwoo;Shin, Yongbeom;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.24 no.6
    • /
    • pp.91-97
    • /
    • 2020
  • Lithium-ion batteries (LIBs) have a longer lifespan, higher energy density, and lower self-discharge rates than other batteries, therefore, they are preferred as an Energy Storage System (ESS). However, during years 2017-2019, 28 ESS fire accidents occurred in Korea, and accurate capacity estimation of LIB is essential to ensure safety and reliability during operations. In this study, data-driven modeling that predicts capacity changes according to the charging cycle of LIB was conducted, and developed models were compared their performance for the selection of the optimal machine learning model, which includes the Decision Tree, Ensemble Learning Method, Support Vector Regression, and Gaussian Process Regression (GPR). For model training, lithium battery test data provided by NASA was used, and GPR showed the best prediction performance. Based on this study, we will develop an enhanced LIB capacity prediction and remaining useful life estimation model through additional data training, and improve the performance of anomaly detection and monitoring during operations, enabling safe and stable ESS operations.

Risk Prediction and Analysis of Building Fires -Based on Property Damage and Occurrence of Fires- (건물별 화재 위험도 예측 및 분석: 재산 피해액과 화재 발생 여부를 바탕으로)

  • Lee, Ina;Oh, Hyung-Rok;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.133-144
    • /
    • 2021
  • This paper derives the fire risk of buildings in Seoul through the prediction of property damage and the occurrence of fires. This study differs from prior research in that it utilizes variables that include not only a building's characteristics but also its affiliated administrative area as well as the accessibility of nearby fire-fighting facilities. We use Ensemble Voting techniques to merge different machine learning algorithms to predict property damage and fire occurrence, and to extract feature importance to produce fire risk. Fire risk prediction was made on 300 buildings in Seoul utilizing the established model, and it has been derived that with buildings at Level 1 for fire risks, there were a high number of households occupying the building, and the buildings had many factors that could contribute to increasing the size of the fire, including the lack of nearby fire-fighting facilities as well as the far location of the 119 Safety Center. On the other hand, in the case of Level 5 buildings, the number of buildings and businesses is large, but the 119 Safety Center in charge are located closest to the building, which can properly respond to fire.

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient (환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘)

  • Jung, Juho;Lee, Naeun;Kim, Sumin;Seo, Gaeun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1296-1301
    • /
    • 2021
  • With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.

Recognition of Indoor and Outdoor Exercising Activities using Smartphone Sensors and Machine Learning (스마트폰 센서와 기계학습을 이용한 실내외 운동 활동의 인식)

  • Kim, Jaekyung;Ju, YeonHo
    • Journal of Creative Information Culture
    • /
    • v.7 no.4
    • /
    • pp.235-242
    • /
    • 2021
  • Recently, many human activity recognition(HAR) researches using smartphone sensor data have been studied. HAR can be utilized in various fields, such as life pattern analysis, exercise measurement, and dangerous situation detection. However researches have been focused on recognition of basic human behaviors or efficient battery use. In this paper, exercising activities performed indoors and outdoors were defined and recognized. Data collection and pre-processing is performed to recognize the defined activities by SVM, random forest and gradient boosting model. In addition, the recognition result is determined based on voting class approach for accuracy and stable performance. As a result, the proposed activities were recognized with high accuracy and in particular, similar types of indoor and outdoor exercising activities were correctly classified.