• Title/Summary/Keyword: 앙상블 러닝

Search Result 135, Processing Time 0.02 seconds

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

  • Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.655-667
    • /
    • 2023
  • Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R2), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R2 of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Sensor Fault Detection Scheme based on Deep Learning and Support Vector Machine (딥 러닝 및 서포트 벡터 머신기반 센서 고장 검출 기법)

  • Yang, Jae-Wan;Lee, Young-Doo;Koo, In-Soo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.2
    • /
    • pp.185-195
    • /
    • 2018
  • As machines have been automated in the field of industries in recent years, it is a paramount importance to manage and maintain the automation machines. When a fault occurs in sensors attached to the machine, the machine may malfunction and further, a huge damage will be caused in the process line. To prevent the situation, the fault of sensors should be monitored, diagnosed and classified in a proper way. In the paper, we propose a sensor fault detection scheme based on SVM and CNN to detect and classify typical sensor errors such as erratic, drift, hard-over, spike, and stuck faults. Time-domain statistical features are utilized for the learning and testing in the proposed scheme, and the genetic algorithm is utilized to select the subset of optimal features. To classify multiple sensor faults, a multi-layer SVM is utilized, and ensemble technique is used for CNN. As a result, the SVM that utilizes a subset of features selected by the genetic algorithm provides better performance than the SVM that utilizes all the features. However, the performance of CNN is superior to that of the SVM.

Comparison of Handball Result Predictions Using Bagging and Boosting Algorithms (배깅과 부스팅 알고리즘을 이용한 핸드볼 결과 예측 비교)

  • Kim, Ji-eung;Park, Jong-chul;Kim, Tae-gyu;Lee, Hee-hwa;Ahn, Jee-Hwan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.279-286
    • /
    • 2021
  • The purpose of this study is to compare the predictive power of the Bagging and Boosting algorithm of ensemble method based on the motion information that occurs in woman handball matches and to analyze the availability of motion information. To this end, this study analyzed the predictive power of the result of 15 practice matches based on inertial motion by analyzing the predictive power of Random Forest and Adaboost algorithms. The results of the study are as follows. First, the prediction rate of the Random Forest algorithm was 66.9 ± 0.1%, and the prediction rate of the Adaboost algorithm was 65.6 ± 1.6%. Second, Random Forest predicted all of the winning results, but none of the losing results. On the other hand, the Adaboost algorithm shows 91.4% prediction of winning and 10.4% prediction of losing. Third, in the verification of the suitability of the algorithm, the Random Forest had no overfitting error, but Adaboost showed an overfitting error. Based on the results of this study, the availability of motion information is high when predicting sports events, and it was confirmed that the Random Forest algorithm was superior to the Adaboost algorithm.