• Title/Summary/Keyword: Machine Learning

Search Result 5,297, Processing Time 0.035 seconds

Quantitative Estimation Method for ML Model Performance Change, Due to Concept Drift (Concept Drift에 의한 ML 모델 성능 변화의 정량적 추정 방법)

  • Soon-Hong An;Hoon-Suk Lee;Seung-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.259-266
    • /
    • 2023
  • It is very difficult to measure the performance of the machine learning model in the business service stage. Therefore, managing the performance of the model through the operational department is not done effectively. Academically, various studies have been conducted on the concept drift detection method to determine whether the model status is appropriate. The operational department wants to know quantitatively the performance of the operating model, but concept drift can only detect the state of the model in relation to the data, it cannot estimate the quantitative performance of the model. In this study, we propose a performance prediction model (PPM) that quantitatively estimates precision through the statistics of concept drift. The proposed model induces artificial drift in the sampling data extracted from the training data, measures the precision of the sampling data, creates a dataset of drift and precision, and learns it. Then, the difference between the actual precision and the predicted precision is compared through the test data to correct the error of the performance prediction model. The proposed PPM was applied to two models, a loan underwriting model and a credit card fraud detection model that can be used in real business. It was confirmed that the precision was effectively predicted.

Study on Water Quality Predictability through Machine Learning Techniques in Non-point Pollutant Management Area (비점오염원관리지역의 머신러닝 기법을 통한 수질 예측 가능성 연구)

  • Yeong Na Yu;Min Hwan Shin;Dong Hyuk Kum;Kyoung Jae Lim;Jong Gun Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.467-467
    • /
    • 2023
  • 강우에 의해 발생하는 비점오염물질의 수질 데이터가 충분하지 않아 비점오염원이 문제가 되고 있는 유역의 수질개선을 위한 대책마련이 어려운 실정이다. 기존에 환경부에서 운영하고 있는 자동측정망은 1시간 간격으로 데이터를 축적하고 있으나, 비점오염원이 문제가 되는 유역에 설치되어 있지 않거나 수온, DO, pH 등 현장항목만을 측정하고 있어 하천의 수질오염을 대표할 수 있는 T-P나 SS 등의 수질분석 항목의 부재하다. 이로인해 유역의 수질개선 대책을 수립하기 위한 오염원의 현황을 파악하기 어려운 실정이다. 따라서, 본 연구에서는 비점오염원관리지역 중 골지천 유역을 대상으로 수질항목별 상관성을 분석하고, 실측자료를 기반으로 DT, MLP, SVM, RF, GB, XGB 등의 머신러닝 기법을 통해 수질 예측 가능성을 연구하였다. 상관관계 분석결과 입력변수인 탁도 항목이 예측 수질과 뚜렷한 상관관계를 보이는 것으로 나타났으나, 그 외 항목에서는 약한 상관관계를 보이거나 상관관계가 없는 것으로 나타났다. 머신러닝 기법을 활용한 수질 예측 분석 결과, 검무교와 태봉2교, 제1여량교는 RF 기법에서 결정계수(R2) 0.57~0.86, RMSE 16.49~175.60으로 예측성이 우수한 것으로 나타났다. 관말교는 SVM 기법에서 R2 0.65, RMSE 57.69로, 송계교는 XGB 기법에서 R2 0.74, RMSE 282.86으로 가장 예측성이 우수한 것으로 나타났다. 분석결과와 같이 머신러닝 기법을 활용한 수질 예측은 가능하나, 예측성이 우수한 머신러닝 기법의 R2 비교 결과, 유역면적이 큰 제1여량교와 작은 관말교에서 0.57과 0.65로 다른 지점에 비해 낮은 것으로 나타났다. RMSE 비교 결과, 상류 산간지역에 발생한 국지성 호우의 영향으로 흙탕물이 가장 자주 발생하는 태봉2교 지점과 우선관리지역이 합류되는 송계교 지점에서 175.60과 282.86으로 예측값과 실측값의 오차가 큰 것으로 나타났다. 연구결과와 같이 하천 수질을 예측하기 위해서는 유역면적 혹은 유역특성과 관련한 기초자료를 추가로 적용하여 머신러닝 기법을 적용 해야할 것으로 판단된다. 또한, 본 연구에서 예측한 수질 항목 이외에 입력변수를 추가로 확보하여 수질의 예측 가능성을 검토해야 할 것으로 보여진다.

  • PDF

Development of an Ensemble-Based Multi-Region Integrated Odor Concentration Prediction Model (앙상블 기반의 악취 농도 다지역 통합 예측 모델 개발)

  • Seong-Ju Cho;Woo-seok Choi;Sang-hyun Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.383-400
    • /
    • 2023
  • Air pollution-related diseases are escalating worldwide, with the World Health Organization (WHO) estimating approximately 7 million annual deaths in 2022. The rapid expansion of industrial facilities, increased emissions from various sources, and uncontrolled release of odorous substances have brought air pollution to the forefront of societal concerns. In South Korea, odor is categorized as an independent environmental pollutant, alongside air and water pollution, directly impacting the health of local residents by causing discomfort and aversion. However, the current odor management system in Korea remains inadequate, necessitating improvements. This study aims to enhance the odor management system by analyzing 1,010,749 data points collected from odor sensors located in Osong, Chungcheongbuk-do, using an Ensemble-Based Multi-Region Integrated Odor Concentration Prediction Model. The research results demonstrate that the model based on the XGBoost algorithm exhibited superior performance, with an RMSE of 0.0096, significantly outperforming the single-region model (0.0146) with a 51.9% reduction in mean error size. This underscores the potential for increasing data volume, improving accuracy, and enabling odor prediction in diverse regions using a unified model through the standardization of odor concentration data collected from various regions.

A Methodology for Making Military Surveillance System to be Intelligent Applied by AI Model (AI모델을 적용한 군 경계체계 지능화 방안)

  • Changhee Han;Halim Ku;Pokki Park
    • Journal of Internet Computing and Services
    • /
    • v.24 no.4
    • /
    • pp.57-64
    • /
    • 2023
  • The ROK military faces a significant challenge in its vigilance mission due to demographic problems, particularly the current aging population and population cliff. This study demonstrates the crucial role of the 4th industrial revolution and its core artificial intelligence algorithm in maximizing work efficiency within the Command&Control room by mechanizing simple tasks. To achieve a fully developed military surveillance system, we have chosen multi-object tracking (MOT) technology as an essential artificial intelligence component, aligning with our goal of an intelligent and automated surveillance system. Additionally, we have prioritized data visualization and user interface to ensure system accessibility and efficiency. These complementary elements come together to form a cohesive software application. The CCTV video data for this study was collected from the CCTV cameras installed at the 1st and 2nd main gates of the 00 unit, with the cooperation by Command&Control room. Experimental results indicate that an intelligent and automated surveillance system enables the delivery of more information to the operators in the room. However, it is important to acknowledge the limitations of the developed software system in this study. By highlighting these limitations, we can present the future direction for the development of military surveillance systems.

Comparison of ANN model's prediction performance according to the level of data uncertainty in water distribution network (상수도관망 내 데이터 불확실성에 따른 절점 압력 예측 ANN 모델 수행 성능 비교)

  • Jang, Hyewoon;Jung, Donghwi;Jun, Sanghoon
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1295-1303
    • /
    • 2022
  • As the role of water distribution networks (WDNs) becomes more important, identifying abnormal events (e.g., pipe burst) rapidly and accurately is required. Since existing approaches such as field equipment-based detection methods have several limitations, model-based methods (e.g., machine learning based detection model) that identify abnormal events using hydraulic simulation models have been developed. However, no previous work has examined the impact of data uncertainties on the results. Thus, this study compares the effects of measurement error-induced pressure data uncertainty in WDNs. An artificial neural network (ANN) is used to predict nodal pressures and measurement errors are generated by using cumulative density function inverse sampling method that follows Gaussian distribution. Total of nine conditions (3 input datasets × 3 output datasets) are considered in the ANN model to investigate the impact of measurement error size on the prediction results. The results have shown that higher data uncertainty decreased ANN model's prediction accuracy. Also, the measurement error of output data had more impact on the model performance than input data that for a same measurement error size on the input and output data, the prediction accuracy was 72.25% and 38.61%, respectively. Thus, to increase ANN models prediction performance, reducing the magnitude of measurement errors of the output pressure node is considered to be more important than input node.

Building Sentence Meaning Identification Dataset Based on Social Problem-Solving R&D Reports (사회문제 해결 연구보고서 기반 문장 의미 식별 데이터셋 구축)

  • Hyeonho Shin;Seonki Jeong;Hong-Woo Chun;Lee-Nam Kwon;Jae-Min Lee;Kanghee Park;Sung-Pil Choi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.159-172
    • /
    • 2023
  • In general, social problem-solving research aims to create important social value by offering meaningful answers to various social pending issues using scientific technologies. Not surprisingly, however, although numerous and extensive research attempts have been made to alleviate the social problems and issues in nation-wide, we still have many important social challenges and works to be done. In order to facilitate the entire process of the social problem-solving research and maximize its efficacy, it is vital to clearly identify and grasp the important and pressing problems to be focused upon. It is understandable for the problem discovery step to be drastically improved if current social issues can be automatically identified from existing R&D resources such as technical reports and articles. This paper introduces a comprehensive dataset which is essential to build a machine learning model for automatically detecting the social problems and solutions in various national research reports. Initially, we collected a total of 700 research reports regarding social problems and issues. Through intensive annotation process, we built totally 24,022 sentences each of which possesses its own category or label closely related to social problem-solving such as problems, purposes, solutions, effects and so on. Furthermore, we implemented four sentence classification models based on various neural language models and conducted a series of performance experiments using our dataset. As a result of the experiment, the model fine-tuned to the KLUE-BERT pre-trained language model showed the best performance with an accuracy of 75.853% and an F1 score of 63.503%.

Threat Situation Determination System Through AWS-Based Behavior and Object Recognition (AWS 기반 행위와 객체 인식을 통한 위협 상황 판단 시스템)

  • Ye-Young Kim;Su-Hyun Jeong;So-Hyun Park;Young-Ho Park
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.189-198
    • /
    • 2023
  • As crimes frequently occur on the street, the spread of CCTV is increasing. However, due to the shortcomings of passively operated CCTV, the need for intelligent CCTV is attracting attention. Due to the heavy system of such intelligent CCTV, high-performance devices are required, which has a problem in that it is expensive to replace the general CCTV. To solve this problem, an intelligent CCTV system that recognizes low-quality images and operates even on devices with low performance is required. Therefore, this paper proposes a Saying CCTV system that can detect threats in real time by using the AWS cloud platform to lighten the system and convert images into text. Based on the data extracted using YOLO v4 and OpenPose, it is implemented to determine the risk object, threat behavior, and threat situation, and calculate the risk using machine learning. Through this, the system can be operated anytime and anywhere as long as the network is connected, and the system can be used even with devices with minimal performance for video shooting and image upload. Furthermore, it is possible to quickly prevent crime by automating meaningful statistics on crime by analyzing the video and using the data stored as text.

Reliability of mortar filling layer void length in in-service ballastless track-bridge system of HSR

  • Binbin He;Sheng Wen;Yulin Feng;Lizhong Jiang;Wangbao Zhou
    • Steel and Composite Structures
    • /
    • v.47 no.1
    • /
    • pp.91-102
    • /
    • 2023
  • To study the evaluation standard and control limit of mortar filling layer void length, in this paper, the train sub-model was developed by MATLAB and the track-bridge sub-model considering the mortar filling layer void was established by ANSYS. The two sub-models were assembled into a train-track-bridge coupling dynamic model through the wheel-rail contact relationship, and the validity was corroborated by the coupling dynamic model with the literature model. Considering the randomness of fastening stiffness, mortar elastic modulus, length of mortar filling layer void, and pier settlement, the test points were designed by the Box-Behnken method based on Design-Expert software. The coupled dynamic model was calculated, and the support vector regression (SVR) nonlinear mapping model of the wheel-rail system was established. The learning, prediction, and verification were carried out. Finally, the reliable probability of the amplification coefficient distribution of the response index of the train and structure in different ranges was obtained based on the SVR nonlinear mapping model and Latin hypercube sampling method. The limit of the length of the mortar filling layer void was, thus, obtained. The results show that the SVR nonlinear mapping model developed in this paper has a high fitting accuracy of 0.993, and the computational efficiency is significantly improved by 99.86%. It can be used to calculate the dynamic response of the wheel-rail system. The length of the mortar filling layer void significantly affects the wheel-rail vertical force, wheel weight load reduction ratio, rail vertical displacement, and track plate vertical displacement. The dynamic response of the track structure has a more significant effect on the limit value of the length of the mortar filling layer void than the dynamic response of the vehicle, and the rail vertical displacement is the most obvious. At 250 km/h - 350 km/h train running speed, the limit values of grade I, II, and III of the lengths of the mortar filling layer void are 3.932 m, 4.337 m, and 4.766 m, respectively. The results can provide some reference for the long-term service performance reliability of the ballastless track-bridge system of HRS.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Development of a Water Quality Indicator Prediction Model for the Korean Peninsula Seas using Artificial Intelligence (인공지능 기법을 활용한 한반도 해역의 수질평가지수 예측모델 개발)

  • Seong-Su Kim;Kyuhee Son;Doyoun Kim;Jang-Mu Heo;Seongeun Kim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.1
    • /
    • pp.24-35
    • /
    • 2023
  • Rapid industrialization and urbanization have led to severe marine pollution. A Water Quality Index (WQI) has been developed to allow the effective management of marine pollution. However, the WQI suffers from problems with loss of information due to the complex calculations involved, changes in standards, calculation errors by practitioners, and statistical errors. Consequently, research on the use of artificial intelligence techniques to predict the marine and coastal WQI is being conducted both locally and internationally. In this study, six techniques (RF, XGBoost, KNN, Ext, SVM, and LR) were studied using marine environmental measurement data (2000-2020) to determine the most appropriate artificial intelligence technique to estimate the WOI of five ecoregions in the Korean seas. Our results show that the random forest method offers the best performance as compared to the other methods studied. The residual analysis of the WQI predicted score and actual score using the random forest method shows that the temporal and spatial prediction performance was exceptional for all ecoregions. In conclusion, the RF model of WQI prediction developed in this study is considered to be applicable to Korean seas with high accuracy.