• Title/Summary/Keyword: Forecasting error

Search Result 539, Processing Time 0.033 seconds

Building of cyanobacteria forecasting model using transformer (Transformer를 이용한 유해남조 발생 예측 모델 구축)

  • Hankyu Lee;Jin Hwi Kim;Seohyun Byeon;Jae-Ki Shin;Yongeun Park
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.515-515
    • /
    • 2023
  • 팔당호는 북한강과 남한강이 합류하여 생성된 호소로 수도인 서울과 수도권인 경기도 동부지역의 물 공급을 담당하는 중요한 상수원이다. 이러한 팔당호에서 유해남조 발생은 상수원수 활용과 직접적으로 연관되어 있어 신속하고 정확한 관리 및 예측이 필요하다. 본 연구에서는 안전한 상수원 활용을 위해, 딥러닝 기법을 이용하여 유해남조 사전 예측 모델을 구축하고자 하였다. 모델 입력 변수는 2012년부터 2021년까지 10년 동안의 주간 팔당호 수질(수온, DO, BOD, COD, Chl-a, TN, TP, pH, 전기전도도, TDN, NH4N, NO3N, TDP, PO4P, 부유물질)과 수문(유입량, 총방류량), 기상 정보(평균기온, 최저기온, 최고기온, 일 강수량, 평균풍속, 평균 상대습도, 합계일조량), 그리고 북한강과 남한강 유입지점의 남조 세포 수를 사용하였다. 모델 출력 변수는 수질, 수문, 기상 요인으로 인한 남조의 성장 발현 시기를 고려하여 1주 후의 댐앞 남조 세포수를 사용하였다. 사용한 딥러닝 기법은 최근 주목받고 있는 Temporal Fusion Transformer (TFT)를 사용하였다. 모델 훈련용 데이터와 테스트용 데이터는 각각 8:2의 비율로 나누었으며, 검증용 데이터는 훈련용 데이터 내에서 훈련 데이터와 검증 데이터를 6:4 비율로 분배하였다. Lookback은 5로 설정하였고, 이는 주단위 데이터로 구성된 데이터세트의 특성을 반영한 것이다. 모델의 성능은 실측값과 예측값을 토대로 R-square와 Root Mean Squared Error (RMSE)를 계산하여 평가하였다. 모델학습은 총 154번 반복 진행되었으며, 이 중 성능이 가장 준수한 시점은 54번째 반복 시점으로 훈련손실 대비 검증손실이 가장 양호한 값을 나타냈다(훈련손실:0.443, 검증손실 0.380). R-square는 훈련단계에서 0.681, 검증단계에서 0.654였고, 테스트 단계에서 0.606으로 산출되었다. RMSE는 훈련단계에서 0.614(㎍/L), 검증단계에서 0.617(㎍/L), 테스트 단계에서 0.773(㎍/L)였다. 모델에 사용한 데이터세트가 주간 데이터라는 특성을 고려하면, 소규모 데이터를 사용하였음에도 본 연구에서 구축한 모델의 성능은 양호하다고 평가할 수 있다. 향후 연구에서 데이터세트를 보강하고 모델을 업데이트한다면, 모델의 성능을 더욱더 개선할 수 있을 것으로 기대된다.

  • PDF

Global Ocean Data Assimilation and Prediction System 2 in KMA: Operational System and Improvements (기상청 전지구 해양자료동화시스템 2(GODAPS2): 운영체계 및 개선사항)

  • Hyeong-Sik Park;Johan Lee;Sang-Min Lee;Seung-On Hwang;Kyung-On Boo
    • Atmosphere
    • /
    • v.33 no.4
    • /
    • pp.423-440
    • /
    • 2023
  • The updated version of Global Ocean Data Assimilation and Prediction System (GODAPS) in the NIMS/KMA (National Institute of Meteorological Sciences/Korea Meteorological Administration), which has been in operation since December 2021, is being introduced. This technical note on GODAPS2 describes main progress and updates to the previous version of GODAPS, a software tool for the operating system, and its improvements. GODAPS2 is based on Forecasting Ocean Assimilation Model (FOAM) vn14.1, instead of previous version, FOAM vn13. The southern limit of the model domain has been extended from 77°S to 85°S, allowing the modelling of the circulation under ice shelves in Antarctica. The adoption of non-linear free surface and variable volume layers, the update of vertical mixing parameterization, and the adjustment of isopycnal diffusion coefficient for the ocean model decrease the model biases. For the sea-ice model, four vertical ice layers and an additional snow layer on top of the ice layers are being used instead of previous single ice and snow layers. The changes for data assimilation include the updated treatment for background error covariance, a newly added bias scheme combined with observation bias, the application of a new bias correction for sea level anomaly, an extension of the assimilation window from 1 day to 2 days, and separate assimilations for ocean and sea-ice. For comparison, we present the difference between GODAPS and GODAPS2. The verification results show that GODAPS2 yields an overall improved simulation compared to GODAPS.

Development of a Dynamic Downscaling Method for Use in Short-Range Atmospheric Dispersion Modeling Near Nuclear Power Plants

  • Sang-Hyun Lee;Su-Bin Oh;Chun-Ji Kim;Chun-Sil Jin;Hyun-Ha Lee
    • Journal of Radiation Protection and Research
    • /
    • v.48 no.1
    • /
    • pp.28-43
    • /
    • 2023
  • Background: High-fidelity meteorological data is a prerequisite for the realistic simulation of atmospheric dispersion of radioactive materials near nuclear power plants (NPPs). However, many meteorological models frequently overestimate near-surface wind speeds, failing to represent local meteorological conditions near NPPs. This study presents a new high-resolution (approximately 1 km) meteorological downscaling method for modeling short-range (< 100 km) atmospheric dispersion of accidental NPP plumes. Materials and Methods: Six considerations from literature reviews have been suggested for a new dynamic downscaling method. The dynamic downscaling method is developed based on the Weather Research and Forecasting (WRF) model version 3.6.1, applying high-resolution land-use and topography data. In addition, a new subgrid-scale topographic drag parameterization has been implemented for a realistic representation of the atmospheric surface-layer momentum transfer. Finally, a year-long simulation for the Kori and Wolsong NPPs, located in southeastern coastal areas, has been made for 2016 and evaluated against operational surface meteorological measurements and the NPPs' on-site weather stations. Results and Discussion: The new dynamic downscaling method can represent multiscale atmospheric motions from the synoptic to the boundary-layer scales and produce three-dimensional local meteorological fields near the NPPs with a 1.2 km grid resolution. Comparing the year-long simulation against the measurements showed a salient improvement in simulating near-surface wind fields by reducing the root mean square error of approximately 1 m/s. Furthermore, the improved wind field simulation led to a better agreement in the Eulerian estimate of the local atmospheric dispersion. The new subgrid-scale topographic drag parameterization was essential for improved performance, suggesting the importance of the subgrid-scale momentum interactions in the atmospheric surface layer. Conclusion: A new dynamic downscaling method has been developed to produce high-resolution local meteorological fields around the Kori and Wolsong NPPs, which can be used in short-range atmospheric dispersion modeling near the NPPs.

Quantifying the 2022 Extreme Drought Using Global Grid-Based Satellite Rainfall Products (전지구 강수관측위성 기반 격자형 강우자료를 활용한 2022년 국내 가뭄 분석)

  • Mun, Young-Sik;Nam, Won-Ho;Jeon, Min-Gi;Lee, Kwang-Ya;Do, Jong-Won;Isaya Kisekka
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.4
    • /
    • pp.41-50
    • /
    • 2024
  • Precipitation is an important component of the hydrological cycle and a key input parameter for many applications in hydrology, climatology, meteorology, and weather forecasting research. Grid-based satellite rainfall products with wide spatial coverage and easy accessibility are well recognized as a supplement to ground-based observations for various hydrological applications. The error properties of satellite rainfall products vary as a function of rainfall intensity, climate region, altitude, and land surface conditions. Therefore, this study aims to evaluate the commonly used new global grid-based satellite rainfall product, Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), using data collected at different spatial and temporal scales. Additionally, in this study, grid-based CHIRPS satellite precipitation data were used to evaluate the 2022 extreme drought. CHIRPS provides high-resolution precipitation data at 5 km and offers reliable global data through the correction of ground-based observations. A frequency analysis was performed to determine the precipitation deficit in 2022. As a result of comparing droughts in 2015, 2017, and 2022, it was found that May 2022 had a drought frequency of more than 500 years. The 1-month SPI in May 2022 indicated a severe drought with an average value of -1.8, while the 3-month SPI showed a moderate drought with an average value of 0.6. The extreme drought experienced in South Korea in 2022 was evident in the 1-month SPI. Both CHIRPS precipitation data and observations from weather stations depicted similar trends. Based on these results, it is concluded that CHIRPS can be used as fundamental data for drought evaluation and monitoring in unmeasured areas of precipitation.

The development of four efficient optimal neural network methods in forecasting shallow foundation's bearing capacity

  • Hossein Moayedi;Binh Nguyen Le
    • Computers and Concrete
    • /
    • v.34 no.2
    • /
    • pp.151-168
    • /
    • 2024
  • This research aimed to appraise the effectiveness of four optimization approaches - cuckoo optimization algorithm (COA), multi-verse optimization (MVO), particle swarm optimization (PSO), and teaching-learning-based optimization (TLBO) - that were enhanced with an artificial neural network (ANN) in predicting the bearing capacity of shallow foundations located on cohesionless soils. The study utilized a database of 97 laboratory experiments, with 68 experiments for training data sets and 29 for testing data sets. The ANN algorithms were optimized by adjusting various variables, such as population size and number of neurons in each hidden layer, through trial-and-error techniques. Input parameters used for analysis included width, depth, geometry, unit weight, and angle of shearing resistance. After performing sensitivity analysis, it was determined that the optimized architecture for the ANN structure was 5×5×1. The study found that all four models demonstrated exceptional prediction performance: COA-MLP, MVO-MLP, PSO-MLP, and TLBO-MLP. It is worth noting that the MVO-MLP model exhibited superior accuracy in generating network outputs for predicting measured values compared to the other models. The training data sets showed R2 and RMSE values of (0.07184 and 0.9819), (0.04536 and 0.9928), (0.09194 and 0.9702), and (0.04714 and 0.9923) for COA-MLP, MVO-MLP, PSO-MLP, and TLBO-MLP methods respectively. Similarly, the testing data sets produced R2 and RMSE values of (0.08126 and 0.07218), (0.07218 and 0.9814), (0.10827 and 0.95764), and (0.09886 and 0.96481) for COA-MLP, MVO-MLP, PSO-MLP, and TLBO-MLP methods respectively.

A Study on the Optimal Forecasting Model for Cucumber Growth Based on Machine Learning (머신러닝기반 오이 생육 최적 예측 모델에 관한 연구)

  • Ki-Tae Park;Hyun Sim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.5
    • /
    • pp.911-918
    • /
    • 2024
  • This study developed and evaluated the performance of a machine learning-based model for predicting cucumber fruit set using cucumber growth data. In this study, plant height, node number, internode length, stem thickness, leaf length, leaf width, leaf count, and female flower count were used as independent variables, and the fruit set was set as the dependent variable to develop a prediction model. Various machine learning algorithms, including Linear Regression, Random Forest, XGBoost, Support Vector Regression (SVR), and K-Nearest Neighbors (KNN), were applied, and model performance was evaluated based on Mean Squared Error (MSE) and the coefficient of determination (R2). As a result, the Random Forest algorithm demonstrated the best performance, with an MSE of 3.91 and an R2 of 0.828, effectively capturing the non-linear relationships in the cucumber growth data. In particular, the Random Forest model showed robustness against outliers and proved to be highly effective in predicting fruit set.

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

The Sensitivity Analyses of Initial Condition and Data Assimilation for a Fog Event using the Mesoscale Meteorological Model (중규모 기상 모델을 이용한 안개 사례의 초기장 및 자료동화 민감도 분석)

  • Kang, Misun;Lim, Yun-Kyu;Cho, Changbum;Kim, Kyu Rang;Park, Jun Sang;Kim, Baek-Jo
    • Journal of the Korean earth science society
    • /
    • v.36 no.6
    • /
    • pp.567-579
    • /
    • 2015
  • The accurate simulation of micro-scale weather phenomena such as fog using the mesoscale meteorological models is a very complex task. Especially, the uncertainty arisen from initial input data of the numerical models has a decisive effect on the accuracy of numerical models. The data assimilation is required to reduce the uncertainty of initial input data. In this study, the limitation of the mesoscale meteorological model was verified by WRF (Weather Research and Forecasting) model for a summer fog event around the Nakdong river in Korea. The sensitivity analyses of simulation accuracy from the numerical model were conducted using two different initial and boundary conditions: KLAPS (Korea Local Analysis and Prediction System) and LDAPS (Local Data Assimilation and Prediction System) data. In addition, the improvement of numerical model performance by FDDA (Four-Dimensional Data Assimilation) using the observational data from AWS (Automatic Weather System) was investigated. The result of sensitivity analysis showed that the accuracy of simulated air temperature, dew point temperature, and relative humidity with LDAPS data was higher than those of KLAPS, but the accuracy of the wind speed of LDAPS was lower than that of KLAPS. Significant difference was found in case of relative humidity where RMSE (Root Mean Square Error) for LDAPS and KLAPS was 15.7 and 35.6%, respectively. The RMSE for air temperature, wind speed, and relative humidity was improved by approximately $0.3^{\circ}C$, $0.2m\;s^{-1}$, and 2.2%, respectively after incorporating the FDDA.

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.213-225
    • /
    • 2010
  • Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.

A Study on the Data Driven Neural Network Model for the Prediction of Time Series Data: Application of Water Surface Elevation Forecasting in Hangang River Bridge (시계열 자료의 예측을 위한 자료 기반 신경망 모델에 관한 연구: 한강대교 수위예측 적용)

  • Yoo, Hyungju;Lee, Seung Oh;Choi, Seohye;Park, Moonhyung
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.2
    • /
    • pp.73-82
    • /
    • 2019
  • Recently, as the occurrence frequency of sudden floods due to climate change increased, the flood damage on riverside social infrastructures was extended so that there has been a threat of overflow. Therefore, a rapid prediction of potential flooding in riverside social infrastructure is necessary for administrators. However, most current flood forecasting models including hydraulic model have limitations which are the high accuracy of numerical results but longer simulation time. To alleviate such limitation, data driven models using artificial neural network have been widely used. However, there is a limitation that the existing models can not consider the time-series parameters. In this study the water surface elevation of the Hangang River bridge was predicted using the NARX model considering the time-series parameter. And the results of the ANN and RNN models are compared with the NARX model to determine the suitability of NARX model. Using the 10-year hydrological data from 2009 to 2018, 70% of the hydrological data were used for learning and 15% was used for testing and evaluation respectively. As a result of predicting the water surface elevation after 3 hours from the Hangang River bridge in 2018, the ANN, RNN and NARX models for RMSE were 0.20 m, 0.11 m, and 0.09 m, respectively, and 0.12 m, 0.06 m, and 0.05 m for MAE, and 1.56 m, 0.55 m and 0.10 m for peak errors respectively. By analyzing the error of the prediction results considering the time-series parameters, the NARX model is most suitable for predicting water surface elevation. This is because the NARX model can learn the trend of the time series data and also can derive the accurate prediction value even in the high water surface elevation prediction by using the hyperbolic tangent and Rectified Linear Unit function as an activation function. However, the NARX model has a limit to generate a vanishing gradient as the sequence length becomes longer. In the future, the accuracy of the water surface elevation prediction will be examined by using the LSTM model.