• Title/Summary/Keyword: RMSE(Root Mean Squared Error)

Search Result 141, Processing Time 0.03 seconds

Application of Intensity-Duration-Frequency Curve to Korea Derived by Cumulative Distribution Function (누가분포함수를 활용한 강우강도식의 국내 적용성 평가)

  • Kim, Kewtae;Kim, Taesoon;Kim, Sooyoung;Heo, Jun-Haeng
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.4B
    • /
    • pp.363-374
    • /
    • 2008
  • Intensity-Duration-Frequency (IDF) curve that is essential to calculate rainfall quantiles for designing hydraulic structures in Korea is generally formulated by regression analysis. In this study, IDF curve derived by the cumulative distribution function ("IDF by CDF") of the proper probability distribution function (PDF) of each site is suggested, and the corresponding parameters of IDF curve are computed using genetic algorithm (GA). For this purpose, IDF by CDF and the conventional IDF derived by regression analysis ("IDF by REG") were computed for 22 Korea Meteorological Administration (KMA) rainfall recording sites. Comparisons of RMSE (root mean squared error) and RRMSE (Relative RMSE) of rainfall intensities computed from IDF by CDF and IDF by REG show that IDF by CDF is more accurate than IDF by REG. In order to accommodate the effect of the recent intensive rainfall of Korea, the rainfall intensities computed by the two IDF curves are compared with that by at-site frequency analysis using the rainfall data recorded by 2006, and the result from IDF by CDF show the better performance than that from IDF by REG. As a result, it can be said that the suggested IDF by CDF curve would be the more efficient IDF curve than that computed by regression analysis and could be applied for Korean rainfall data.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Development of groundwater level monitoring and forecasting technique for drought analysis (II) - Groundwater drought forecasting Using SPI, SGI and ANN (가뭄 분석을 위한 지하수위 모니터링 및 예측기법 개발(II) - 표준강수지수, 표준지하수지수 및 인공신경망을 이용한 지하수 가뭄 예측)

  • Lee, Jeongju;Kang, Shinuk;Kim, Taeho;Chun, Gunil
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.11
    • /
    • pp.1021-1029
    • /
    • 2018
  • A primary objective of this study is to develop a drought forecasting technique based on groundwater which can be exploit for water supply under drought stress. For this purpose, we explored the lagged relationships between regionalized SGI (standardized groundwater level index) and SPI (standardized precipitation index) in view of the drought propagation. A regional prediction model was constructed using a NARX (nonlinear autoregressive exogenous) artificial neural network model which can effectively capture nonlinear relationships with the lagged independent variable. During the training phase, model performance in terms of correlation coefficient was found to be satisfactory with the correlation coefficient over 0.7. Moreover, the model performance was described by root mean squared error (RMSE). It can be concluded that the proposed approach is able to provide a reliable SGI forecasts along with rainfall forecasts provided by the Korea Meteorological Administration.

Outside Temperature Prediction Based on Artificial Neural Network for Estimating the Heating Load in Greenhouse (인공신경망 기반 온실 외부 온도 예측을 통한 난방부하 추정)

  • Kim, Sang Yeob;Park, Kyoung Sub;Ryu, Keun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.129-134
    • /
    • 2018
  • Recently, the artificial neural network (ANN) model is a promising technique in the prediction, numerical control, robot control and pattern recognition. We predicted the outside temperature of greenhouse using ANN and utilized the model in greenhouse control. The performance of ANN model was evaluated and compared with multiple regression model(MRM) and support vector machine (SVM) model. The 10-fold cross validation was used as the evaluation method. In order to improve the prediction performance, the data reduction was performed by correlation analysis and new factor were extracted from measured data to improve the reliability of training data. The backpropagation algorithm was used for constructing ANN, multiple regression model was constructed by M5 method. And SVM model was constructed by epsilon-SVM method. As the result showed that the RMSE (Root Mean Squared Error) value of ANN, MRM and SVM were 0.9256, 1.8503 and 7.5521 respectively. In addition, by applying the prediction model to greenhouse heating load calculation, it can increase the income by reducing the energy cost in the greenhouse. The heating load of the experimented greenhouse was 3326.4kcal/h and the fuel consumption was estimated to be 453.8L as the total heating time is $10000^{\circ}C/h$. Therefore, data mining technology of ANN can be applied to various agricultural fields such as precise greenhouse control, cultivation techniques, and harvest prediction, thereby contributing to the development of smart agriculture.

Investigation on the Key Parameters for the Strengthening Behavior of Biopolymer-based Soil Treatment (BPST) Technology (바이오폴리머-흙 처리(BPST) 기술의 강도 발현 거동에 대한 주요 영향인자 분석에 관한 연구)

  • Lee, Hae-Jin;Cho, Gye-Chum;Chang, Ilhan
    • Land and Housing Review
    • /
    • v.12 no.3
    • /
    • pp.109-119
    • /
    • 2021
  • Global warming caused by greenhouse gas emissions has rapidly increased abnormal climate events and geotechnical engineering hazards in terms of their size and frequency accordingly. Biopolymer-based soil treatment (BPST) in geotechnical engineering has been implemented in recent years as an alternative to reducing carbon footprint. Furthermore, thermo-gelating biopolymers, including agar gum, gellan gum, and xanthan gum, are known to strengthen soils noticeably. However, an explicitly detailed evaluation of the correlation between the factors, that have a significant influence on the strengthening behavior of BPST, has not been explored yet. In this study, machine learning regression analysis was performed using the UCS (unconfined compressive strength) data for BPST tested in the laboratory to evaluate the factors influencing the strengthening behavior of gellan gum-treated soil mixtures. General linear regression, Ridge, and Lasso were used as linear regression methods; the key factors influencing the behavior of BPST were determined by RMSE (root mean squared error) and regression coefficient values. The results of the analysis showed that the concentration of biopolymer and the content of clay have the most significant influence on the strength of BPST.

Accuracy Analysis of GNSS-based Public Surveying and Proposal for Work Processes (GNSS관측 공공측량 정확도 분석 및 업무프로세스 제안)

  • Bae, Tae-Suk
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.36 no.6
    • /
    • pp.457-467
    • /
    • 2018
  • Currently, the regulation and rules for public surveying and the UCPs (Unified Control Points) adapts those of the triangulated traverse surveying. In addition, such regulations do not take account of the unique characteristics of GNSS (Global Navigation Satellite System) surveying, thus there are difficulties in field work and data processing afterwards. A detailed procesure of GNSS processing has not yet been described either, and the verification of accuracy does not follow the generic standards. In order to propose an appropriate procedure for field surveys, we processed a short session (30 minutes) based on the scenarios similar to actual situations. The reference network in Seoul was used to process the same data span for 3 days. The temporal variation during the day was evaluated as well. We analyzed the accuracy of the estimated coordinates depending on the parameterization of tropospheric delay, which was compared with the 24-hr static processing results. Estimating the tropospheric delay is advantageous for the accuracy and stability of the coordinates, resulting in about 5 mm and 10 mm of RMSE (Root Mean Squared Error) for horizontal and vertical components, respectively. Based on the test results, we propose a procedure to estimate the daily solution and then combine them to estimate the final solution by applying the minimum constraints (no-net-translation condition). It is necessary to develop a web-based processing system using a high-end softwares. Additionally, it is also required to standardize the ID of the public control points and the UCPs for the automatic GNSS processing.

Monitoring Ground-level SO2 Concentrations Based on a Stacking Ensemble Approach Using Satellite Data and Numerical Models (위성 자료와 수치모델 자료를 활용한 스태킹 앙상블 기반 SO2 지상농도 추정)

  • Choi, Hyunyoung;Kang, Yoojin;Im, Jungho;Shin, Minso;Park, Seohui;Kim, Sang-Min
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1053-1066
    • /
    • 2020
  • Sulfur dioxide (SO2) is primarily released through industrial, residential, and transportation activities, and creates secondary air pollutants through chemical reactions in the atmosphere. Long-term exposure to SO2 can result in a negative effect on the human body causing respiratory or cardiovascular disease, which makes the effective and continuous monitoring of SO2 crucial. In South Korea, SO2 monitoring at ground stations has been performed, but this does not provide spatially continuous information of SO2 concentrations. Thus, this research estimated spatially continuous ground-level SO2 concentrations at 1 km resolution over South Korea through the synergistic use of satellite data and numerical models. A stacking ensemble approach, fusing multiple machine learning algorithms at two levels (i.e., base and meta), was adopted for ground-level SO2 estimation using data from January 2015 to April 2019. Random forest and extreme gradient boosting were used as based models and multiple linear regression was adopted for the meta-model. The cross-validation results showed that the meta-model produced the improved performance by 25% compared to the base models, resulting in the correlation coefficient of 0.48 and root-mean-square-error of 0.0032 ppm. In addition, the temporal transferability of the approach was evaluated for one-year data which were not used in the model development. The spatial distribution of ground-level SO2 concentrations based on the proposed model agreed with the general seasonality of SO2 and the temporal patterns of emission sources.

Application of Machine Learning Algorithm and Remote-sensed Data to Estimate Forest Gross Primary Production at Multi-sites Level (산림 총일차생산량 예측의 공간적 확장을 위한 인공위성 자료와 기계학습 알고리즘의 활용)

  • Lee, Bora;Kim, Eunsook;Lim, Jong-Hwan;Kang, Minseok;Kim, Joon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1117-1132
    • /
    • 2019
  • Forest covers 30% of the Earth's land area and plays an important role in global carbon flux through its ability to store much greater amounts of carbon than other terrestrial ecosystems. The Gross Primary Production (GPP) represents the productivity of forest ecosystems according to climate change and its effect on the phenology, health, and carbon cycle. In this study, we estimated the daily GPP for a forest ecosystem using remote-sensed data from Moderate Resolution Imaging Spectroradiometer (MODIS) and machine learning algorithms Support Vector Machine (SVM). MODIS products were employed to train the SVM model from 75% to 80% data of the total study period and validated using eddy covariance measurement (EC) data at the six flux tower sites. We also compare the GPP derived from EC and MODIS (MYD17). The MODIS products made use of two data sets: one for Processed MODIS that included calculated by combined products (e.g., Vapor Pressure Deficit), another one for Unprocessed MODIS that used MODIS products without any combined calculation. Statistical analyses, including Pearson correlation coefficient (R), mean squared error (MSE), and root mean square error (RMSE) were used to evaluate the outcomes of the model. In general, the SVM model trained by the Unprocessed MODIS (R = 0.77 - 0.94, p < 0.001) derived from the multi-sites outperformed those trained at a single-site (R = 0.75 - 0.95, p < 0.001). These results show better performance trained by the data including various events and suggest the possibility of using remote-sensed data without complex processes to estimate GPP such as non-stationary ecological processes.

Predicting Forest Gross Primary Production Using Machine Learning Algorithms (머신러닝 기법의 산림 총일차생산성 예측 모델 비교)

  • Lee, Bora;Jang, Keunchang;Kim, Eunsook;Kang, Minseok;Chun, Jung-Hwa;Lim, Jong-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.29-41
    • /
    • 2019
  • Terrestrial Gross Primary Production (GPP) is the largest global carbon flux, and forest ecosystems are important because of the ability to store much more significant amounts of carbon than other terrestrial ecosystems. There have been several attempts to estimate GPP using mechanism-based models. However, mechanism-based models including biological, chemical, and physical processes are limited due to a lack of flexibility in predicting non-stationary ecological processes, which are caused by a local and global change. Instead mechanism-free methods are strongly recommended to estimate nonlinear dynamics that occur in nature like GPP. Therefore, we used the mechanism-free machine learning techniques to estimate the daily GPP. In this study, support vector machine (SVM), random forest (RF) and artificial neural network (ANN) were used and compared with the traditional multiple linear regression model (LM). MODIS products and meteorological parameters from eddy covariance data were employed to train the machine learning and LM models from 2006 to 2013. GPP prediction models were compared with daily GPP from eddy covariance measurement in a deciduous forest in South Korea in 2014 and 2015. Statistical analysis including correlation coefficient (R), root mean square error (RMSE) and mean squared error (MSE) were used to evaluate the performance of models. In general, the models from machine-learning algorithms (R = 0.85 - 0.93, MSE = 1.00 - 2.05, p < 0.001) showed better performance than linear regression model (R = 0.82 - 0.92, MSE = 1.24 - 2.45, p < 0.001). These results provide insight into high predictability and the possibility of expansion through the use of the mechanism-free machine-learning models and remote sensing for predicting non-stationary ecological processes such as seasonal GPP.

Development of Market Growth Pattern Map Based on Growth Model and Self-organizing Map Algorithm: Focusing on ICT products (자기조직화 지도를 활용한 성장모형 기반의 시장 성장패턴 지도 구축: ICT제품을 중심으로)

  • Park, Do-Hyung;Chung, Jaekwon;Chung, Yeo Jin;Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.1-23
    • /
    • 2014
  • Market forecasting aims to estimate the sales volume of a product or service that is sold to consumers for a specific selling period. From the perspective of the enterprise, accurate market forecasting assists in determining the timing of new product introduction, product design, and establishing production plans and marketing strategies that enable a more efficient decision-making process. Moreover, accurate market forecasting enables governments to efficiently establish a national budget organization. This study aims to generate a market growth curve for ICT (information and communication technology) goods using past time series data; categorize products showing similar growth patterns; understand markets in the industry; and forecast the future outlook of such products. This study suggests the useful and meaningful process (or methodology) to identify the market growth pattern with quantitative growth model and data mining algorithm. The study employs the following methodology. At the first stage, past time series data are collected based on the target products or services of categorized industry. The data, such as the volume of sales and domestic consumption for a specific product or service, are collected from the relevant government ministry, the National Statistical Office, and other relevant government organizations. For collected data that may not be analyzed due to the lack of past data and the alteration of code names, data pre-processing work should be performed. At the second stage of this process, an optimal model for market forecasting should be selected. This model can be varied on the basis of the characteristics of each categorized industry. As this study is focused on the ICT industry, which has more frequent new technology appearances resulting in changes of the market structure, Logistic model, Gompertz model, and Bass model are selected. A hybrid model that combines different models can also be considered. The hybrid model considered for use in this study analyzes the size of the market potential through the Logistic and Gompertz models, and then the figures are used for the Bass model. The third stage of this process is to evaluate which model most accurately explains the data. In order to do this, the parameter should be estimated on the basis of the collected past time series data to generate the models' predictive value and calculate the root-mean squared error (RMSE). The model that shows the lowest average RMSE value for every product type is considered as the best model. At the fourth stage of this process, based on the estimated parameter value generated by the best model, a market growth pattern map is constructed with self-organizing map algorithm. A self-organizing map is learning with market pattern parameters for all products or services as input data, and the products or services are organized into an $N{\times}N$ map. The number of clusters increase from 2 to M, depending on the characteristics of the nodes on the map. The clusters are divided into zones, and the clusters with the ability to provide the most meaningful explanation are selected. Based on the final selection of clusters, the boundaries between the nodes are selected and, ultimately, the market growth pattern map is completed. The last step is to determine the final characteristics of the clusters as well as the market growth curve. The average of the market growth pattern parameters in the clusters is taken to be a representative figure. Using this figure, a growth curve is drawn for each cluster, and their characteristics are analyzed. Also, taking into consideration the product types in each cluster, their characteristics can be qualitatively generated. We expect that the process and system that this paper suggests can be used as a tool for forecasting demand in the ICT and other industries.