• Title/Summary/Keyword: RMSE

Search Result 1,816, Processing Time 0.03 seconds

Predicting the Performance of Recommender Systems through Social Network Analysis and Artificial Neural Network (사회연결망분석과 인공신경망을 이용한 추천시스템 성능 예측)

  • Cho, Yoon-Ho;Kim, In-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.159-172
    • /
    • 2010
  • The recommender system is one of the possible solutions to assist customers in finding the items they would like to purchase. To date, a variety of recommendation techniques have been developed. One of the most successful recommendation techniques is Collaborative Filtering (CF) that has been used in a number of different applications such as recommending Web pages, movies, music, articles and products. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. Broadly, there are memory-based CF algorithms, model-based CF algorithms, and hybrid CF algorithms which combine CF with content-based techniques or other recommender systems. While many researchers have focused their efforts in improving CF performance, the theoretical justification of CF algorithms is lacking. That is, we do not know many things about how CF is done. Furthermore, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting the performances of CF algorithms in advance is practically important and needed. In this study, we propose an efficient approach to predict the performance of CF. Social Network Analysis (SNA) and Artificial Neural Network (ANN) are applied to develop our prediction model. CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. SNA facilitates an exploration of the topological properties of the network structure that are implicit in data for CF recommendations. An ANN model is developed through an analysis of network topology, such as network density, inclusiveness, clustering coefficient, network centralization, and Krackhardt's efficiency. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Inclusiveness refers to the number of nodes which are included within the various connected parts of the social network. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. Krackhardt's efficiency characterizes how dense the social network is beyond that barely needed to keep the social group even indirectly connected to one another. We use these social network measures as input variables of the ANN model. As an output variable, we use the recommendation accuracy measured by F1-measure. In order to evaluate the effectiveness of the ANN model, sales transaction data from H department store, one of the well-known department stores in Korea, was used. Total 396 experimental samples were gathered, and we used 40%, 40%, and 20% of them, for training, test, and validation, respectively. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. The input variable measuring process consists of following three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used Net Miner 3 and UCINET 6.0 for SNA, and Clementine 11.1 for ANN modeling. The experiments reported that the ANN model has 92.61% estimated accuracy and 0.0049 RMSE. Thus, we can know that our prediction model helps decide whether CF is useful for a given application with certain data characteristics.

The NCAM Land-Atmosphere Modeling Package (LAMP) Version 1: Implementation and Evaluation (국가농림기상센터 지면대기모델링패키지(NCAM-LAMP) 버전 1: 구축 및 평가)

  • Lee, Seung-Jae;Song, Jiae;Kim, Yu-Jung
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.4
    • /
    • pp.307-319
    • /
    • 2016
  • A Land-Atmosphere Modeling Package (LAMP) for supporting agricultural and forest management was developed at the National Center for AgroMeteorology (NCAM). The package is comprised of two components; one is the Weather Research and Forecasting modeling system (WRF) coupled with Noah-Multiparameterization options (Noah-MP) Land Surface Model (LSM) and the other is an offline one-dimensional LSM. The objective of this paper is to briefly describe the two components of the NCAM-LAMP and to evaluate their initial performance. The coupled WRF/Noah-MP system is configured with a parent domain over East Asia and three nested domains with a finest horizontal grid size of 810 m. The innermost domain covers two Gwangneung deciduous and coniferous KoFlux sites (GDK and GCK). The model is integrated for about 8 days with the initial and boundary conditions taken from the National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) data. The verification variables are 2-m air temperature, 10-m wind, 2-m humidity, and surface precipitation for the WRF/Noah-MP coupled system. Skill scores are calculated for each domain and two dynamic vegetation options using the difference between the observed data from the Korea Meteorological Administration (KMA) and the simulated data from the WRF/Noah-MP coupled system. The accuracy of precipitation simulation is examined using a contingency table that is made up of the Probability of Detection (POD) and the Equitable Threat Score (ETS). The standalone LSM simulation is conducted for one year with the original settings and is compared with the KoFlux site observation for net radiation, sensible heat flux, latent heat flux, and soil moisture variables. According to results, the innermost domain (810 m resolution) among all domains showed the minimum root mean square error for 2-m air temperature, 10-m wind, and 2-m humidity. Turning on the dynamic vegetation had a tendency of reducing 10-m wind simulation errors in all domains. The first nested domain (7,290 m resolution) showed the highest precipitation score, but showed little advantage compared with using the dynamic vegetation. On the other hand, the offline one-dimensional Noah-MP LSM simulation captured the site observed pattern and magnitude of radiative fluxes and soil moisture, and it left room for further improvement through supplementing the model input of leaf area index and finding a proper combination of model physics.

Downscaling of Sunshine Duration for a Complex Terrain Based on the Shaded Relief Image and the Sky Condition (하늘상태와 음영기복도에 근거한 복잡지형의 일조시간 분포 상세화)

  • Kim, Seung-Ho;Yun, Jin I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.4
    • /
    • pp.233-241
    • /
    • 2016
  • Experiments were carried out to quantify the topographic effects on attenuation of sunshine in complex terrain and the results are expected to help convert the coarse resolution sunshine duration information provided by the Korea Meteorological Administration (KMA) into a detailed map reflecting the terrain characteristics of mountainous watershed. Hourly shaded relief images for one year, each pixel consisting of 0 to 255 brightness value, were constructed by applying techniques of shadow modeling and skyline analysis to the 3m resolution digital elevation model for an experimental watershed on the southern slope of Mt. Jiri in Korea. By using a bimetal sunshine recorder, sunshine duration was measured at three points with different terrain conditions in the watershed from May 15, 2015 to May 14, 2016. The brightness values of the 3 corresponding pixel points on the shaded relief map were extracted and regressed to the measured sunshine duration, resulting in a brightness-sunshine duration response curve for a clear day. We devised a method to calibrate this curve equation according to sky condition categorized by cloud amount and used it to derive an empirical model for estimating sunshine duration over a complex terrain. When the performance of this model was compared with a conventional scheme for estimating sunshine duration over a horizontal plane, the estimation bias was improved remarkably and the root mean square error for daily sunshine hour was 1.7hr, which is a reduction by 37% from the conventional method. In order to apply this model to a given area, the clear-sky sunshine duration of each pixel should be produced on hourly intervals first, by driving the curve equation with the hourly shaded relief image of the area. Next, the cloud effect is corrected by 3-hourly 'sky condition' of the KMA digital forecast products. Finally, daily sunshine hour can be obtained by accumulating the hourly sunshine duration. A detailed sunshine duration distribution of 3m horizontal resolution was obtained by applying this procedure to the experimental watershed.

Application of Machine Learning Algorithm and Remote-sensed Data to Estimate Forest Gross Primary Production at Multi-sites Level (산림 총일차생산량 예측의 공간적 확장을 위한 인공위성 자료와 기계학습 알고리즘의 활용)

  • Lee, Bora;Kim, Eunsook;Lim, Jong-Hwan;Kang, Minseok;Kim, Joon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1117-1132
    • /
    • 2019
  • Forest covers 30% of the Earth's land area and plays an important role in global carbon flux through its ability to store much greater amounts of carbon than other terrestrial ecosystems. The Gross Primary Production (GPP) represents the productivity of forest ecosystems according to climate change and its effect on the phenology, health, and carbon cycle. In this study, we estimated the daily GPP for a forest ecosystem using remote-sensed data from Moderate Resolution Imaging Spectroradiometer (MODIS) and machine learning algorithms Support Vector Machine (SVM). MODIS products were employed to train the SVM model from 75% to 80% data of the total study period and validated using eddy covariance measurement (EC) data at the six flux tower sites. We also compare the GPP derived from EC and MODIS (MYD17). The MODIS products made use of two data sets: one for Processed MODIS that included calculated by combined products (e.g., Vapor Pressure Deficit), another one for Unprocessed MODIS that used MODIS products without any combined calculation. Statistical analyses, including Pearson correlation coefficient (R), mean squared error (MSE), and root mean square error (RMSE) were used to evaluate the outcomes of the model. In general, the SVM model trained by the Unprocessed MODIS (R = 0.77 - 0.94, p < 0.001) derived from the multi-sites outperformed those trained at a single-site (R = 0.75 - 0.95, p < 0.001). These results show better performance trained by the data including various events and suggest the possibility of using remote-sensed data without complex processes to estimate GPP such as non-stationary ecological processes.

Comparative Assessment of Linear Regression and Machine Learning for Analyzing the Spatial Distribution of Ground-level NO2 Concentrations: A Case Study for Seoul, Korea (서울 지역 지상 NO2 농도 공간 분포 분석을 위한 회귀 모델 및 기계학습 기법 비교)

  • Kang, Eunjin;Yoo, Cheolhee;Shin, Yeji;Cho, Dongjin;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1739-1756
    • /
    • 2021
  • Atmospheric nitrogen dioxide (NO2) is mainly caused by anthropogenic emissions. It contributes to the formation of secondary pollutants and ozone through chemical reactions, and adversely affects human health. Although ground stations to monitor NO2 concentrations in real time are operated in Korea, they have a limitation that it is difficult to analyze the spatial distribution of NO2 concentrations, especially over the areas with no stations. Therefore, this study conducted a comparative experiment of spatial interpolation of NO2 concentrations based on two linear-regression methods(i.e., multi linear regression (MLR), and regression kriging (RK)), and two machine learning approaches (i.e., random forest (RF), and support vector regression (SVR)) for the year of 2020. Four approaches were compared using leave-one-out-cross validation (LOOCV). The daily LOOCV results showed that MLR, RK, and SVR produced the average daily index of agreement (IOA) of 0.57, which was higher than that of RF (0.50). The average daily normalized root mean square error of RK was 0.9483%, which was slightly lower than those of the other models. MLR, RK and SVR showed similar seasonal distribution patterns, and the dynamic range of the resultant NO2 concentrations from these three models was similar while that from RF was relatively small. The multivariate linear regression approaches are expected to be a promising method for spatial interpolation of ground-level NO2 concentrations and other parameters in urban areas.

Analysis of Co-registration Performance According to Geometric Processing Level of KOMPSAT-3/3A Reference Image (KOMPSAT-3/3A 기준영상의 기하품질에 따른 상호좌표등록 결과 분석)

  • Yun, Yerin;Kim, Taeheon;Oh, Jaehong;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.221-232
    • /
    • 2021
  • This study analyzed co-registration results according to the geometric processing level of reference image, which are Level 1R and Level 1G provided from KOMPSAT-3 and KOMPSAT-3A images. We performed co-registration using each Level 1R and Level 1G image as a reference image, and Level 1R image as a sensed image. For constructing the experimental dataset, seven Level 1R and 1G images of KOMPSAT-3 and KOMPSAT-3A acquired from Daejeon, South Korea, were used. To coarsely align the geometric position of the two images, SURF (Speeded-Up Robust Feature) and PC (Phase Correlation) methods were combined and then repeatedly applied to the overlapping region of the images. Then, we extracted tie-points using the SURF method from coarsely aligned images and performed fine co-registration through affine transformation and piecewise Linear transformation, respectively, constructed with the tie-points. As a result of the experiment, when Level 1G image was used as a reference image, a relatively large number of tie-points were extracted than Level 1R image. Also, in the case where the reference image is Level 1G image, the root mean square error of co-registration was 5 pixels less than the case of Level 1R image on average. We have shown from the experimental results that the co-registration performance can be affected by the geometric processing level related to the initial geometric relationship between the two images. Moreover, we confirmed that the better geometric quality of the reference image achieved the more stable co-registration performance.

Predicting Forest Gross Primary Production Using Machine Learning Algorithms (머신러닝 기법의 산림 총일차생산성 예측 모델 비교)

  • Lee, Bora;Jang, Keunchang;Kim, Eunsook;Kang, Minseok;Chun, Jung-Hwa;Lim, Jong-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.29-41
    • /
    • 2019
  • Terrestrial Gross Primary Production (GPP) is the largest global carbon flux, and forest ecosystems are important because of the ability to store much more significant amounts of carbon than other terrestrial ecosystems. There have been several attempts to estimate GPP using mechanism-based models. However, mechanism-based models including biological, chemical, and physical processes are limited due to a lack of flexibility in predicting non-stationary ecological processes, which are caused by a local and global change. Instead mechanism-free methods are strongly recommended to estimate nonlinear dynamics that occur in nature like GPP. Therefore, we used the mechanism-free machine learning techniques to estimate the daily GPP. In this study, support vector machine (SVM), random forest (RF) and artificial neural network (ANN) were used and compared with the traditional multiple linear regression model (LM). MODIS products and meteorological parameters from eddy covariance data were employed to train the machine learning and LM models from 2006 to 2013. GPP prediction models were compared with daily GPP from eddy covariance measurement in a deciduous forest in South Korea in 2014 and 2015. Statistical analysis including correlation coefficient (R), root mean square error (RMSE) and mean squared error (MSE) were used to evaluate the performance of models. In general, the models from machine-learning algorithms (R = 0.85 - 0.93, MSE = 1.00 - 2.05, p < 0.001) showed better performance than linear regression model (R = 0.82 - 0.92, MSE = 1.24 - 2.45, p < 0.001). These results provide insight into high predictability and the possibility of expansion through the use of the mechanism-free machine-learning models and remote sensing for predicting non-stationary ecological processes such as seasonal GPP.

Evaluation of stream flow and water quality behavior by weir operation in Nakdong river basin using SWAT (SWAT을 이용한 낙동강유역의 보 개방에 따른 하천유량 및 수질 거동 분석)

  • Lee, Ji Wan;Jung, Chung Gil;Woo, So Young;Kim, Seong Joon
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.5
    • /
    • pp.349-360
    • /
    • 2019
  • The purpose of this study is to evaluate the stream flow and water quality (SS, T-N, and T-P) behavior of Nakdong river basin ($23,609.3km^2$) by simulating the dam and weir operation scenarios using SWAT (Soil and Water Assessment Tool). The operation senarios are the simultaneous release for all dam and weirs (scenario 1), simultaneous release for all weirs (scenario 2), and sequential release for the weirs with one month interval from upstream weirs (scenario 3). Before evaluation, the SWAT was calibrated and validated using 11 years (2005-2015) daily multi-purpose dam inflow at 5 locations (ADD, IHD, HCD, MKD, and MYD), multi-function weir inflow at 7 locations (SHW, GMW, CGW, GJW, DSW, HCW, and HAW), and monthly water quality monitoring data at 6 locations (AD-4, SJ-2, EG, HC, MK-4, and MG). For the two dam inflow and dam storage, the Nash-Sutcliffe efficiency (NSE) was 0.56~0.79, and the coefficient of determination ($R^2$) was 0.68~0.90. For water quality, the $R^2$ of SS, T-N, and T-P was 0.64~0.79, 0.51~0.74, and 0.53~0.72 respectively. For the three scenarios of dam and weir release combination suggested by the ministry of environment, the scenario 1 and 3 operations were improved the stream water quality (for T-N and T-P) within the 3 months since the time of release, but it showed the negative effect for 3 months after compared to scenario 2.

Global Ocean Data Assimilation and Prediction System in KMA: Description and Assessment (기상청 전지구 해양자료동화시스템(GODAPS): 개요 및 검증)

  • Chang, Pil-Hun;Hwang, Seung-On;Choo, Sung-Ho;Lee, Johan;Lee, Sang-Min;Boo, Kyung-On
    • Atmosphere
    • /
    • v.31 no.2
    • /
    • pp.229-240
    • /
    • 2021
  • The Global Ocean Data Assimilation and Prediction System (GODAPS) in operation at the KMA (Korea Meteorological Administration) is introduced. GODAPS consists of ocean model, ice model, and 3-d variational ocean data assimilation system. GODAPS assimilates conventional and satellite observations for sea surface temperature and height, observations of sea-ice concentration, as well as temperature and salinity profiles for the ocean using a 24-hour data assimilation window. It finally produces ocean analysis fields with a resolution of 0.25 ORCA (tripolar) grid and 75-layer in depth. This analysis is used for providing a boundary condition for the atmospheric model of the KMA Global Seasonal Forecasting System version 5 (GloSea5) in addition to monitoring on the global ocean and ice. For the purpose of evaluating the quality of ocean analysis produced by GODAPS, a one-year data assimilation experiment was performed. Assimilation of global observing system in GODAPS results in producing improved analysis and forecast fields with reduced error in terms of RMSE of innovation and analysis increment. In addition, comparison with an unassimilated experiment shows a mostly positive impact, especially over the region with large oceanic variability.

Estimation of the Lodging Area in Rice Using Deep Learning (딥러닝을 이용한 벼 도복 면적 추정)

  • Ban, Ho-Young;Baek, Jae-Kyeong;Sang, Wan-Gyu;Kim, Jun-Hwan;Seo, Myung-Chul
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.66 no.2
    • /
    • pp.105-111
    • /
    • 2021
  • Rice lodging is an annual occurrence caused by typhoons accompanied by strong winds and strong rainfall, resulting in damage relating to pre-harvest sprouting during the ripening period. Thus, rapid estimations of the area of lodged rice are necessary to enable timely responses to damage. To this end, we obtained images related to rice lodging using a drone in Gimje, Buan, and Gunsan, which were converted to 128 × 128 pixels images. A convolutional neural network (CNN) model, a deep learning model based on these images, was used to predict rice lodging, which was classified into two types (lodging and non-lodging), and the images were divided in a 8:2 ratio into a training set and a validation set. The CNN model was layered and trained using three optimizers (Adam, Rmsprop, and SGD). The area of rice lodging was evaluated for the three fields using the obtained data, with the exception of the training set and validation set. The images were combined to give composites images of the entire fields using Metashape, and these images were divided into 128 × 128 pixels. Lodging in the divided images was predicted using the trained CNN model, and the extent of lodging was calculated by multiplying the ratio of the total number of field images by the number of lodging images by the area of the entire field. The results for the training and validation sets showed that accuracy increased with a progression in learning and eventually reached a level greater than 0.919. The results obtained for each of the three fields showed high accuracy with respect to all optimizers, among which, Adam showed the highest accuracy (normalized root mean square error: 2.73%). On the basis of the findings of this study, it is anticipated that the area of lodged rice can be rapidly predicted using deep learning.