• Title/Summary/Keyword: Mean square error

Search Result 2,190, Processing Time 0.033 seconds

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Evaluation of Factors Used in AAPM TG-43 Formalism Using Segmented Sources Integration Method and Monte Carlo Simulation: Implementation of microSelectron HDR Ir-192 Source (미소선원 적분법과 몬테칼로 방법을 이용한 AAPM TG-43 선량계산 인자 평가: microSelectron HDR Ir-192 선원에 대한 적용)

  • Ahn, Woo-Sang;Jang, Won-Woo;Park, Sung-Ho;Jung, Sang-Hoon;Cho, Woon-Kap;Kim, Young-Seok;Ahn, Seung-Do
    • Progress in Medical Physics
    • /
    • v.22 no.4
    • /
    • pp.190-197
    • /
    • 2011
  • Currently, the dose distribution calculation used by commercial treatment planning systems (TPSs) for high-dose rate (HDR) brachytherapy is derived from point and line source approximation method recommended by AAPM Task Group 43 (TG-43). However, the study of Monte Carlo (MC) simulation is required in order to assess the accuracy of dose calculation around three-dimensional Ir-192 source. In this study, geometry factor was calculated using segmented sources integration method by dividing microSelectron HDR Ir-192 source into smaller parts. The Monte Carlo code (MCNPX 2.5.0) was used to calculate the dose rate $\dot{D}(r,\theta)$ at a point ($r,\theta$) away from a HDR Ir-192 source in spherical water phantom with 30 cm diameter. Finally, anisotropy function and radial dose function were calculated from obtained results. The obtained geometry factor was compared with that calculated from line source approximation. Similarly, obtained anisotropy function and radial dose function were compared with those derived from MCPT results by Williamson. The geometry factor calculated from segmented sources integration method and line source approximation was within 0.2% for $r{\geq}0.5$ cm and 1.33% for r=0.1 cm, respectively. The relative-root mean square error (R-RMSE) of anisotropy function obtained by this study and Williamson was 2.33% for r=0.25 cm and within 1% for r>0.5 cm, respectively. The R-RMSE of radial dose function was 0.46% at radial distance from 0.1 to 14.0 cm. The geometry factor acquired from segmented sources integration method and line source approximation was in good agreement for $r{\geq}0.1$ cm. However, application of segmented sources integration method seems to be valid, since this method using three-dimensional Ir-192 source provides more realistic geometry factor. The anisotropy function and radial dose function estimated from MCNPX in this study and MCPT by Williamson are in good agreement within uncertainty of Monte Carlo codes except at radial distance of r=0.25 cm. It is expected that Monte Carlo code used in this study could be applied to other sources utilized for brachytherapy.

Recent Trends in Blooming Dates of Spring Flowers and the Observed Disturbance in 2014 (최근의 봄꽃 개화 추이와 2014년 개화시기의 혼란)

  • Lee, Ho-Seung;Kim, Jin-Hee;Yun, Jin I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.16 no.4
    • /
    • pp.396-402
    • /
    • 2014
  • The spring season in Korea features a dynamic landscape with a variety of flowers such as magnolias, azaleas, forsythias, cherry blossoms and royal azaleas flowering sequentially one after another. However, the narrowing of south-north differences in flowering dates and those among the flower species was observed in 2014, taking a toll on economic and shared communal values of seasonal landscape. This study was carried out to determine whether the 2014 incidence is an outlier or a mega trend in spring phenology. Data on flowering dates of forsythias and cherry blossoms, two typical spring flower species, as observed for the recent 60 years in 6 weather stations of Korea Meteorological Administration (KMA) indicate that the difference spanning the flowering date of forsythias, the flower blooming earlier in spring, and that of cherry blossoms that flower later than forsythias was 30 days at the longest and 14 days on an average in the climatological normal year for the period 1951-1980, comparing with the period 1981-2010 when the difference narrowed to 21 days at the longest and 11 days on an average. The year 2014 in particular saw the gap further narrowing down to 7 days, making it possible to see forsythias and cherry blossoms blooming at the same time in the same location. 'Cherry blossom front' took 20 days in traveling from Busan, the earliest flowering station, to Incheon, the latest flowering station, in the case of the 1951-1980 normal year, while 16 days for the 1981-2010 and 6 days for 2014 were observed. The delay in flowering date of forsythias for each time period was 20, 17, and 12 days, respectively. It is presumed that the recent climate change pattern in the Korean Peninsula as indicated by rapid temperature hikes in late spring contrastive to slow temperature rise in early spring immediately after dormancy release brought forward the flowering date of cherry blossoms which comes later than forsythias which flowers early in spring. Thermal time based heating requirements for flowering of 2 species were estimated by analyzing the 60 year data at the 6 locations and used to predict flowering date in 2014. The root mean square error for the prediction was within 2 days from the observed flowering dates in both species at all 6 locations, showing a feasibility of thermal time as a prognostic tool.

Analysis of Empirical Multiple Linear Regression Models for the Production of PM2.5 Concentrations (PM2.5농도 산출을 위한 경험적 다중선형 모델 분석)

  • Choo, Gyo-Hwang;Lee, Kyu-Tae;Jeong, Myeong-Jae
    • Journal of the Korean earth science society
    • /
    • v.38 no.4
    • /
    • pp.283-292
    • /
    • 2017
  • In this study, the empirical models were established to estimate the concentrations of surface-level $PM_{2.5}$ over Seoul, Korea from 1 January 2012 to 31 December 2013. We used six different multiple linear regression models with aerosol optical thickness (AOT), ${\AA}ngstr{\ddot{o}}m$ exponents (AE) data from Moderate Resolution Imaging Spectroradiometer (MODIS) aboard Terra and Aqua satellites, meteorological data, and planetary boundary layer depth (PBLD) data. The results showed that $M_6$ was the best empirical model and AOT, AE, relative humidity (RH), wind speed, wind direction, PBLD, and air temperature data were used as input data. Statistical analysis showed that the result between the observed $PM_{2.5}$ and the estimated $PM_{2.5}$ concentrations using $M_6$ model were correlations (R=0.62) and root square mean error ($RMSE=10.70{\mu}gm^{-3}$). In addition, our study show that the relation strongly depends on the seasons due to seasonal observation characteristics of AOT, with a relatively better correlation in spring (R=0.66) and autumntime (R=0.75) than summer and wintertime (R was about 0.38 and 0.56). These results were due to cloud contamination of summertime and the influence of snow/ice surface of wintertime, compared with those of other seasons. Therefore, the empirical multiple linear regression model used in this study showed that the AOT data retrieved from the satellite was important a dominant variable and we will need to use additional weather variables to improve the results of $PM_{2.5}$. Also, the result calculated for $PM_{2.5}$ using empirical multi linear regression model will be useful as a method to enable monitoring of atmospheric environment from satellite and ground meteorological data.

Development of an Automatic 3D Coregistration Technique of Brain PET and MR Images (뇌 PET과 MR 영상의 자동화된 3차원적 합성기법 개발)

  • Lee, Jae-Sung;Kwark, Cheol-Eun;Lee, Dong-Soo;Chung, June-Key;Lee, Myung-Chul;Park, Kwang-Suk
    • The Korean Journal of Nuclear Medicine
    • /
    • v.32 no.5
    • /
    • pp.414-424
    • /
    • 1998
  • Purpose: Cross-modality coregistration of positron emission tomography (PET) and magnetic resonance imaging (MR) could enhance the clinical information. In this study we propose a refined technique to improve the robustness of registration, and to implement more realistic visualization of the coregistered images. Materials and Methods: Using the sinogram of PET emission scan, we extracted the robust head boundary and used boundary-enhanced PET to coregister PET with MR. The pixels having 10% of maximum pixel value were considered as the boundary of sinogram. Boundary pixel values were exchanged with maximum value of sinogram. One hundred eighty boundary points were extracted at intervals of about 2 degree using simple threshold method from each slice of MR images. Best affined transformation between the two point sets was performed using least square fitting which should minimize the sum of Euclidean distance between the point sets. We reduced calculation time using pre-defined distance map. Finally we developed an automatic coregistration program using this boundary detection and surface matching technique. We designed a new weighted normalization technique to display the coregistered PET and MR images simultaneously. Results: Using our newly developed method, robust extraction of head boundary was possible and spatial registration was successfully performed. Mean displacement error was less than 2.0 mm. In visualization of coregistered images using weighted normalization method, structures shown in MR image could be realistically represented. Conclusion: Our refined technique could practically enhance the performance of automated three dimensional coregistration.

  • PDF

Development of a Predictive Model Describing the Growth of Listeria Monocytogenes in Fresh Cut Vegetable (샐러드용 신선 채소에서의 Listerio monocytogenes 성장예측모델 개발)

  • Cho, Joon-Il;Lee, Soon-Ho;Lim, Ji-Su;Kwak, Hyo-Sun;Hwang, In-Gyun
    • Journal of Food Hygiene and Safety
    • /
    • v.26 no.1
    • /
    • pp.25-30
    • /
    • 2011
  • In this study, predictive mathematical models were developed to predict the kinetics of Listeria monocytogenes growth in the mixed fresh-cut vegetables, which is the most popular ready-to-eat food in the world, as a function of temperature (4, 10, 20 and $30^{\circ}C$). At the specified storage temperatures, the primary growth curve fit well ($r^2$=0.916~0.981) with a Gompertz and Baranyi equation to determine the specific growth rate (SGR). The Polynomial model for natural logarithm transformation of the SGR as a function of temperature was obtained by nonlinear regression (Prism, version 4.0, GraphPad Software). As the storage temperature decreased from $30^{\circ}C$ to $4^{\circ}C$, the SGR decreased, respectively. Polynomial model was identified as appropriate secondary model for SGR on the basis of most statistical indices such as mean square error (MSE=0.002718 by Gompertz, 0.055186 by Baranyi), bias factor (Bf=1.050084 by Gompertz, 1.931472 by Baranyi) and accuracy factor (Af=1.160767 by Gompertz, 2.137181 by Baranyi). Results indicate L. monocytogenes growth was affected by temperature mainly, and equation was developed by Gompertz model (-0.1606+$0.0574^*Temp$+$0.0009^*Temp^*Temp$) was more effective than equation was developed by Baranyi model (0.3502-$0.0496^*Temp$+$0.0022^*Temp^*Temp$) for specific growth rate prediction of L.monocytogenes in the mixed fresh-cut vegetables.

A Study on Retrieval of Storage Heat Flux in Urban Area (우리나라 도심지에서의 저장열 산출에 관한 연구)

  • Lee, Darae;Kim, Honghee;Lee, Sang-Hyun;Lee, Doo-Il;Hong, Jinkyu;Hong, Je-Woo;Lee, Keunmin;Lee, Kyeong-sang;Seo, Minji;Han, Kyung-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.2_1
    • /
    • pp.301-306
    • /
    • 2018
  • Urbanization causes urban floods and urban heat island in the summer, so it is necessary to understanding the changes of the thermal environment through urban climate and energy balance. This can be explained by the energy balance, but in urban areas, unlike the typical energy balance, the storage heat flux saved in the building or artificial land cover should be considered. Since the environment of each city is different, there is a difficulty in applying the method of retrieving the storage heat flux of the previous research. Especially, most of the previous studies are focused on the overseas cities, so it is necessary to study the storage heat retrieval suitable for various land cover and building characteristics of the urban areas in Korea. Therefore, the object of this study, it is to derive the regression formula which can quantitatively retrieve the storage heat using the data of the area where various surface types exist. To this end, nonlinear regression analysis was performed using net radiation and surface temperature data as independent variables and flux tower based storage heat estimates as dependent variables. The retrieved regression coefficients were applied to each independent variable to derive the storage heat retrieval regression formula. As a result of time series analysis with flux tower based storage heat estimates, it was well simulated high peak at day time and the value at night. Moreover storage heat retrieved in this study was possible continuous retrieval than flux tower based storage heat estimates. As a result of scatter plot analysis, accuracy of retrieved storage heat was found to be significant at $50.14Wm^{-2}$ and bias $-0.94Wm^{-2}$.

Climate Change Impact on Nonpoint Source Pollution in a Rural Small Watershed (기후변화에 따른 농촌 소유역에서의 비점오염 영향 분석)

  • Hwang, Sye-Woon;Jang, Tae-Il;Park, Seung-Woo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.4
    • /
    • pp.209-221
    • /
    • 2006
  • The purpose of this study is to analyze the effects of climate change on the nonpoint source pollution in a small watershed using a mid-range model. The study area is a basin in a rural area that covers 384 ha with a composition of 50% forest and 19% paddy. The hydrologic and water quality data were monitored from 1996 to 2004, and the feasibility of the GWLF (Generalized Watershed Loading function) model was examined in the agricultural small watershed using the data obtained from the study area. As one of the studies on climate change, KEI (Korea Environment Institute) has presented the monthly variation ratio of rainfall in Korea based on the climate change scenario for rainfall and temperature. These values and observed daily rainfall data of forty-one years from 1964 to 2004 in Suwon were used to generate daily weather data using the stochastic weather generator model (WGEN). Stream runoff was calibrated by the data of $1996{\sim}1999$ and was verified in $2002{\sim}2004$. The results were determination coeff, ($R^2$) of $0.70{\sim}0.91$ and root mean square error (RMSE) of $2.11{\sim}5.71$. Water quality simulation for SS, TN and TP showed $R^2$ values of 0.58, 0.47 and 0.62, respectively, The results for the impact of climate change on nonpoint source pollution show that if the factors of watershed are maintained as in the present circumstances, pollutant TN loads and TP would be expected to increase remarkably for the rainy season in the next fifty years.

Estimation of Chlorophyll-a Concentrations in the Nakdong River Using High-Resolution Satellite Image (고해상도 위성영상을 이용한 낙동강 유역의 클로로필-a 농도 추정)

  • Choe, Eun-Young;Lee, Jae-Woon;Lee, Jae-Kwan
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.5
    • /
    • pp.613-623
    • /
    • 2011
  • This study assessed the feasibility to apply Two-band and Three-band reflectance models for chlorophyll-a estimation in turbid productive waters whose scale is smaller and narrower than ocean using a high spatial resolution image. Those band ratio models were successfully applied to analyzing chlorophyll-a concentrations of ocean or coastal water using Moderate Imaging Spectroradiometer(MODIS), Sea-viewing Wide Field-fo-view Sensor(SeaWiFS), Medium Resolution Imaging Spectrometer(MERIS), etc. Two-band and Three-band models based on band ratio such as Red and NIR band were generally used for the Chl-a in turbid waters. Two-band modes using Red and NIR bands of RapidEye image showed no significant results with $R^2$ 0.38. To enhance a band ratio between absorption and reflection peak, We used red-edge band(710 nm) of RapidEye image for Twoband and Three-band models. Red-RE Two-band and Red-RE-NIR Three-band reflectance model (with cubic equation) for the RapidEye image provided significance performances with $R^2$ 0.66 and 0.73, respectively. Their performance showed the 'Approximate Prediction' with RPD, 1.39 and 1.29 and RMSE, 24.8, 22.4, respectively. Another three-band model with quadratic equation showed similar performances to Red-RE two-band model. The findings in this study demonstrated that Two-band and Three-band reflectance models using a red-edge band can approximately estimate chlorophyll-a concentrations in a turbid river water using high-resolution satellite image. In the distribution map of estimated Chl-a concentrations, three-band model with cubic equation showed lower values than twoband model. In the further works, quantification and correction of spectral interferences caused by suspended sediments and colored dissolved organic matters will improve the accuracy of chlorophyll-a estimation in turbid waters.

The Comparison of Existing Synthetic Unit Hydrograph Method in Korea (국내 기존 합성단위도 방법의 비교)

  • Jeong, Seong-Won;Mun, Jang-Won
    • Journal of Korea Water Resources Association
    • /
    • v.34 no.6
    • /
    • pp.659-672
    • /
    • 2001
  • Generally, design flood for a hydraulic structure is estimated using statistical analysis of runoff data. However, due to the lack of runoff data, it is difficult that the statistical method is applied for estimation of design flood. In this case, the synthetic unit hydrograph method is used generally and the models such as NYMO method, Snyder method, SCS method, and HYMO method have been widely used in Korea. In this study, these methods and KICT method, which is developed in year 2000, are compared and analyzed in 10 study areas. Firstly, peak flow and peak time of representative unit hydrograph and synthetic unit hydrograph in study area are compared, and secondly, the shape of unit hydrograph is compared using a root mean square error(RMSE). In Nakayasu method developed in Japan, synthetic unit hydrograph is very different from peak flow, peak time, and the shape of representative unit hydrograph, and KICT method(2000) is superior to others. Also, KICT method(2000) is superior to others in the aspects of using hydrologic and topographical data. Therefore, Nakayasu method is not a proper in hydrological practice. Moreover, it is considered that KICT model is a better method for the estimation of design flood. However, if other model, i.e. SCS method, Nakayasu method, and HYMO method, is used, parameters or regression equations must be adjusted by analysis of real data in Korea.

  • PDF