• Title/Summary/Keyword: MACHINE LEARNING

Search Result 5,612, Processing Time 0.033 seconds

A Study on Improvement of Collaborative Filtering Based on Implicit User Feedback Using RFM Multidimensional Analysis (RFM 다차원 분석 기법을 활용한 암시적 사용자 피드백 기반 협업 필터링 개선 연구)

  • Lee, Jae-Seong;Kim, Jaeyoung;Kang, Byeongwook
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.139-161
    • /
    • 2019
  • The utilization of the e-commerce market has become a common life style in today. It has become important part to know where and how to make reasonable purchases of good quality products for customers. This change in purchase psychology tends to make it difficult for customers to make purchasing decisions in vast amounts of information. In this case, the recommendation system has the effect of reducing the cost of information retrieval and improving the satisfaction by analyzing the purchasing behavior of the customer. Amazon and Netflix are considered to be the well-known examples of sales marketing using the recommendation system. In the case of Amazon, 60% of the recommendation is made by purchasing goods, and 35% of the sales increase was achieved. Netflix, on the other hand, found that 75% of movie recommendations were made using services. This personalization technique is considered to be one of the key strategies for one-to-one marketing that can be useful in online markets where salespeople do not exist. Recommendation techniques that are mainly used in recommendation systems today include collaborative filtering and content-based filtering. Furthermore, hybrid techniques and association rules that use these techniques in combination are also being used in various fields. Of these, collaborative filtering recommendation techniques are the most popular today. Collaborative filtering is a method of recommending products preferred by neighbors who have similar preferences or purchasing behavior, based on the assumption that users who have exhibited similar tendencies in purchasing or evaluating products in the past will have a similar tendency to other products. However, most of the existed systems are recommended only within the same category of products such as books and movies. This is because the recommendation system estimates the purchase satisfaction about new item which have never been bought yet using customer's purchase rating points of a similar commodity based on the transaction data. In addition, there is a problem about the reliability of purchase ratings used in the recommendation system. Reliability of customer purchase ratings is causing serious problems. In particular, 'Compensatory Review' refers to the intentional manipulation of a customer purchase rating by a company intervention. In fact, Amazon has been hard-pressed for these "compassionate reviews" since 2016 and has worked hard to reduce false information and increase credibility. The survey showed that the average rating for products with 'Compensated Review' was higher than those without 'Compensation Review'. And it turns out that 'Compensatory Review' is about 12 times less likely to give the lowest rating, and about 4 times less likely to leave a critical opinion. As such, customer purchase ratings are full of various noises. This problem is directly related to the performance of recommendation systems aimed at maximizing profits by attracting highly satisfied customers in most e-commerce transactions. In this study, we propose the possibility of using new indicators that can objectively substitute existing customer 's purchase ratings by using RFM multi-dimensional analysis technique to solve a series of problems. RFM multi-dimensional analysis technique is the most widely used analytical method in customer relationship management marketing(CRM), and is a data analysis method for selecting customers who are likely to purchase goods. As a result of verifying the actual purchase history data using the relevant index, the accuracy was as high as about 55%. This is a result of recommending a total of 4,386 different types of products that have never been bought before, thus the verification result means relatively high accuracy and utilization value. And this study suggests the possibility of general recommendation system that can be applied to various offline product data. If additional data is acquired in the future, the accuracy of the proposed recommendation system can be improved.

A quantitative study on the minimal pair of Korean phonemes: Focused on syllable-initial consonants (한국어 음소 최소대립쌍의 계량언어학적 연구: 초성 자음을 중심으로)

  • Jung, Jieun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.29-40
    • /
    • 2019
  • The paper investigates the minimal pair of Korean phonemes quantitatively. To achieve this goal, I calculated the number of consonant minimal pairs in the syllable-initial position as both raw counts and relative counts, and analyzed the part of speech relations of the two words in the minimal pair. "Urimalsaem" was chosen as the object of this study because it was judged that the minimal pair analysis should be done through a dictionary and it is the largest among Korean dictionaries. The results of the study are summarized as follows. First, there were 153 types of minimal pairs out of 337,135 examples. The ranking of phoneme pairs from highest to lowest was 'ㅅ-ㅈ, ㄱ-ㅅ, ㄱ-ㅈ, ㄱ-ㅂ, ㄱ-ㅎ, ${\ldots}$, ㅆ-ㅋ, ㄸ-ㅋ, ㅉ-ㅋ, ㄹ-ㅃ, ㅃ-ㅋ'. The phonemes that played a major role in the formation of the minimal pair were /ㄱ, ㅅ, ㅈ, ㅂ, ㅊ/, in that order, which showed a high proportion of palatals. The correlation between the raw count of minimal pairs and the relative count of minimal pairs was found to be quite high r=0.937. Second, 87.91% of the minimal pairs shared the part of speech (same syntactic category). The most frequently observed type has been 'noun-noun' pair (70.25%), and 'vowel-vowel' pair (14.77%) was the next ranking. It can be indicated that the minimal pair could be grouped into similar categories in terms of semantics. The results of this study can be useful for various research in Korean linguistics, speech-language pathology, language education, language acquisition, speech synthesis, and artificial intelligence-machine learning as basic data related to Korean phonemes.

A Case Study: Improvement of Wind Risk Prediction by Reclassifying the Detection Results (풍해 예측 결과 재분류를 통한 위험 감지확률의 개선 연구)

  • Kim, Soo-ock;Hwang, Kyu-Hong
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.3
    • /
    • pp.149-155
    • /
    • 2021
  • Early warning systems for weather risk management in the agricultural sector have been developed to predict potential wind damage to crops. These systems take into account the daily maximum wind speed to determine the critical wind speed that causes fruit drops and provide the weather risk information to farmers. In an effort to increase the accuracy of wind risk predictions, an artificial neural network for binary classification was implemented. In the present study, the daily wind speed and other weather data, which were measured at weather stations at sites of interest in Jeollabuk-do and Jeollanam-do as well as Gyeongsangbuk- do and part of Gyeongsangnam- do provinces in 2019, were used for training the neural network. These weather stations include 210 synoptic and automated weather stations operated by the Korean Meteorological Administration (KMA). The wind speed data collected at the same locations between January 1 and December 12, 2020 were used to validate the neural network model. The data collected from December 13, 2020 to February 18, 2021 were used to evaluate the wind risk prediction performance before and after the use of the artificial neural network. The critical wind speed of damage risk was determined to be 11 m/s, which is the wind speed reported to cause fruit drops and damages. Furthermore, the maximum wind speeds were expressed using Weibull distribution probability density function for warning of wind damage. It was found that the accuracy of wind damage risk prediction was improved from 65.36% to 93.62% after re-classification using the artificial neural network. Nevertheless, the error rate also increased from 13.46% to 37.64%, as well. It is likely that the machine learning approach used in the present study would benefit case studies where no prediction by risk warning systems becomes a relatively serious issue.

Spectral Band Selection for Detecting Fire Blight Disease in Pear Trees by Narrowband Hyperspectral Imagery (초분광 이미지를 이용한 배나무 화상병에 대한 최적 분광 밴드 선정)

  • Kang, Ye-Seong;Park, Jun-Woo;Jang, Si-Hyeong;Song, Hye-Young;Kang, Kyung-Suk;Ryu, Chan-Seok;Kim, Seong-Heon;Jun, Sae-Rom;Kang, Tae-Hwan;Kim, Gul-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.1
    • /
    • pp.15-33
    • /
    • 2021
  • In this study, the possibility of discriminating Fire blight (FB) infection tested using the hyperspectral imagery. The reflectance of healthy and infected leaves and branches was acquired with 5 nm of full width at high maximum (FWHM) and then it was standardized to 10 nm, 25 nm, 50 nm, and 80 nm of FWHM. The standardized samples were divided into training and test sets at ratios of 7:3, 5:5 and 3:7 to find the optimal bands of FWHM by the decision tree analysis. Classification accuracy was evaluated using overall accuracy (OA) and kappa coefficient (KC). The hyperspectral reflectance of infected leaves and branches was significantly lower than those of healthy green, red-edge (RE) and near infrared (NIR) regions. The bands selected for the first node were generally 750 and 800 nm; these were used to identify the infection of leaves and branches, respectively. The accuracy of the classifier was higher in the 7:3 ratio. Four bands with 50 nm of FWHM (450, 650, 750, and 950 nm) might be reasonable because the difference in the recalculated accuracy between 8 bands with 10 nm of FWHM (440, 580, 640, 660, 680, 710, 730, and 740 nm) and 4 bands was only 1.8% for OA and 4.1% for KC, respectively. Finally, adding two bands (550 nm and 800 nm with 25 nm of FWHM) in four bands with 50 nm of FWHM have been proposed to improve the usability of multispectral image sensors with performing various roles in agriculture as well as detecting FB with other combinations of spectral bands.

A Comparison between the Reference Evapotranspiration Products for Croplands in Korea: Case Study of 2016-2019 (우리나라 농지의 기준증발산 격자자료 비교평가: 2016-2019년의 사례연구)

  • Kim, Seoyeon;Jeong, Yemin;Cho, Subin;Youn, Youjeong;Kim, Nari;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1465-1483
    • /
    • 2020
  • Evapotranspiration is a concept that includes the evaporation from soil and the transpiration from the plant leaf. It is an essential factor for monitoring water balance, drought, crop growth, and climate change. Actual evapotranspiration (AET) corresponds to the consumption of water from the land surface and the necessary amount of water for the land surface. Because the AET is derived from multiplying the crop coefficient by the reference evapotranspiration (ET0), an accurate calculation of the ET0 is required for the AET. To date, many efforts have been made for gridded ET0 to provide multiple products now. This study presents a comparison between the ET0 products such as FAO56-PM, LDAPS, PKNU-NMSC, and MODIS to find out which one is more suitable for the local-scale hydrological and agricultural applications in Korea, where the heterogeneity of the land surface is critical. In the experiment for the period between 2016 and 2019, the daily and 8-day products were compared with the in-situ observations by KMA. The analyses according to the station, year, month, and time-series showed that the PKNU-NMSC product with a successful optimization for Korea was superior to the others, yielding stable accuracy irrespective of space and time. Also, this paper showed the intrinsic characteristics of the FAO56-PM, LDAPS, and MODIS ET0 products that could be informative for other researchers.

On Using Near-surface Remote Sensing Observation for Evaluation Gross Primary Productivity and Net Ecosystem CO2 Partitioning (근거리 원격탐사 기법을 이용한 총일차생산량 추정 및 순생태계 CO2 교환량 배분의 정확도 평가에 관하여)

  • Park, Juhan;Kang, Minseok;Cho, Sungsik;Sohn, Seungwon;Kim, Jongho;Kim, Su-Jin;Lim, Jong-Hwan;Kang, Mingu;Shim, Kyo-Moon
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.251-267
    • /
    • 2021
  • Remotely sensed vegetation indices (VIs) are empirically related with gross primary productivity (GPP) in various spatio-temporal scales. The uncertainties in GPP-VI relationship increase with temporal resolution. Uncertainty also exists in the eddy covariance (EC)-based estimation of GPP, arising from the partitioning of the measured net ecosystem CO2 exchange (NEE) into GPP and ecosystem respiration (RE). For two forests and two agricultural sites, we correlated the EC-derived GPP in various time scales with three different near-surface remotely sensed VIs: (1) normalized difference vegetation index (NDVI), (2) enhanced vegetation index (EVI), and (3) near infrared reflectance from vegetation (NIRv) along with NIRvP (i.e., NIRv multiplied by photosynthetically active radiation, PAR). Among the compared VIs, NIRvP showed highest correlation with half-hourly and monthly GPP at all sites. The NIRvP was used to test the reliability of GPP derived by two different NEE partitioning methods: (1) original KoFlux methods (GPPOri) and (2) machine-learning based method (GPPANN). GPPANN showed higher correlation with NIRvP at half-hourly time scale, but there was no difference at daily time scale. The NIRvP-GPP correlation was lower under clear sky conditions due to co-limitation of GPP by other environmental conditions such as air temperature, vapor pressure deficit and soil moisture. However, under cloudy conditions when photosynthesis is mainly limited by radiation, the use of NIRvP was more promising to test the credibility of NEE partitioning methods. Despite the necessity of further analyses, the results suggest that NIRvP can be used as the proxy of GPP at high temporal-scale. However, for the VIs-based GPP estimation with high temporal resolution to be meaningful, complex systems-based analysis methods (related to systems thinking and self-organization that goes beyond the empirical VIs-GPP relationship) should be developed.

The Dynamics of CO2 Budget in Gwangneung Deciduous Old-growth Forest: Lessons from the 15 years of Monitoring (광릉 낙엽활엽수 노령림의 CO2 수지 역학: 15년 관측으로부터의 교훈)

  • Yang, Hyunyoung;Kang, Minseok;Kim, Joon;Ryu, Daun;Kim, Su-Jin;Chun, Jung-Hwa;Lim, Jong-Hwan;Park, Chan Woo;Yun, Soon Jin
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.198-221
    • /
    • 2021
  • After large-scale reforestation in the 1960s and 1970s, forests in Korea have gradually been aging. Net ecosystem CO2 exchange of old-growth forests is theoretically near zero; however, it can be a CO2 sink or source depending on the intervention of disturbance or management. In this study, we report the CO2 budget dynamics of the Gwangneung deciduous old-growth forest (GDK) in Korea and examined the following two questions: (1) is the preserved GDK indeed CO2 neutral as theoretically known? and (2) can we explain the dynamics of CO2 budget by the common mechanisms reported in the literature? To answer, we analyzed the 15-year long CO2 flux data measured by eddy covariance technique along with other biometeorological data at the KoFlux GDK site from 2006 to 2020. The results showed that (1) GDK switched back-and-forth between sink and source of CO2 but averaged to be a week CO2 source (and turning to a moderate CO2 source for the recent five years) and (2) the interannual variability of solar radiation, growing season length, and leaf area index showed a positive correlation with that of gross primary production (GPP) (R2=0.32~0.45); whereas the interannual variability of both air and surface temperature was not significantly correlated with that of ecosystem respiration (RE). Furthermore, the machine learning-based model trained using the dataset of early monitoring period (first 10 years) failed to reproduce the observed interannual variations of GPP and RE for the recent five years. Biomass data analysis suggests that carbon emissions from coarse woody debris may have contributed partly to the conversion to a moderate CO2 source. To properly understand and interpret the long-term CO2 budget dynamics of GDK, new framework of analysis and modeling based on complex systems science is needed. Also, it is important to maintain the flux monitoring and data quality along with the monitoring of coarse woody debris and disturbances.

Modeling of Vegetation Phenology Using MODIS and ASOS Data (MODIS와 ASOS 자료를 이용한 식물계절 모델링)

  • Kim, Geunah;Youn, Youjeong;Kang, Jonggu;Choi, Soyeon;Park, Ganghyun;Chun, Junghwa;Jang, Keunchang;Won, Myoungsoo;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_1
    • /
    • pp.627-646
    • /
    • 2022
  • Recently, the seriousness of climate change-related problems caused by global warming is growing, and the average temperature is also rising. As a result, it is affecting the environment in which various temperature-sensitive creatures and creatures live, and changes in the ecosystem are also being detected. Seasons are one of the important factors influencing the types, distribution, and growth characteristics of creatures living in the area. Among the most popular and easily recognized plant seasonal phenomena among the indicators of the climate change impact evaluation, the blooming day of flower and the peak day of autumn leaves were modeled. The types of plants used in the modeling were forsythia and cherry trees, which can be seen as representative plants of spring, and maple and ginkgo, which can be seen as representative plants of autumn. Weather data used to perform modeling were temperature, precipitation, and solar radiation observed through the ASOS Observatory of the Korea Meteorological Administration. As satellite data, MODIS NDVI was used for modeling, and it has a correlation coefficient of about -0.2 for the flowering date and 0.3 for the autumn leaves peak date. As the model used, the model was established using multiple regression models, which are linear models, and Random Forest, which are nonlinear models. In addition, the predicted values estimated by each model were expressed as isopleth maps using spatial interpolation techniques to express the trend of plant seasonal changes from 2003 to 2020. It is believed that using NDVI with high spatio-temporal resolution in the future will increase the accuracy of plant phenology modeling.

Automated Analyses of Ground-Penetrating Radar Images to Determine Spatial Distribution of Buried Cultural Heritage (매장 문화재 공간 분포 결정을 위한 지하투과레이더 영상 분석 자동화 기법 탐색)

  • Kwon, Moonhee;Kim, Seung-Sep
    • Economic and Environmental Geology
    • /
    • v.55 no.5
    • /
    • pp.551-561
    • /
    • 2022
  • Geophysical exploration methods are very useful for generating high-resolution images of underground structures, and such methods can be applied to investigation of buried cultural properties and for determining their exact locations. In this study, image feature extraction and image segmentation methods were applied to automatically distinguish the structures of buried relics from the high-resolution ground-penetrating radar (GPR) images obtained at the center of Silla Kingdom, Gyeongju, South Korea. The major purpose for image feature extraction analyses is identifying the circular features from building remains and the linear features from ancient roads and fences. Feature extraction is implemented by applying the Canny edge detection and Hough transform algorithms. We applied the Hough transforms to the edge image resulted from the Canny algorithm in order to determine the locations the target features. However, the Hough transform requires different parameter settings for each survey sector. As for image segmentation, we applied the connected element labeling algorithm and object-based image analysis using Orfeo Toolbox (OTB) in QGIS. The connected components labeled image shows the signals associated with the target buried relics are effectively connected and labeled. However, we often find multiple labels are assigned to a single structure on the given GPR data. Object-based image analysis was conducted by using a Large-Scale Mean-Shift (LSMS) image segmentation. In this analysis, a vector layer containing pixel values for each segmented polygon was estimated first and then used to build a train-validation dataset by assigning the polygons to one class associated with the buried relics and another class for the background field. With the Random Forest Classifier, we find that the polygons on the LSMS image segmentation layer can be successfully classified into the polygons of the buried relics and those of the background. Thus, we propose that these automatic classification methods applied to the GPR images of buried cultural heritage in this study can be useful to obtain consistent analyses results for planning excavation processes.

A study on improving the accuracy of machine learning models through the use of non-financial information in predicting the Closure of operator using electronic payment service (전자결제서비스 이용 사업자 폐업 예측에서 비재무정보 활용을 통한 머신러닝 모델의 정확도 향상에 관한 연구)

  • Hyunjeong Gong;Eugene Hwang;Sunghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.361-381
    • /
    • 2023
  • Research on corporate bankruptcy prediction has been focused on financial information. Since the company's financial information is updated quarterly, there is a problem that timeliness is insufficient in predicting the possibility of a company's business closure in real time. Evaluated companies that want to improve this need a method of judging the soundness of a company that uses information other than financial information to judge the soundness of a target company. To this end, as information technology has made it easier to collect non-financial information about companies, research has been conducted to apply additional variables and various methodologies other than financial information to predict corporate bankruptcy. It has become an important research task to determine whether it has an effect. In this study, we examined the impact of electronic payment-related information, which constitutes non-financial information, when predicting the closure of business operators using electronic payment service and examined the difference in closure prediction accuracy according to the combination of financial and non-financial information. Specifically, three research models consisting of a financial information model, a non-financial information model, and a combined model were designed, and the closure prediction accuracy was confirmed with six algorithms including the Multi Layer Perceptron (MLP) algorithm. The model combining financial and non-financial information showed the highest prediction accuracy, followed by the non-financial information model and the financial information model in order. As for the prediction accuracy of business closure by algorithm, XGBoost showed the highest prediction accuracy among the six algorithms. As a result of examining the relative importance of a total of 87 variables used to predict business closure, it was confirmed that more than 70% of the top 20 variables that had a significant impact on the prediction of business closure were non-financial information. Through this, it was confirmed that electronic payment-related information of non-financial information is an important variable in predicting business closure, and the possibility of using non-financial information as an alternative to financial information was also examined. Based on this study, the importance of collecting and utilizing non-financial information as information that can predict business closure is recognized, and a plan to utilize it for corporate decision-making is also proposed.