• Title/Summary/Keyword: High-Frequency Refinement

Search Result 22, Processing Time 0.015 seconds

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

Detection of Wildfire Burned Areas in California Using Deep Learning and Landsat 8 Images (딥러닝과 Landsat 8 영상을 이용한 캘리포니아 산불 피해지 탐지)

  • Youngmin Seo;Youjeong Youn;Seoyeon Kim;Jonggu Kang;Yemin Jeong;Soyeon Choi;Yungyo Im;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1413-1425
    • /
    • 2023
  • The increasing frequency of wildfires due to climate change is causing extreme loss of life and property. They cause loss of vegetation and affect ecosystem changes depending on their intensity and occurrence. Ecosystem changes, in turn, affect wildfire occurrence, causing secondary damage. Thus, accurate estimation of the areas affected by wildfires is fundamental. Satellite remote sensing is used for forest fire detection because it can rapidly acquire topographic and meteorological information about the affected area after forest fires. In addition, deep learning algorithms such as convolutional neural networks (CNN) and transformer models show high performance for more accurate monitoring of fire-burnt regions. To date, the application of deep learning models has been limited, and there is a scarcity of reports providing quantitative performance evaluations for practical field utilization. Hence, this study emphasizes a comparative analysis, exploring performance enhancements achieved through both model selection and data design. This study examined deep learning models for detecting wildfire-damaged areas using Landsat 8 satellite images in California. Also, we conducted a comprehensive comparison and analysis of the detection performance of multiple models, such as U-Net and High-Resolution Network-Object Contextual Representation (HRNet-OCR). Wildfire-related spectral indices such as normalized difference vegetation index (NDVI) and normalized burn ratio (NBR) were used as input channels for the deep learning models to reflect the degree of vegetation cover and surface moisture content. As a result, the mean intersection over union (mIoU) was 0.831 for U-Net and 0.848 for HRNet-OCR, showing high segmentation performance. The inclusion of spectral indices alongside the base wavelength bands resulted in increased metric values for all combinations, affirming that the augmentation of input data with spectral indices contributes to the refinement of pixels. This study can be applied to other satellite images to build a recovery strategy for fire-burnt areas.