• Title/Summary/Keyword: 온라인 기계학습 모델

Search Result 34, Processing Time 0.033 seconds

Developing a deep learning-based recommendation model using online reviews for predicting consumer preferences: Evidence from the restaurant industry (딥러닝 기반 온라인 리뷰를 활용한 추천 모델 개발: 레스토랑 산업을 중심으로)

  • Dongeon Kim;Dongsoo Jang;Jinzhe Yan;Jiaen Li
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.31-49
    • /
    • 2023
  • With the growth of the food-catering industry, consumer preferences and the number of dine-in restaurants are gradually increasing. Thus, personalized recommendation services are required to select a restaurant suitable for consumer preferences. Previous studies have used questionnaires and star-rating approaches, which do not effectively depict consumer preferences. Online reviews are the most essential sources of information in this regard. However, previous studies have aggregated online reviews into long documents, and traditional machine-learning methods have been applied to these to extract semantic representations; however, such approaches fail to consider the surrounding word or context. Therefore, this study proposes a novel review textual-based restaurant recommendation model (RT-RRM) that uses deep learning to effectively extract consumer preferences from online reviews. The proposed model concatenates consumer-restaurant interactions with the extracted high-level semantic representations and predicts consumer preferences accurately and effectively. Experiments on real-world datasets show that the proposed model exhibits excellent recommendation performance compared with several baseline models.

Social Issue Risk Type Classification based on Social Bigdata (소셜 빅데이터 기반 사회적 이슈 리스크 유형 분류)

  • Oh, Hyo-Jung;An, Seung-Kwon;Kim, Yong
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.1-9
    • /
    • 2016
  • In accordance with the increased political and social utilization of social media, demands on online trend analysis and monitoring technologies based on social bigdata are also increasing rapidly. In this paper, we define 'risk' as issues which have probability of turn to negative public opinion among big social issues and classify their types in details. To define risk types, we conduct a complete survey on news documents and analyzed characteristics according to issue domains. We also investigate cross-medias analysis to find out how different public media and personalized social media. At the result, we define 58 risk types for 6 domains and developed automatic classification model based on machine learning algorithm. Based on empirical experiments, we prove the possibility of automatic detection for social issue risk in social media.

Inference of Korean Public Sentiment from Online News (온라인 뉴스에 대한 한국 대중의 감정 예측)

  • Matteson, Andrew Stuart;Choi, Soon-Young;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.25-31
    • /
    • 2018
  • Online news has replaced the traditional newspaper and has brought about a profound transformation in the way we access and share information. News websites have had the ability for users to post comments for quite some time, and some have also begun to crowdsource reactions to news articles. The field of sentiment analysis seeks to computationally model the emotions and reactions experienced when presented with text. In this work, we analyze more than 100,000 news articles over ten categories with five user-generated emotional annotations to determine whether or not these reactions have a mathematical correlation to the news body text and propose a simple sentiment analysis algorithm that requires minimal preprocessing and no machine learning. We show that it is effective even for a morphologically complex language like Korean.

A Box Office Type Classification and Prediction Model Based on Automated Machine Learning for Maximizing the Commercial Success of the Korean Film Industry (한국 영화의 산업의 흥행 극대화를 위한 AutoML 기반의 박스오피스 유형 분류 및 예측 모델)

  • Subeen Leem;Jihoon Moon;Seungmin Rho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.45-55
    • /
    • 2023
  • This paper presents a model that supports decision-makers in the Korean film industry to maximize the success of online movies. To achieve this, we collected historical box office movies and clustered them into types to propose a model predicting each type's online box office performance. We considered various features to identify factors contributing to movie success and reduced feature dimensionality for computational efficiency. We systematically classified the movies into types and predicted each type's online box office performance while analyzing the contributing factors. We used automated machine learning (AutoML) techniques to automatically propose and select machine learning algorithms optimized for the problem, allowing for easy experimentation and selection of multiple algorithms. This approach is expected to provide a foundation for informed decision-making and contribute to better performance in the film industry.

  • PDF

Robust Sentence Boundary Detection for Korean SNS Documents (한국어 SNS 문서에 적합한 문장 경계 인식)

  • Yeom, Haram;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.532-535
    • /
    • 2021
  • 다양한 SNS 플랫폼이 등장하고, 이용자 수가 급증함에 따라 온라인에서 얻을 수 있는 정보의 활용 가치가 높아지고 있다. 문장은 자연어 처리 시스템의 기본적인 단위이므로 주어진 문서로부터 문장의 경계를 인식하는 작업이 필수적이다. 공개된 문장 경계 인식기는 SNS 문서에서 좋은 성능을 보이지 않는다. 본 논문에서는 문어체로 구성된 일반 문서뿐 아니라 SNS 문서에서 사용할 수 있는 문장 경계 인식기를 제안한다. 본 논문에서는 SNS 문서에 적용하기 위해 다음과 같은 두 가지를 개선한다. 1) 학습 말뭉치를 일반문서와 SNS 문서 두 영역으로 확장하고, 2) 이모티콘을 사용하는 SNS 문서의 특징을 반영하는 어절의 유형을 자질로 추가하여 성능을 개선한다. 실험을 통해서 추가된 자질의 기여도를 분석하고, 또한 기존의 한국어 문장 경계 인식기와 제안한 모델의 성능을 비교·분석하였다. 개선된 모델은 일반 문서에서 99.1%의 재현율을 보이며, SNS 문서에서 88.4%의 재현율을 보였다. 두 영역 모두에서 문장 경계 인식이 잘 이루어지는 것을 확인할 수 있었다.

  • PDF

A Study on Representation Difficulty Assessment of Educational Presentation Materials (교육용 슬라이드의 표현적 난이도 측정에 관한 연구)

  • Kim, Seongchan;Song, Sa-Kwang;Yi, Mun Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.1054-1056
    • /
    • 2017
  • 대학과 같은 고등교육 현장에서 슬라이드는 수업 또는 세미나에 많이 활용되는 매체이다. 최근에는 SlideShare 등의 슬라이드 전용 공유 플랫폼까지 등장하며 온라인상에 더 많은 교육용 슬라이드가 축적되고 있다. 이 연구에서는 이러한 슬라이드 형태의 교육 자료에 대해 인식하기 쉽고 어려운 정도인 표현적 난이도를 자동으로 측정하는 기법을 제안한다. 제안하는 60개의 자질을 활용하여 기계학습 모델을 구축하고 표현적으로 고 난이도와 저 난이도의 슬라이드를 효과적으로 구분한다. 정밀하게 파악된 난이도 정보는 콘텐츠 선택에 있어 사용자 편의성을 획기적으로 증대시켜 줄 수 있다.

A Study on the Fraud Detection in an Online Second-hand Market by Using Topic Modeling and Machine Learning (토픽 모델링과 머신 러닝 방법을 이용한 온라인 C2C 중고거래 시장에서의 사기 탐지 연구)

  • Dongwoo Lee;Jinyoung Min
    • Information Systems Review
    • /
    • v.23 no.4
    • /
    • pp.45-67
    • /
    • 2021
  • As the transaction volume of the C2C second-hand market is growing, the number of frauds, which intend to earn unfair gains by sending products different from specified ones or not sending them to buyers, is also increasing. This study explores the model that can identify frauds in the online C2C second-hand market by examining the postings for transactions. For this goal, this study collected 145,536 field data from actual C2C second-hand market. Then, the model is built with the characteristics from postings such as the topic and the linguistic characteristics of the product description, and the characteristics of products, postings, sellers, and transactions. The constructed model is then trained by the machine learning algorithm XGBoost. The final analysis results show that fraudulent postings have less information, which is also less specific, fewer nouns and images, a higher ratio of the number and white space, and a shorter length than genuine postings do. Also, while the genuine postings are focused on the product information for nouns, delivery information for verbs, and actions for adjectives, the fraudulent postings did not show those characteristics. This study shows that the various features can be extracted from postings written in C2C second-hand transactions and be used to construct an effective model for frauds. The proposed model can be also considered and applied for the other C2C platforms. Overall, the model proposed in this study can be expected to have positive effects on suppressing and preventing fraudulent behavior in online C2C markets.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Improving Accuracy of Noise Review Filtering for Places with Insufficient Training Data

  • Hyeon Gyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.19-27
    • /
    • 2023
  • In the process of collecting social reviews, a number of noise reviews irrelevant to a given search keyword can be included in the search results. To filter out such reviews, machine learning can be used. However, if the number of reviews is insufficient for a target place to be analyzed, filtering accuracy can be degraded due to the lack of training data. To resolve this issue, we propose a supervised learning method to improve accuracy of the noise review filtering for the places with insufficient reviews. In the proposed method, training is not performed by an individual place, but by a group including several places with similar characteristics. The classifier obtained through the training can be used for the noise review filtering of an arbitrary place belonging to the group, so the problem of insufficient training data can be resolved. To verify the proposed method, a noise review filtering model was implemented using LSTM and BERT, and filtering accuracy was checked through experiments using real data collected online. The experimental results show that the accuracy of the proposed method was 92.4% on the average, and it provided 87.5% accuracy when targeting places with less than 100 reviews.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.