• Title/Summary/Keyword: 텍스트마이닝 분석

Search Result 1,003, Processing Time 0.024 seconds

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

The Research Trend and Social Perceptions Related with the Tap Water in South Korea (수돗물 이용에 대한 국내 연구동향과 사회적 인식)

  • Kim, Ji Yoon;Do, Yuno;Joo, Gea-Jae;Kim, Eunhee;Park, Eun-Young;Lee, Sang-Hyup;Baek, Myeong Su
    • Korean Journal of Ecology and Environment
    • /
    • v.49 no.3
    • /
    • pp.208-214
    • /
    • 2016
  • We analyzed research trend and public perception related with tap water to identify major factors affecting low consumption of tap water. 805 research articles were collected for text mining analysis and 1,000 on-line questionnaires were surveyed to find social variables influencing tap water intake. Based on the word network analysis, research topics were divided into 4 major categories, 1) drinking water quality, 2) water fluoridation, 3) residual chlorine, and 4) micro-organism management. Compared with these major research topics, scientific studies of drinking behavior, or social perception were rather limited. 22.4% of total respondents used tap water as drinking water source, and only 1% drank tap water without further treatments (i.e. boiling, filtering). Experience of quality control report (B=0.392, p=0.046) and level of policy trust (B=1.002, p<0.0001) were influential factors on tap water drinking behavior. Age (B=0.020, p=0.002) and gender (B= - 1.843, p<0.0001) also showed significant difference. To increase the frequency of drinking the tap water by social members, the more scientific information of tap water quality and the water policy management should be clearly shared with social members.

TF-IDF Based Association Rule Analysis System for Medical Data (의료 정보 추출을 위한 TF-IDF 기반의 연관규칙 분석 시스템)

  • Park, Hosik;Lee, Minsu;Hwang, Sungjin;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.145-154
    • /
    • 2016
  • Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.

Antecedent Decision Rules of Personal Pronouns for Coreference Resolution (Coreference Resolution을 위한 3인칭 대명사의 선행사 결정 규칙)

  • Kang, Seung-Shik;Yun, Bo-Hyun;Woo, Chong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.227-232
    • /
    • 2004
  • When we extract a representative term from text for information retrieval system or a special information for information retrieval and text milling system, we often need to solve the anaphora resolution problem. The antecedent decision problem of a pronoun is one of the major issues for anaphora resolution. In this paper, we are suggesting a method of deciding an antecedent of the third personal pronouns, such as “he/she/they” to analyze the contents of documents precisely. Generally, the antecedent of the third personal Pronouns seem to be the subject of the current statement or previous statement, and also it occasionally happens more than twice. Based on these characteristics, we have found rules for deciding an antecedent, by investigating a case of being an antecedent from the personal pronouns, which appears in the current statement and the previous statements. Since the heuristic rule differs on the case of the third personal pronouns, we described it as subjective case, objective case, and possessive case based on the case of the pronouns. We collected 300 sentences that include a pronoun from the newspaper articles on political issues. The result of our experiment shows that the recall and precision ratio on deciding the antecedent of the third personal pronouns are 79.0% and 86.8%, respectively.

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

A Study on the Influence of Sentiment and Emotion on Review Helpfulness through Online Reviews of Restaurants (레스토랑의 온라인 리뷰를 통해 감성과 감정이 리뷰 유용성에 미치는 영향에 관한 연구)

  • Yao, Ziyan;Park, Jiyoung;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.1
    • /
    • pp.243-267
    • /
    • 2021
  • Sentiment represents one's own state through the process of change to stimulus, and emotion represents a simple psychological state felt for a certain phenomenon. These two terms tend to be used interchangeably, but their meaning and usage are different. In this study, we try to find out how it affects the helpfulness of reviews by classifying sentiment and emotion through online reviews written by online consumers after purchasing and using various products and services. Recently, online reviews have become a very important factor for businesses and consumers. Helpful reviews play a key role in the decision-making process of potential customers and can be assessed through review helpfulness. The helpfulness of reviews is becoming increasingly important in practice as it is utilized in marketing strategies in business as well as in purchasing decision-making issues of consumers. And academically, the importance of research to find the factors influencing the helpfulness of reviews is growing. In this study, Yelp.com secured reviews on restaurants and conducted a study on how the sentiment and emotion of online reviews affect the helpfulness of reviews. Based on the prior research, a research model including sentiment and emotions for online reviews was built, and text mining analyzes how the sentiment and emotion of online reviews affect the helpfulness of online reviews, and the difference in the effects on emotions It was verified. The results showed that negative sentiment and emotion had a greater effect on review helpfulness, which was consistent with the negative bias theory.

A Study on How to Set up a Standard Framework for AI Ethics and Regulation (AI 윤리와 규제에 관한 표준 프레임워크 설정 방안 연구)

  • Nam, Mun-Hee
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.7-15
    • /
    • 2022
  • With the aim of an intelligent world in the age of individual customization through decentralization of information and technology, sharing/opening, and connection, we often see a tendency to cross expectations and concerns in the technological discourse and interest in artificial intelligence more than ever. Recently, it is easy to find claims by futurists that AI singularity will appear before and after 2045. Now, as part of preparations to create a paradigm of coexistence that coexists and prosper with AI in the coming age of artificial intelligence, a standard framework for setting up more correct AI ethics and regulations is required. This is because excluding the risk of omission of setting major guidelines and methods for evaluating reasonable and more reasonable guideline items and evaluation standards are increasingly becoming major research issues. In order to solve these research problems and at the same time to develop continuous experiences and learning effects on AI ethics and regulation setting, we collect guideline data on AI ethics and regulation of international organizations / countries / companies, and research and suggest ways to set up a standard framework (SF: Standard Framework) through a setting research model and text mining exploratory analysis. The results of this study can be contributed as basic prior research data for more advanced AI ethics and regulatory guidelines item setting and evaluation methods in the future.

Developing a New Algorithm for Conversational Agent to Detect Recognition Error and Neologism Meaning: Utilizing Korean Syllable-based Word Similarity (대화형 에이전트 인식오류 및 신조어 탐지를 위한 알고리즘 개발: 한글 음절 분리 기반의 단어 유사도 활용)

  • Jung-Won Lee;Il Im
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.267-286
    • /
    • 2023
  • The conversational agents such as AI speakers utilize voice conversation for human-computer interaction. Voice recognition errors often occur in conversational situations. Recognition errors in user utterance records can be categorized into two types. The first type is misrecognition errors, where the agent fails to recognize the user's speech entirely. The second type is misinterpretation errors, where the user's speech is recognized and services are provided, but the interpretation differs from the user's intention. Among these, misinterpretation errors require separate error detection as they are recorded as successful service interactions. In this study, various text separation methods were applied to detect misinterpretation. For each of these text separation methods, the similarity of consecutive speech pairs using word embedding and document embedding techniques, which convert words and documents into vectors. This approach goes beyond simple word-based similarity calculation to explore a new method for detecting misinterpretation errors. The research method involved utilizing real user utterance records to train and develop a detection model by applying patterns of misinterpretation error causes. The results revealed that the most significant analysis result was obtained through initial consonant extraction for detecting misinterpretation errors caused by the use of unregistered neologisms. Through comparison with other separation methods, different error types could be observed. This study has two main implications. First, for misinterpretation errors that are difficult to detect due to lack of recognition, the study proposed diverse text separation methods and found a novel method that improved performance remarkably. Second, if this is applied to conversational agents or voice recognition services requiring neologism detection, patterns of errors occurring from the voice recognition stage can be specified. The study proposed and verified that even if not categorized as errors, services can be provided according to user-desired results.

Liaohe National Park based on big data visualization Visitor Perception Study

  • Qi-Wei Jing;Zi-Yang Liu;Cheng-Kang Zheng
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.133-142
    • /
    • 2023
  • National parks are one of the important types of protected area management systems established by IUCN and a management model for implementing effective conservation and sustainable use of natural and cultural heritage in countries around the world, and they assume important roles in conservation, scientific research, education, recreation and driving community development. In the context of big data, this study takes China's Liaohe National Park, a typical representative of global coastal wetlands, as a case study, and using Python technology to collect tourists' travelogues and reviews from major OTA websites in China as a source. The text spans from 2015 to 2022 and contains 2998 reviews with 166,588 words in total. The results show that wildlife resources, natural landscape, wetland ecology and the fishing and hunting culture of northern China are fully reflected in the perceptions of visitors to Liaohe National Park; visitors have strong positive feelings toward Liaohe National Park, but there is still much room for improvement in supporting services and facilities, public education and visitor experience and participation.

Analysis of major issues in the field of Maritime Autonomous Surface Ships using text mining: focusing on S.Korea news data (텍스트 마이닝을 활용한 자율운항선박 분야 주요 이슈 분석 : 국내 뉴스 데이터를 중심으로)

  • Hyeyeong Lee;Jin Sick Kim;Byung Soo Gu;Moon Ju Nam;Kook Jin Jang;Sung Won Han;Joo Yeoun Lee;Myoung Sug Chung
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.20 no.spc1
    • /
    • pp.12-29
    • /
    • 2024
  • The purpose of this study is to identify the social issues discussed in Korea regarding Maritime Autonomous Surface Ships (MASS), the most advanced ICT field in the shipbuilding industry, and to suggest policy implications. In recent years, it has become important to reflect social issues of public interest in the policymaking process. For this reason, an increasing number of studies use media data and social media to identify public opinion. In this study, we collected 2,843 domestic media articles related to MASS from 2017 to 2022, when MASS was officially discussed at the International Maritime Organization, and analyzed them using text mining techniques. Through term frequency-inverse document frequency (TF-IDF) analysis, major keywords such as 'shipbuilding,' 'shipping,' 'US,' and 'HD Hyundai' were derived. For LDA topic modeling, we selected eight topics with the highest coherence score (-2.2) and analyzed the main news for each topic. According to the combined analysis of five years, the topics '1. Technology integration of the shipbuilding industry' and '3. Shipping industry in the post-COVID-19 era' received the most media attention, each accounting for 16%. Conversely, the topic '5. MASS pilotage areas' received the least media attention, accounting for 8 percent. Based on the results of the study, the implications for policy, society, and international security are as follows. First, from a policy perspective, the government should consider the current situation of each industry sector and introduce MASS in stages and carefully, as they will affect the shipbuilding, port, and shipping industries, and a radical introduction may cause various adverse effects. Second, from a social perspective, while the positive aspects of MASS are often reported, there are also negative issues such as cybersecurity issues and the loss of seafarer jobs, which require institutional development and strategic commercialization timing. Third, from a security perspective, MASS are expected to change the paradigm of future maritime warfare, and South Korea is promoting the construction of a maritime unmanned system-based power, but it emphasizes the need for a clear plan and military leadership to secure and develop the technology. This study has academic and policy implications by shedding light on the multidimensional political and social issues of MASS through news data analysis, and suggesting implications from national, regional, strategic, and security perspectives beyond legal and institutional discussions.