• Title/Summary/Keyword: 소셜 마이닝

Search Result 217, Processing Time 0.024 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

'Elderly image' Analysis Using Big Data and Social Networking Techniques (빅데이터와 사회연결망 기법을 이용한 '노인 이미지' 분석)

  • Han, Sun-Bo;Lee, Hyun-Sim
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.253-263
    • /
    • 2016
  • We analyzed the social issue 'image of the elderly' using Big Data and Social Network Analysis. First, we analyzed the words extracted by the text mining technique by inputting the keyword 'elderly'. As a result of analysis, the image of the elderly viewed through media such as cafes, blogs, etc. Representing the trend of the public was using the word 'Senior' the most. The image of the elderly is expressed using the word having the highest frequency in the top 10, "The elderly are 'Senior' people who are respected by society, they are organized to earn money, to earn their qualifications, to health, and to 'Seniors' who desire to work healthy up to 100 years old". The purpose of this study is to differentiate from the existing analysis method by analyzing the macro-level image of the elderly including the social discourse by collecting vast amount of data and analyzing it with the social networking technique. When the image of the elderly that the public perceives is positively expressed as 'Senior', it can be said that the direction of the current elderly policy is evaluated as a desirable direction. On the other hand, it was able to feel the 'desire' of the public who wanted to be evaluated. Therefore, the policy direction of the elderly to be applied in the future should be the policy that enables the elderly to be perceived as 'Necessary existence' in society by taking on social roles. In addition, we proposed to implement the policy of the elderly that reflects priorities such as job creation, welfare, and alienation that can activity and maintain health.

Discovery of Market Convergence Opportunity Combining Text Mining and Social Network Analysis: Evidence from Large-Scale Product Databases (B2B 전자상거래 정보를 활용한 시장 융합 기회 발굴 방법론)

  • Kim, Ji-Eun;Hyun, Yoonjin;Choi, Yun-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.87-107
    • /
    • 2016
  • Understanding market convergence has became essential for small and mid-size enterprises. Identifying convergence items among heterogeneous markets could lead to product innovation and successful market introduction. Previous researches have two limitations. First, traditional researches focusing on patent databases are suitable for detecting technology convergence, however, they have failed to recognize market demands. Second, most researches concentrate on identifying the relationship between existing products or technology. This study presents a platform to identify the opportunity of market convergence by using product databases from a global B2B marketplace. We also attempt to identify convergence opportunity in different industries by applying Structural Hole theory. This paper shows the mechanisms for market convergence: attributes extraction of products and services using text mining and association analysis among attributes, and network analysis based on structural hole. In order to discover market demand, we analyzed 240,002 e-catalog from January 2013 to July 2016.

Reliability Analysis of VOC Data for Opinion Mining (오피니언 마이닝을 위한 VOC 데이타의 신뢰성 분석)

  • Kim, Dongwon;Yu, Song Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.217-245
    • /
    • 2016
  • The purpose of this study is to verify how 7 sentiment domains extracted through sentiment analysis from social media have an influence on business performance. It consists of three phases. In phase I, we constructed the sentiment lexicon after crawling 45,447 pieces of VOC (Voice of the Customer) on 26 auto companies from the car community and extracting the POS information and built a seven-sensitive domains. In phase II, in order to retain the reliability of experimental data, we examined auto-correlation analysis and PCA. In phase III, we investigated how 7 domains impact on the market share of three major (GM, FCA, and VOLKSWAGEN) auto companies by using linear regression analysis. The findings from the auto-correlation analysis proved auto-correlation and the sequence of the sentiments, and the results from PCA reported the 7 sentiments connected with positivity, negativity and neutrality. As a result of linear regression analysis on model 1, we indentified that the sentimental factors have a significant influence on the actual market share. In particular, not only posotive and negative sentiment domains, but neutral sentiment had significantly impacted on auto market share. As we apply the availability of data to the market, and take advantage of auto-correlation of the market-related information and the sentiment, the findings will be a huge contribution to other researches on sentiment analysis as well as actual business performances in various ways.

A Study on the Changes in Perspectives on Unwed Mothers in S.Korea and the Direction of Government Polices: 1995~2020 Social Media Big Data Analysis (한국미혼모에 대한 관점 변화와 정부정책의 방향: 1995년~2020년 소셜미디어 빅데이터 분석)

  • Seo, Donghee;Jun, Boksun
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.305-313
    • /
    • 2021
  • This study collected and analyzed big data from 1995 to 2020, focusing on the keywords "unwed mother", "single mother," and "single mom" to present appropriate government support policy directions according to changes in perspectives on unwed mothers. Big data collection platform Textom was used to collect data from portal search sites Naver and Daum and refine data. The final refined data were word frequency analysis, TF-IDF analysis, an N-gram analysis provided by Textom. In addition, Network analysis and CONCOR analysis were conducted through the UCINET6 program. As a result of the study, similar words appeared in word frequency analysis and TF-IDF analysis, but they differed by year. In the N-gram analysis, there were similarities in word appearance, but there were many differences in frequency and form of words appearing in series. As a result of CONCOR analysis, it was found that different clusters were formed by year. This study confirms the change in the perspective of unwed mothers through big data analysis, suggests the need for unwed mothers policies for various options for independent women, and policies that embrace pregnancy, childbirth, and parenting without discrimination within the new family form.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

The Analysis of Fashion Trend Cycle using Big Data (패션 트렌드의 주기적 순환성에 관한 빅데이터 융합 분석)

  • Kim, Ki-Hyun;Byun, Hae-Won
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.113-123
    • /
    • 2020
  • In this paper, big data analysis was conducted for past and present fashion trends and fashion cycle. We focused on daily look for ordinary people instead of the fashion professionals and fashion show. Using the social matrix tool, Textom, we performed frequency analysis, N-gram analysis, network analysis and structural equivalence analysis on the big data containing fashion trends and cycles. The results are as follows. First, this study extracted the major key words related to fashion trends for the daily look from the past(1980s, 1990s) and the present(2019 and 2020). Second, the frequence analysis and N-gram analysis showed that the fashion cycle has shorten to 30-40 years. Third, the structural equivalence analysis found the four representative clusters. The past four clusters are jean, retro codi, athleisure look, celebrity retro and the present clusters are retro, newtro, lady chic, retro futurism. Fourth, through the network analysis and N-gram analysis, it turned out that the past fashion is reproduced and evolves to the current fashion with certain reasoning.

Exploratory Analysis of Consumer Responses to Korea-China Mobile Payment Service using Keyword Analysis -Focus on Kakao Pay and Alipay- (키워드 분석을 활용한 한·중 모바일 결제 서비스에 대한 소비자 반응 탐색적 분석 -카카오페이와 알리페이를 중심으로-)

  • Ke, Jung;Yoon, Donghwa;Ahn, Jinhyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.514-523
    • /
    • 2021
  • Recently, the proliferation of mobile simple payment services has been increasingly affecting people's lives. In addition, the increase in research from both China and Korea shows that the continuous development of simple mobile payment services will be very important in the future. The blog posts mentioning Kakao Pay and Alipay were collected, and keyword analysis was performed to investigate differences in consumers' responses to Kakao Pay and Alipay on social media. The frequency of keywords for each part of speech and the frequency of co-occurred words mentioned in one sentence were analyzed. Specifically, common words that appear in both Kakao Pay and Alipay blogs were extracted. The cooccurred words were analyzed to examine how different reactions were made on the same subject. As a result of the analysis, there were concerns among consumers about the trust of Kakao Pay and Alipay's benefits. For a mobile payment service to become competitive, it is necessary to add various additional services or solve security problems.