• Title/Summary/Keyword: Social Media Mining

Search Result 243, Processing Time 0.023 seconds

Application of Sentiment Analysis and Topic Modeling on Rural Solar PV Issues : Comparison of News Articles and Blog Posts (감성분석과 토픽모델링을 활용한 농촌태양광 관련 이슈 연구 : 언론 기사와 블로그 포스트 비교)

  • Ki, Jaehong;Ahn, Seunghyeok
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.17-27
    • /
    • 2020
  • News articles and blog posts have influence on social agenda setting and this study applied text mining on the subject of solar PV in rural area appeared in those media. Texts are gained from online news articles and blog posts with rural solar PV as a keyword by web scrapping, and these are analysed by sentiment analysis and topic modeling technique. Sentiment analysis shows that the proportion of negative texts are significantly lower in blog posts compared to news articles. Result of topic modeling shows that topics related to government policy have the largest loading in positive articles whereas various topics are relatively evenly distributed in negative articles. For blog posts, topics related to rural area installation and environmental damage are have the largest loading in positive and negative texts, respectively. This research reveals issues related to rural solar PV by combining sentiment analysis and topic modeling that were separately applied in previous studies.

Active Senior Contents Trend Analysis using LDA Topic Modeling (LDA 토픽 모델링을 이용한 액티브 시니어 콘텐츠 트렌드 분석)

  • Lee, Dongwoo;Kim, Yoosin;Shin, Eunjung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.35-45
    • /
    • 2021
  • The purpose of this study is to understand the characteristics and trends of active senior. As the baby boom generation become the age of the elderly, they are more active than senior. These seniors are called active seniors, a new consumer group. Many countries and companies are also interested in providing relevant policies and services, but there is lack of researches on active senior trends. This study collects the 8,740 posts related to active seniors on social media from January 1st, 2018 to June 31st, 2021, and conducted keyword frequency analysis, TF-IDF analysis and LDA topic modeling. Through LDA topic modeling, topics are classified into 10 categories: lifestyle, benefits, shopping, government business, government education, health, society and economy, care industry, silver housing, leisure. The results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of active senior.

A Study on the Online Perception of Chabak Using Big Data Analysis (빅데이터 분석을 통한 차박의 온라인 인식에 대한 연구)

  • Kim, Sae-Hoon;Lee, Hwan-Soo
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.61-81
    • /
    • 2021
  • In the era of untact, the "Chabak" using cars as accommodation spaces is attracting attention as a new form of travel. Due to the advantages, including low costs, convenience, and safety, as well as the characteristics of the vehicle enabling independent travel, the demand for Chabak is continuously increasing. Despite the rapid growth of the market and related industries, little academic has investigated this trend. To establish itself as a new type of travel culture and to sustain the growth of related industries, it is essential to understand the public perception of Chabak. Therefore, based on the marketing mix theory and big data analysis, this study analyzes the public perception of Chabak. The results showed that Chabak has established itself as a consumer-led travel culture, contributing to the aftermarket growth of the automobile industry. Additionally, consumers were found to be increasingly inclined to enjoy travel economically and wisely, and actively share information through social media. This initial study on the new travel trend of Chabak is significant in that it employs big data analysis on a theoretical basis.

Developing the Automated Sentiment Learning Algorithm to Build the Korean Sentiment Lexicon for Finance (재무분야 감성사전 구축을 위한 자동화된 감성학습 알고리즘 개발)

  • Su-Ji Cho;Ki-Kwang Lee;Cheol-Won Yang
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.32-41
    • /
    • 2023
  • Recently, many studies are being conducted to extract emotion from text and verify its information power in the field of finance, along with the recent development of big data analysis technology. A number of prior studies use pre-defined sentiment dictionaries or machine learning methods to extract sentiment from the financial documents. However, both methods have the disadvantage of being labor-intensive and subjective because it requires a manual sentiment learning process. In this study, we developed a financial sentiment dictionary that automatically extracts sentiment from the body text of analyst reports by using modified Bayes rule and verified the performance of the model through a binary classification model which predicts actual stock price movements. As a result of the prediction, it was found that the proposed financial dictionary from this research has about 4% better predictive power for actual stock price movements than the representative Loughran and McDonald's (2011) financial dictionary. The sentiment extraction method proposed in this study enables efficient and objective judgment because it automatically learns the sentiment of words using both the change in target price and the cumulative abnormal returns. In addition, the dictionary can be easily updated by re-calculating conditional probabilities. The results of this study are expected to be readily expandable and applicable not only to analyst reports, but also to financial field texts such as performance reports, IR reports, press articles, and social media.

Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence (설명 가능한 인공지능을 이용한 지역별 출산율 차이 요인 분석)

  • Dongwoo Lee;Mi Kyung Kim;Jungyoon Yoon;Dongwon Ryu;Jae Wook Song
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.1
    • /
    • pp.41-50
    • /
    • 2024
  • Korea is facing a significant problem with historically low fertility rates, which is becoming a major social issue affecting the economy, labor force, and national security. This study analyzes the factors contributing to the regional gap in fertility rates and derives policy implications. The government and local authorities are implementing a range of policies to address the issue of low fertility. To establish an effective strategy, it is essential to identify the primary factors that contribute to regional disparities. This study identifies these factors and explores policy implications through machine learning and explainable artificial intelligence. The study also examines the influence of media and public opinion on childbirth in Korea by incorporating news and online community sentiment, as well as sentiment fear indices, as independent variables. To establish the relationship between regional fertility rates and factors, the study employs four machine learning models: multiple linear regression, XGBoost, Random Forest, and Support Vector Regression. Support Vector Regression, XGBoost, and Random Forest significantly outperform linear regression, highlighting the importance of machine learning models in explaining non-linear relationships with numerous variables. A factor analysis using SHAP is then conducted. The unemployment rate, Regional Gross Domestic Product per Capita, Women's Participation in Economic Activities, Number of Crimes Committed, Average Age of First Marriage, and Private Education Expenses significantly impact regional fertility rates. However, the degree of impact of the factors affecting fertility may vary by region, suggesting the need for policies tailored to the characteristics of each region, not just an overall ranking of factors.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Investigating Topics of Incivility Related to COVID-19 on Twitter: Analysis of Targets and Keywords of Hate Speech (트위터에서의 COVID-19와 관련된 반시민성 주제 탐색: 혐오 대상 및 키워드 분석)

  • Kim, Kyuli;Oh, Chanhee;Zhu, Yongjun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.331-350
    • /
    • 2022
  • This study aims to understand topics of incivility related to COVID-19 from analyzing Twitter posts including COVID-19-related hate speech. To achieve the goal, a total of 63,802 tweets that were created between December 1st, 2019, and August 31st, 2021, covering three targets of hate speech including region and public facilities, groups of people, and religion were analyzed. Frequency analysis, dynamic topic modeling, and keyword co-occurrence network analysis were used to explore topics and keywords. 1) Results of frequency analysis revealed that hate against regions and public facilities showed a relatively increasing trend while hate against specific groups of people and religion showed a relatively decreasing trend. 2) Results of dynamic topic modeling analysis showed keywords of each of the three targets of hate speech. Keywords of the region and public facilities included "Daegu, Gyeongbuk local hate", "interregional hate", and "public facility hate"; groups of people included "China hate", "virus spreaders", and "outdoor activity sanctions"; and religion included "Shincheonji", "Christianity", "religious infection", "refusal of quarantine", and "places visited by confirmed cases". 3) Similarly, results of keyword co-occurrence network analysis revealed keywords of three targets: region and public facilities (Corona, Daegu, confirmed cases, Shincheonji, Gyeongbuk, region); specific groups of people (Coronavirus, Wuhan pneumonia, Wuhan, China, Chinese, People, Entry, Banned); and religion (Corona, Church, Daegu, confirmed cases, infection). This study attempted to grasp the public's anti-citizenship public opinion related to COVID-19 by identifying domestic COVID-19 hate targets and keywords using social media. In particular, it is meaningful to grasp public opinion on incivility topics and hate emotions expressed on social media using data mining techniques for hate-related to COVID-19, which has not been attempted in previous studies. In addition, the results of this study suggest practical implications in that they can be based on basic data for contributing to the establishment of systems and policies for cultural communication measures in preparation for the post-COVID-19 era.

Urban Landscape Image Study by Text Mining and Factor Analysis - Focused on Lotte World Tower - (텍스트 마이닝과 인자분석에 의한 도시경관이미지 연구 - 롯데월드타워를 대상으로 -)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.45 no.4
    • /
    • pp.104-117
    • /
    • 2017
  • This study compares the results of landscape image analysis using text mining techniques and factor analysis for Lotte World Tower, which is the first atypical skyscraper building in Korea, and identifies landscape images of the site to determine possibilities of use. Lotte World Tower's landscape image has been extracted from text mining analysis focusing on adjectives such as 'new', 'transformational', 'unusual', 'novelty', 'impressive', and 'unique', and phrases such as in the process of change, people's active elements(caliber, outing, project, night view), media(newspaper, blog), and climate(weather, season). As a result of the factor analysis, factors affecting the landscape image of Lotte World Tower were symbolic, aesthetic, and formative. Identification, which is a morphological feature, has characteristics of scale and visibility but it is not statistically significant in preference. Rather, the psychological factors such as the symbolism with characteristics such as poison and specialty, harmony with the characteristics of the surrounding environment, and beautiful aesthetic characteristics were an influence on the landscape image. The common results of the two research methods show that psychological characteristics such as factors that can represent and represent the city affect the landscape image more greatly than the morphological and physical characteristics such as location and location of the building. In addition, the text mining technique can identify nouns and adjectives corresponding to the images that people see and feel, and confirms the relationship between the derived keywords, so that it can focus the process of forming the landscape image and further the image of the city. It would appear to be a suitable method to complement the limitation of landscape research. This study is meaningful in that it confirms the possibility that big data can be utilized in landscape analysis, which is one research field of landscape architecture, and is significant for understanding the information of a big data base and contribute to enlarging the landscape research area.

Perception and Appraisal of Urban Park Users Using Text Mining of Google Maps Review - Cases of Seoul Forest, Boramae Park, Olympic Park - (구글맵리뷰 텍스트마이닝을 활용한 공원 이용자의 인식 및 평가 - 서울숲, 보라매공원, 올림픽공원을 대상으로 -)

  • Lee, Ju-Kyung;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • The study aims to grasp the perception and appraisal of urban park users through text analysis. This study used Google review data provided by Google Maps. Google Maps Review is an online review platform that provides information evaluating locations through social media and provides an understanding of locations from the perspective of general reviewers and regional guides who are registered as members of Google Maps. The study determined if the Google Maps Reviews were useful for extracting meaningful information about the user perceptions and appraisals for parks management plans. The study chose three urban parks in Seoul, South Korea; Seoul Forest, Boramae Park, and Olympic Park. Review data for each of these three parks were collected via web crawling using Python. Through text analysis, the keywords and network structure characteristics for each park were analyzed. The text was analyzed, as were park ratings, and the analysis compared the reviews of residents and foreign tourists. The common keywords found in the review comments for the three parks were "walking", "bicycle", "rest" and "picnic" for activities, "family", "child" and "dogs" for accompanying types, and "playground" and "walking trail" for park facilities. Looking at the characteristics of each park, Seoul Forest shows many outdoor activities based on nature, while the lack of parking spaces and congestion on weekends negatively impacted users. Boramae Park has the appearance of a city park, with various facilities providing numerous activities, but reviewers often cited the park's complexity and the negative aspects in terms of dog walking groups. At Olympic Park, large-scale complex facilities and cultural events were frequently mentioned, emphasizing its entertainment functions. Google Maps Review can function as useful data to identify parks' overall users' experiences and general feelings. Compared to data from other social media sites, Google Maps Review's data provides ratings and understanding factors, including user satisfaction and dissatisfaction.