• Title/Summary/Keyword: Text mining analysis

Search Result 1,222, Processing Time 0.026 seconds

Analysis of Dog-Related Outdoor Public Space Conflicts Using Complaint Data (민원 자료를 활용한 반려견 관련 옥외 공공공간 갈등 분석)

  • Yoo, Ye-seul;Son, Yong-Hoon;Zoh, Kyung-Jin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.1
    • /
    • pp.34-45
    • /
    • 2024
  • Companion animals are increasingly being recognized as members of society in outdoor public spaces. However, the presence of dogs in cities has become a subject of conflict between pet owners and non-pet owners, causing problems in terms of hygiene and noise. This study was conducted to analyze public complaint data using the keywords 'dog,' 'pet,' and 'puppy' through text mining techniques to identify the causes of conflicts in outdoor public spaces related to dogs and to identify key issues. The main findings of the study are as follows. First, the majority of dog-related complaints were related to the use of outdoor public spaces. Second, different types of outdoor public spaces have different spatial issues. Third, there were a total of four topics of dog-related complaints: 'Requesting a dog playground', 'Raising safety issues related to animals', 'Using facilities other than dog-only areas', and 'Requesting increased park management and enforcement related to pet tickets'. This study analyzed the perceptions of citizens surrounding pets at a time when the creation and use of public spaces related to pets are expanding. In particular, it is significant in that it applied a new method of collecting public opinions by adopting complaint data that clearly presents problems and requests.

Online Privacy Protection: An Analysis of Social Media Reactions to Data Breaches (온라인 정보 보호: 소셜 미디어 내 정보 유출 반응 분석)

  • Seungwoo Seo;Youngjoon Go;Hong Joo Lee
    • Knowledge Management Research
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2024
  • This study analyzed the changes in social media reactions of data subjects to major personal data breach incidents in South Korea from January 2014 to October 2022. We collected a total of 1,317 posts written on Naver Blogs within a week immediately following each incident. Applying the LDA topic modeling technique to these posts, five main topics were identified: personal data breaches, hacking, information technology, etc. Analyzing the temporal changes in topic distribution, we found that immediately after a data breach incident, the proportion of topics directly mentioning the incident was the highest. However, as time passed, the proportion of mentions related indirectly to the personal data breach increased. This suggests that the attention of data subjects shifts from the specific incident to related topics over time, and interest in personal data protection also decreases. The findings of this study imply a future need for research on the changes in privacy awareness of data subjects following personal data breach incidents.

UX Methodology Study by Data Analysis Focusing on deriving persona through customer segment classification (데이터 분석을 통한 UX 방법론 연구 고객 세그먼트 분류를 통한 페르소나 도출을 중심으로)

  • Lee, Seul-Yi;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.151-176
    • /
    • 2021
  • As the information technology industry develops, various kinds of data are being created, and it is now essential to process them and use them in the industry. Analyzing and utilizing various digital data collected online and offline is a necessary process to provide an appropriate experience for customers in the industry. In order to create new businesses, products, and services, it is essential to use customer data collected in various ways to deeply understand potential customers' needs and analyze behavior patterns to capture hidden signals of desire. However, it is true that research using data analysis and UX methodology, which should be conducted in parallel for effective service development, is being conducted separately and that there is a lack of examples of use in the industry. In thiswork, we construct a single process by applying data analysis methods and UX methodologies. This study is important in that it is highly likely to be used because it applies methodologies that are actively used in practice. We conducted a survey on the topic to identify and cluster the associations between factors to establish customer classification and target customers. The research methods are as follows. First, we first conduct a factor, regression analysis to determine the association between factors in the happiness data survey. Groups are grouped according to the survey results and identify the relationship between 34 questions of psychological stability, family life, relational satisfaction, health, economic satisfaction, work satisfaction, daily life satisfaction, and residential environment satisfaction. Second, we classify clusters based on factors affecting happiness and extract the optimal number of clusters. Based on the results, we cross-analyzed the characteristics of each cluster. Third, forservice definition, analysis was conducted by correlating with keywords related to happiness. We leverage keyword analysis of the thumb trend to derive ideas based on the interest and associations of the keyword. We also collected approximately 11,000 news articles based on the top three keywords that are highly related to happiness, then derived issues between keywords through text mining analysis in SAS, and utilized them in defining services after ideas were conceived. Fourth, based on the characteristics identified through data analysis, we selected segmentation and targetingappropriate for service discovery. To this end, the characteristics of the factors were grouped and selected into four groups, and the profile was drawn up and the main target customers were selected. Fifth, based on the characteristics of the main target customers, interviewers were selected and the In-depthinterviews were conducted to discover the causes of happiness, causes of unhappiness, and needs for services. Sixth, we derive customer behavior patterns based on segment results and detailed interviews, and specify the objectives associated with the characteristics. Seventh, a typical persona using qualitative surveys and a persona using data were produced to analyze each characteristic and pros and cons by comparing the two personas. Existing market segmentation classifies customers based on purchasing factors, and UX methodology measures users' behavior variables to establish criteria and redefine users' classification. Utilizing these segment classification methods, applying the process of producinguser classification and persona in UX methodology will be able to utilize them as more accurate customer classification schemes. The significance of this study is summarized in two ways: First, the idea of using data to create a variety of services was linked to the UX methodology used to plan IT services by applying it in the hot topic era. Second, we further enhance user classification by applying segment analysis methods that are not currently used well in UX methodologies. To provide a consistent experience in creating a single service, from large to small, it is necessary to define customers with common goals. To this end, it is necessary to derive persona and persuade various stakeholders. Under these circumstances, designing a consistent experience from beginning to end, through fast and concrete user descriptions, would be a very effective way to produce a successful service.

Analysis on Dynamics of Korea Startup Ecosystems Based on Topic Modeling (토픽 모델링을 활용한 한국의 창업생태계 트렌드 변화 분석)

  • Heeyoung Son;Myungjong Lee;Youngjo Byun
    • Knowledge Management Research
    • /
    • v.23 no.4
    • /
    • pp.315-338
    • /
    • 2022
  • In 1986, Korea established legal systems to support small and medium-sized start-ups, which becomes the main pillars of national development. The legal systems have stimulated start-up ecosystems to have more than 1 million new start-up companies founded every year during the past 30 years. To analyze the trend of Korea's start-up ecosystem, in this study, we collected 1.18 million news articles from 1991 to 2020. Then, we extracted news articles that have the keywords "start-up", "venture", and "start-up". We employed network analysis and topic modeling to analyze collected news articles. Our analysis can contribute to analyzing the government policy direction shown in the history of start-up support policy. Specifically, our analysis identifies the dynamic characteristics of government influenced by external environmental factors (e.g., society, economy, and culture). The results of our analysis suggest that the start-up ecosystems in Korea have changed and developed mainly by the government policies for corporation governance, industrial development planning, deregulation, and economic prosperity plan. Our frequency keyword analysis contributes to understanding entrepreneurial productivity attributed to activities among the networked components in industrial ecosystems. Our analyses and results provide practitioners and researchers with practical and academic implications that can help to establish dedicated support policies through forecast tasks of the economic environment surrounding the start-ups. Korean entrepreneurial productivity has been empowered by growing numbers of large companies in the mobile phone industry. The spectrum of large companies incorporates content startups, platform providers, online shopping malls, and youth-oriented start-ups. In addition, economic situational factors contribute to the growth of Korean entrepreneurial productivity the economic, which are related to the global expansions of the mobile industry, and government efforts to foster start-ups. Our research is methodologically implicative. We employ natural language processes for 30 years of media articles, which enables more rigorous analysis compared to the existing studies which only observe changes in government and policy based on a qualitative manner.

The Empirical Study on the Effect of Technology Exchanges in the Fourth Industrial Revolution between Korea and China: Focused on the Firm Social Network Analysis (한중 4차산업혁명 기술교류 및 효과에 대한 실증연구: 기업 소셜 네트워크 분석 중심으로)

  • Zhou, Zhenxin;Sohn, Kwonsang;Hwang, Yoon Min;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.41-61
    • /
    • 2020
  • China's rapid development and commercialization of high-tech technologies in the fourth industrial revolution has led to effective technology exchanges between Korean and Chinese firms becoming more important to Korea's mid-term and long-term industrial development. However, there is still a lack of empirical research on how technology exchanges between Korean and Chinese firms proceed and their effectiveness. In response, this study conducted a social network analysis based on text mining data of Korea-China business technology exchange and cooperation articles introduced in the news from 2018 to March 2020 on the current status and effects of Korea-China technology exchanges related to the fourth industrial revolution, and conducted a regression analysis how network centrality effect on the firm performance. According to the results, most of the Korean major electronic firms are actively networking with Chinese firms and institutions, showing high centrality in the centrality index. Korean telecommunication firms showed high betweenness centrality and subgraph centrality, and Korean Internet service providers and broadcasting contents firms showed high eigenvector centrality. In addition, Chinese firms showed higher betweenness centrality than Korean firms, and Chinese service firms showed higher closeness centrality than manufacturing firms. As a result of regression analysis, this network centrality had a positive effect on firm performance. To the best of our knowledge, this is the first to analyze the impact of the technical cooperation between Korean and Chinese firms under the fourth industrial revolution context. This study has theoretical implications that suggested the direction of social network analysis-based empirical research in global firm cooperation. Also, this study has practical implications that the guidelines for network analysis in setting the direction of technical cooperation between Korea and China by firms or governments.

Measuring the Economic Impact of Item Descriptions on Sales Performance (온라인 상품 판매 성과에 영향을 미치는 상품 소개글 효과 측정 기법)

  • Lee, Dongwon;Park, Sung-Hyuk;Moon, Songchun
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.1-17
    • /
    • 2012
  • Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.

Evaluation of Preference by Bukhansan Dulegil Course Using Sentiment Analysis of Blog Data (블로그 데이터 감성분석을 통한 북한산둘레길 구간별 선호도 평가)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.3
    • /
    • pp.1-10
    • /
    • 2021
  • This study aimed to evaluate preferences of Bukhansan dulegil using sentiment analysis, a natural language processing technique, to derive preferred and non-preferred factors. Therefore, we collected blog articles written in 2019 and produced sentimental scores by the derivation of positive and negative words in the texts for 21 dulegil courses. Then, content analysis was conducted to determine which factors led visitors to prefer or dislike each course. In blogs written about Bukhansan dulegil, positive words appeared in approximately 73% of the content, and the percentage of positive documents was significantly higher than that of negative documents for each course. Through this, it can be seen that visitors generally had positive sentiments toward Bukhansan dulegil. Nevertheless, according to the sentiment score analysis, all 21 dulegil courses belonged to both the preferred and non-preferred courses. Among courses, visitors preferred less difficult courses, in which they could walk without a burden, and in which various landscape elements (visual, auditory, olfactory, etc.) were harmonious yet distinct. Furthermore, they preferred courses with various landscapes and landscape sequences. Additionally, visitors appreciated the presence of viewpoints, such as observation decks, as a significant factor and preferred courses with excellent accessibility and information provisions, such as information boards. Conversely, the dissatisfaction with the dulegil courses was due to noise caused by adjacent roads, excessive urban areas, and the inequality or difficulty of the course which was primarily attributed to insufficient information on the landscape or section of the course. The results of this study can serve not only serve as a guide in national parks but also in the management of nearby forest green areas to formulate a plan to repair and improve dulegil. Further, the sentiment analysis used in this study is meaningful in that it can continuously monitor actual users' responses towards natural areas. However, since it was evaluated based on a predefined sentiment dictionary, continuous updates are needed. Additionally, since there is a tendency to share positive content rather than negative views due to the nature of social media, it is necessary to compare and review the results of analysis, such as with on-site surveys.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

Revisiting the cause of unemployment problem in Korea's labor market: The job seeker's interests-based topic analysis (취업준비생 토픽 분석을 통한 취업난 원인의 재탐색)

  • Kim, Jung-Su;Lee, Suk-Jun
    • Management & Information Systems Review
    • /
    • v.35 no.1
    • /
    • pp.85-116
    • /
    • 2016
  • The present study aims to explore the causes of employment difficulty on the basis of job applicant's interest from P-E (person-environment) fit perspective. Our approach relied on a textual analytic method to reveal insights from their situational interests in a job search during the change of labor market. Thus, to investigate the type of major interests and psychological responses, user-generated texts in a social community were collected for analysis between January 1, 2013 through December 31, 2015 by crawling the online-community in regard to job seeking and sharing information and opinions. The results of topic analysis indicated user's primary interests were divided into four types: perception of vocation expectation, employment pre-preparation behaviors, perception of labor market, and job-seeking stress. Specially, job applicants put mainly concerns of monetary reward and a form of employment, rather than their work values or career exploration, thus youth job applicants expressed their psychological responses using contextualized language (e.g., slang, vulgarisms) for projecting their unstable state under uncertainty in response to environmental changes. Additionally, they have perceived activities in the restricted preparation (e.g., certification, English exam) as determinant factors for success in employment and suffered form job-seeking stress. On the basis of these findings, current unemployment matters are totally attributed to the absence of pursing the value of vocation and job in individuals, organizations, and society. Concretely, job seekers are preoccupied with occupational prestige in social aspect and have undecided vocational value. On the other hand, most companies have no perception of the importance of human resources and have overlooked the needs for proper work environment development in respect of stimulating individual motivation. The attempt in this study to reinterpret the effect of environment as for classifying job applicant's interests in reference to linguistic and psychological theories not only helps conduct a more comprehensive meaning for understanding social matters, but guides new directions for future research on job applicant's psychological factors (e.g., attitudes, motivation) using topic analysis.

  • PDF