• Title/Summary/Keyword: topic mining

Search Result 503, Processing Time 0.026 seconds

Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO (텍스트마이닝을 통한 최고경영자 대상 이러닝 콘텐츠 트렌드 분석)

  • Kyung-Hoon Kim;Myungsin Chae;Byungtae Lee
    • Information Systems Review
    • /
    • v.19 no.2
    • /
    • pp.1-19
    • /
    • 2017
  • Original scripts of e-learning lectures for the CEOs of corporation S were analyzed using topic analysis, which is a text mining method. Twenty-two topics were extracted based on the keywords chosen from five-year records that ranged from 2011 to 2015. Research analysis was then conducted on various issues. Promising topics were selected through evaluation and element analysis of the members of each topic. In management and economics, members demonstrated high satisfaction and interest toward topics in marketing strategy, human resource management, and communication. Philosophy, history of war, and history demonstrated high interest and satisfaction in the field of humanities, whereas mind health showed high interest and satisfaction in the field of in lifestyle. Studies were also conducted to identify topics on the proportion of content, but these studies failed to increase member satisfaction. In the field of IT, educational content responds sensitively to change of the times, but it may not increase the interest and satisfaction of members. The present study found that content production for CEOs should draw out deep implications for value innovation through technology application instead of simply ending the technical aspect of information delivery. Previous studies classified contents superficially based on the name of content program when analyzing the status of content operation. However, text mining can derive deep content and subject classification based on the contents of unstructured data script. This approach can examine current shortages and necessary fields if the service contents of the themes are displayed by year. This study was based on data obtained from influential e-learning companies in Korea. Obtaining practical results was difficult because data were not acquired from portal sites or social networking service. The content of e-learning trends of CEOs were analyzed. Data analysis was also conducted on the intellectual interests of CEOs in each field.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

The Analysis of Changes in East Coast Tourism using Topic Modeling (토핑 모델링을 활용한 동해안 관광의 변화 분석)

  • Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.6
    • /
    • pp.489-495
    • /
    • 2020
  • The amount of data is increasing through various IT devices in a hyper-connected society where the 4th revolution is progressing, and new value can be created by analyzing that data. This paper was collected total 1,526 articles from 2017 to 2019 in central magazines, economic magazines, regional associations, and major broadcasting companies with the keyword "(East Coast Tourism or East Coast Travel) and Gangwon-do" through Bigkinds. It was performed the topic modeling using LDA algorithm implemented in the R language to analyze the collected 1,526 articles. It was extracted keywords for each year from 2017 to 2019, and classified and compared keywords with high frequency for each year. It was setted the optimal number of topics to 8 using Log Likelihood and Perplexity, and then inferred 8 topics using the Gibbs Sampling method. The inferred topics were Gangneung and Beach, Goseong and Mt.Geumgang, KTX and Donghae-Bukbu line, weekend sea tour, Sokcho and Unification Observatory, Yangyang and Surfing, experience tour, and transportation network infra. The changes of articles on East coast tourism was was analyzed using the proportion of the inferred eight topics. As the result, the proportion of Unification Observatory and Mt. Geumgang showed no significant change, the proportion of KTX and experience tour increased, and the proportion of other topics decreased in 2018 compared to 2017. In 2019, the proportion of KTX and experience tour decreased, but the proportion of other topics showed no significant change.

Analysis of domestic and foreign future automobile research trends based on topic modeling (토픽모델링 기반의 국내외 미래 자동차 연구동향 비교 분석: CASE 키워드 중심으로)

  • Jeong, Ho Jeong;Kim, Keun-Wook;Kim, Na-Gyeong;Chang, Won-Jun;Jeong, Won-Oong;Park, Dae-Yeong
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.463-476
    • /
    • 2022
  • After industrialization in the past, the automobile industry has continued to grow centered on internal combustion engines, but is facing a major change with the recent 4th industrial revolution. Most companies are preparing for the transition to electric vehicles and autonomous driving. Therefore, in this study, topic modeling was performed based on LDA algorithm by collecting 4,002 domestic papers and 68,372 overseas papers that contain keywords related to CASE (Connectivity, Autonomous, Sharing, Electrification), which represent future automobile trends. As a result of the analysis, it was found that domestic research mainly focuses on macroscopic aspects such as traffic infrastructure, urban traffic efficiency, and traffic policy. Through this, the government's technical support for MaaS (Mobility-as-a-Service) is required in the domestic shared car sector, and the need for data opening by means of transportation was presented. It is judged that these analysis results can be used as basic data for the future automobile industry.

Analysis of Users' Sentiments and Needs for ChatGPT through Social Media on Reddit (Reddit 소셜미디어를 활용한 ChatGPT에 대한 사용자의 감정 및 요구 분석)

  • Hye-In Na;Byeong-Hee Lee
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.79-92
    • /
    • 2024
  • ChatGPT, as a representative chatbot leveraging generative artificial intelligence technology, is used valuable not only in scientific and technological domains but also across diverse sectors such as society, economy, industry, and culture. This study conducts an explorative analysis of user sentiments and needs for ChatGPT by examining global social media discourse on Reddit. We collected 10,796 comments on Reddit from December 2022 to August 2023 and then employed keyword analysis, sentiment analysis, and need-mining-based topic modeling to derive insights. The analysis reveals several key findings. The most frequently mentioned term in ChatGPT-related comments is "time," indicative of users' emphasis on prompt responses, time efficiency, and enhanced productivity. Users express sentiments of trust and anticipation in ChatGPT, yet simultaneously articulate concerns and frustrations regarding its societal impact, including fears and anger. In addition, the topic modeling analysis identifies 14 topics, shedding light on potential user needs. Notably, users exhibit a keen interest in the educational applications of ChatGPT and its societal implications. Moreover, our investigation uncovers various user-driven topics related to ChatGPT, encompassing language models, jobs, information retrieval, healthcare applications, services, gaming, regulations, energy, and ethical concerns. In conclusion, this analysis provides insights into user perspectives, emphasizing the significance of understanding and addressing user needs. The identified application directions offer valuable guidance for enhancing existing products and services or planning the development of new service platforms.

Analysis of Social Trends for Electric Scooters Using Dynamic Topic Modeling and Sentiment Analysis (동적 토픽 모델링과 감성 분석을 활용한 전동킥보드에 대한 사회적 동향 분석)

  • Kyoungok, Kim;Yerang, Shin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.19-30
    • /
    • 2023
  • An electric scooter(e-scooter), one popularized micro-mobility vehicle has shown rapidly increasing use in many cities. In South Korea, the use of e-scooters has greatly increased, as some companies have launched e-scooter sharing services in a few large cities, starting with Seoul in 2018. However, the use of e-scooters is still controversial because of issues such as parking and safety. Since the perception toward the means of transportation affects the mode choice, it is necessary to track the trends for electric scooters to make the use of e-scooters more active. Hence, this study aimed to analyze the trends related to e-scooters. For this purpose, we analyzed news articles related to e-scooters published from 2014 to 2020 using dynamic topic modeling to extract issues and sentiment analysis to investigate how the degree of positive and negative opinions in news articles had changed. As a result of topic modeling, it was possible to extract three different topics related to micro-mobility technologies, shared e-scooter services, and regulations for micro-mobility, and the proportion of the topic for regulations for micro-mobility increased as shared e-scooter services increased in recent years. In addition, the top positive words included quick, enjoyable, and easy, whereas the top negative words included threat, complaint, and ilegal, which implies that people satisfied with the convenience of e-scooter or e-scooter sharing services, but safety and parking issues should be addressed for micro-mobility services to become more active. In conclusion, this study was able to understand how issues and social trends related to e-scooters have changed, and to determine the issues that need to be addressed. Moreover, it is expected that the research framework using dynamic topic modeling and sentiment analysis will be helpful in determining social trends on various areas.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Development of Chatbot Using Q&A Data of SME(Small and Medium Enterprise) (소상공인들의 고객 문의 데이터를 활용한 문의응대 챗봇의 개발 및 도입)

  • Shin, Minchul;Kim, Sungguen;Rhee, Cheul
    • Journal of Information Technology Services
    • /
    • v.17 no.3
    • /
    • pp.17-36
    • /
    • 2018
  • In this study, we developed a chatbot (Dialogue agent) using small Q & A data and evaluated its performance. The chatbot developed in this study was developed in the form of an FAQ chatbot that responds promptly to customer inquiries. The development of chatbot was conducted in three stages : 1. Analysis and planning, 2. Content creation, 3. API and messenger interworking. During the analysis and planning phase, we gathered and analyzed the question data of the customers and extracted the topics and details of the customers' questions. In the content creation stage, we created scenarios for each topic and sub-items, and then filled out specific answers in consultation with business owners. API and messenger interworking is KakaoTalk. The performance of the chatbot was measured by the quantitative indicators such as the accuracy that the chatbot grasped the inquiry of the customer and correctly answered, and then the questionnaire survey was conducted on the chatbot users. As a result of the survey, it was found that the chatbot not only provided useful information to the users but positively influenced the image of the pension. This study shows that it is possible to develop chatbots by using easily obtainable data and commercial API regardless of the size of business. It also implies that we have verified the validity of the development process by verifying the performance of developed chatbots as well as an explicit process of developing FAQ chatbots.

Experimental Study for Effective Combination of Opinion Features (효과적인 의견 자질 결합을 위한 실험적 연구)

  • Han, Kyoung-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.227-239
    • /
    • 2010
  • Opinion retrieval is to retrieve items which are relevant to the user information need topically and include opinion about the topic. This paper aims to find a method to represent user information need for effective opinion retrieval and to analyze the combination methods for opinion features through various experiments. The experiments are carried out in the inference network framework using the Blogs06 collection and 100 TREC test topics. The results show that our suggested representation method based on hidden 'opinion' concept is effective, and the compact model with very small opinion lexicon shows the comparable performance to the previous model on the same test data set.

A Study on the Bucket Loading Characteristics for Wheel-loader Loading Automation (휠로더 굴착 자동화를 위한 버킷 부하특성 연구)

  • Seo, Dong-Kwan;Seo, Hyun-Jae;Kang, In-Pil;Kwon, Young-Min;Lee, Sang-Hoon;Hwang, Sung-Ho
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.11
    • /
    • pp.1332-1340
    • /
    • 2009
  • The front end wheel loader is widely used for the loading of materials in mining and construction fields. It has repetitive digging, loading and dumping procedures. The bucket is subjected to large resistance force from the soil during scooping. We considered the soil reaction force characteristics from scooping procedure, the protection by overload and automatic scooping mode algorithm. The main topic of this paper is the analysis of the soil reaction force characteristics. The analysis of soil mechanics is carried out and the developed soil model is verified by experimental results from the simplified experimental equipment. A simplified model of the soil shape and bucket trajectory is used to determine the scooping direction based on an estimation of the resistance force applied on the bucket during the scooping motion. In the future, this model will be used for the generation of an appropriate path for the wheel loader automation.