• Title/Summary/Keyword: 토픽 추출

Search Result 213, Processing Time 0.022 seconds

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

Predicting Bug Severity by utilizing Topic Model and Bug Report Meta-Field (토픽 모델과 버그 리포트 메타 필드를 이용한 버그 심각도 예측 방법)

  • Yang, Geunseok;Lee, Byungjeong
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.9
    • /
    • pp.616-621
    • /
    • 2015
  • Recently developed software systems have many components, and their complexity is thus increasing. Last year, about 375 bug reports in one day were reported to a software repository in Eclipse and Mozilla open source projects. With so many bug reports submitted, developers' time and efforts have increased unnecessarily. Since the bug severity is manually determined by quality assurance, project manager or other developers in the general bug fixing process, it is biased to them. They might also make a mistake on the manual decision because of the large number of bug reports. Therefore, in this study, we propose an approach of bug severity prediction to solve these problems. First, we find similar topics within a new bug report and reduce the candidate reports of the topic by using the meta field of the bug report. Next, we train the reduced reports by applying Naive Bayes Multinomial. Finally, we predict the severity of the new bug report. We compare our approach with other prediction algorithms by using bug reports in open source projects. The results show that our approach better predicts bug severity than other algorithms.

A Study on Search Query Topics and Types using Topic Modeling and Principal Components Analysis (토픽모델링 및 주성분 분석 기반 검색 질의 유형 분류 연구)

  • Kang, Hyun-Ah;Lim, Heui-Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.223-234
    • /
    • 2021
  • Recent advances in the 4th Industrial Revolution have accelerated the change of the shopping behavior from offline to online. Search queries show customers' information needs most intensively in online shopping. However, there are not many search query research in the field of search, and most of the prior research in the field of search query research has been studied on a limited topic and data-based basis based on researchers' qualitative judgment. To this end, this study defines the type of search query with data-based quantitative methodology by applying machine learning to search research query field to define the 15 topics of search query by conducting topic modeling based on search query and clicked document information. Furthermore, we present a new classification system of new search query types representing searching behavior characteristics by extracting key variables through principal component analysis and analyzing. The results of this study are expected to contribute to the establishment of effective search services and the development of search systems.

Analyzing Factors of Success of Film Using Big Data : Focusing on the SNS Utilization Index and Topic Keywords of the Film (빅데이터를 활용한 영화흥행 요인 분석: 영화 <기생충>의 SNS 활용지수와 토픽키워드 중심으로)

  • Kim, Jin-Wook
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.4
    • /
    • pp.145-153
    • /
    • 2020
  • In the rapidly changing era of the fourth industry, big data is being used in various fields. In recent years, the use of big data has been rapidly applied to overall cultural and artistic contents, and among them, the use of big data is essential as a film genre with a lot of capital. This research method is analyzed as the film , which won the Palme d'Or Prize of the 72nd Cannes Film Festival in 2019 and the works and directors' award at the Academy Awards. The analyzed value predicts the film's performance through opinion mining, which gives the value of the change and sensitivity of each data cycle, and extracts the utilization index and topic keywords of SNS such as Facebook and Twitter to reflect the audience's interest. Identify the factors. As such, if model performance and model development can be predicted through model analysis of film performance using big data, the efficiency of the film production process will be maximized while the risk of production cost and the risk of film failure will be minimized.

Identifying Research Trends in Big data-driven Digital Transformation Using Text Mining (텍스트마이닝을 활용한 빅데이터 기반의 디지털 트랜스포메이션 연구동향 파악)

  • Minjun, Kim
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.54-64
    • /
    • 2022
  • A big data-driven digital transformation is defined as a process that aims to innovate companies by triggering significant changes to their capabilities and designs through the use of big data and various technologies. For a successful big data-driven digital transformation, reviewing related literature, which enhances the understanding of research statuses and the identification of key research topics and relationships among key topics, is necessary. However, understanding and describing literature is challenging, considering its volume and variety. Establishing a common ground for central concepts is essential for science. To clarify key research topics on the big data-driven digital transformation, we carry out a comprehensive literature review by performing text mining of 439 articles. Text mining is applied to learn and identify specific topics, and the suggested key references are manually reviewed to develop a state-of-the-art overview. A total of 10 key research topics and relationships among the topics are identified. This study contributes to clarifying a systematized view of dispersed studies on big data-driven digital transformation across multiple disciplines and encourages further academic discussions and industrial transformation.

SNS Sentiment Analysis and Needmining for ICT Digital Transformation and Data Convergence Ecosystem Establishment in LEO Satellite Communications (저궤도 위성통신 분야의 ICT 디지털 전환과 데이터 융합 생태계 조성을 위한 SNS 감성분석과 니드마이닝)

  • Byeong-Hee Lee;Tae-Hyun Kim
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.12
    • /
    • pp.347-356
    • /
    • 2023
  • In the recent war between Ukraine and Russia, low-orbit satellite communication played a major role, and Korea laid a foothold for low-orbit satellite communication services with the successful launch of Nuri in May 2023 and entered a full-scale civilian space age competition. In order to create an ecosystem for ICT digital transformation and data convergence in the field of low-orbit satellite communication, this paper conducts user sentiment analysis by importing posts from Reddit, one of the world's SNS, and extracts need-related sentences through need mining to identify user needs, performs topic modeling to classify topics, and prepares an action plan according to these topics. We hope that this study will be used as a policy resource for the development and innovation of new business models in the field of low-orbit satellite communication, bridging the digital information gap and solving social problems, contributing to sustainable digital transformation and enhancing soft power.

The Trends of Eco-Friendly Textiles Using Big Data from Newspaper Articles (신문기사 빅데이터를 활용한 친환경 섬유의 추이에 관한 연구)

  • Nam Beom Cho;Choong Kwon Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.95-107
    • /
    • 2024
  • The development of environmentally friendly products and services has become a trend, and the development and utilization of eco-friendly textiles with economic value is gaining attention as a new business model. Analyzing and identifying trends and developments in eco-friendly textiles can provide important information and insights for various stakeholders such as companies, governments, and consumers to help them achieve sustainable growth. For this study, we collected and analyzed data from newspaper articles mainly covering the textile and fashion sector from 2000 to June 2023. A total of 12,331 articles containing the keyword 'eco-friendly textiles' were collected, and after performing morphological analysis on the extracted data, Latent Dirichlet Allocation and Dynamic Topic Modeling analysis were performed to identify topics by year. The results of the study are expected to provide strategic guidance and insights for the sustainable development of the textile industry, thereby helping to promote the research, development, and commercialization of eco-friendly textiles.

Comments Classification System using Support Vector Machines and Topic Signature (지지 벡터 기계와 토픽 시그너처를 이용한 댓글 분류 시스템 언어에 독립적인 댓글 분류 시스템)

  • Bae, Min-Young;En, Ji-Hyun;Jang, Du-Sung;Cha, Jeong-Won
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.263-266
    • /
    • 2009
  • Comments are short and not use spacing words or comma more than general document. We convert the 7-gram into 3-gram and select key features using topic signature. Topic signature is widely used for selecting features in document classification and summarization. We use the SVM(Support Vector Machines) as a classifier. From the result of experiments, we can see that the proposed method is outstanding over the previous methods. The proposed system can also apply to other languages.

  • PDF

Hashtag Analysis Scheme for Topic based Tweet Categorization (토픽 기반의 트윗 분류를 위한 해시태그 분석 기법)

  • Kim, Yongsung;Jun, Sanghoon;Rew, Jehyeok;Hwang, Eenjun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.737-740
    • /
    • 2014
  • 최근 SNS 사용자가 급증하면서 매우 다양하고 방대한 양의 글이 여러 종류의 SNS를 통해 생성되고 있다. 그중 트위터는 정보의 전달 및 확산에 상당히 유용한 도구로 사용되고 있다. 이러한 트위터의 사용자 트윗은 뉴스, 음악, 사진, 여행 등 다양한 형태로 등장한다. 또한 트위터는 해시태그라는 사용자 정의 태그를 사용하는데 이는 트윗의 키워드 및 핵심을 쉽게 표현할 수 있도록 해주는 효과적인 수단이다. 최근 상당히 많은 양의 트윗의 생성에도 불구하고 이를 다양한 카테고리별로 분류할 수 있는 연구가 많이 진행되지 않았다. 따라서 본 논문에서는 해시태그를 이용해 트윗의 핵심을 파악하고 수많은 트윗을 다양한 토픽별로 분류할 수 있는 기법을 제안한다. 우선 다양한 카테고리의 인기 해시태그가 포함된 트윗을 수집하고 수집한 트윗에서 해시태그별 키워드를 추출한다. 그리고 코사인 유사도를 통해 해시태그별 내용 유사도를 파악하여 각 카테고리 내의 해시태그가 얼마나 유사한 내용을 지니고 있는지 파악한다. 마지막으로 사용자 트윗이 입력되면 모든 카테고리와 유사도를 비교하여 가장 유사도가 높은 카테고리를 찾아 추천해준다. 제안된 기법을 바탕으로 프로토타입을 구현하고 실험을 통해 성능을 평가한다.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.