• 제목/요약/키워드: topic modeling analysis

검색결과 672건 처리시간 0.029초

Text Mining Analysis of News Articles Related to 'Space Hazard' ('우주 위험' 관련 뉴스 기사의 텍스트 마이닝 분석 연구)

  • Jo, Hoon;Sohn, Jungjoo
    • Journal of the Korean earth science society
    • /
    • 제43권1호
    • /
    • pp.224-235
    • /
    • 2022
  • This study aimed to confirm the status of media reports on space hazards using topic modeling analysis of media articles that are related to space hazards for the past 12 years. Therefore, Latent Dirichlet Allocation (LDA) analysis was performed by collecting over 1200 space hazards articles between 2010 and 2021 on solar storm, artificial space objects, and natural space objects from BIGKins news platform. The articles related to solar storm focused on three topics: the effect of solar explosion on satellites; effect of solar explosion on radio communication in Korea, centered on the Korean Space Weather Center; and relationship between aircrew and space radiation. The articles related to artificial space objects focused on three topics: the threat of space garbage to satellite and space stations and the transition of useful objects into space junk; the relationship between space garbage and humanity as shown in movies; and the effort of developed countries for tracking, monitoring, and disposing of space garbage. The articles related to natural space objects focused on two topics: International Space Agency's tracking and monitoring of near-Earth asteroids and the countermeasures of collisions, and the evolution and extinction of dinosaurs and mammals, with a focus on the collisions of asteroids or comets. Therefore, this study confirmed that domestic media play a role in conveying dangers of space hazards and arousing the attention of public using a total of eight themes in various fields such as society and culture, and derived education method and policy on space hazards.

Fault Localization for Self-Managing Based on Bayesian Network (베이지안 네트워크 기반에 자가관리를 위한 결함 지역화)

  • Piao, Shun-Shan;Park, Jeong-Min;Lee, Eun-Seok
    • The KIPS Transactions:PartB
    • /
    • 제15B권2호
    • /
    • pp.137-146
    • /
    • 2008
  • Fault localization plays a significant role in enormous distributed system because it can identify root cause of observed faults automatically, supporting self-managing which remains an open topic in managing and controlling complex distributed systems to improve system reliability. Although many Artificial Intelligent techniques have been introduced in support of fault localization in recent research especially in increasing complex ubiquitous environment, the provided functions such as diagnosis and prediction are limited. In this paper, we propose fault localization for self-managing in performance evaluation in order to improve system reliability via learning and analyzing real-time streams of system performance events. We use probabilistic reasoning functions based on the basic Bayes' rule to provide effective mechanism for managing and evaluating system performance parameters automatically, and hence the system reliability is improved. Moreover, due to large number of considered factors in diverse and complex fault reasoning domains, we develop an efficient method which extracts relevant parameters having high relationships with observing problems and ranks them orderly. The selected node ordering lists will be used in network modeling, and hence improving learning efficiency. Using the approach enables us to diagnose the most probable causal factor with responsibility for the underlying performance problems and predict system situation to avoid potential abnormities via posting treatments or pretreatments respectively. The experimental application of system performance analysis by using the proposed approach and various estimations on efficiency and accuracy show that the availability of the proposed approach in performance evaluation domain is optimistic.

A study on the method of deriving the cause of social issues based on causal sentences (인과관계문형 기반 사회이슈 발생원인 도출 방법 연구)

  • Lee, Namyeon;Lee, Jae Hyung
    • Journal of Digital Convergence
    • /
    • 제19권3호
    • /
    • pp.167-176
    • /
    • 2021
  • With development of big data analysis technology, many studies to find social issues using texts mining techniques have been conducted. In order to derive social issues, previous studies performed in a way that collects a large amount of text data from news or SNS, and then analyzes issues based on text mining techniques such as topic modeling and terms network analysis. Social issues are the results of various social phenomena and factors. However, since previous studies focused on deriving social issues that are results of various causes, there are limitations to revealing the cause of the issues. In order to effectively respond to social issues, it is necessary not only to derive social issues, but also to be able to identify the causes of social issues. In this study, in order to overcome these limitations, we proposed a method of deriving the factors that cause social issues from texts related to social issues based on the theory of part of Korean linguistics. To do this, we collected news data related to social issues for three years from 2017 to 2019 and proposed a methodology to find causes based causal sentences based on text mining techniques.

Keyword Analysis of Research on Consumption of Children and Adolescents Using Text Mining (텍스트마이닝을 활용한 아동, 청소년 대상 소비관련 연구 키워드 분석)

  • Jin, Hyun-Jeong
    • Journal of Korean Home Economics Education Association
    • /
    • 제33권4호
    • /
    • pp.1-13
    • /
    • 2021
  • The purpose of this study is to identify trends and potential themes of research on consumption of children and adolescents for 20 years by analyzing keywords. The keywords of 869 studies on consumption of children and adolescents published in journals listed in Korean Citation Index were analyzed using text mining techniques. The most frequent keywords were found in the order of youth, youth consumers, consumer education, conspicuous consumption, consumption behavior, and character. As a result of analyzing the frequency of keywords by dividing into five-year periods, it was confirmed that the frequency of consumer education was significantly higher betwn 2006 and 2010. Research on ethical consumption has been active since 2011, and research has been conducted on various topics instead of without a prominent keyword during the most recent 5-year period. Looking at the keywords based on the TF-IDF, the keywords related to the environment and the Internet were the main keywords between 2001 and 2005. From 2006 to 2010, the TF-IDF values of media use, advertisement education, and Internet items were high. From 2011 to 2015, fair trade, green growth, green consumption, North Korean defector youths, social media, and from 2016 to 2020, text mining, sustainable development education, maker education, and the 2015 revised curriculum appeared as important themes. As a result of topic modeling, eight topics were derived: consumer education, mass media/peer culture, rational consumption, Hallyu/cultural industry, consumer competency, economic education, teaching and learning method, and eco-friendly/ethical consumption. As a result of network analysis, it was found that conspicuous consumption and consumer education are important topics in consumption research of children and adolescents.

2023 Korea Digital Business Trend Study: Listening to Voices from Academia and Industry (2023 대한민국 디지털 비즈니스 트렌드 인식조사: 학계와 산업계의 다양한 목소리를 들어보다)

  • Heedong Yang;Hyunchul Ahn;Jung Lee;Hyunjeong Kang
    • Information Systems Review
    • /
    • 제25권1호
    • /
    • pp.189-212
    • /
    • 2023
  • This study uses various methods, including media analysis, expert interviews, and large-scale surveys, to derive notable digital business trends in 2023. Most trend studies have yet to deal with digital business trends in Korea. They also often have limitations in the objectivity of the results using unclear methods. On the other hand, this study emphasizes the validity of the results by collecting opinions from Korean digital business experts in various fields. First, Korean IT news articles were collected and analyzed through topic modeling analysis. Then, based on the results, interviews were conducted with 13 academic and industrial experts to derive 16 IT business trend candidates. Then, a survey was conducted on 210 experts to finalize the list of Korean IT business trends. Finally, to compare overseas and domestic views, we conducted an additional survey using the items developed by the Society for Information Management, SIM. This study is meaningful in that it drew prospects for digital business trends in consideration of the domestic business environment by scientifically converging various opinions of Korean digital business leaders. Our study contributes to developing strategies for IT technology and IT service business markets.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • 제24권2호
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제21권9호
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Movie Recommended System base on Analysis for the User Review utilizing Ontology Visualization (온톨로지 시각화를 활용한 사용자 리뷰 분석 기반 영화 추천 시스템)

  • Mun, Seong Min;Kim, Gi Nam;Choi, Gyeong cheol;Lee, Kyung Won
    • Design Convergence Study
    • /
    • 제15권2호
    • /
    • pp.347-368
    • /
    • 2016
  • Recently, researches for the word of mouth(WOM) imply that consumers use WOM informations of products in their purchase process. This study suggests methods using opinion mining and visualization to understand consumers' opinion of each goods and each markets. For this study we conduct research that includes developing domain ontology based on reviews confined to "movie" category because people who want to have watching movie refer other's movie reviews recently, and it is analyzed by opinion mining and visualization. It has differences comparing other researches as conducting attribution classification of evaluation factors and comprising verbal dictionary about evaluation factors when we conduct ontology process for analyzing. We want to prove through the result if research method will be valid. Results derived from this study can be largely divided into three. First, This research explains methods of developing domain ontology using keyword extraction and topic modeling. Second, We visualize reviews of each movie to understand overall audiences' opinion about specific movies. Third, We find clusters that consist of products which evaluated similar assessments in accordance with the evaluation results for the product. Case study of this research largely shows three clusters containing 130 movies that are used according to audiences'opinion.

The Prediction of the Helpfulness of Online Review Based on Review Content Using an Explainable Graph Neural Network (설명가능한 그래프 신경망을 활용한 리뷰 콘텐츠 기반의 유용성 예측모형)

  • Eunmi Kim;Yao Ziyan;Taeho Hong
    • Journal of Intelligence and Information Systems
    • /
    • 제29권4호
    • /
    • pp.309-323
    • /
    • 2023
  • As the role of online reviews has become increasingly crucial, numerous studies have been conducted to utilize helpful reviews. Helpful reviews, perceived by customers, have been verified in various research studies to be influenced by factors such as ratings, review length, review content, and so on. The determination of a review's helpfulness is generally based on the number of 'helpful' votes from consumers, with more 'helpful' votes considered to have a more significant impact on consumers' purchasing decisions. However, recently written reviews that have not been exposed to many customers may have relatively few 'helpful' votes and may lack 'helpful' votes altogether due to a lack of participation. Therefore, rather than relying on the number of 'helpful' votes to assess the helpfulness of reviews, we aim to classify them based on review content. In addition, the text of the review emerges as the most influential factor in review helpfulness. This study employs text mining techniques, including topic modeling and sentiment analysis, to analyze the diverse impacts of content and emotions embedded in the review text. In this study, we propose a review helpfulness prediction model based on review content, utilizing movie reviews from IMDb, a global movie information site. We construct a review helpfulness prediction model by using an explainable Graph Neural Network (GNN), while addressing the interpretability limitations of the machine learning model. The explainable graph neural network is expected to provide more reliable information about helpful or non-helpful reviews as it can identify connections between reviews.

Analysis of Dog-Related Outdoor Public Space Conflicts Using Complaint Data (민원 자료를 활용한 반려견 관련 옥외 공공공간 갈등 분석)

  • Yoo, Ye-seul;Son, Yong-Hoon;Zoh, Kyung-Jin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • 제52권1호
    • /
    • pp.34-45
    • /
    • 2024
  • Companion animals are increasingly being recognized as members of society in outdoor public spaces. However, the presence of dogs in cities has become a subject of conflict between pet owners and non-pet owners, causing problems in terms of hygiene and noise. This study was conducted to analyze public complaint data using the keywords 'dog,' 'pet,' and 'puppy' through text mining techniques to identify the causes of conflicts in outdoor public spaces related to dogs and to identify key issues. The main findings of the study are as follows. First, the majority of dog-related complaints were related to the use of outdoor public spaces. Second, different types of outdoor public spaces have different spatial issues. Third, there were a total of four topics of dog-related complaints: 'Requesting a dog playground', 'Raising safety issues related to animals', 'Using facilities other than dog-only areas', and 'Requesting increased park management and enforcement related to pet tickets'. This study analyzed the perceptions of citizens surrounding pets at a time when the creation and use of public spaces related to pets are expanding. In particular, it is significant in that it applied a new method of collecting public opinions by adopting complaint data that clearly presents problems and requests.