• 제목/요약/키워드: topic extraction

Search Result 123, Processing Time 0.032 seconds

Analysis System for SNS Issues per Country based on Topic Model (토픽 모델 기반의 국가 별 SNS 관심 이슈 분석 시스템)

  • Kim, Seong Hoon;Yoon, Ji Won
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1201-1209
    • /
    • 2016
  • As the use of SNS continues to increase, various related studies have been conducted. According to the effectiveness of the topic model for existing theme extraction, a huge number of related research studies on topic model based analysis have been introduced. In this research, we suggested an automation system to analyze topics of each country and its distribution in twitter by combining world map visualization and issue matching method. The core system components are the following three modules; 1) collection of tweets and classification by nation, 2) extraction of topics and distribution by country based on topic model algorithm, and 3) visualization of topics and distribution based on Google geochart. In experiments with USA and UK, we could find issues of the two nations and how they changed. Based on these results, we could analyze the differences of each nation's position on ISIS problem.

Dynamic Expansion of Semantic Dictionary for Topic Extraction in Automatic Summarization (자동요약의 주제어 추출을 위한 의미사전의 동적 확장)

  • Choo, Kyo-Nam;Woo, Yo-Seob
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.241-247
    • /
    • 2009
  • This paper suggests the expansion methods of semantic dictionary, taking Korean semantic features account. These methods will be used to extract a practical topic word in the automatic summarization. The first is the method which is constructed the synonym dictionary for improving the performance of semantic-marker analysis. The second is the method which is extracted the probabilistic information from the subcategorization dictionary for resolving the syntactic and semantic ambiguity. The third is the method which is predicted the subcategorization patterns of the unregistered predicate, for the resolution of an affix-derived predicate.

  • PDF

Document Summarization using Topic Phrase Extraction and Query-based Summarization (주제어구 추출과 질의어 기반 요약을 이용한 문서 요약)

  • 한광록;오삼권;임기욱
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.488-497
    • /
    • 2004
  • This paper describes the hybrid document summarization using the indicative summarization and the query-based summarization. The learning models are built from teaming documents in order to extract topic phrases. We use Naive Bayesian, Decision Tree and Supported Vector Machine as the machine learning algorithm. The system extracts topic phrases automatically from new document based on these models and outputs the summary of the document using query-based summarization which considers the extracted topic phrases as queries and calculates the locality-based similarity of each topic phrase. We examine how the topic phrases affect the summarization and how many phrases are proper to summarization. Then, we evaluate the extracted summary by comparing with manual summary, and we also compare our summarization system with summarization mettled from MS-Word.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

Topic Based Hierarchical Network Analysis for Entrepreneur Using Text Mining (텍스트 마이닝을 이용한 주제기반의 기업인 네트워크 계층 분석)

  • Lee, Donghun;Kim, Yonghwa;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.33-49
    • /
    • 2018
  • The importance of convergence activities among business is increasing due to the necessity of designing and developing new products to satisfy various customers' needs. In particular, decision makers such as CEOs are required to participate in networks between entrepreneurs for being connected with valuable convergence partners. Moreover, it is important for entrepreneurs not only to make a large number of network connections, but also to understand the networking relationship with entrepreneurs with similar topic information. However, there is a difficult limit in collecting the topic information that can show the lack of current status of business and the technology and characteristics of entrepreneur in industry sector. In this paper, we solve these problems through the topic extraction method and analyze the business network in three aspects. Specifically, there are C, S, T-Layer models, and each model analyzes amount of entrepreneurs relationship, network centrality, and topic similarity. As a result of experiments using real data, entrepreneur need to activate network by connecting high centrality entrepreneur when the corporate relationship is low. In addition, we confirmed through experiments that there is a need to activate the topic-based network when topic similarity is low between entrepreneurs.

Topic Analysis of Science and Technology Articles using CiteSeer Corpus (CiteSeer 말뭉치를 이용한 과학기술 문헌의 주제 분석)

  • Jung, Han-Min;Kang, In-Su;Sung, Won-Kyung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.507-511
    • /
    • 2008
  • There have been enormous technological advances in science & technology domain and frequent convergences between its sub-domains. Topic analysis with science & technology corpus is a key process to grasp topic trends and relations between topics. The main objective of this research is to show various analytic approaches with topics extracted from CiteSeer corpus, which is widely used in information technology domain. This paper will also show a case study of Onto-Frame, an R&D support system developed by KISTI, to reveal the role of topics on the system.

A Study on Graph-based Topic Extraction from Microblogs (마이크로블로그를 통한 그래프 기반의 토픽 추출에 관한 연구)

  • Choi, Don-Jung;Lee, Sung-Woo;Kim, Jae-Kwang;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.564-568
    • /
    • 2011
  • Microblogs became popular information delivery ways due to the spread of smart phones. They have the characteristic of reflecting the interests of users more quickly than other medium. Particularly, in case of the subject which attracts many users, microblogs can supply rich information originated from various information sources. Nevertheless, it has been considered as a hard problem to obtain useful information from microblogs because too much noises are in them. So far, various methods are proposed to extract and track some subjects from particular documents, yet these methods do not work effectively in case of microblogs which consist of short phrases. In this paper, we propose a graph-based topic extraction and partitioning method to understand interests of users about a certain keyword. The proposed method contains the process of generating a keyword graph using the co-occurrences of terms in the microblogs, and the process of splitting the graph by using a network partitioning method. When we applied the proposed method on some keywords. our method shows good performance for finding a topic about the keyword and partitioning the topic into sub-topics.

Automatic Extraction Techniques of Topic-relevant Visual Shots Using Realtime Brainwave Responses (실시간 뇌파반응을 이용한 주제관련 영상물 쇼트 자동추출기법 개발연구)

  • Kim, Yong Ho;Kim, Hyun Hee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1260-1274
    • /
    • 2016
  • To obtain good summarization algorithms, we need first understand how people summarize videos. 'Semantic gap' refers to the gap between semantics implied in video summarization algorithms and what people actually infer from watching videos. We hypothesized that ERP responses to real time videos will show either N400 effects to topic-irrelevant shots in the 300∼500ms time-range after stimulus on-set or P600 effects to topic-relevant shots in the 500∼700ms time range. We recruited 32 participants in the EEG experiment, asking them to focus on the topic of short videos and to memorize relevant shots to the topic of the video. After analysing real time videos based on the participants' rating information, we obtained the following t-test result, showing N400 effects on PF1, F7, F3, C3, Cz, T7, and FT7 positions on the left and central hemisphere, and P600 effects on PF1, C3, Cz, and FCz on the left and central hemisphere and C4, FC4, P8, and TP8 on the right. A further 3-way MANOVA test with repeated measures of topic-relevance, hemisphere, and electrode positions showed significant interaction effects, implying that the left hemisphere at central, frontal, and pre-frontal positions were sensitive in detecting topic-relevant shots while watching real time videos.

Text Categorization using Topic Signature and Co-occurrence Features (Topic Signature와 동시 출현 단어 쌍을 이용한 문서 범주화)

  • Bae, Won-Sik;Han, Yo-Sub;Cha, Jeong-Won
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.262-267
    • /
    • 2008
  • 본 논문에서는 문서 내에서 동시에 출현하는 단어 쌍을 자질 추출 단위로 하는 문서 범주화 시스템에 대하여 기술한다. 자질 추출 단위를 단어 쌍으로 정의한 것은 문서에서 빈번하게 동시에 출현하는 단어들은 서로 연관관계가 높으며, 단어 하나보다는 연관관계가 높은 단어들의 쌍이 특정 범주의 문서에서만 나타날 확률이 높아지므로 문서 분류 능력을 높이는데 좋은 요인으로 작용할 수 있을 것이라는 가정 때문이다. 그리고 문서 요약 분야에서 제안된 Log-likelihood Ratio를 기반으로 하는 Topic Signature Term Extraction 방법을 사용하여 자질 추출을 하고, Naive Bayes 분류기를 이용하여 문서를 분류한다. 본 연구는 Reuters-21578 문서 집합을 이용한 성능평가에서 좋은 결과를 보였으며, 이는 앞으로의 연구에도 기여할 수 있을 것이라 기대한다.

  • PDF

Research trends over 10 years (2010-2021) in infant and toddler rearing behavior by family caregivers in South Korea: text network and topic modeling

  • In-Hye Song;Kyung-Ah Kang
    • Child Health Nursing Research
    • /
    • v.29 no.3
    • /
    • pp.182-194
    • /
    • 2023
  • Purpose: This study analyzed research trends in infant and toddler rearing behavior among family caregivers over a 10-year period (2010-2021). Methods: Text network analysis and topic modeling were employed on data collected from relevant papers, following the extraction and refinement of semantic morphemes. A semantic-centered network was constructed by extracting words from 2,613 English-language abstracts. Data analysis was performed using NetMiner 4.5.0. Results: Frequency analysis, degree centrality, and eigenvector centrality all revealed the terms ''scale," ''program," and ''education" among the top 10 keywords associated with infant and toddler rearing behaviors among family caregivers. The keywords extracted from the analysis were divided into two clusters through cohesion analysis. Additionally, they were classified into two topic groups using topic modeling: "program and evaluation" (64.37%) and "caregivers' role and competency in child development" (35.63%). Conclusion: The roles and competencies of family caregivers are essential for the development of infants and toddlers. Intervention programs and evaluations are necessary to improve rearing behaviors. Future research should determine the role of nurses in supporting family caregivers. Additionally, it should facilitate the development of nursing strategies and intervention programs to promote positive rearing practices.