• Title/Summary/Keyword: Topic vector

Search Result 70, Processing Time 0.024 seconds

A Semantic Aspect-Based Vector Space Model to Identify the Event Evolution Relationship within Topics

  • Xi, Yaoyi;Li, Bicheng;Liu, Yang
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.2
    • /
    • pp.73-82
    • /
    • 2015
  • Understanding how the topic evolves is an important and challenging task. A topic usually consists of multiple related events, and the accurate identification of event evolution relationship plays an important role in topic evolution analysis. Existing research has used the traditional vector space model to represent the event, which cannot be used to accurately compute the semantic similarity between events. This has led to poor performance in identifying event evolution relationship. This paper suggests constructing a semantic aspect-based vector space model to represent the event: First, use hierarchical Dirichlet process to mine the semantic aspects. Then, construct a semantic aspect-based vector space model according to these aspects. Finally, represent each event as a point and measure the semantic relatedness between events in the space. According to our evaluation experiments, the performance of our proposed technique is promising and significantly outperforms the baseline methods.

Comments Classification System using Support Vector Machines and Topic Signature (지지 벡터 기계와 토픽 시그너처를 이용한 댓글 분류 시스템 언어에 독립적인 댓글 분류 시스템)

  • Bae, Min-Young;En, Ji-Hyun;Jang, Du-Sung;Cha, Jeong-Won
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.263-266
    • /
    • 2009
  • Comments are short and not use spacing words or comma more than general document. We convert the 7-gram into 3-gram and select key features using topic signature. Topic signature is widely used for selecting features in document classification and summarization. We use the SVM(Support Vector Machines) as a classifier. From the result of experiments, we can see that the proposed method is outstanding over the previous methods. The proposed system can also apply to other languages.

  • PDF

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

Document Summarization using Topic Phrase Extraction and Query-based Summarization (주제어구 추출과 질의어 기반 요약을 이용한 문서 요약)

  • 한광록;오삼권;임기욱
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.488-497
    • /
    • 2004
  • This paper describes the hybrid document summarization using the indicative summarization and the query-based summarization. The learning models are built from teaming documents in order to extract topic phrases. We use Naive Bayesian, Decision Tree and Supported Vector Machine as the machine learning algorithm. The system extracts topic phrases automatically from new document based on these models and outputs the summary of the document using query-based summarization which considers the extracted topic phrases as queries and calculates the locality-based similarity of each topic phrase. We examine how the topic phrases affect the summarization and how many phrases are proper to summarization. Then, we evaluate the extracted summary by comparing with manual summary, and we also compare our summarization system with summarization mettled from MS-Word.

A Design of Topic-map based Traditional literature's Digital Ontology (토픽맵 기반의 고전문학 디지털 콘텐츠 온톨로지 설계)

  • Kim, Dong-Gun;Jeong, Hwa-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.4
    • /
    • pp.673-678
    • /
    • 2012
  • Traditional culture's researcher has attempting to public use as a various method. Example is design of digital archive and digital contents. However, in spite of this effort, traditional culture's researcher has difficulty to public use. Because traditional culture is hard to understand, and less interest than the other area. Especially, traditional culture has not environment that user can searching and using the culture's information due to difficult to search the data and layers. We propose a design to make an ontology using information profile for digital contents of traditional culture. Also, we use topic-map for the factors of ontology's relation, and specify their relation using topic vector.

Mobile Device and Virtual Storage-Based Approach to Automatically and Pervasively Acquire Knowledge in Dialogues (모바일 기기와 가상 스토리지 기술을 적용한 자동적 및 편재적 음성형 지식 획득)

  • Yoo, Kee-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.1-17
    • /
    • 2012
  • The Smartphone, one of essential mobile devices widely used recently, can be very effectively applied to capture knowledge on the spot by jointly applying the pervasive functionality of cloud computing. The process of knowledge capturing can be also effectively automated if the topic of knowledge is automatically identified. Therefore, this paper suggests an interdisciplinary approach to automatically acquire knowledge on the spot by combining technologies of text mining-based topic identification and cloud computing-based Smartphone. The Smartphone is used not only as the recorder to record knowledge possessor's dialogue which plays the role of the knowledge source, but also as the sensor to collect knowledge possessor's context data which characterize specific situations surrounding him or her. The support vector machine, one of well-known outperforming text mining algorithms, is applied to extract the topic of knowledge. By relating the topic and context data, a business rule can be formulated, and by aggregating the rule, the topic, context data, and the dictated dialogue, a set of knowledge is automatically acquired.

The method to Apply User Preference for On-line Shopping Mall: A Topic Map approach (온라인 쇼핑몰에서 사용자 선호도 적용 방법: 토픽맵 적용)

  • Jeong, Hwa-Young;Kim, Yoon-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.5
    • /
    • pp.925-930
    • /
    • 2011
  • In this paper, we propose a method to apply the purchase preference of a user in on-line shopping mall. To analyze the preference, we use topic preference vector. The topic is purchase count of products. In this structure, we construct the association the four factors; Purchase Hit meaning the purchase count of product, Count meaning the purchase count by other users in interesting product, Preference meaning product preference, and product meaning information of the product. By this structure and the method, we could show that proposed method displayed the product applying user preference, effectively.

Personalized Topic map Ranking Algorithm using the User Profile (사용자 프로파일을 이용한 개인화된 토픽맵 랭킹 알고리즘)

  • Park, Jung-Woo;Lee, Sang-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.522-528
    • /
    • 2008
  • Topic map typically provide information to user through the selection of topics, that is using only topic, association, occurrence on the first topicmap which is made by domain expert without regard to individual interests or context, for the purpose of supplementation for the weakness which is providing personalized topic map information, personalization has been studied for supporting user preference through preseting of customize, filtering, scope, etc in topic map. Nevertheless, personalization in current topicmap is not enough to user so far. In this paper, we propose a design of PTRS(personalized topicmap ranking system) & algorithm, using both user profile(click through data) and basic element of topic map(topic, association) on knowledge layer in specific domain topicmap, therefore User has strong point that is improvement of personal facilities to user through representation of ranked topicmap information in consideration of user preference using PTRS.

Method of Extracting the Topic Sentence Considering Sentence Importance based on ELMo Embedding (ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출 방법)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.39-46
    • /
    • 2021
  • This study is about a method of extracting a summary from a news article in consideration of the importance of each sentence constituting the article. We propose a method of calculating sentence importance by extracting the probabilities of topic sentence, similarity with article title and other sentences, and sentence position as characteristics that affect sentence importance. At this time, a hypothesis is established that the Topic Sentence will have a characteristic distinct from the general sentence, and a deep learning-based classification model is trained to obtain a topic sentence probability value for the input sentence. Also, using the pre-learned ELMo language model, the similarity between sentences is calculated based on the sentence vector value reflecting the context information and extracted as sentence characteristics. The topic sentence classification performance of the LSTM and BERT models was 93% accurate, 96.22% recall, and 89.5% precision, resulting in high analysis results. As a result of calculating the importance of each sentence by combining the extracted sentence characteristics, it was confirmed that the performance of extracting the topic sentence was improved by about 10% compared to the existing TextRank algorithm.