• Title/Summary/Keyword: document topic

Search Result 182, Processing Time 0.033 seconds

Document Clustering Using Reference Titles (인용문헌 표제를 이용한 문헌 클러스터링에 관한 연구)

  • Choi, Sang-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.241-252
    • /
    • 2010
  • Titles have been regarded as having effective clustering features, but they sometimes fail to represent the topic of a document and result in poorly generated document clusters. This study aims to improve the performance of document clustering with titles by suggesting titles in the citation bibliography as a clustering feature. Titles of original literature, titles in the citation bibliography, and an aggregation of both titles were adapted to measure the performance of clustering. Each feature was combined with three hierarchical clustering methods, within group average linkage, complete linkage, and Ward's method in the clustering experiment. The best practice case of this experiment was clustering document with features from both titles by within-groups average method.

A Study on Analysis of Topic Modeling using Customer Reviews based on Sharing Economy: Focusing on Sharing Parking (공유경제 기반의 고객리뷰를 이용한 토픽모델링 분석: 공유주차를 중심으로)

  • Lee, Taewon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.3
    • /
    • pp.39-51
    • /
    • 2020
  • This study will examine the social issues and consumer awareness of sharing parking through the method text mining. In this experiment, the topic by keyword was extracted and analyzed using TFIDF (Term frequency inverse document frequency) and LDA (Latent dirichlet allocation) technique. As a result of categorization by topic, citizens' complaints such as local government agreements, parking space negotiations, parking culture improvement, citizen participation, etc., played an important role in implementing shared parking services. The contribution of this study highly differentiated from previous studies that conducted exploratory studies using corporate and regional cases, and can be said to have a high academic contribution. In addition, based on the results obtained by utilizing the LDA analysis in this study, there is a practical contribution that it can be applied or utilized in establishing a sharing economy policy for revitalizing the local economy.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering (워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법)

  • Lee, Pil Won;Hwang, Yun Young;Choi, Jong Seok;Shin, Young Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.535-540
    • /
    • 2021
  • Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Extension and Case Analysis of Topic Modeling for Inductive Social Science Research Methodology (귀납적 사회과학연구 방법론을 위한 토픽모델링의 확장 및 사례분석)

  • Kim, Keun Hyung
    • The Journal of Information Systems
    • /
    • v.31 no.4
    • /
    • pp.25-45
    • /
    • 2022
  • Purpose In this paper, we propose the method to extend topic modeling techniques in order to derive data-based research hypotheses when establishing research hypotheses for social sciences, As a concept in contrast to the existing deductive hypothesis establishment methodology for the social science research, the topic modeling technique was expanded to enable the so-called inductive hypothesis establishment methodology, and an analysis case of the Seongsan Ilchulbong online review based on the proposed methodology was presented. Design/methodology/approach In this paper, an extension architecture and extension algorithm in the form of extending the existing topic modeling were proposed. The extended architecture and algorithm include data processing method based on topic ratio in document, correlation analysis and regression analysis of processed data for topics derived by existing topic modeling. In addition, in this paper, an analysis case of the online review of Seongsan Ilchulbong Peak was presented by applying the extended topic modeling algorithm. An exploratory analysis was performed on the Seongsan Ilchulbong online reviews through the basic text analysis. The data was transformed into 5-point scale to enable correlation and regression analysis based on the topic ratio in each online review. A regression analysis was performed using the derived topics as the independent variable and the review rating as the dependent variable, and hypotheses could be derived based on this, which enable the so-called inductive hypothesis establishment. Findings This paper is meaningful in that it confirmed the possibility of deriving a causal model and setting an inductive hypothesis through an extended analysis of topic modeling.

Phrase-based Topic and Sentiment Detection and Tracking Model using Incremental HDP

  • Chen, YongHeng;Lin, YaoJin;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.12
    • /
    • pp.5905-5926
    • /
    • 2017
  • Sentiments can profoundly affect individual behavior as well as decision-making. Confronted with the ever-increasing amount of review information available online, it is desirable to provide an effective sentiment model to both detect and organize the available information to improve understanding, and to present the information in a more constructive way for consumers. This study developed a unified phrase-based topic and sentiment detection model, combined with a tracking model using incremental hierarchical dirichlet allocation (PTSM_IHDP). This model was proposed to discover the evolutionary trend of topic-based sentiments from online reviews. PTSM_IHDP model firstly assumed that each review document has been composed by a series of independent phrases, which can be represented as both topic information and sentiment information. PTSM_IHDP model secondly depended on an improved time-dependency non-parametric Bayesian model, integrating incremental hierarchical dirichlet allocation, to estimate the optimal number of topics by incrementally building an up-to-date model. To evaluate the effectiveness of our model, we tested our model on a collected dataset, and compared the result with the predictions of traditional models. The results demonstrate the effectiveness and advantages of our model compared to several state-of-the-art methods.

A Study on the Research Trends in Int'l Trade Using Topic modeling (토픽모델링을 활용한 무역분야 연구동향 분석)

  • Jee-Hoon Lee;Jung-Suk Kim
    • Korea Trade Review
    • /
    • v.45 no.3
    • /
    • pp.55-69
    • /
    • 2020
  • This study examines the research trends and knowledge structure of international trade studies using topic modeling method, which is one of the main methodologies of text mining. We collected and analyzed English abstracts of 1,868 papers of three Korean major journals in the area of international trade from 2003 to 2019. We used the Latent Dirichlet Allocation(LDA), an unsupervised machine learning algorithm to extract the latent topics from the large quantity of research abstracts. 20 topics are identified without any prior human judgement. The topics reveal topographical maps of research in international trade and are representative and meaningful in the sense that most of them correspond to previously established sub-topics in trade studies. Then we conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics. We discovered 2 hot topics(internationalization capacity and performance of export companies, economic effect of trade) and 2 cold topics(exchange rate and current account, trade finance). Trade studies are characterized as a interdisciplinary study of three agendas(i.e. international economy, International Business, trade practice), and 20 topics identified can be grouped into these 3 agendas. From the estimated results of the study, we find that the Korean government's active pursuit of FTA and consequent necessity of capacity building in Korean export firms lie behind the popularity of topic selection by the Korean researchers in the area of int'l trade.

Comparing Social Media and News Articles on Climate Change: Different Viewpoints Revealed

  • Kang Nyeon Lee;Haein Lee;Jang Hyun Kim;Youngsang Kim;Seon Hong Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.2966-2986
    • /
    • 2023
  • Climate change is a constant threat to human life, and it is important to understand the public perception of this issue. Previous studies examining climate change have been based on limited survey data. In this study, the authors used big data such as news articles and social media data, within which the authors selected specific keywords related to climate change. Using these natural language data, topic modeling was performed for discourse analysis regarding climate change based on various topics. In addition, before applying topic modeling, sentiment analysis was adjusted to discover the differences between discourses on climate change. Through this approach, discourses of positive and negative tendencies were classified. As a result, it was possible to identify the tendency of each document by extracting key words for the classified discourse. This study aims to prove that topic modeling is a useful methodology for exploring discourse on platforms with big data. Moreover, the reliability of the study was increased by performing topic modeling in consideration of objective indicators (i.e., coherence score, perplexity). Theoretically, based on the social amplification of risk framework (SARF), this study demonstrates that the diffusion of the agenda of climate change in public news media leads to personal anxiety and fear on social media.

Extracting Logical Structure from Web Documents (웹 문서로부터 논리적 구조 추출)

  • Lee Min-Hyung;Lee Kyong-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1354-1369
    • /
    • 2004
  • This paper presents a logical structure analysis method which transforms Web documents into XML ones. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of topic-specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully compared with previous works. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

  • PDF