• Title/Summary/Keyword: topic

Search Result 4,580, Processing Time 0.042 seconds

Topic Masks for Image Segmentation

  • Jeong, Young-Seob;Lim, Chae-Gyun;Jeong, Byeong-Soo;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.12
    • /
    • pp.3274-3292
    • /
    • 2013
  • Unsupervised methods for image segmentation are recently drawing attention because most images do not have labels or tags. A topic model is such an unsupervised probabilistic method that captures latent aspects of data, where each latent aspect, or a topic, is associated with one homogeneous region. The results of topic models, however, usually have noises, which decreases the overall segmentation performance. In this paper, to improve the performance of image segmentation using topic models, we propose two topic masks applicable to topic assignments of homogeneous regions obtained from topic models. The topic masks capture the noises among the assigned topic assignments or topic labels, and remove the noises by replacements, just like image masks for pixels. However, as the nature of topic assignments is different from image pixels, the topic masks have properties that are different from the existing image masks for pixels. There are two contributions of this paper. First, the topic masks can be used to reduce the noises of topic assignments obtained from topic models for image segmentation tasks. Second, we test the effectiveness of the topic masks by applying them to segmented images obtained from the Latent Dirichlet Allocation model and the Spatial Latent Dirichlet Allocation model upon the MSRC image dataset. The empirical results show that one of the masks successfully reduces the topic noises.

A Study on the Research Topics and Trends in South Korea: Focusing on Particulate Matter (토픽모델링을 이용한 국내 미세먼지 연구 분류 및 연구동향 분석)

  • Park, Hyemin;Kim, Taeyong;Kwon, Daewoong;Heo, Junyong;Lee, Juyeon;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.873-885
    • /
    • 2022
  • The particulate matter (PM) has emerged as a hot topic around the world as it has been reported that PM is related to an increase in mortality and prevalence rates. In South Korea, the importance of PM has been recognized since the late 1990s, and various studies on PM have been conducted. This study investigated the PM research topics and trends for papers (D=2,764) published in Research Information Sharing Service (RISS) using topic modeling based on Latent Dirichlet Allocation (LDA). As a result, a total of 10 topics were identified in the whole papers, and the PM research topics were classified as 'PM reduction (Topic 1)', 'Government policy and management (Topic 2)', 'Characteristics of PM (Topic 3)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', 'Bio (Topic 6)', 'Traffic (Topic 7)', 'Asian dust (Topic 8)', 'Indoor PM (Topic 9)', 'Human risk (Topic 10)'. In particular, the proportion of papers on topics 'Government policy and management (Topic 2)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', and 'Bio (Topic 6)' to the toal number of papers increased over time (linear slope > 0). The results of this study provide the new literature review methodology related to particulate matter and the history and insight.

An Implementation of FRBR Model by Using Topic Maps (Topic Maps를 이용한 MARC데이터의 FRBR모델 구현에 관한 연구)

  • Lee, Hyun-Sil;Han, Sung-Kook
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.3 s.57
    • /
    • pp.289-306
    • /
    • 2005
  • As FRBR defines structural framework based on ER modeling for bibliographic data elements, an effective tool is required to implement FRBR model. In this paper, we present the implementation of FRBR model based on Topic Maps. To show the effectiveness of Topic Maps as the implantation language of FRBR, we implement FRBR model of MyongSungHwangHu by means of Topic Maps. We can ascertain that topic-association of Topic Maps conceptually harmonize with entity-relation of FRBR, which means that Topic Maps is suitable for the implementation of FRBR model.

A Method of Calculating Topic Keywords for Topic Labeling (토픽 레이블링을 위한 토픽 키워드 산출 방법)

  • Kim, Eunhoe;Suh, Yuhwa
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.3
    • /
    • pp.25-36
    • /
    • 2020
  • Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.

The Influence of Topic Exploration and Topic Relevance On Amplitudes of Endogenous ERP Components in Real-Time Video Watching (실시간 동영상 시청시 주제탐색조건과 주제관련성이 내재적 유발전위 활성에 미치는 영향)

  • Kim, Yong Ho;Kim, Hyun Hee
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.8
    • /
    • pp.874-886
    • /
    • 2019
  • To delve into the semantic gap problem of the automatic video summarization, we focused on an endogenous ERP responses at around 400ms and 600ms after the on-set of audio-visual stimulus. Our experiment included two factors: the topic exploration of experimental conditions (Topic Given vs. Topic Exploring) as a between-subject factor and the topic relevance of the shots (Topic-Relevant vs. Topic-Irrelevant) as a within-subject factor. For the Topic Given condition of 22 subjects, 6 short historical documentaries were shown with their video titles and written summaries, while in the Topic Exploring condition of 25 subjects, they were asked instead to explore topics of the same videos with no given information. EEG data were gathered while they were watching videos in real time. It was hypothesized that the cognitive activities to explore topics of videos while watching individual shots increase the amplitude of endogenous ERP at around 600 ms after the onset of topic relevant shots. The amplitude of endogenous ERP at around 400ms after the onset of topic-irrelevant shots was hypothesized to be lower in the Topic Given condition than that in the Topic Exploring condition. The repeated measure MANOVA test revealed that two hypotheses were acceptable.

A method of Multi-Layer Visualizations for XTM (XTM을 위한 다층적 시각화 방법)

  • 박영조;박호병;조용윤;유재우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.529-531
    • /
    • 2004
  • 웹 상에는 많은 자원들과 정보들이 존재한다. XML은 이러한 자원들과 정보들을 구조화하기 위해서 개발되었다. XTM(XML Topic Maps)은 XML의 형태로 자원들과 정보들에 의미를 부여할 수 있는 언어이다. XTM은 Topic과 Association을 이용해서 자원들과 정보들이 가진 의미를 표현한다 XTM상에서 나타나는 Topic과 Association은 매우 거대하고 다양하기 때문에 모든 Topic과 Association을 한꺼번에 표현하기 어렵다 또한, 사용자가 수백만개의 Topic과 Association에서 원하는 Topic과 Association을 찾기 어렵다. 따라서 이러한 문제점을 해결하기 위해서 다양한 시각화 방법이 연구되었다. 현재 Topic Maps을 표현할 때 트리, 그래프, 맵 등 하나의 구조를 이용해서 표현한다. 하지만 추상화정도에 따라 시각화 방법은 장ㆍ단점을 지닌다. 본 논문에서는 웹 상의 자원, 정보들과 의미 사이에 여러 계층이 존재하는 다층적 시각화를 제안한다. 각 계층은 독립적인 표현구조로 나타내어 추상화정도에 따라 최적화된 구조를 사용한다. 사용자는 자신이 원하는 Topic과 Association을 점진적 접근을 통해서 원하는 Topic과 Association을 검색할 수 있다. 또한 Topic이 Association의 member처럼 사용되는 경우, 시각적으로 Topic이 표현되면 Topic은 연결된 Association과 직접적인 연결을 갖는다.

  • PDF

A Study on Research Trend Analysis and Topic Class Prediction of Digital Transformation using Text Mining

  • Lee, JeeYoung
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.183-190
    • /
    • 2019
  • In the era of the Fourth Industrial Revolution, digital transformation, which means changes in all industrial structures, politics, economics and society as well as IT technology, is an important issue. It is difficult to know which research topic is being studied because digital transformation is being studied in various fields. Convergence research is possible because a research topic is studied in various fields such as computer science area and Decision science area. However, it is difficult to know the specific research status of the research topic. In this study, eight research topics were derived using the topic modeling technique of text mining for abstract of academic literature and the trend of each topic was analyzed. We also proposed to create a Topic-Word Proportions Table in the LDA based Topic modeling process to predict the topic of new literature. The results of this study are expected to contribute to advanced convergence research on topic of digital transformation. It is expected that the literature related to each research topic will be grasped and contribute to the design of a new convergence research.

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

Brand Personality of Global Automakers through Text Mining

  • Kim, Sungkuk
    • Journal of Korea Trade
    • /
    • v.25 no.2
    • /
    • pp.22-45
    • /
    • 2021
  • Purpose - This study aims to identify new attributes by analyzing reviews conducted by global automaker customers and to examine the influence of these attributes on satisfaction ratings in the U.S. automobile sales market. The present study used J.D. Power for customer responses, which is the largest online review site in the USA. Design/methodology - Automobile customer reviews are valid data available to analyze the brand personality of the automaker. This study collected 2,998 survey responses from automobile companies in the U.S. automobile sales market. Keyword analysis, topic modeling, and the multiple regression analysis were used to analyze the data. Findings - Using topic modeling, the author analyzed 2,998 responses of the U.S. automobile brands. As a result, Topic 1 (Competence), Topic 5 (Sincerity), and Topic 6 (Prestige) attributes had positive effects, and Topic 2 (Sophistication) had a negative effect on overall customer responses. Topic 4 (Conspicuousness) did not have any statistical effect on this research. Topic 1, Topic 5, and Topic 6 factors also show the importance of buying factors. This present study has contributed to identifying a new attribute, personality. These findings will help global automakers better understand the impacts of Topic 1, Topic 5, and Topic 6 on purchasing a car. Originality/value - Contrary to a traditional approach to brand analysis using questionnaire survey methods, this study analyzed customer reviews using text mining. This study is timely research since a big data analysis is employed in order to identify direct responses to customers in the future.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.