• Title/Summary/Keyword: topic extraction

Search Result 123, Processing Time 0.059 seconds

Recovery of the connection relationship among planar objects

  • Yao, Fenghui;Shao, Guifeng;T amaki, Akikazu;Kato, Kiyoshi
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10a
    • /
    • pp.430-433
    • /
    • 1996
  • The shape of an object plays a very important role in pattern analysis and classification. Roughly, the researches on this topic can be classified into three fields, i.e. (i) edge detection, (ii) dominant points extraction, and (iii) shape recognition and classification. Many works have been done in these three fields. However, it is very seldom to see the research that discusses the connection relationship of objects. This problem is very important in robot assembly systems. Therefore, here we focus on this problem and discuss how to recover the connection relationship of planar objects. Our method is based on the partial curve identification algorithm. The experiment results show the efficiency and validity of this method.

  • PDF

Illumination-Robust Foreground Extraction for Text Area Detection in Outdoor Environment

  • Lee, Jun;Park, Jeong-Sik;Hong, Chung-Pyo;Seo, Yong-Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.345-359
    • /
    • 2017
  • Optical Character Recognition (OCR) that has been a main research topic of computer vision and artificial intelligence now extend its applications to detection of text area from video or image contents taken by camera devices and retrieval of text information from the area. This paper aims to implement a binarization algorithm that removes user intervention and provides robust performance to outdoor lights by using TopHat algorithm and channel transformation technique. In this study, we particularly concentrate on text information of outdoor signboards and validate our proposed technique using those data.

A Study on Automatic Analysis System of National Defense Articles (국방 기사 자동 분석 시스템 구축 방안 연구)

  • Kim, Hyunjung;Kim, Wooju
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.21 no.1
    • /
    • pp.86-93
    • /
    • 2018
  • Since media articles, which have a great influence on public opinion, are transmitted to the public through various media, it is very difficult to analyze them manually. There are many discussions on methods that can collect, process, and analyze documents in the academia, but this is mostly done in the areas related to politics and stocks, and national-defense articles are poorly researched. In this study, we will explain how to build an automatic analysis system of national defense articles that can collect information on defense articles automatically, and can process information quickly by using topic modeling with LDA, emotional analysis, and extraction-based text summarization.

Segmentation and Classification of Lidar data

  • Tseng, Yi-Hsing;Wang, Miao
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.153-155
    • /
    • 2003
  • Laser scanning has become a viable technique for the collection of a large amount of accurate 3D point data densely distributed on the scanned object surface. The inherent 3D nature of the sub-randomly distributed point cloud provides abundant spatial information. To explore valuable spatial information from laser scanned data becomes an active research topic, for instance extracting digital elevation model, building models, and vegetation volumes. The sub-randomly distributed point cloud should be segmented and classified before the extraction of spatial information. This paper investigates some exist segmentation methods, and then proposes an octree-based split-and-merge segmentation method to divide lidar data into clusters belonging to 3D planes. Therefore, the classification of lidar data can be performed based on the derived attributes of extracted 3D planes. The test results of both ground and airborne lidar data show the potential of applying this method to extract spatial features from lidar data.

  • PDF

Walking Features Detection for Human Recognition

  • Viet, Nguyen Anh;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.787-795
    • /
    • 2008
  • Human recognition on camera is an interesting topic in computer vision. While fingerprint and face recognition have been become common, gait is considered as a new biometric feature for distance recognition. In this paper, we propose a gait recognition algorithm based on the knee angle, 2 feet distance, walking velocity and head direction of a person who appear in camera view on one gait cycle. The background subtraction method firstly use for binary moving object extraction and then base on it we continue detect the leg region, head region and get gait features (leg angle, leg swing amplitude). Another feature, walking speed, also can be detected after a gait cycle finished. And then, we compute the errors between calculated features and stored features for recognition. This method gives good results when we performed testing using indoor and outdoor landscape in both lateral, oblique view.

  • PDF

Keyword identifications on dimensions for service quality of Healthcare providers (헬스케어 서비스 리뷰를 활용한 서비스 품질 차원 별 중요 단어 파악 방안)

  • Lee, Hong Joo
    • Knowledge Management Research
    • /
    • v.19 no.4
    • /
    • pp.171-185
    • /
    • 2018
  • Studies on online review have carried out analysis of the rating and topic as a whole. However, it is necessary to analyze opinions on various dimensions of service quality. This study classifies reviews of healthcare services into service quality dimensions, and proposes a method to identify words that are mainly referred to in each dimension. Service quality was based on the dimensions provided by SERVQUAL, and patient reviews have collected from NHSChoice. The 2,000 sentences sampled were classified into service quality dimension of SERVQUAL and a method of extracting important keywords from sentences by service quality dimension was suggested. The RAKE algorithm is used to extract key words from a single document and an index is considered to consider frequently used words in various documents. Since we need to identify key words in various reviews, we have considered frequency and discrimination (IDF) at the same time, rather than identifying key words based only on the RAKE score. In SERVQUAL dimension, we identified the words that patients mentioned mainly, and also identified the words that patients mainly refer to by review rating.

Modeling Topic Extraction-based Sentiment Analysis Based on User Reviews

  • Kim, Tae-Yeun
    • Journal of Integrative Natural Science
    • /
    • v.14 no.2
    • /
    • pp.35-40
    • /
    • 2021
  • In this paper, we proposed a multi-subject-level sentiment analysis model for user reviews using the Latent Dirichlet Allocation (LDA) method targeting user-generated content (UGC). Data were collected from users' online reviews of hotels in major tourist cities in the world, and 30 hotel-related topics were extracted using the entire user reviews through the LDA technique. Six major hotel-related themes (Cleanliness, Location, Rooms, Service, Sleep Quality, and Value) were selected from the extracted themes, and emotions were evaluated for sentences corresponding to six themes in each user review in the proposed sentiment analysis model. Sentiment was analyzed using a dictionary. In addition, the performance of the proposed sentiment analysis model was evaluated by comparing the emotional values for each subject in the user reviews and the detailed scores evaluated by the user directly for each hotel attribute. As a result of analyzing the values of accuracy and recall of the proposed sentiment analysis model, it was analyzed that the efficiency was high.

A System for Keyword Extraction and Keyword-based Sentiment Analysis for Topic Analysis in Discussion (토론 대화에서의 토픽 분석을 위한 키워드 추출 및 키워드 기반 감성분석 시스템)

  • Yong-Bin Jeong;Yu-Jin Oh;Jae-Wan Park;Sae-Mi Jang;Young-Gyun Hahm
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.164-169
    • /
    • 2022
  • 토픽 모델링은 비즈니스 분석이나 기술 동향 파악 등 다방면에서 많이 사용되고 있는 기술이다. 하지만 대표적인 방법인 LDA와 같은 비지도학습의 경우, 그 알고리즘 구조상 문서의 수가 많을 때 토픽 모델링이 가능하다. 본 논문에서는 문서의 수가 적은 경우도, 키워드 및 키프레이즈를 이용한 군집화를 통해 토픽 모델링을 하고 감성분석을 통해 토픽에 대한 분석도 제시하였다. 이에 필요한 데이터 제작 및 키워드 추출, 키워드 기반 감성분석, 키워드 임베딩 및 군집화를 구현하였고, 결과를 정성적으로 보았을 때 유의미한 분석이 되는 것을 확인하였다.

  • PDF

Keyword Extraction Technique for Attractions using Online Reviews - Topic Modeling and Markov Chain (온라인 리뷰를 활용한 관광지 키워드 추출 기법 - 토픽 모델링과 Markov Chain)

  • Kim, MyeongSeon;Lee, KangWoo;Lim, JiWon;Hong, Soon-Goo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.521-523
    • /
    • 2021
  • 관광 분야에서 온라인 리뷰의 중요성이 커지고 있다. 온라인 리뷰의 텍스트 데이터는 파악이 어렵다. 이에 본 연구에서는 특정 관광지에 대한 온라인 리뷰 텍스트 데이터가 나타내는 전반적인 의견을 직관적으로 도출하는 방법에 대해 알아보고자, 토픽 모델링과 Markov Chain을 시행했다. '해운대'에 대한 온라인 리뷰를 수집한 후, LDA와 BTM을 활용하여 주제를 도출하고, Markov Chain을 시각화하여 키워드 간의 관계와 전체적인 평가 내용을 확인했다. 사용된 기법은 각자 특징적인 결과를 제시했기 때문에 다양한 기법을 상보적으로 이용하기를 제안하였다.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.