• Title/Summary/Keyword: 과학 텍스트

Search Result 607, Processing Time 0.027 seconds

Text Structuring using Centering Theory (중심화 이론을 이용한 텍스트 구조화)

  • Roh, Ji-Eun;Na, Seung-Hoon;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.6
    • /
    • pp.572-583
    • /
    • 2007
  • This paper investigates Centering-based metrics to evaluate ordering of utterances for text structuring. We point out a problem of MIN.NOCB metric which has been regarded as the simplest and best measure to evaluate coherence of ordering within Centering framework, and propose a new Centering-based metric, MAX.CPS as an alternative or supplementary one. This paper introduces a framework which pre-estimates the effectiveness of a metric on a given input ordering, and selects an applicable metric according to the pre-estimation result. Using this framework, we propose a new policy which can generate more optimal ordering within Centering framework. Moreover, we evaluate several kinds of Cf-ranking methods in terms of Centering-based metrics, and find that simply ranking entities by their linear order is generally the most suitable because of characteristics in Korean.

A Hypertext Categorization Method using Incrementally Computable Class Link Information (점진적으로 계산되는 분류정보와 링크정보를 이용한 하이퍼텍스트 문서 분류 방법)

  • Oh, Hyo-Jung;Myaeng, Sung-Hyoun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.7
    • /
    • pp.498-509
    • /
    • 2002
  • As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization il quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyerlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to tile improvement.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Variance Recovery in Text Detection using Color Variance Feature (색 분산 특징을 이용한 텍스트 추출에서의 손실된 분산 복원)

  • Choi, Yeong-Woo;Cho, Eun-Sook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.73-82
    • /
    • 2009
  • This paper proposes a variance recovery method for character strokes that can be missed in applying the previously proposed color variance approach in text detection of natural scene images. The previous method has a shortcoming of missing the color variance due to the fixed length of horizontal and vertical windows of variance detection when the character strokes are thick or long. Thus, this paper proposes a variance recovery method by using geometric information of bounding boxes of connected components and heuristic knowledge. We have tested the proposed method using various kinds of document-style and natural scene images such as billboards, signboards, etc captured by digital cameras and mobile-phone cameras. And we showed the improved text detection accuracy even in the images of containing large characters.

Descriptive Characteristics of the Label Texts Related to Earth Science: Toward Educationally Meaningful Communication (교육적으로 유의미한 의사소통을 위한 지구과학 관련 전시 라벨의 서술 특징)

  • Kim, Chan-Jong;Park, Eun-Ji;Yoon, Sae-Yeol;Lee, Sun-Kyung
    • Journal of the Korean earth science society
    • /
    • v.33 no.1
    • /
    • pp.94-109
    • /
    • 2012
  • The purpose of this study is to analyse the descriptive characteristics of the label texts related to Earth Science at a science museum and a natural history museum in Korea. The data were collected from Korean National Science Museum and Seodaemun Natural History Museum. The analysis framework was modified according to the Systemic Functional Linguistics. As a result, characteristics of the labels are 1) mostly declarative sentences, 2) appropriate amount of scientific information, and 3) mainly 'facts'. Moreover, all of the text genre are 4) 'logical expositions'. Particularly in Korean National Science Museum, the labels present 5) more scientific words among the entire terminologies and 6) more than half subjects omitted or long nominalized. Those results may imply that the labels can lead one-way communication regarding the culture of science rather than two-way. This study presents the descriptive characteristics of the label texts to make educationally meaningful communication possible by building an open structure between visitors' own culture in everyday life and the culture of science.

Library User Education using HyperCard (하이퍼 카드를 응용(應用)한 도서관 이용자(利用者) 교육(敎育))

  • Tak, Hae-Kyung
    • Journal of Information Management
    • /
    • v.25 no.3
    • /
    • pp.1-27
    • /
    • 1994
  • HyperCard used the concept of hypertext not only is the database management program and the educational medium used hypermedia but also provides the environment able to develop the educational software. In this paper, the concept and characteristics of HyperCard are reviewed, and the example applied HyperCard program to library user education are given.

  • PDF

A Ensemble Classification Method of Korean Standard Industry Code for Corporate Business Analysis (기업 비지니스 분석을 위한 한국표준산업코드 앙상블 분류)

  • Kyo-Joong Oh;Ho-Jin Choi;Jinwon Kim;Wonseok Cha;Ilgu Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.477-479
    • /
    • 2022
  • 본 논문에서는 기업 비즈니스 분석을 위해 한국표준산업분류에 근거하여 국내 사업체의 산업군을 분류하는 앙상블 분류 모델 구축 방법론을 제시한다. 기업 평가 및 보고서 자동화 시스템 구축을 위해 기업의 재무제표 정보, 기업등록부와 같은 신고 정보, 사업체 조사 정보에 포함된 텍스트 정보를 이용하여, 각 기업이 속해 있는 산업군 정보를 분석해야 하며, 이를 통해 동일한 산업군에 속해 있는 다른 기업에 대한 현황 파악 및 비교 등 비즈니스 정보를 분석할 수 있다.

  • PDF

EmoNSMC: Constructing Korean Emotion Tagging Dataset Using Distant Supervision (EmoNSMC: Distant Supervision 을 이용한 한국어 감정 태깅 데이터셋 구축)

  • Lee, Young-Jun;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.519-521
    • /
    • 2019
  • 최근 소셜 메신저를 통해 많은 사람들이 의사소통을 주고받음에 따라, 텍스트에서 감정을 파악하는 것이 중요하다. 따라서, 감정이 태깅된 데이터가 필요하다. 하지만, 기존 연구는 감정이 태깅된 데이터의 양이 많지가 않다. 이는 텍스트에서 감정을 파악하는데 성능 저하를 야기할 수 있다. 이를 해결하기 위해, 본 논문에서는 단어 매칭 방법과 형태소 매칭 방법을 이용하여 많은 양의 한국어 감정 태깅 데이터셋인 EmoNSMC 를 구축하였다. 구축한 데이터셋은 네이버 영화 감상 리뷰 데이터 (NSMC)에 디스턴트 수퍼비전 방법 (distant supervision) 방법을 적용하여 weak labeling을 진행하였고, 이 과정에서 한국어 감정 어휘 사전 (KTEA) 을 이용하였다. 구축된 데이터셋의 감정 분포 결과, 형태소 매칭 방법을 통해 구축한 데이터셋이 좀 더 감정 분포가 균등한 것을 확인할 수 있었다. 해당 데이터셋은 공개되어 있다.

  • PDF

A Study on the Analysis of ICT R&D using Text Mining Method: Focused on ICT Field and Smart City (텍스트 마이닝을 활용한 국가 R&D과제 동향 분석: ICT 분야와 스마트시티 중심으로)

  • Kim, Seong-soon;Yang, Myung-seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.462-465
    • /
    • 2021
  • 본 연구는 최근 ICT분야 R&D 동향을 파악하기 위하여 NTIS에서 제공하는 국가연구개발사업 과제정보를 텍스트 마이닝 기법을 통해 분석하였다. 2017년부터 2020까지의 과제 정보에서 키워드를 추출하고 연결 관계 마이닝을 통해 키워드 네트워크를 시각화하였다. 분석 결과는 다음과 같다. 첫째, 정보통신 각 분야에서 핵심 연구주제가 기술의 발전에 따라 변화하고 있음을 관찰하였다. 둘째, 키워드 네트워크 상에서 허브 역할을 하는 키워드를 통해 분야 간 융합의 매개 기술을 파악할 수 있었다. 마지막으로, 연도별 키워드 네트워크를 비교·분석함으로써 새롭게 등장하거나 연결 상태의 변화를 보이는 이머징(Emerging) 키워드를 통해 미래 유망 기술이나 최신 연구 방향성을 감지할 수 있음을 보였다.