• Title/Summary/Keyword: 텍스트분석

Search Result 2,604, Processing Time 0.036 seconds

A System for the Decomposition of Text Block into Words (텍스트 영역에 대한 단어 단위 분할 시스템)

  • Jeong, Chang-Boo;Kwag, Hee-Kue;Jeong, Seon-Hwa;Kim, Soo-Hyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10a
    • /
    • pp.293-296
    • /
    • 2000
  • 본 논문에서는 주제어 인식에 기반한 문서영상의 검색 및 색인 시스템에 적용하기 위한 단어 단위 분한 시스템을 제안한다. 제안 시스템은 영상 전처리, 문서 구조 분석을 통해 추출된 텍스트 영역을 입력으로 단어 단위 분할을 수행하는데, 텍스트 영역에 대해 텍스트 라인을 분할하고 분할된 텍스트 라인을 단어 단위로 분할하는 계층적 접근 방법을 사용한다. 텍스트라인 분할은 수평 방향 투영 프로파일을 적용하여 분할 지점을 구한다. 그리고 단어 분할은 연결요소들을 추출한 후 연결요소간의 gap 정보를 구하고, gap 군집화 기법을 사용하여 단어 단위 분한 지점을 구한다. 이때 단어 단위 분할의 성능을 저하시키는 특수기호에 대해서는 휴리스틱 정보를 이용하여 검출한다. 제안 시스템의 성능 평가는 50개의 텍스트 영역에 적용하여 99.83%의 정확도를 얻을 수 있었다.

  • PDF

Modeling and Implementation of Intelligent Pen-based Online Editing System (지능형 펜기반 온라인 교정 시스템의 설계 및 구현)

  • 김재경;손원성;정한상;임순범;최윤철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.178-180
    • /
    • 2002
  • 최근 종이 문서의 전자화가 이루어지면서 기존의 전통적인 펜기반 교정 시스템 또한 온라인 상의 전자 문서 환경에 맞게 구축되고 있다. 이러한 펜기반 입력 기법을 사용하는 교정 시스템에서는 교정 부호와 텍스트 간의 정확한 영역 인식이 중요하며 이를 위해 교정 부호의 특성과 텍스트 영역의 분석이 필요하다. 본 연구에서는 온라인 교정 시스템 모델링을 통하여 온라인 환경에 적합한 교정 부호를 정의하고, 교정 대상 텍스트 영역을 편집 가능한 단위로 구분하여 효율적인 편집 연산이 이루어 질 수 있도록 하였다. 또한 웹 기반의 구조문서(HTML/XML) 편집 환경을 고려하여 편집으로 인한 문서의 구조 정보 변경을 지원하기 위하여 텍스트를 비구조 및 구조정보 텍스트로 분류하여 정의하였다. 본 연구에서는 이러한 모델에 기반하여 교정 부호의 특성에 따른 가변적인 편집 텍스트 영역 인식 규칙 모델을 정의하여 교정부호와 편집 텍스트 영역간의 모호성을 최소화 하고, 편집으로 인한 문서의 구조 정보 변경을 지원하는 시스템을 구현하였다. 결과적으로 온라인 웹 문서 환경에서 펜기반의 모호한 교정 부호의 입력을 인지적인 관점에서 해석하여 보다 정확한 교정 작업 수행을 지원하도록 하였다.

  • PDF

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열 분석을 통한 부상 기술 예측)

  • Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.355-360
    • /
    • 2014
  • Forecasting of emerging technology plays important roles in business strategy and R&D investment. There are various ways for technology forecasting including patent analysis. Qualitative analysis methods through experts' evaluations and opinions have been mainly used for technology forecasting using patents. However qualitative methods do not assure objectivity of analysis results and requires high cost and long time. To make up for the weaknesses, we are able to analyze patent data quantitatively and statistically by using text mining technique. In this paper, we suggest a new method of technology forecasting using text mining and ARIMA analysis.

A Study on Word Cloud Techniques for Analysis of Unstructured Text Data (비정형 텍스트 테이터 분석을 위한 워드클라우드 기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.715-720
    • /
    • 2020
  • In Big data analysis, text data is mostly unstructured and large-capacity, so analysis was difficult because analysis techniques were not established. Therefore, this study was conducted for the possibility of commercialization through verification of usefulness and problems when applying the big data word cloud technique, one of the text data analysis techniques. In this paper, the limitations and problems of this technique are derived through visualization analysis of the "President UN Speech" using the R program word cloud technique. In addition, by proposing an improved model to solve this problem, an efficient method for practical application of the word cloud technique is proposed.

Knowledge Structure Analysis on Defense Research Using Text Network Analysis (텍스트 네트워크분석을 활용한 국방분야 연구논문 지식구조 분석)

  • Lee, Yong-Kyu;Yoon, Soung-woong;Lee, Sang-Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.526-529
    • /
    • 2018
  • 본 연구에서는 텍스트 네트워크분석을 활용하여 국방분야 연구의 핵심 주제어와 연구주제를 분석하고 이를 통해 전체 지식구조를 파악하고자 하였다. 이를 위해 2010년부터 2017년까지의 국방대학교 학위과정 논문을 대상으로 국방분야 연구현황을 진단하고 지식구조를 구성하였다. 8년간 누적된 논문 710건의 초록을 분석하여 총 6,883개의 단어를 추출한 후, 단어의 논문 등장 빈도수와 단어간 링크수를 파레토 법칙에 따라 상위 20%의 기준으로 총 270개의 단어로 추출하였고, 컴포넌트 분석을 통해 최종 170개의 핵심 주제어를 도출하였다. 이 핵심 주제어를 통해 중심성 분석과 응집구조를 분석하여, 국방분야에 대한 총 6개의 지식구조 그룹을 도출하였다.

  • PDF

Analysis of Interrelation between Image and Text as Fusion Relationship -Through Advertising Production Class- (융합적 관계로서의 이미지와 텍스트의 상호관계성 분석 연구 -광고 제작 수업을 통하여-)

  • Seo, Hwa-Jung;Huh, Yoon Jung
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.155-162
    • /
    • 2018
  • This study explores the relationship between images and texts through advertising production using images and texts, and analyzes the student works with the semiotics of Roland Barth. Since Barth emphasized the interpreter's interpretation rather than the producer's intention in his work, he interpreted the work as a receiver. It was analyzed in terms of socio-cultural meaning of what students produced in the works. A total of 64 classes were held for the first two classes in D high school. The results of analyzing students' works after the advertisement production class are as follows. First, as a result of analyzing Barth 's myth structure model, advertisement image and text are symbols and have meaning. Second, advertising image and text complement each other and have the characteristic of interrelationship that constitutes meaning. Third, By attracting the socio-cultural implications inherent in the students' advertising, their values and interests could be discovered.

Project Failure Main Factors Analysis using Text Mining in Audit Evaluation (감리결과에 텍스트마이닝 기법을 적용한 프로젝트 실패 주요요인 분석)

  • Jang, Kyoungae;Jang, Seong Yong;Kim, Woo-Je
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.468-474
    • /
    • 2015
  • Corporations should make efforts to recognize the importance of projects, identify their failure factors, prevent risks in advance, and raise the success rates, because the corporations need to make quick responses to rapid external changes. There are some previous studies on success and failure factors of projects, however, most of them have limitations in terms of objectivity and quantitative analysis based on data gathering through surveys, statistical sampling and analysis. This study analyzes the failure factors of projects based on data mining to find problems with projects in an audit report, which is an objective project evaluation report. To do this, we identified the texts in the paragraph of suggestions about improvement. We made use of the superior classification algorithms in this study, which were NaiveBayes, SMO and J48. They were evaluated in terms of data of Recall and Precision after performing 10-fold-cross validation. In the identified texts, the failure factors of projects were analyzed so that they could be utilized in project implementation.

Factor Analysis and Content Development of Digital Text Structure for Designing Visual Experience in e-Book Interface (e-Book 인터페이스에서 시각적 경험 설계를 위한 디지털 텍스트 구조의 물리적 요인분석 및 콘텐츠 개발)

  • Sung, Eun-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.11
    • /
    • pp.79-90
    • /
    • 2011
  • The purpose of this study is to explore physical factor of digital text structure for designing e-Book interface and to develop prototype of e-Book interface by applied these factors. To address this goal, explore factor analysis and confirmatory factor analysis were employed, 237 university students were the participated in this study. According to a result, 29 items for physical feature of digital text structure were developed, 9 factors of digital text structure were also extracted; volume, depth, density, space, layout, format, signal, size, and length. Besides, to identify structure of pre-defined 9 factors, confirmatory factor analysis was conducted. As a result of CFA, the factor structure was supported by all of model fit indices.

The Effects of Semantic Mapping as a Science Text Reading Strategy On High School Students' Inferential Comprehension (과학 텍스트 의미지도 읽기 전략이 고등학생의 추론적 이해에 미치는 영향)

  • Sujin Lee;Jihun Park;Jeonghee Nam
    • Journal of the Korean Chemical Society
    • /
    • v.67 no.5
    • /
    • pp.362-377
    • /
    • 2023
  • The purpose of this study was to investigate the effect of semantic mapping as a science text reading strategy on high school students' inferential understanding. For this purpose, eight science text reading classes were conducted a reading strategy using semantic mapping for 46 students in two science-focused classes in the third grade of a high school. To investigate the effects of semantic mapping reading strategy on students' inferential comprehension, students' pre- and post-reading ability tests results were analyzed. In order to find out the change in inferential comprehension, the level of the inferential comprehension was analyzed using the analysis framework for developed in this study. For the classification of inferential comprehension, the levels of the inferential comprehension were converted into scores. The results of the analysis of changes in students' inferential comprehension showed that semantic mapping reading strategy classes influenced the changes in high school students' inference, especially bridge inference and elaborative inference among sub-elements of inferential comprehension.

A Review on Expressive Materials and Approaches to Text Visualization (텍스트 데이터 시각화의 표현 재료와 접근 방식에 관한 고찰)

  • Kim, Hyoyoung;Park, Jin Wan
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.64-72
    • /
    • 2013
  • In this study, we contemplated types, essence, characteristics of text data which is material for visual expression of text visualization part of data visualization research and also analysed the multidirectional means of expressive approach for it. Studies of text visualization are spread dramastically under the influence of computer development, open data, wide use of visualization tools, etc. For these reasons, text visualization works have been creating as art works or output of research through various inter-discipline convergent research with engineering, art, humanities, sociology, etc. Nevertheless the theoretical studies on text data itself and its visualization, and also systematic analysis of its approach are rarely made. Data is target of understanding and interpretation, and it has infinite information and possibility with process and approach for it. Considering the attainable status of data in future human society, text visualization which is convergent academic field of study starting with understanding and interpretation of data needs further methodological research and theoretical accumulate.