• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.025 seconds

An Experimental Study on Feature Ranking Schemes for Text Classification (텍스트 분류를 위한 자질 순위화 기법에 관한 연구)

  • Pan Jun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.

Application of Deep Learning and Optical Character Recognition Technology to Automate Classification and Database of Borehole Log for Ground Stability Investigation of Abandoned Mines (폐광산 지반안정성 조사용 시추주상도의 분류 및 데이터베이스화를 위한 딥러닝 및 광학문자인식 기술의 적용)

  • Hosang Han;Jangwon Suh
    • Economic and Environmental Geology
    • /
    • v.57 no.5
    • /
    • pp.473-486
    • /
    • 2024
  • Boring logs are essential for the evaluation of ground stability in abandoned mine areas, representing geomaterial and subsurface structure information. However, because boring logs are maintained in various analog formats, extracting useful information from them is prone to human error and time-consuming. Therefore, this study develops an algorithm to efficiently manage and analyze boring log data for abandoned mine ground investigation provided in PDF format. For this purpose, the EfficientNet deep learning model was employed to classify the boring logs into five types with a high classification accuracy of 1.00. Then, optical character recognition (OCR) and PDF text extraction techniques were utilized to extract text data from each type of boring log. The OCR technique resulted in many cases of misrecognition of the text data of the boring logs, but the PDF text extraction technique extracted the text with very high accuracy. Subsequently, the structure of the database was established, and the text data of the boring logs were reorganized according to the established schema and written as structured data in the form of a spreadsheet. The results of this study suggest an effective approach for managing boring logs as part of the transition to digital mining, and it is expected that the structured boring log data from legacy data can be readily utilized for machine learning analysis.

Application for Predicting Candidate on Election Broadcasting - A Case Study on the 20th Assembly Election - (선거방송을 위한 선거후보 당선자 예측 어플리케이션 - 제 20 대 국회의원 선거에 적용한 연구 -)

  • Yang, Geunseok;Gu, Jinwon;Roh, Minchul;Shin, Yongwoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.95-98
    • /
    • 2016
  • 민주주의의 꽃, 제 20 대 국회의원 선거가 막을 내렸다. 지난 선거에서는 방송사뿐만 아니라 정당들도 엄청난 비용 지출과 노력이 소요되었다. 한 예로, 지난 4. 13 총선거 (제 20 대 국회의원)에서 방송 3 사 출구조사 비용으로 약 66 억원 이상이 지출됐다. 그리고 정당에서는 여론조사 비용으로 약 70 억원 이상을 지출했다. 이러한 큰 비용 지출과, 담당자들의 노력을 줄이기 위해 본 논문에서는 텍스트 마이닝과 감정분석을 적용한 후보 당선자 예측 어플리케이션을 제안한다. 첫째, 소셜 그래프 모델을 소개하여 지역 구조를 발견한다. 둘째, 텍스트 마이닝 기법을 이용하여, 후보자 관련 데이터를 가공한다. 셋째, 텍스트 감정 분석을 통해 후보자의 정보를 수치화 한다. 본 논문의 성능과 효율성을 평가하기 위해, 제 20 대 국회의원 선거에 사례연구를 진행하였다. 제안한 방법이 정확도와 수학적 통계 검증을 통해 가치 있는 효율성을 보였다. 선거방송을 위한 후보자 예측 도구의 도입으로 향후 선거(방송)에서의 큰 비용과 노력을 줄이는데 도움을 줄 것이라 기대한다.

  • PDF

Automatic Topic Identification Based on the Ontology for Web Documents (온톨로지 기반의 웹 문서 자동 주제 식별)

  • Choi In-Dae;Nam In-Gil;Bu Ki-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.3
    • /
    • pp.38-45
    • /
    • 2004
  • The goal of this research is to develop a method of identifying a topic of a given text by looking at relationship of keywords defined in an ontology hierarchy. The keywords which are extracted from important sentences of the given text are mapped onto their correspond concepts which exist in the hierarchy. After all the words are mapped, the correspond concepts will be generalized into one single concept. The single concept will most likely be the topic of text. Our research have an approach that promotes both satisfaction in term of robustness and accuracy using ontologies and word frequency. So, this attempts are done in what they call as a hybrid approach. We try to take the challenge by using knowledge-statistical base approach. Experimental results show that proposed method outperforms the existing method using knowledge-base only.

  • PDF

A Study on the Reading Efficiency of Comics (만화의 독서 효용성에 관한 연구)

  • Ryu, Ban-Dee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.2
    • /
    • pp.123-139
    • /
    • 2011
  • This study focuses on how the characteristics of the comics make the medium a useful reading material. To highlight how the characteristics of comics enhance the media's capacity for reading, this study analyzed several patterns of adaptation from a comic to other media. The usefulness of comics as a reading text can be summarized as follows: First, comics have alternate phases of immersion and concentration. Second, comics provide the whole view as well as speed. These two factors make the comics as an interesting reading material.

The Effects of the Presentation Mode of Web Contents on the Children's Information Processing Process (웹 콘텐츠의 정보제시유형이 어린이 뉴스정보처리과정에 미치는 영향)

  • Choi E-Jung
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.3
    • /
    • pp.113-122
    • /
    • 2005
  • The major purpose of this study is to explore the effect of the presentation undo combined by main four media(moving Image, audio, turf image) of web contents on the children's information processing process. So children were assigned to one of five experimental medium conditions: 'moving Image1 (auditory-visual redundancy)', 'moving Image2 (auditory-visual dissonance)', 'text', 'text-with-image', 'audio'. Results indicated that the moving image was found to be the most effective transmitter of internet news information for children's recall. And the recall advantage of moving image was found to be particularly pronounced for verbal information supplemented with redundant visual.

  • PDF

Scaling Documents' Semantic Transparency Spectrum with Semantic Hypernetwork (Semantic Hypernetwork 학습에 의한 자연언어 텍스트의 의미 구분)

  • Lee, Eun-Seok;Kim, Joon-Shik;Shin, Won-Jin;Park, Chan-Hoon;Zhang, Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.289-294
    • /
    • 2008
  • 어떤 자연언어 문서가 전달하려는 의미는 그 텍스트의 성격에 따라 아주 명확할 수도(예: 뉴스 문서), 아주 불분명할 수도 있다(예: 시). 이 연구는 이러한 '의미의 명확성(semantic transparency)'을 정량적으로 측정할 수 있다고 가정하고, 이 의미의 명확성을 판단하는 데에 단어들의 연쇄(word association)의 확률통계적 성질들이 어떻게 기능하는지에 대해 논한다. 이를 위해 특정 단어가 연쇄체를 형성하면서 발생하는 neighboring frequency와 degeneracy를 중심으로 Markov chain Monte Carlo scheme을 적용하여 의미망('Semantic Hypernetwork')으로 학습시킨 후 문서의 구성 단어들과 그 집합들 간의 연결 상태를 파악하였다. 우리는 의미적으로 그 표상이 분명하게 나뉘는 문서들(뉴스와 시)을 대상으로 이 모델이 어떻게 이들의 의미적 명확성을 분류하는지 분석하였다. Neighboring frequency와 degeneracy, 이 두 속성이 언어구조에서의 의미망 기억과 학습 탐색 기제에 유의한 기질로서 제안될 수 있다. 본 연구의 주요 결과로 1) 텍스트의 의미론적 투명성을 구별하는 통계적 증거와, 2) 문서의 의미구조에 대한 새로운 기질 발견, 3) 기존의 문서의 카테고리 별 분류와는 다른 방식의 분류 방식 제안을 들 수 있다.

  • PDF

Analysis of Signboard Characteristics and Dictionary Construction for Text Recognition in Signboard Images (간판영상의 텍스트 인식을 위한 영상데이터 특성 분석 및 사전 구축)

  • Lee, Myung-Hun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang;Oh, Sang-Wook;Kim, Sun-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.11
    • /
    • pp.10-17
    • /
    • 2008
  • The sign recognition and translation offer information and support decision making for foreigners or city tourist. Collecting sign images and building words in signs are essential to train machine recognizers and to evaluate systems. In this paper, we analyze the characteristics of sign images. The collected sign images are about 1000 captured from difference conditions and locations. We also build a dictionary of words in 100,000 sign names.

A Study on the Knowledge-Based System for Automaic Abstracting (자동 초록을 위한 지식 기반 시스템 설계에 관한 연구)

  • 최인숙
    • Journal of the Korean Society for information Management
    • /
    • v.6 no.1
    • /
    • pp.93-117
    • /
    • 1989
  • The objective of this study is to design an automatic abstracting system through the analysis of natural language texts. For this purpose a knowledge-based system operating on the basis of domain knowledge was developed. The procedure of generating an abstract consists of three steps: (1) A knowledge-base containing domain knowledge necessary to understand a text is constructed using frame and semantic network structures,and preliminary abstracts are prepared for various cases. (2) Input text is analysed on the basis of domain knowledge in order to extract information filling slots of the abstract with. (3) A Preliminary abstract corresponding to the input text is called and filled with the information, completing the abstract.

  • PDF

Development of computational thinking based Coding_Projects using the ARCS model (ARCS 모형을 적용한 컴퓨팅사고력 기반 코딩 프로젝트 개발)

  • Nam, Choong Mo;Kim, Chong Woo
    • Journal of The Korean Association of Information Education
    • /
    • v.23 no.4
    • /
    • pp.355-362
    • /
    • 2019
  • Elementary students are studying software training to teach coding education using text-based languages such as Python. In general, these higher-level languages support learning activities in combination with a kits for physical computing or various programming languages, in contrast to block-coding programming languages. In this study, we conducted a coding project based on computational thinking using the ARCS model to overcome the difficulties of text-based language. The results of the experiment show that students are generally confident and interested in programming. Especially, the understanding of repetition, function, and object was high in the change of computational thinking power, so this trend is believed to be due to the use of text-based languages and the Python module.