• Title/Summary/Keyword: text base

Search Result 213, Processing Time 0.027 seconds

Research and Development of Document Recognition System for Utilizing Image Data (이미지데이터 활용을 위한 문서인식시스템 연구 및 개발)

  • Kwag, Hee-Kue
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.125-138
    • /
    • 2010
  • The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.

패션디자인 DB 개발

  • 김정회
    • Proceedings of the Korea Database Society Conference
    • /
    • 1997.10a
    • /
    • pp.358-375
    • /
    • 1997
  • 가. 패션 디자인 기초 정보 수집/분석 - 국내외에 산재하는 패션디자인 정보의 기초자료를 입수 - 디자이너별/ 컬렉션별/주제별로 분류 - 가공 나- 패션디자인정보의 멀티미디어 DATA BASE개발 - 화상(IMAGE)/해설(TEXT)/ SOUND의 복합 DATA BASE SYSTEM - PC통신망 서비스를 위한 DATA개발 다. 패션디자인 관련자료의 DB화 - 패션디자인 이론서 - 패션디자인 컨테스트 / 이벤트 정보 - 패션디자인 교육기관 정보 - 패션브랜드 정보 (내셔널 / 디자이너 / 수입) 라. DATA BASE 공급 서비스 - PC통신망을 통한 서비스(DOWN LOAD 가능) - 디자인작품 IMAGE 및 CONCEPT/ DETAILS/ CAPTION - PC통신을 이용 디자인 인력 구인/구직정보 활용 - 패션디자인 해외유학 정보 마. Inter-NET 서비스 - Inter-NET을 이용 국내디자이너작품 해외 소개(중략)

  • PDF

Study of Analyzing Outcome of Building and Introducing System for Preserving Full-Text of e-Journal

  • Kim, Kwang-Young;Kim, Soon-Young;Kim, Hwan-Min
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.2 no.2
    • /
    • pp.5-16
    • /
    • 2012
  • Today, most researchers conduct their studies through the full-text of e-journals. Therefore, an important base for domestic development of science and technology is to obtain the full-text of quality e-journals by overseas researchers and to provide it to Korea's researchers. This study aims to build a system based on the National Archiving Center for the full-text of e-journals and to make a service system for providing them to the public by acquiring the full-text of quality overseas e-journals. To do this, an analysis was made of the outcome of introducing such a system for full-text of e-journals in comparison with the investment. As a result, 112 more institutions, that is, from 47 institutions to 159 institutions, have introduced the system as of 2012, and the number of downloaded full-texts increased at least 2.17 times.

A Study on Fine-Tuning and Transfer Learning to Construct Binary Sentiment Classification Model in Korean Text (한글 텍스트 감정 이진 분류 모델 생성을 위한 미세 조정과 전이학습에 관한 연구)

  • JongSoo Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, generative models based on the Transformer architecture, such as ChatGPT, have been gaining significant attention. The Transformer architecture has been applied to various neural network models, including Google's BERT(Bidirectional Encoder Representations from Transformers) sentence generation model. In this paper, a method is proposed to create a text binary classification model for determining whether a comment on Korean movie review is positive or negative. To accomplish this, a pre-trained multilingual BERT sentence generation model is fine-tuned and transfer learned using a new Korean training dataset. To achieve this, a pre-trained BERT-Base model for multilingual sentence generation with 104 languages, 12 layers, 768 hidden, 12 attention heads, and 110M parameters is used. To change the pre-trained BERT-Base model into a text classification model, the input and output layers were fine-tuned, resulting in the creation of a new model with 178 million parameters. Using the fine-tuned model, with a maximum word count of 128, a batch size of 16, and 5 epochs, transfer learning is conducted with 10,000 training data and 5,000 testing data. A text sentiment binary classification model for Korean movie review with an accuracy of 0.9582, a loss of 0.1177, and an F1 score of 0.81 has been created. As a result of performing transfer learning with a dataset five times larger, a model with an accuracy of 0.9562, a loss of 0.1202, and an F1 score of 0.86 has been generated.

Automatic Topic Identification Based on the Ontology for Web Documents (온톨로지 기반의 웹 문서 자동 주제 식별)

  • Choi In-Dae;Nam In-Gil;Bu Ki-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.3
    • /
    • pp.38-45
    • /
    • 2004
  • The goal of this research is to develop a method of identifying a topic of a given text by looking at relationship of keywords defined in an ontology hierarchy. The keywords which are extracted from important sentences of the given text are mapped onto their correspond concepts which exist in the hierarchy. After all the words are mapped, the correspond concepts will be generalized into one single concept. The single concept will most likely be the topic of text. Our research have an approach that promotes both satisfaction in term of robustness and accuracy using ontologies and word frequency. So, this attempts are done in what they call as a hybrid approach. We try to take the challenge by using knowledge-statistical base approach. Experimental results show that proposed method outperforms the existing method using knowledge-base only.

  • PDF

Text Mining Analysis on the Research Field of the Coastal and Ocean Engineering Based on the SCOPUS Bibliographic Information (해안해양공학 연구 분야의 SCOPUS 서지정보 Text Mining 분석)

  • Lee, Gi Seop;Cho, Hong Yeon;Han, Jae Rim
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.30 no.1
    • /
    • pp.19-28
    • /
    • 2018
  • Numerous research papers have been accumulated due to the development and computerization of bibliometrics. This made it difficult to review all of the related papers published worldwide to conduct the study. However, due to the development of Natural language processing techniques, the tendency analysis of published research papers has become easier. In this study, text mining analysis using the statistical computing language R was carried out based on the bibliographic information of SCOPUS DB (Data Base) in the field of coastal and ocean engineering. As expected, the term 'wave' predominates, and it was confirmed that numerical analysis and hydraulic experiments were still dominant from the terms 'numerical model', 'numerical simulation', and 'experimental study'. In addition, recent use of the term 'wave energy' related to marine energy has been recognized. On the other hand, it was quantitatively confirmed that the frequency of connection between 'wave', and 'height' or 'energy' prevailed, and suggested the possibility of high resolution analysis by detailed field and period in the future.

Reliable Image-Text Fusion CAPTCHA to Improve User-Friendliness and Efficiency (사용자 편의성과 효율성을 증진하기 위한 신뢰도 높은 이미지-텍스트 융합 CAPTCHA)

  • Moon, Kwang-Ho;Kim, Yoo-Sung
    • The KIPS Transactions:PartC
    • /
    • v.17C no.1
    • /
    • pp.27-36
    • /
    • 2010
  • In Web registration pages and online polling applications, CAPTCHA(Completely Automated Public Turing Test To Tell Computers and Human Apart) is used for distinguishing human users from automated programs. Text-based CAPTCHAs have been widely used in many popular Web sites in which distorted text is used. However, because the advanced optical character recognition techniques can recognize the distorted texts, the reliability becomes low. Image-based CAPTCHAs have been proposed to improve the reliability of the text-based CAPTCHAs. However, these systems also are known as having some drawbacks. First, some image-based CAPTCHA systems with small number of image files in their image dictionary is not so reliable since attacker can recognize images by repeated executions of machine learning programs. Second, users may feel uncomfortable since they have to try CAPTCHA tests repeatedly when they fail to input a correct keyword. Third, some image-base CAPTCHAs require high communication cost since they should send several image files for one CAPTCHA. To solve these problems of image-based CAPTCHA, this paper proposes a new CAPTCHA based on both image and text. In this system, an image and keywords are integrated into one CAPTCHA image to give user a hint for the answer keyword. The proposed CAPTCHA can help users to input easily the answer keyword with the hint in the fused image. Also, the proposed system can reduce the communication costs since it uses only a fused image file for one CAPTCHA. To improve the reliability of the image-text fusion CAPTCHA, we also propose a dynamic building method of large image dictionary from gathering huge amount of images from theinternet with filtering phase for preserving the correctness of CAPTCHA images. In this paper, we proved that the proposed image-text fusion CAPTCHA provides users more convenience and high reliability than the image-based CAPTCHA through experiments.

A Study on the Intelligent Personal Assistant Development Method Base on the Open Source (오픈소스기반의 지능형 개인 도움시스템(IPA) 개발방법 연구)

  • Kim, Kil-hyun;Kim, Young-kil
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.89-92
    • /
    • 2016
  • The latest the siri and like this is offering services that recognize and respond to words in the smartphone or web services. In order to handle intelligently these voices, It needs to search big data in the cloud and requires the implementation of parsing context accuracy given. In this paper, I would like to propose the study on the intelligent personal assistant development method base on the Open source with ASR(Automatic Speech Recognition), QAS(Question Answering System) and TTS(Text To Speech).

  • PDF

Linguistic and Cognitive Factors that Affect Word Problem Solving (수학 문장제 해결에 영향을 주는 언어적.인지적 요인 -혼합물 문제를 중심으로-)

  • 김선희
    • Journal of Educational Research in Mathematics
    • /
    • v.14 no.3
    • /
    • pp.267-281
    • /
    • 2004
  • Many students feel the word problems are very difficult. This study analyzes the linguistic and cognitive factors that affect word problem solving so that we help students bring through the difficulty. There are a text base, a situation model, and a real world in the linguistic aspects. Students have a difficulty at the transition from text base to situation model(equation), and make lots of errors at the situation model. In the cognitive aspects, I investigated problem solving schemes, strategies, and complexity level. Students are likely to choose strategy by the contents which teacher instructed, but not by low complexity level, and mix up the amount of sugar and sugar water, and concentration. We can recognize how complex the types of word problems are to solve, which strategies students choose largely, and what errors that students make in the problem solving are.

  • PDF