• Title/Summary/Keyword: Hangul text

Search Result 96, Processing Time 0.033 seconds

Hangeul Stem Extraction Algorithm for Text Mining Based on Natural Language Processing (자연어 처리 기반 텍스트 마이닝을 위한 한글 어간 추출 알고리즘)

  • Choi, Ki-won;Choi, Seong-hun;Jo, Sang-hyeon;Kim, Hee-cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.718-721
    • /
    • 2017
  • Natural language processing, which is the basis of text mining, differs depending on the type of language. Especially, Hangeul, which has relatively high freedom of expression compared to other languages, has various forms of words depending on the use of ending. The part that does not change in these various forms of words is called the stem. For effective text mining, it is essential to extract words and unify various types of words. Therefore, this paper proposes an extraction algorithm for Hangul word for effective text mining of Hangul document.

  • PDF

Hangul-Oullim-Meotjit (한글-어울림-멋짓)

  • Ahn, Sang-Soo
    • Archives of design research
    • /
    • v.20 no.3 s.71
    • /
    • pp.335-344
    • /
    • 2007
  • Hunminjeongeum. is. book. of. Hangul.. The. contents. is. all. about. philosophy. and. concept. of. Hangul. design.. It. is. world-valuable. design. text.. It. is. a. design. theory. book.. typographic. theory.. and. design. philosophy. book.. The. word. of. 'design'. is. Meotjit. in. Korean.. Design. is.'doing. or. making. with .Meot'. in. material,. non-material,. even. in. thinking.. Visual. communication. design. is.'Bom-Meotjit',. Fashion. design. is. 'Ot-Meotjit'.. Substance. of. Meot. is. Oullim,. the. great. harmony.. The. state. of. Meot. is. the. identity. of. Korean. design. spirit..

  • PDF

Query Extension of Retrieve System Using Hangul Word Embedding and Apriori (한글 워드임베딩과 아프리오리를 이용한 검색 시스템의 질의어 확장)

  • Shin, Dong-Ha;Kim, Chang-Bok
    • Journal of Advanced Navigation Technology
    • /
    • v.20 no.6
    • /
    • pp.617-624
    • /
    • 2016
  • The hangul word embedding should be performed certainly process for noun extraction. Otherwise, it should be trained words that are not necessary, and it can not be derived efficient embedding results. In this paper, we propose model that can retrieve more efficiently by query language expansion using hangul word embedded, apriori, and text mining. The word embedding and apriori is a step expanding query language by extracting association words according to meaning and context for query language. The hangul text mining is a step of extracting similar answer and responding to the user using noun extraction, TF-IDF, and cosine similarity. The proposed model can improve accuracy of answer by learning the answer of specific domain and expanding high correlation query language. As future research, it needs to extract more correlation query language by analysis of user queries stored in database.

A Study on Word Learning and Error Type for Character Correction in Hangul Character Recognition (한글 문자 인식에서의 오인식 문자 교정을 위한 단어 학습과 오류 형태에 관한 연구)

  • Lee, Byeong-Hui;Kim, Tae-Gyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1273-1280
    • /
    • 1996
  • In order perform high accuracy recognition of text recognition systems, the recognized text must be processed through a post-processing stage using contextual information. We present a system that combines multiple knowledge sources to post-process the output of an optical character recognition(OCR) system. The multiple knowledge sources include characteristics of word, wrongly recognized types of Hangul characters, and Hangul word learning In this paper, the wrongly recognized characters which are made by OCR systems are collected and analyzed. We imput a Korean dictionary with approximately 15 0,000 words, and Korean language texts of Korean elementary/middle/high school. We found that only 10.7% words in Korean language texts of Korean elementary/middle /high school were used in a Korean dictionary. And we classified error types of Korean character recognition with OCR systems. For Hangul word learning, we utilized indexes of texts. With these multiple knowledge sources, we could predict a proper word in large candidate words.

  • PDF

Implementation of Hangul to $T_EX$ conversion software (아래아 한글 파일의 텍 파일로의 변환 소프트웨어 구현)

  • Kim, Sung-Won;Lee, Han-Na;Park, Sang-Hoon;Oh, Chang-Hyuck
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.99-107
    • /
    • 2010
  • This research is for implementation of a software that can convert Hangul format file to $T_EX$ format file. Hangul is a word processor that has widely been used in Korea. It is known that Hangul is relatively easy of typing in equations and tables in preparing a paper draft. $T_EX$ has been developed as a computer programming language for preparing and publishing documents. Documents are first typed in with a plain text editor with $T_EX$ commands and then is compiled and linked. The software implemented in this research converts Hangul format files which are written under the specific format of a journal to $T_EX$ format file with the given style specific file. It converts special symbols, texts, tables, equations, and paragraph formats. We have used Hangul format of Journal of the Korean Data & Information Science Society (JKDISS) and the style file of $T_EX$ for the beta-test for the software.

Design of Regional Function Message of AIS for Hangul Text messaging (한글 텍스트 메시징을 위한 AIS 지역 기반 메시지 설계)

  • Yu, Dong-Hui
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.2
    • /
    • pp.77-81
    • /
    • 2013
  • The international standard AIS, which stands for the safety of ship navigation and vessel traffic management, provides 27 messages to exchange the navigational information of ship. Among 27 messages, message ID 6 and 8 are defined as the binary data format to exchange application specific information and are classified into IFM for international use and RFM for national or regional use. Since international standards are based on English, there have been some needs to exchange data in Hangul text for vessel traffic management to correct the static and dynamic ships' information. In this paper, I analyze international standards to provide a Hangul text messaging service based on RFM and propose a RFM message and a simple protocol to correct information of a ship.

PHDCM : Efficient Compression of Hangul Text in Parallel (PHDCM : 병렬 컴퓨터에서 한글 텍스트의 효율적인 축약)

  • Min, Yong-Sㅑk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2E
    • /
    • pp.50-56
    • /
    • 1995
  • This paper describes an efficient coding method for Korean characters using a three-state transition graph. To our knowledge, this is the first achievement of its kind. This new method, called the Paralle Hangul Dynamic Coding Method(PHDCM), compresses about 3.5 bits per a Korean character, which is more than 1 bit shorter than the conventional codes introduced thus far to achieve extensive code compression. When we ran the method on a MasPar machine, which is on SIMD SM (EFEW-PRAM)., it achieved a 49.314-fold speedup with 64 processors having 10 million Korean characters.

  • PDF

A Study on Effective Processing of Hangul for JBIG2 Coding (JBIG2 부호화에서의 한글의 효율적 처리에 관한 연구)

  • 강병택;김현민;고형화
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.6B
    • /
    • pp.1050-1059
    • /
    • 2000
  • In this paper, we propose a method to improve JBIG2 compression ratio which can be applied to Hangul text. Hangul character is composed of a few symbols which is called JASO, which needs inevitable increase of position information to be transmitted. To reduce this disadvantage, we have proposed an algorithm that generate aggregated symbol in combination of JASO symbols. Proposed algorithm shows better performance in Huffman coding than in arithmetic coding. In lossless coding, proposed algorithm showed 4.5∼16.7(%) improvement for Huffman coding and 2.9∼10.4(%) improvement for arithmetic coding. In lossy coding, proposed algorithm showed 3.7∼17.0(%) improvement for Huffman coding and 2.1∼10.5(%) improvement for arithmetic coding.

  • PDF

Study on Methods of Digitalization of Older Books Using PDF (PDF를 활용한 고문헌의 원문디지털화 방안에 대한 고찰)

  • Lee, Sang-Yong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.1
    • /
    • pp.133-153
    • /
    • 2000
  • This article is a study on methods of digitalization for eider books using PDF (Portable Document Format) supported by Acrobat 4.0 which was introduced in April of 1999. Acrobat 3.0 has caused many problems in supporting Korean language or Hangul. However, the revised 4.0 version of this software made the conversion of Korean, Japanese and Chinese language possible due to its support by the multi-language fonts. Therefore, it Is possible to converse and to edit the text file of older books written with Hangul. The Acrobat Reader, the viewer of PDF, can be downloaded for free from its website. However, the digitalized text of older books by PDF has still some problems. But the user can retrieve the text of older books from the Internet easily.

  • PDF