• Title/Summary/Keyword: Hangeul string

Search Result 7, Processing Time 0.028 seconds

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments (제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue;Chung, Woo-Keun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.788-801
    • /
    • 2010
  • Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

Development of EUC-KR based Locale and Application Program Supporting North Korean Collating Sequence (북한 한글 순서를 지원하는 EUC-KR 기반의 로캘과 응용 프로그램 개발)

  • Jung Il-dong;Lee Jung-hwa;Kim Yong-ho;Kim Kyongsok
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.875-884
    • /
    • 2004
  • UCS (=ISO/IEC 10646, =Unicode) will be used widely as globalization. If UCS is used for official purpose in Koreas, UCS solves a Problem in different hangeul code between South and North Korea. But, UCS is not a solution for problems in unequal order with the same character. IS0/1EC 146sl : 2000 (International String Ordering), which is a international standard for string ordering, defines a framework sorting all char-acter strings consisting multi-national scripts. Because the Common Template Table in ISO/IEC 14651 defines orders of characters, we can change orders of characters without changes of characters sequences in programs. Therefore, we can solve a ordering problem without unifying order of hangeul in South and North Korea. Functions related ISO/IEC 14651 are contained by system librarys in unix-based operating system such as Linux, Solaris and FreeBSD. We implement EUC-KR-based North Korean locale, which includes North Korean hangeul order, in Linux in order to use North Korean locale in South Korea. And we develop a program ordering strings with South and North Korean hangout order.

A solution for the problems of Collating Hangeul in the framework of ISO/IEC 14651 (ISO/IEC 14651 틀에서 한글 간추리기 문제점에 대한 해결 방안)

  • 옥제영;정일동;김경석
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.11b
    • /
    • pp.457-460
    • /
    • 2002
  • 국제 표준인 ISO/IEC 14651(International String Ordering)은 글자의 차례를 정하고 문자열(=글자떼)을 간추리는 틀에 관한 표준이다. ISO/IEC 14651을 사용하면, 글자의 차례를 바꾸기 위하여 프로그램을 바꾸지 않고, 공통 틀 표라고 하는 표만 수정하면 글자의 차례를 바꾸어 간추릴 수 있다. ISO/IEC 14651에서 한글과 다른 나라 글자가 섞여 있는 글자떼를 간추리면 제대로 된 곁과가 나오지 않는다. 이 문제를 해결하기 위하여, 한글 글자마디를 첫소리, 가운데 소리, 끝소리 글자로 바꾼 뒤 비교하는 방안을 제안한다.

  • PDF

Hangeul detection method based on histogram and character structure in natural image (다양한 배경에서 히스토그램과 한글의 구조적 특징을 이용한 문자 검출 방법)

  • Pyo, Sung-Kook;Park, Young-Soo;Lee, Gang Seung;Lee, Sang-Hun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.3
    • /
    • pp.15-22
    • /
    • 2019
  • In this paper, we proposed a Hangeul detection method using structural features of histogram, consonant, and vowel to solve the problem of Hangul which is separated and detected consonant and vowel The proposed method removes background by using DoG (Difference of Gaussian) to remove unnecessary noise in Hangul detection process. In the image with the background removed, we converted it to a binarized image using a cumulative histogram. Then, the horizontal position histogram was used to find the position of the character string, and character combination was performed using the vertical histogram in the found character image. However, words with a consonant vowel such as '가', '라' and '귀' are combined using a structural characteristic of characters because they are difficult to combine into one character. In this experiment, an image composed of alphabets with various backgrounds, an image composed of Korean characters, and an image mixed with alphabets and Hangul were tested. The detection rate of the proposed method is about 2% lower than that of the K-means and MSER character detection method, but it is about 5% higher than that of the character detection method including Hangul.

A Study of on Extension Compression Algorithm of Mixed Text by Hangeul-Alphabet

  • Ji, Kang-yoo;Cho, Mi-nam;Hong, Sung-soo;Park, Soo-bong
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.446-449
    • /
    • 2002
  • This paper represents a improved data compression algorithm of mixed text file by 2 byte completion Hangout and 1 byte alphabet from. Original LZW algorithm efficiently compress a alphabet text file but inefficiently compress a 2 byte completion Hangout text file. To solve this problem, data compression algorithm using 2 byte prefix field and 2 byte suffix field for compression table have developed. But it have a another problem that is compression ratio of alphabet text file decreased. In this paper, we proposes improved LZW algorithm, that is, compression table in the Extended LZW(ELZW) algorithm uses 2 byte prefix field for pointer of a table and 1 byte suffix field for repeat counter. where, a prefix field uses a pointer(index) of compression table and a suffix field uses a counter of overlapping or recursion text data in compression table. To increase compression ratio, after construction of compression table, table data are properly packed as different bit string in accordance with a alphabet, Hangout, and pointer respectively. Therefore, proposed ELZW algorithm is superior to 1 byte LZW algorithm as 7.0125 percent and superior to 2 byte LZW algorithm as 11.725 percent. This paper represents a improved data Compression algorithm of mixed text file by 2 byte completion Hangout and 1 byte alphabet form. This document is an example of what your camera-ready manuscript to ITC-CSCC 2002 should look like. Authors are asked to conform to the directions reported in this document.

  • PDF

Development on Improved of LZW Compression Algorithm by Mixed Text File for Embedded System (임베디드시스템을 위한 혼용텍스트 파일의 개선된 LZW 압축 알고리즘 구현)

  • Cho, Mi-Nam;Ji, Yoo-Kang
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.70-76
    • /
    • 2010
  • This paper Extended ELZW(EBCDIC Lempel Ziv Welch) algorithm uses 2 byte prefix field for pointer of a table and 1 byte suffix field for repeat counter. where, a prefix field uses a pointer(index) of compression table and a suffix field uses a counter of overlapping or recursion text data in compression table. To increase compression ratio, after construction of compression table, table data are properly packed as different bit string in accordance with a alphabet, Hangeul, and pointer respectively. Therefore, proposed ELZW algorithm is superior to 1byte LZW algorithm as 5.22 percent and superior to 2byte LZW algorithm as 8.96 percent.

Namnyeong-wie, Yun Eui-Seon's Everyday Clothes included in Wedding Gift List in 1837 (남녕위(南寧尉) 윤의선(尹宜善)의 1837년 「혼수발기」 속 부마 편복(便服) 고찰)

  • LEE, Eunjoo
    • Korean Journal of Heritage: History & Science
    • /
    • v.54 no.3
    • /
    • pp.68-89
    • /
    • 2021
  • In August 1837, a list of wedding gifts was given by Queen Sunwon (1789-1857) to her son-in-law, Namnyeong-wie, Yun Eui-Seon (1823-1887) at the wedding of Princess Deok-on (1822-1844). This Honsubalgi is now kept at the National Hangeul Museum. This text was used in the present study to examine the everyday clothes of the royal son-in-law in the early 19th century. First, the everyday clothes were organized into about 36 types. They were classified as tops, bottoms, hats, accessories, belts, pouches, fans and shoes. Second, the most important clothes were the ordinary formal attire, composed of the namgwangsa dopo and namgwangcho changui. As for the bottoms, the pants, the Chinese hemp leggings, two pairs of socks, the green silk belt, and a pair of light blue ankle ties were identified. Third, as for the head and accessories, there were heukrip, with the gemstone string and silk string, the jeong-ja-gwan and dong-pa-gwan, as well as tang-geon and bok-geon. And there were the sangtu-gwan, three types of donggos, and the mang-geon equipped with okgwanja. On the other hand, the jeong-ja-gwan and dong-pa-gwan are peculiar hats whose status has changed over time since the mid-18th century. The fact that the jeong-ja-gwan and dong-pa-gwan were given to Namnyeong-wie showed that the status of these hats improved in the early reign of King Heonjong. The belt was given with the sejodae that is suitable for the dangsang, the coral plates, and the silk bag containing a flint pouch. Fourth, there were the red-colored sejodae, a ssamji silk pouch for flint and the fan decorated with okseonchu, and shoes, such as unhye and danghye.