Search | Korea Science

Hangul Component Decomposition in Outline Fonts (한글 외곽선 폰트의 자소 분할)

Koo, Sang-Ok;Jung, Soon-Ki
- Journal of the Korea Computer Graphics Society
- /
- v.17 no.4
- /
- pp.11-21
- /
- 2011
This paper proposes a method for decomposing a Hangul glyph of outline fonts into its initial, medial and final components using statistical-structural information. In a font family, the positions of components are statistically consistent and the stroke relationships of a Hangul character reflect its structure. First, we create the component histograms that accumulate the shapes and positions of the same components. Second, we make pixel clusters from character image based on pixel direction probabilities and extract the candidate strokes using position, direction, size of clusters and adjacencies between clusters. Finally, we find the best structural match between candidate strokes and predefined character model by relaxation labeling. The proposed method in this paper can be used for a study on formative characteristics of Hangul font, and for a font classification/retrieval system.
PDF KSCI

CKFont2: An Improved Few-Shot Hangul Font Generation Model Based on Hangul Composability (CKFont2: 한글 구성요소를 이용한 개선된 퓨샷 한글 폰트 생성 모델)

Jangkyoung, Park;Ammar, Ul Hassan;Jaeyoung, Choi
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.12
- /
- pp.499-508
- /
- 2022
A lot of research has been carried out on the Hangeul generation model using deep learning, and recently, research is being carried out how to minimize the number of characters input to generate one set of Hangul (Few-Shot Learning). In this paper, we propose a CKFont2 model using only 14 letters by analyzing and improving the CKFont (hereafter CKFont1) model using 28 letters. The CKFont2 model improves the performance of the CKFont1 model as a model that generates all Hangul using only 14 characters including 24 components (14 consonants and 10 vowels), where the CKFont1 model generates all Hangul by extracting 51 Hangul components from 28 characters. It uses the minimum number of characters for currently known models. From the basic consonants/vowels of Hangul, 27 components such as 5 double consonants, 11/11 compound consonants/vowels respectively are learned by deep learning and generated, and the generated 27 components are combined with 24 basic consonants/vowels. All Hangul characters are automatically generated from the combined 51 components. The superiority of the performance was verified by comparative analysis with results of the zi2zi, CKFont1, and MX-Font model. It is an efficient and effective model that has a simple structure and saves time and resources, and can be extended to Chinese, Thai, and Japanese.
https://doi.org/10.3745/KTSDE.2022.11.12.499 인용 PDF KSCI

Assembling Disjoint Korean Syllables Using Two-Step Rules (2단계 규칙을 이용한 해체된 한글 음절의 결합)

Lee, Joo-Ho;Kim, Hark-Soo
- Korean Journal of Cognitive Science
- /
- v.19 no.3
- /
- pp.283-295
- /
- 2008
With increasing usages of a messenger and a SMS, many young people are habitually using a new-style of sentences with intentionally disjoint Korean syllables. To develop a natural language interface system in these environments, we should first develop a technique that converts a sequence of disjoint Korean syllables to a correct sentence. Therefore, we propose a method to assemble a sequence of disjoint Korean syllables into a correct sentence by using two-step rules. In the first step, the proposed method assembles CVC (consonant-vowel-consonant) forms of simple-disjoint Korean syllables by using manual heuristic rules. In the second step, the proposed method assembles CCVCC forms of double-disjoint Korean syllables by using a mapping table and a transformation-based learning technique. In the experiment, the proposed method showed the perfect precision of 100% in assembling simple-disjoint Korean syllables and the high precision of 99.98% in assembling double-disjoint Korean syllables.
PDF

Enhancing Korean Alphabet Unit Speech Recognition with Neural Network-Based Alphabet Merging Methodology (한국어 자모단위 음성인식 결과 후보정을 위한 신경망 기반 자모 병합 방법론)

Solee Im;Wonjun Lee;Gary Geunbae Lee;Yunsu Kim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.659-663
- /
- 2023
이 논문은 한국어 음성인식 성능을 개선하고자 기존 음성인식 과정을 자모단위 음성인식 모델과 신경망 기반 자모 병합 모델 총 두 단계로 구성하였다. 한국어는 조합어 특성상 음성 인식에 필요한 음절 단위가 약 2900자에 이른다. 이는 학습 데이터셋에 자주 등장하지 않는 음절에 대해서 음성인식 성능을 저하시키고, 학습 비용을 높이는 단점이 있다. 이를 개선하고자 음절 단위의 인식이 아닌 51가지 자모 단위(ㄱ-ㅎ, ㅏ-ㅞ)의 음성인식을 수행한 후 자모 단위 인식 결과를 음절단위의 한글로 병합하는 과정을 수행할 수 있다[1]. 자모단위 인식결과는 초성, 중성, 종성을 고려하면 규칙 기반의 병합이 가능하다. 하지만 음성인식 결과에 잘못인식된 자모가 포함되어 있다면 최종 병합 결과에 오류를 생성하고 만다. 이를 해결하고자 신경망 기반의 자모 병합 모델을 제시한다. 자모 병합 모델은 분리되어 있는 자모단위의 입력을 완성된 한글 문장으로 변환하는 작업을 수행하고, 이 과정에서 음성인식 결과로 잘못인식된 자모에 대해서도 올바른 한글 문장으로 변환하는 오류 수정이 가능하다. 본 연구는 한국어 음성인식 말뭉치 KsponSpeech를 활용하여 실험을 진행하였고, 음성인식 모델로 Wav2Vec2.0 모델을 활용하였다. 기존 규칙 기반의 자모 병합 방법에 비해 제시하는 자모 병합 모델이 상대적 음절단위오류율(Character Error Rate, CER) 17.2% 와 단어단위오류율(Word Error Rate, WER) 13.1% 향상을 확인할 수 있었다.
PDF

An Analysis on the Examples of the Analytico-Synthetic Classification Techniques Applied to Practical Life (실생활에 적용된 분석합성식 분류기법의 사례에 관한 심층적 분석)

Oh, Dong-Geun
- Journal of Korean Library and Information Science Society
- /
- v.42 no.2
- /
- pp.151-170
- /
- 2011
This article tries to analyze some examples applied the analytico-synthetic classification techniques found in the practical life. For this purpose, it selected and investigated the cases of the combination rules of the Hangeul, the number systems of the city buses of Daegu and Seoul Cities, information of the members of the wedding consulting agencies, information from real estate agents and court auction, and postal codes and direct distance dialing(DDD) numbers in Korea. It suggests that applying the analytico-synthetic classification techniques to the systems can improve them especially in the regards of notational systems.
PDF KSCI

The Internal Structure of Korean Syllable and Kulca (글자와 음절의 내부구조)

Yi, Kwang-Oh
- Annual Conference on Human and Language Technology
- /
- 1995.10a
- /
- pp.228-232
- /
- 1995
음절과 글자의 내부구조에 대한 언어학적 심리학적 논의들을 개관하였다. 영어의 음절구조에 대한 연구들은 초두자음/각운 구조를 지지하고 있다. 한편 한국어의 음절과 글자의 내부구조에 대한 최근의 연구는 영어에서 얻어진 결과들과 다른 결과를 얻고 있다. 자모대체 과제를 사용한 이 연구에 의하면, 글자유형에 관계없이 종성자모의 대체시간이 초성자모의 대체시간보다 짧았다. 이러한 결과는 음절에 대응하는 글자의 내부구조로서 초중성자모/종성자모 구조를 지지하고 있다. 선행연구 결과들을 바탕으로 음운단위와 표기단위의 상동성, 그리고 언어특유적 음절구조의 가능성에 대해서 논의하였다.
PDF

A Study on the Pre-Classification of Handwritten Hangeul Characters Using Partial Separation and Recognition of Initial Consonants (초성자소분리 인식에 의한 필기 한글문자의 대분류에 관한 연구)

안석출;김명기
- Journal of the Korean Graphic Arts Communication Society
- /
- v.6 no.1
- /
- pp.41-57
- /
- 1988
Recently, it Is required to develop OCR(Optical Character Reader) along with the progress of the information processing system for Hangeul. Characters have to be recognized clearly so that OCR can be applied, Structure analysis method and lump method are used for the recognition of characters, and OCR is now available for the recognition of printed characters and handwritten alphanumeric characters having simple structure by them However, It is known that there should be much more study on the development of handwritten Hangout's OCR. This paper proposed a new method for the handwritten Hangout character recognition. The units of Initial consonant of Hangout are separated and then recognized from the utilization of the position- Information of Hangeul's units from the normalized patterns using the regression line theory. It is carried out for the extraction of the block which exists in the virtual Initial consonant region from the normalized input patterns and the calculation on maximum value (${\beta}$) of likelihood after comparing the features of separated subpattern with the initial consonant dictionary.
PDF

A Study on Improvement of Retrieval Algorithm for Audio Response Service (음성정보 서비스의 검색 알고리즘 개선 연구)

Jeong, Yoo-Hyeon;Kim, Soon-Hyop
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.5
- /
- pp.92-95
- /
- 1997
Thlephone pushbuttons simply consist of 0~9 digits, #, and ${\ast}$). So it is difficulty for user to input the various query command for information retrieval of audio response sevice. We suggest the new retrieval algorithm for audio response service using Korean initial sounds sequences. User those who do not know the retrieval code can retrieve the audio response service by pushing the telephone digit buttons which correspond to initial sounds of its name.
PDF

Entropy and Average Mutual Information for a 'Choseong', a 'Jungseong', and a 'Jongseong' of a Korean Syllable (한글 음절의 초성, 중성, 종성 단위의 발생확률, 엔트로피 및 평균상호정보량)

이재홍;오상현
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.26 no.9
- /
- pp.1299-1307
- /
- 1989
A Korean syllable is regarded as a random variable according to its probabilistic property in occurrence. A Korean syllable is divided into a 'choseong', a 'jungseong', and a 'jongseong' which are regarded as random variables. From the cumulative freaquency of a Korean syllable all possible joint probabilities and conditional probabilities are computed for the three ramdom variables. From the joint probabilities and the conditional probabilities all possible joint entropies and conditional entropies are computed for the three random varibles. Also all possible average mutual informations are calculated for the three random variables. Average mutual informatin between two random variables hss its biggest value between choseong and jungseong. Average mutual information between a random variable and other two random variables has its biggest value between jungseong and choseong-jongseong.
PDF

Korean Word Search App Using Meta-characters (메타문자를 사용한 한국어 사전 탐색 앱)

Kwon, Hong-Seok;Kim, Jae-Hoon
- Annual Conference on Human and Language Technology
- /
- 2011.10a
- /
- pp.110-113
- /
- 2011
스마트 폰의 보급이 대중화됨에 따라 다양한 앱들이 사용되고 있으나 효율적인 사전 탐색에 관한 앱은 그다지 많지 않다. 현재 공개된 한국어 사전 탐색 앱은 완전한 단어이거나 단어의 부분 문자열을 질의로 사용한다. 이 경우 완전한 단어를 기억하지 못하거나 한국어 정보처리를 위한 여러 형태의 음운 정보를 쉽게 탐색할 수 없다. 이러한 문제를 개선하기 위해 본 논문에서는 메타문자를 사용하여 효율적으로 단어를 탐색할 수 있는 앱을 개발한다. 본 논문에서 사용하는 메타문자는 임의의 음절을 표현하는 '*'와 '?'과 종성을 표현하는 ':'를 사용하며 사전구조는 자소 단위의 트라이를 사용한다. 또한 음절은 물론이고 자소(초성, 중성, 종성)로 구성된 질의를 탐색할 수 있다. 더구나 음절과 자소가 혼합된 질의도 사용할 수 있도록 하여 사용자의 편의를 크게 도모하였다.
PDF

Search Result 63, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)