• Title/Summary/Keyword: Hangul Input method

Search Result 45, Processing Time 0.018 seconds

Document Classification using Recurrent Neural Network with Word Sense and Contexts (단어의 의미와 문맥을 고려한 순환신경망 기반의 문서 분류)

  • Joo, Jong-Min;Kim, Nam-Hun;Yang, Hyung-Jeong;Park, Hyuck-Ro
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.7
    • /
    • pp.259-266
    • /
    • 2018
  • In this paper, we propose a method to classify a document using a Recurrent Neural Network by extracting features considering word sense and contexts. Word2vec method is adopted to include the order and meaning of the words expressing the word in the document as a vector. Doc2vec is applied for considering the context to extract the feature of the document. RNN classifier, which includes the output of the previous node as the input of the next node, is used as the document classification method. RNN classifier presents good performance for document classification because it is suitable for sequence data among neural network classifiers. We applied GRU (Gated Recurrent Unit) model which solves the vanishing gradient problem of RNN. It also reduces computation speed. We used one Hangul document set and two English document sets for the experiments and GRU based document classifier improves performance by about 3.5% compared to CNN based document classifier.

Animation Generation for Chinese Character Learning on Mobile Devices (모바일 한자 학습 애니메이션 생성)

  • Koo, Sang-Ok;Jang, Hyun-Gyu;Jung, Soon-Ki
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.12
    • /
    • pp.894-906
    • /
    • 2006
  • There are many difficulties to develop a mobile contents due to many constraints on mobile environments. It is difficult to make a good mobile contents with only visual reduction of existing contents on wire Internet. Therefore, it is essential to devise the data representation and to develop the authoring tool to meet the needs of the mobile contents market. We suggest the compact mobile contents to learn Chinese characters and developed its authoring tool. The animation which our system produces is realistic as if someone writes letters with pen or brush. Moreover, our authoring tool makes a user generate a Chinese character animation easily and rapidly although she or he has not many knowledge in computer graphics, mobile programming or Chinese characters. The method to generate the stroke animation is following: We take basic character shape information represented with several contours from TTF(TrueType Font) and get the information for the stroke segmentation and stroke ordering from simple user input. And then, we decompose whole character shape into some strokes by using polygonal approximation technique. Next, the stroke animation for each stroke is automatically generated by the scan line algorithm ordered by the stroke direction. Finally, the ordered scan lines are compressed into some integers by reducing coordinate redundancy As a result, the stroke animation of our system is even smaller than GIF animation. Our method can be extended to rendering and animation of Hangul or general 2D shape based on vector graphics. We have the plan to find the method to automate the stroke segmentation and ordering without user input.

A study on Mapping the Unicode based Hangul-Hanja for prescription names in Korean Medicine (처방명 연계를 위한 유니코드 한자 기반의 한글-한자 매핑정보 구축에 관한 연구)

  • Jeon, Byoung-Uk;Kim, An-Na;Kim, Ji-Young;Oh, Yong-Taek;Kim, Chul;Song, Mi-Young;Jang, Hyun-Chul
    • Korean Journal of Oriental Medicine
    • /
    • v.18 no.3
    • /
    • pp.133-139
    • /
    • 2012
  • Objective : UMLS is 'Ontology' which establishes the database for medical terminology by gathering various medical vocabularies representing same fundamental concepts. Method : Although Chinese character are represented in the Chinese part of Korean Unicode system in a computer, writing of Chinese characters is vary depending on Chinese input systems and Chinese writers' levels of knowledge. As the result of this, representation of Chinese writing in a computer will be considerably different from an old Chinese document. Therefore, a meaningful relationship between digital Chinese terminology and translated Korean is necessary in order to build Ontology for Chinese medical terms from Oriental medical prescription in a computer system. Result : This research will present 1:1 mapping information among the Chinese characters used in the Oriental medical prescription with analysis of 'same character different sound' and 'same meaning different shape' in Chinese part of Unicode systems. Conclusions : Furthermore, the research will provide top-down menu of relationship between Chinese term and Korean term in medical prescription with assumption of that the Oriental medical prescription has its own unique meaning.

Implementation of the Automatic Segmentation and Labeling System (자동 음성분할 및 레이블링 시스템의 구현)

  • Sung, Jong-Mo;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.50-59
    • /
    • 1997
  • In this paper, we implement an automatic speech segmentation and labeling system which marks phone boundaries automatically for constructing the Korean speech database. We specify and implement the system based on conventional speech segmentation and labeling techniques, and also develop the graphic user interface(GUI) on Hangul $Motif^{TM}$ environment for the users to examine the automatic alignment boundaries and to refine them easily. The developed system is applied to 16kHz sampled speech, and the labeling unit is composed of 46 phoneme-like units(PLUs) and silence. The system uses both of the phonetic and orthographic transcription as input methods of linguistic information. For pattern-matching method, hidden Markov models(HMM) is employed. Each phoneme model is trained using the manually segmented 445 phonetically balanced word (PBW) database. In order to evaluate the performance of the system, we test it using another database consisting of sentence-type speech. According to our experiment, 74.7% of phoneme boundaries are within 20ms of the true boundary and 92.8% are within 40ms.

  • PDF

A Study on the Hangul Recognition Using Hough Transform and Subgraph Pattern (Hough Transform과 부분 그래프 패턴을 이용한 한글 인식에 관한 연구)

  • 구하성;박길철
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.1
    • /
    • pp.185-196
    • /
    • 1999
  • In this dissertation, a new off-line recognition system is proposed using a subgraph pattern, neural network. After thinning is applied to input characters, balance having a noise elimination function on location is performed. Then as the first step for recognition procedure, circular elements are extracted and recognized. From the subblock HT, space feature points such as endpoint, flex point, bridge point are extracted and a subgraph pattern is formed observing the relations among them. A region where vowel can exist is allocated and a candidate point of the vowel is extracted. Then, using the subgraph pattern dictionary, a vowel is recognized. A same method is applied to extract horizontal vowels and the vowel is recognized through a simple structural analysis. For verification of recognition subgraph in this paper, experiments are done with the most frequently used Myngjo font, Gothic font for printed characters and handwritten characters. In case of Gothic font, character recognition rate was 98.9%. For Myngjo font characters, the recognition rate was 98.2%. For handwritten characters, the recognition rate was 92.5%. The total recognition rate was 94.8% with mixed handwriting and printing characters for multi-font recognition.

  • PDF