• Title/Summary/Keyword: Grapheme Segmentation

Search Result 12, Processing Time 0.027 seconds

Effects of the Orthographic Representation on Speech Sound Segmentation in Children Aged 5-6 Years (5~6세 아동의 철자표상이 말소리분절 과제 수행에 미치는 영향)

  • Maeng, Hyeon-Su;Ha, Ji-Wan
    • Journal of Digital Convergence
    • /
    • v.14 no.6
    • /
    • pp.499-511
    • /
    • 2016
  • The aim of this study was to find out effect of the orthographic representation on speech sound segmentation performance. Children's performances of the orthographic representation task and the speech sound segmentation task had positive correlation in words of phoneme-grapheme correspondence and negative correlation in words of phoneme-grapheme non-correspondence. In the case of words of phoneme-grapheme correspondence, there was no difference in performance ability between orthographic representation high level group and low level group, while in the case of words of phoneme-grapheme non-correspondence, the low level group's performance was significantly better than the high level group's. The most frequent errors of both groups were orthographic conversion errors and such errors were significantly more noticeable in the high level group. This study suggests that from the time of learning orthographic knowledge, children utilize orthographic knowledge for the performance of phonological awareness tasks.

A Study on Korean Printed Character Type Classification And Nonlinear Grapheme Segmentation (한글 인쇄체 문자의 형식 분류 및 비선형적 자소 분리에 관한 연구)

  • Park Yong-Min;Kim Do-Hyeon;Cha Eui-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.784-787
    • /
    • 2006
  • In this paper, we propose a method for nonlinear grapheme segmentation in Korean printed character type classification. The characters are subdivided into six types based on character type information. The feature vector is consist of mesh features, vertical projection features and horizontal projection features which are extracted from gray-level images. We classify characters into 6 types using Back propagation. Character segmentation regions are determined based on character type information. Then, an optimal nonlinear grapheme segmentation path is found using multi-stage graph search algorithm. As the result, a proposed methodology is proper to classify character type and to find nonlinear char segmentation paths.

  • PDF

Graphemes Segmentation for Arabic Online Handwriting Modeling

  • Boubaker, Houcine;Tagougui, Najiba;El Abed, Haikal;Kherallah, Monji;Alimi, Adel M.
    • Journal of Information Processing Systems
    • /
    • v.10 no.4
    • /
    • pp.503-522
    • /
    • 2014
  • In the cursive handwriting recognition process, script trajectory segmentation and modeling represent an important task for large or open lexicon context that becomes more complicated in multi-writer applications. In this paper, we will present a developed system of Arabic online handwriting modeling based on graphemes segmentation and the extraction of its geometric features. The main contribution consists of adapting the Fourier descriptors to model the open trajectory of the segmented graphemes. To segment the trajectory of the handwriting, the system proceeds by first detecting its baseline by checking combined geometric and logic conditions. Then, the detected baseline is used as a topologic reference for the extraction of particular points that delimit the graphemes' trajectories. Each segmented grapheme is then represented by a set of relevant geometric features that include the vector of the Fourier descriptors for trajectory shape modeling, normalized metric parameters that model the grapheme dimensions, its position in respect to the baseline, and codes for the description of its associated diacritics.

A Study on Grapheme and Grapheme Recognition Using Connected Components Grapheme for Machine-Printed Korean Character Recognition

  • Lee, Kyong-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.27-36
    • /
    • 2016
  • Recognition of grapheme is a very important process in the recognition within 'Hangul(Korean written language)' letters using phoneme recognition. It is because the success or failure in the recognition of phoneme greatly affects the recognition of letters. For this reason, it is reported that separation of phonemes is the biggest difficulty in the phoneme recognition study. The current study separates and suggests the new phonemes that used the connective elements that are helpful for dividing phonemes, recommends the features for recognition of such suggested phonemes, databases this, and carried out a set of experiments of recognizing phonemes using the suggested features. The current study used 350 letters in the experiment of phoneme separation and recognition. In this particular kind of letters, there were 1,125 phonemes suggested. In the phoneme separation experiment, the phonemes were divided in the rate of 100%, and the phoneme recognition experiment showed the recognition rate of 98% in recognizing only 14 phonemes into different ones.

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

Pronunciation Variation Patterns of Loanwords Produced by Korean and Grapheme-to-Phoneme Conversion Using Syllable-based Segmentation and Phonological Knowledge (한국인 화자의 외래어 발음 변이 양상과 음절 기반 외래어 자소-음소 변환)

  • Ryu, Hyuksu;Na, Minsu;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.139-149
    • /
    • 2015
  • This paper aims to analyze pronunciation variations of loanwords produced by Korean and improve the performance of pronunciation modeling of loanwords in Korean by using syllable-based segmentation and phonological knowledge. The loanword text corpus used for our experiment consists of 14.5k words extracted from the frequently used words in set-top box, music, and point-of-interest (POI) domains. At first, pronunciations of loanwords in Korean are obtained by manual transcriptions, which are used as target pronunciations. The target pronunciations are compared with the standard pronunciation using confusion matrices for analysis of pronunciation variation patterns of loanwords. Based on the confusion matrices, three salient pronunciation variations of loanwords are identified such as tensification of fricative [s] and derounding of rounded vowel [ɥi] and [$w{\varepsilon}$]. In addition, a syllable-based segmentation method considering phonological knowledge is proposed for loanword pronunciation modeling. Performance of the baseline and the proposed method is measured using phone error rate (PER)/word error rate (WER) and F-score at various context spans. Experimental results show that the proposed method outperforms the baseline. We also observe that performance degrades when training and test sets come from different domains, which implies that loanword pronunciations are influenced by data domains. It is noteworthy that pronunciation modeling for loanwords is enhanced by reflecting phonological knowledge. The loanword pronunciation modeling in Korean proposed in this paper can be used for automatic speech recognition of application interface such as navigation systems and set-top boxes and for computer-assisted pronunciation training for Korean learners of English.

The Recognition of Vehicle Plate`s Korean Character Using Grapheme Segmentation (자소 분리 방법을 이용한 차량번호판의 용도구분 문자 인식)

  • 김성우;강동구;박재현;차의영
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.646-648
    • /
    • 2002
  • 본 논문에서는 차량번호판의 용도구분 문자를 자소 단위로 분리하는 효율적인 방법을 제안하고, 신경망을 이용하여 자소를 인식하는 방법을 소개한다. 용도구분 문자(가, 거, 나, 너‥‥)는 실제 번호판의 훼손, 카메라의 성능, 기타 여러 가지 조건에 의해서 번호판 영상에 많은 잡영이 포함된다. 따라서 차량번호판 한글문자를 자소분리하는 것은 어려운 작업이다. 제안하는 이진 영상처리 기법(morphological operation, connected component labeling 등) 으로 분리된 자소가 인식시스템으로의 입력벡터로 입력되었을 때 높은 인식률을 보이는 것을 실험을 통하여 확인하였다

  • PDF

Recognition of Hangeul Character Using Grapheme Segmentation and Pixel Distribution (자소분할과 픽셀분포를 이용한 한글문자인식)

  • Cho, Young-Guk;Lee, Dong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1919_1920
    • /
    • 2009
  • 한글 문자 인식에 관한 연구는 통계적 방법과 구조적 방법, 신경 회로망 등 다양한 방법론이 제시되어 왔다. 그러나 한글은 영문이나 숫자에 비해 방대한 문자수와 복잡한 구조로 인하여 인식에 많은 어려움을 가지고 있다. 따라서 본 논문에서는 한글을 가장 단순한 구조인 자음과 모음으로 분리한 뒤 각 개체의 픽셀 분포를 파악하고, 한글의 구조적 특징을 이용하여 자소의 행과 열에서의 peak값과 픽셀의 분포를 그룹으로 나누어 한글을 인식하는 방법을 제시한다.

  • PDF

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.