• Title/Summary/Keyword: word spacing

Search Result 66, Processing Time 0.027 seconds

Segmentation of Words from the Lines of Unconstrained Handwritten Text using Neural Networks (신경회로망을 이용한 제약 없이 쓰여진 필기체 문자열로부터 단어 분리 방법)

  • Kim, Gyeong-Hwan
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.7
    • /
    • pp.27-35
    • /
    • 1999
  • Researches on the recognition of handwritten script have been conducted under the assumption that the isolated recognition units are provided as inputs. However, in practical recognition system designs, providing the isolated recognition unit is an challenge due to various writing syles. This paper proposes an approach for segmenting words from lines of unconstrained handwritten text, without help of recognition. In contrast to the conventional approaches which are based on physical gaps between connected components, clues that reflect the author's writing style, in terms of spacing, are extracted and utilized for the segmentation using a simple neural network. The clues are from character segments and include normalized heights and intervals of the segments. Effectiveness of the proposed approach compared with the conventional connected component based approaches in terms of word segmentation performance was evaluated by experiments.

  • PDF

A Model for Post-processing of Speech Recognition Using Syntactic Unit of Morphemes (구문형태소 단위를 이용한 음성 인식의 후처리 모델)

  • 양승원;황이규
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.3
    • /
    • pp.74-80
    • /
    • 2002
  • There are many researches on post-processing methods for the Korean continuous speech recognition enhancement using natural language processing techniques. It is very difficult to use a formal morphological analyzer for improving the speech recognition because the analysis technique of natural language processing is mainly for formal written languages. In this paper, we propose a speech recognition enhancement model using syntactic unit of morphemes. This approach uses the functional word level longest match which dose not consider spacing words. We describe the post-processing mechanism for the improving speech recognition by using proposed model which uses the relationship of phonological structure information between predicates md auxiliary predicates or bound nouns that are frequently occurred in Korean sentences.

  • PDF

Multi-class Classification System Based on Multi-loss Linear Combination for Word Spacing and Sentence Boundary Detection (띄어쓰기 및 문장 경계 인식을 위한 다중 손실 선형 결합 기반의 다중 클래스 분류 시스템)

  • Kim, GiHwan;Seo, Jisu;Lee, Kyungyeol;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.185-188
    • /
    • 2018
  • 띄어쓰기와 문장 경계 인식은 그 성능에 따라 자연어 분석 단계에서 오류를 크게 전파하기 때문에 굉장히 중요한 문제로 인식되고 있지만 각각 서로 다른 자질을 사용하는 문제 때문에 각각 다른 모델을 사용해 순차적으로 해결하였다. 그러나 띄어쓰기와 문장 경계 인식은 완전히 다른 문제라고는 볼 수 없으며 두 모델의 순차적 수행은 앞선 모델의 오류가 다음 모델에 전파될 뿐만 아니라 시간 복잡도가 높아진다는 문제점이 있다. 본 논문에서는 띄어쓰기와 문장 경계 인식을 하나의 문제로 보고 한 번에 처리하는 다중 클래스 분류 시스템을 통해 시간 복잡도 문제를 해결하고 다중 손실 선형 결합을 사용하여 띄어쓰기와 문장 경계 인식이 서로 다른 자질을 사용하는 문제를 해결했다. 최종 모델은 띄어쓰기와 문장 경계 인식 기본 모델보다 각각 3.98%p, 0.34%p 증가한 성능을 보였다. 시간 복잡도 면에서도 단일 모델의 순차적 수행 시간보다 38.7% 감소한 수행 시간을 보였다.

  • PDF

CRFs versus Bi-LSTM/CRFs: Automatic Word Spacing Perspective (CRFs와 Bi-LSTM/CRFs의 비교 분석: 자동 띄어쓰기 관점에서)

  • Yoon, Ho;Kim, Chang-Hyun;Cheon, Min-Ah;Park, Ho-min;Namgoong, Young;Choi, Minseok;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.189-192
    • /
    • 2018
  • 자동 띄어쓰기란 컴퓨터를 사용하여 띄어쓰기가 수행되어 있지 않은 문장에 대해 띄어쓰기를 수행하는 것이다. 이는 자연언어처리 분야에서 형태소 분석 전에 수행되는 과정으로, 띄어쓰기에 오류가 발생할 경우, 형태소 분석이나 구문 분석 등에 영향을 주어 그 결과의 모호성을 높이기 때문에 매우 중요한 전처리 과정 중 하나이다. 본 논문에서는 기계학습의 방법 중 하나인 CRFs(Conditional Random Fields)를 이용하여 자동 띄어쓰기를 수행하고 심층 학습의 방법 중 하나인 양방향 LSTM/CRFs (Bidirectional Long Short Term Memory/CRFs)를 이용하여 자동 띄어쓰기를 수행한 뒤 각 모델의 성능을 비교하고 분석한다. CRFs 모델이 양방향 LSTM/CRFs모델보다 성능이 약간 더 높은 모습을 보였다. 따라서 소형 기기와 같은 환경에서는 CRF와 같은 모델을 적용하여 모델의 경량화 및 시간복잡도를 개선하는 것이 훨씬 더 효과적인 것으로 생각된다.

  • PDF

A note for improving mathematical terms in Korea (수학 용어의 개선 방향에 대한 소고)

  • Her, Min
    • Communications of Mathematical Education
    • /
    • v.27 no.4
    • /
    • pp.391-406
    • /
    • 2013
  • Most of mathematical terms in Korean are Sino-Korean words. It is necessary to find the efficient ways to teach Sino-Korean mathematical terms to mathematics teachers and students who dot not know Chinese characters well and use only Korean alphabet in mathematics. Especially, we have to avoid the inappropriate Sino-Korean words which can cause misconceptions and can distinguish homophones by Korean alphabet. We may use native Korean terms to do that and the national curriculum can play an important role. In this paper, we investigate the way of improving mathematics terms in Korea with concrete examples.

Bayesian Parameter Estimation Considering User-input for Korean Word Spacing Model (한국어 띄어쓰기 모델에서 사용자 입력을 고려한 베이지언 파라미터 추정)

  • Lee, Jeong-Hoon;Hong, Gum-Won;Lee, Do-Gil;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.5-11
    • /
    • 2008
  • 한국어 띄어쓰기에서 통계적 모델을 사용한 기존의 연구들은 최대우도추정(Maximum Likelihood Estimation)에 기반하고 있다. 그러나 최대우도추정은 자료부족 시 부정확한 결과를 주는 단점이 있다. 본 연구는 이에 대한 대안으로 사용자 입력을 고려하는 베이지언 파라미터 추정(Bayesian parameter estimation)을 제안한다. 기존 연구가 사용자 입력을 교정 대상으로만 간주한 것에 비해, 제안 방법은 사용자 입력을 교정 대상이면서 동시에 학습의 대상으로 해석한다. 제안하는 방법에서 사용자 입력은 학습 말뭉치의 자료부족에서 유발되는 부정확한 파라미터 추정(parameter estimation)을 방지하는 역할을 수행하고, 학습 말뭉치는 사용자 입력의 불확실성을 보완하는 역할을 수행한다. 실험을 통해 문어체 말뭉치, 통신환경 구어체 말뭉치, 웹 게시판 등 다양한 종류의 말뭉치와 다양한 통계적 모델에 대해 제안 방법이 효과적임을 알 수 있다.

  • PDF

Self-Organizing n-gram Model for Automatic Word Spacing (자기 조직화 n-gram모델을 이용한 자동 띄어쓰기)

  • Tae, Yoon-Shik;Park, Seong-Bae;Lee, Sang-Jo;Park, Se-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.125-132
    • /
    • 2006
  • 한국어의 자연어처리 및 정보검색분야에서 자동 띄어쓰기는 매우 중요한 문제이다. 신문기사에서조차 잘못된 띄어쓰기를 발견할 수 있을 정도로 띄어쓰기가 어려운 경우가 많다. 본 논문에서는 자기 조직화 n-gram모델을 이용해 자동 띄어쓰기의 정확도를 높이는 방법을 제안한다. 본 논문에서 제안하는 방법은 문맥의 길이를 바꿀 수 있는 가변길이 n-gram모델을 기본으로 하여 모델이 자동으로 문맥의 길이를 결정하도록 한 것으로, 일반적인 n-gram모델에 비해 더욱 높은 성능을 얻을 수 있다. 자기조직화 n-gram모델은 최적의 문맥의 길이를 찾기 위해 문맥의 길이를 늘였을 때 나타나는 확률분포와 문맥의 길이를 늘이지 않았을 태의 확률분포를 비교하여 그 차이가 크다면 문맥의 길이를 늘이고, 그렇지 않다면 문맥의 길이를 자동으로 줄인다. 즉, 더 많은 정보가 필요한 경우는 데이터의 차원을 높여 정확도를 올리며, 이로 인해 증가된 계산량은 필요 없는 데이터의 양을 줄임으로써 줄일 수 있다. 본 논문에서는 실험을 통해 n-gram모델의 자기 조직화 구조가 기본적인 모델보다 성능이 뛰어나다는 것을 확인하였다.

  • PDF

On the "Virtual and Real" and Blankness in Chinese Landscape Painting

  • Dongqi, Liu
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.174-183
    • /
    • 2022
  • The abstract should summarize the contents of the paper and written below the author information. Use the word "Abstract" as the title, in 12-point Times New Roman, boldface type, italicized, centered relative to the column, initially capitalized, fixed-spacing at 13 pt., 12 pt. spacing before the text and 6 pt. after. The abstract content is to be in 11-point, italicized, single spaced type. Leave one blank line after the abstract, and then begin the keywords. All manuscripts must be in English. When it comes to the issue of "virtual and real" in traditional Chinese painting, the first impression is to describe the problems of painting strokes and ink, layout of pictures, etc., but it runs through the initial conception of the work, creation in the middle and aesthetic appreciation of the work. It exists in the whole process of artistic creation and appreciation. In essence, it is a problem of aesthetic thinking and philosophical thinking. Because the traditional Chinese painting theory is influenced by Taoism, when the concept of "virtual and real" is implemented in the specific picture of Chinese painting, it is contained in the specific shape of "physics", that is, the painting theory research of "blank space" in the picture. Based on the traditional Taoist philosophy of China, this paper takes the "virtual and real" view in Lao Zhuang's thought as the research object, deeply analyzes and compares its relationship with the "virtual and real" in Chinese landscape painting, and finds out their artistic spirit, essential characteristics and how to present them. This paper mainly discusses the internal relationship between Taoist philosophy and "virtual and real" in Chinese landscape painting from the following aspects. The introduction expounds the origin, purpose, significance, innovation and research methods of the topic. This paper analyzes the philosophical thoughts about landscape in the philosophical thoughts represented by Lao Tzu and Zhuangzi. The development of Chinese traditional aesthetics theory is closely related to Taoist philosophy, which has laid the foundation and pointed out the direction for the development of Chinese painting theory since ancient times. It also discusses the influence of the Taoist philosophy of "the combination of the virtual and real" on the emergence and development of the artistic conception of landscape painting. Firstly, through the analysis of the artistic conception of landscape painting and its constituent factors, it is pointed out that the artistic conception is affected by the personality and the painting artistic conception. Secondly, through the Taoist thought of "the combination of the virtual and real" in landscape painting, so as to reflect that it is the source of the artistic conception of Chinese landscape painting. It is the unique spiritual concept of "Yin and Yang" and "virtual and real" that creates the unique "blank space" aesthetic realm of Chinese painting in the composition of the picture. Finally, it focuses on the "nothingness" in Taoist philosophy and the "blank space" in Chinese landscape painting. The connotation of the "blank space" in Chinese painting exceeds its own expressive significance, which makes the picture form the aesthetic principle of emotional blending, virtual and real combination and dynamic and static integration. Through the "blank space", it deepens the artistic characteristics of the picture and sublimates the expression of "form" in Chinese painting.

Quantitative image processing analysis for handwriting legibility evaluation (글씨쓰기 명료도 평가의 정량적 영상처리 분석)

  • Kim, Eun-Bin;Lee, Cho-Hee;Kim, Eun-Young;Lee, OnSeok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.7
    • /
    • pp.158-165
    • /
    • 2019
  • Although evaluation of writing disabilities identification and timely intervention are required, clinicians adopt a manual scoring method and there is a possibility of error due to subjective evaluation. In this study, the size ratio and position of letters are digitized and quantified through image processing of offline handwritten characters. We tried to evaluate objectively and accurately the performance of writing through comparison with existing methods. From November 12th to 16th, 2018, 20 adults without neurological injury were selected. They used a pencil to follow the 10 words, 2 sentence stimuli after keeping the usual habit, and we collected the writing test data. The results showed that the height of the word was 1.2 times larger than the width and it tilted to the lower left. The spacing interval was 9mm on average. In the Paired T test, a high correlation was showed between our system and existing methods in the word and sentence 2. This demonstrated the possibility as a testing tool. This study evaluated objectively and precisely writing performance of offline handwritten characters through image processing and provided preliminary data for performance standards. In the future, it can be suggested as a basic data on writing diagnosis of various ages.

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.