• Title/Summary/Keyword: words spacing

Search Result 48, Processing Time 0.024 seconds

Word Spacing Consistency Check using Syllable and Morpheme Information (음절 및 형태소 정보를 이용한 띄어쓰기 일관성 검사)

  • Lee, Jae-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.10-19
    • /
    • 2010
  • Korean word spacing rules have exceptional cases which permit both spacing and no-spacing between words. The exceptional cases, however, do not mean that inconsistent spacing between words or word-phrases is legitimate in a document proof reading. This paper proposes a word spacing consistency check method using syllable and morpheme information, and evaluated it through experiment.

Automatic Correction of Word-spacing Errors using by Syllable Bigram (음절 bigram를 이용한 띄어쓰기 오류의 자동 교정)

  • Kang, Seung-Shik
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.83-90
    • /
    • 2001
  • We proposed a probabilistic approach of using syllable bigrams to the word-spacing problem. Syllable bigrams are extracted and the frequencies are calculated for the large corpus of 12 million words. Based on the syllable bigrams, we performed three experiments: (1) automatic word-spacing, (2) detection and correction of word-spacing errors for spelling checker, and (3) automatic insertion of a space at the end of line in the character recognition system. Experimental results show that the accuracy ratios are 97.7 percent, 82.1 percent, and 90.5%, respectively.

  • PDF

Recognizing Unknown Words and Correcting Spelling errors as Preprocessing for Korean Information Processing System (한국어 정보처리 시스템의 전처리를 위한 미등록어 추정 및 철자 오류의 자동 교정)

  • Park, Bong-Rae;Rim, Hae-Chang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2591-2599
    • /
    • 1998
  • In this paper, we proose a method of recognizing unknown words and correcting spelling errors(including spacing erors) to increase the performance of Korean information processing systems. Unknown words are recognized through comparative analysis of two or more morphologically similar eojeols(spacing units in Korean) including the same unknown word candidates. And spacing errors and spelling errors are corrected by using lexicatlized rules shich are automatically extracted from very large raw corpus. The extractionof the lexicalized rules is based on morphological and contextual similarities between error eojeols and their corection eojeols which are confirmed to be used in the corpus. The experimental result shows that our system can recognize unknown words in an accuracy of 98.9%, and can correct spacing errors and spelling errors in accuracies of 98.1% and 97.1%, respectively.

  • PDF

Pullout Resistance by Horizontal Spacing of Geosynthetic Strip (띠형 섬유보강재의 설치간격에 따른 인발저항 특성에 관한 연구)

  • Han, Jung-Geun;Yoon, Won-Il;Hong, Ki-Kwon;Lee, Kwang-Wu;Kim, Ju-Hyong;Cho, Sam-Deok
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2010.09a
    • /
    • pp.315-324
    • /
    • 2010
  • In this study, the pullout tests are conducted to evaluate the pullout resistance of the geosynthetic strip with or without bearing resistance zone. The test results are indicated that the pullout resistance of the geosynthetic strip without bearing resistance zone is not affected by horizontal spacing. However, the horizontal spacing of reinforcement with bearing resistance zone affects the bearing resistance. In other words, it is indicated that the bearing resistance at spacing of 210mm is larger than that at spacing of 260mm. This means that the pullout strength at spacing of 210mm is larger than that at spacing of 260mm. Therefore.

  • PDF

Automatic Word Spacing based on Conditional Random Fields (CRF를 이용한 한국어 자동 띄어쓰기)

  • Shim, Kwang-Seob
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.2
    • /
    • pp.217-233
    • /
    • 2011
  • In this paper, an automatic word spacing system is proposed, which assumes sentences with no spaces between the words and segments them into proper words. Segmentation is regarded as a labeling problem in that segmentation can be done by attaching appropriate labels to each syllables of the given sentences. The system is based on Conditional Random Fields, which were reported to show excellent performance in labeling problems. The system is trained with a corpus of 1.12 million syllables, and evaluated with 2,114 sentences, 93 thousand syllables. The best results obtained are 98.84% of syllable-based accuracy and 95.99% of word-based accuracy.

  • PDF

A Stochastic Word-Spacing System Based on Word Category-Pattern (어절 내의 형태소 범주 패턴에 기반한 통계적 자동 띄어쓰기 시스템)

  • Kang, Mi-Young;Jung, Sung-Won;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.11
    • /
    • pp.965-978
    • /
    • 2006
  • This paper implements an automatic Korean word-spacing system based on word-recognition using morpheme unigrams and the pattern that the categories of those morpheme unigrams share within a candidate word. Although previous work on Korean word-spacing models has produced the advantages of easy construction and time efficiency, there still remain problems, such as data sparseness and critical memory size, which arise from the morpho-typological characteristics of Korean. In order to cope with both problems, our implementation uses the stochastic information of morpheme unigrams, and their category patterns, instead of word unigrams. A word's probability in a sentence is obtained based on morpheme probability and the weight for the morpheme's category within the category pattern of the candidate word. The category weights are trained so as to minimize the error means between the observed probabilities of words and those estimated by words' individual-morphemes' probabilities weighted according to their categories' powers in a given word's category pattern.

Two Statistical Models for Automatic Word Spacing of Korean Sentences (한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델)

  • 이도길;이상주;임희석;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.358-371
    • /
    • 2003
  • Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns.

An Analysis of Korean Word Spacing Errors Made by Chinese Learners (중국인 한국어 학습자의 글쓰기에 나타난 띄어쓰기 오류 양상 및 지도 방향)

  • Wang, Yuan
    • Korean Educational Research Journal
    • /
    • v.40 no.1
    • /
    • pp.59-79
    • /
    • 2019
  • The purpose of this study is to analyze, through questionnaires and interviews, spacing errors in Chinese students' Korean writing and to propose changes for the teaching methods used for Chinese learners by analyzing the causes of errors. By analyzing the learners' writing samples, a total of 148 space errors were found. The rates of errors (77.6%) that were made by combining separate words is much higher than the errors (22.4%) that were made by placing a space within a compound word. Among the error types, "noun + noun," "observer (type) + dependent noun," and postpositional particle errors occur most frequently. In this paper, we propose the direction of spacing starting from the deductive side and the inductive side for nouns and investigations.

  • PDF

The Effect of Orthography on Electronic Character Reading and Comprehending Ability in Japanese Education using ICT (ICT를 활용한 일본어 교육에서 문장 표기 형식이 영상문자 낭독 및 내용 파악에 미치는 효과)

  • Kang, Shin-Cheol;Kim, Min-Ki
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.6
    • /
    • pp.85-93
    • /
    • 2004
  • We investigated the proper display environment for japanese electronic character reading lessons through the experiment with a projection TV and a computer. For the purpose of finding out the effect of prior learning activities at the context of authentic Japanese text orthography, which includes dual notation, words spacing, etc., we also made an experiment on comprehending the web documents which are extracted from japanese web sites. From the experimental results, we acquired a conclusion that two approaches are needed to enhance the ability of comprehending Japanese web documents which is newly added to the 7th curriculum revision. For short-term approach, we need to utilize Japanese web documents as learning materials. For long-term approach, we have to reconsider whether the orthography of the current Japanese textbooks is suitable or not.

  • PDF

Noun and affix extraction using conjunctive information (결합정보를 이용한 명사 및 접사 추출)

  • 서창덕;박인칠
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.71-81
    • /
    • 1997
  • This paper proposes noun and affix extraction methods using conjunctive information for making an automatic indexing system thorugh morphological analysis and syntactic analysis. The korean language has a peculiar spacing words rule, which is different from other languages, and the conjunctive information, which is extracted from the rule, can reduce the number of multiple parts of speech at a minimum cost. The proposed algorithms also solve the problem that one word is seperated by newline charcter. We show efficiency of the proposed algorithms through the process of morhologica analyzing.

  • PDF