• Title/Summary/Keyword: Morphological Errors

Search Result 54, Processing Time 0.024 seconds

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning (기계학습에 기반한 한국어 미등록 형태소 인식 및 품사 태깅)

  • Choi, Maeng-Sik;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.45-50
    • /
    • 2011
  • Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

Analyzing the Types and Causes of Korean-to-English Machine Translation Errors: Focused on Morphological and Syntactical Errors (한-영 기계번역 결과물의 오류 유형 및 원인 분석: 형태적·구문적 오류를 중심으로)

  • Baek, Ji-Yeon;Goo, Hye-Kyoung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.199-204
    • /
    • 2022
  • This study was carried out in an L2 writing class using machine translation. The aim of this study was to explore what types of errors are identified the most frequently in the Korean-to-English machine translation output and what causes those errors. The participants were seven EFL university students who completed three writing tasks throughout the semester. The findings of data analysis indicated that the most common errors were seen in sentence structure and mechanics, and those errors in the translated texts were caused by the errors in the Korean source texts.

Detecting Spelling Errors by Comparison of Words within a Document (문서내 단어간 비교를 통한 철자오류 검출)

  • Kim, Dong-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.83-92
    • /
    • 2011
  • Typographical errors by the author's mistyping occur frequently in a document being prepared with word processors contrary to usual publications. Preparing this online document, the most common orthographical errors are spelling errors resulting from incorrectly typing intent keys to near keys on keyboard. Typical spelling checkers detect and correct these errors by using morphological analyzer. In other words, the morphological analysis module of a speller tries to check well-formedness of input words, and then all words rejected by the analyzer are regarded as misspelled words. However, if morphological analyzer accepts even mistyped words, it treats them as correctly spelled words. In this paper, I propose a simple method capable of detecting and correcting errors that the previous methods can not detect. Proposed method is based on the characteristics that typographical errors are generally not repeated and so tend to have very low frequency. If words generated by operations of deletion, exchange, and transposition for each phoneme of a low frequency word are in the list of high frequency words, some of them are considered as correctly spelled words. Some heuristic rules are also presented to reduce the number of candidates. Proposed method is able to detect not syntactic errors but some semantic errors, and useful to scoring candidates.

A Study of Morphological Errors in Aphasic Language

  • Kim, Heui-Beom
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.227-236
    • /
    • 1997
  • How do aphasics deal with the inflectional marking occurring in agglutinative languages like Korean? Korean speech repetition, comprehension and production were studied in 3 Broca's aphasic speakers of Korean. As experimental materials, 100 easy sentences were chosen in 1st grade Korean elementary school textbooks about reading writing and listening, and two pictures were made from each sentence. This study examines the use of three kinds of inflectional markings--past tense, nominative case, and accusative case. The analysis focuses on whether each inflectional marking was performed well or not in tasks such as repetition, comprehension and production. In addition, morphological errors concerned with each inflectional marking were analyzed in view of markedness. In general, the aphasic subjects showed a clear preservation of the morphological aspects of their native language. So the view of Broca's aphasics as agrammatical could not be strongly supported. It can be suggested that nominative case and accusative case are marked elements in Korean.

  • PDF

Semi-Automatic Construction of Morphological Pattern Dictionary using the Method of Morphological Synthesis (형태소 합성 기법을 이용한 형태소 패턴 사전의 반자동 구축)

  • Park, In-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.11
    • /
    • pp.5278-5283
    • /
    • 2011
  • One approach for very high speed korean morphological analysis is to use pre-built morphological results in dictionary. It pays the high cost to build this morphological pattern dictionary manually, besides the dictionary may contain errors. This paper proposes a method to generate morphological patterns automatically using Korean morphological synthesis. The experiment shows that we automatically generate 86% morphological patterns for analyzing Korean sentences. It takes 52.68 seconds for the morphological system using the patterns to analyze 403MB Korean corpus on 2.8GHz Window system.

Improving the Quality of Filtered Lidar Data by Local Operations

  • Seo, Su-Young
    • Korean Journal of Remote Sensing
    • /
    • v.23 no.3
    • /
    • pp.189-198
    • /
    • 2007
  • Introduction of lidar technology have contributed to a wide range of applications in generating quality surface models. Accordingly, because of the importance of terrain surface models in mapping applications, rigorous studies have been performed to extract ground points from a lidar data point cloud. Although most filters have been shown abilities to extract ground points with their parameters tuned, however, most experiments revealed that there are certain limitations in optimizing filter parameters and the correction of remaining misclassified points is not straightforward. In this study, therefore, a method to improve the quality of filtered lidar data is proposed, which exploits neighboring surface properties arising between immediate neighbors. The method comprises a sequence of procedures which can reduce commission and omission errors. Commission errors occurring in low-rise objects are reduced by utilizing morphological operations. On the other hand, omission errors are reduced by adding missing ground points around step edges. Experimental results show that the qualities of filtered data can be improved considerably by the proposed method.

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Morphological Feature Parameter Extraction from the Chromosome Image Using Reconstruction Algorithm (염색체 영상의 재구성에 의한 형태학적 특징 파라메타 추출)

  • 장용훈;이권순
    • Journal of Biomedical Engineering Research
    • /
    • v.17 no.4
    • /
    • pp.545-552
    • /
    • 1996
  • Researches on chromosome are very significant in cytogenetics since a gene of the chromosome controls revelation of the inheritance plasma The human chromosome analysis is widely used to diagnose genetic disease and various congenital anomalies. Many researches on automated chromosome karyotype analysis has been carried out, some of which produced commercial systems. However, there still remains much room for improving the accuracy of chromosome classification. In this paper, we propose an algorithm for reconstruction of the chromosDme image to improve the chromosome classification accuracy. Morphological feature parameters are extracted from the reconstructed chromosome images. The reconstruction method from chromosome image is the 32 direction line algorithm. We extract three morphological feature parameters, centromeric index(C.I.), relative length ratio(R.L.), and relative area ratio(R.A.), by preprocessing ten human chromosDme images. The experimental results show that proposed algorithm is better than that of other researchers'comparing by feature parameter errors.

  • PDF

Sexual Size Dimorphism and Morphological Sex Determination in the Black-billed Magpie in South Korea (Pica pica sericea)

  • Lee, Sang-Im;Jang, Hyun-Joo;Eo, Soo-Hyung;Choe, Jae-Chun
    • Journal of Ecology and Environment
    • /
    • v.30 no.2
    • /
    • pp.195-199
    • /
    • 2007
  • Statistical tools for determining sex in the sexually monomorphic black-billed magpie based on morphological characters have been developed based on studies of European and North American populations. However, since no morphological method has been developed for black-billed magpies in Korea, it has been difficult to conduct field studies that require information about the sex of individuals. We present two discriminant equations for determining sex of second-year (SY) and after-second-year (ASY) magpies in north- and midwestern part of South Korea. Based on morphological measurements on 105 SY (56 females, 49 males) and 72 ASY (36 females, 36 males) individuals, we found body mass, wing chord, and head length to be the most useful features for morphological sex determination. The accuracy of our method was 86.5% for SYs and 93.1% for ASYs, which is similar to values reported previously from American and European magpies. Since the equations contain morphological traits which are only minimally susceptible to seasonal variation and measurement errors, our discriminant equations should be both useful and robust for sex determination on black-billed magpies in the northern and mid-western regions of South Korea.

Morphological and Genetic Characteristics of Pearl-spot Damselfish Chromis notata (Teleostei: Pomacentridae) in Coastal Waters of East Sea (Sea of Japan) and Jejudo (제주도와 동해 근해에 서식하는 자리돔(Chromis notata)의 형태와 유전특성 비교)

  • Shin, Hye Jeong;Kim, Sun Wook;Choi, Young-Ung
    • Ocean and Polar Research
    • /
    • v.36 no.2
    • /
    • pp.189-197
    • /
    • 2014
  • The pearl-spot damsel, Chromis notata, is one of the important fishery species in Korea. While C. notata has been commonly harvested in southern Korea, the increasing number of C. notata in higher latitudes has crucial ecological, economic and evolutionary implications under conditions where the climate is rapidly changing. Here we examined the morphological and genetic characteristics of C. notata to assess patterns of geographical variations among the groups from three different sites. The groups were clearly distinguishable in the analysis of morphological characteristics. On the other hand, the groups were genetically indistinguishable. All individuals fell within a single clade in the neighbor-joining tree but appeared scattered in the haplotype network. Several haplotypes are shared among the sampling sites (Jejudo-Ulleungdo; Hap 9, Wangdolcho-Ulleungdo; Hap 28, Hap 33, Hap 34). Although control region markers did not elucidate the spatial patterns in genetic characteristics, Wangdolcho and Ulleungdo groups appear to exhibit a more robust gene flow between the two groups than with Jejudo group. Integrative approaches such as those combining morphological and genetic analyses minimize potential errors caused by limited perspectives of each analysis and can provide useful information for discovering functional DNA regions attributable to morphological characteristics expressions.