• Title/Summary/Keyword: Morphological analyzer

Search Result 145, Processing Time 0.029 seconds

A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora

  • Lee, Ki-Yong;Markus Schulze
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.105-128
    • /
    • 2002
  • Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.

  • PDF

Information Retrieval Systems: Between Morphological Analyzers and Systemming Algorithms

  • Mohamed, Afaf Abdel Rhman;Ouni, Chafika;Eljack, Sarah Mustafa;Alfayez, Fayez
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.375-381
    • /
    • 2022
  • The main objective of an Information Retrieval System (IRS) is to obtain suitable information within a reasonable time to satisfy a user need. To achieve this purpose, an IRS should have a good indexing system that is based on natural language processing.In this context, we focus on the available Arabic language processing techniques for an IRS with the goal of contributing to an improvement in the performance. Our contribution consists of integrating morphological analysis into an IRS in order to compare the impact of morphological analysis with that of stemming algorithms.

Implementation of morphologica analyzer and spelling corrector for charcter recognition post-processing (문자 인식 후처리를 위한 형태소 분석기와 문자 교정기의 구현)

  • 이영화;김규성;김영훈;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.82-92
    • /
    • 1997
  • In this paper, we propose post-rpocessing method that corrects a misrecognized character by generated a characater recognizer using morphological analyzer and spelling corrector. The proposed post-processing consists of sthree phases : First, our method pass through morhological analyzer which only outputted necessary information for spelling correcting, doesn't analyze a bundle of phrases, and detects the location of misrecognized character. Second, tagging the generated candidate character using the information of character substitution table and grapheme substitution/separating table. Then we retry analysis after the misrecognition character has been substituted. Finally we select table, we investigate misrecognized charcters in CORPUS. Reliability analysis used to frequency of randomly selected about 100,000 words in CORPUS. A korean character recognizer demonstrates 93% correction rate without a post-processing. The entire recognition rate of our system with a post-processing exceeds 97% correction rate.

  • PDF

Development of Korean Sign Language Generation System using TV Caption Signal (TV 자막 신호를 이용한 한글 수화 발생 시스템의 개발)

  • Kim, Dae-Jin;Kim, Jung-Bae;Jang, Won;Bien, Zeung-Nam
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.5
    • /
    • pp.32-44
    • /
    • 2002
  • In this paper, we propose TV caption-based KSL(Korean Sign Language) generation system. Through TV caption decoder, this caption signal is transmitted to PC. Next, caption signal is segmented into meaning units by morphological analyzer in considering specific characteristics of Korean sign language. Finally, 3D KSL generation system represents the transformed morphological information by 3D visual graphics. Specifically, we propose a morphological analyzer with many pre-processing techniques for real-time capability. Our developed system is applied to real TV caption program. Through usage of the deaf, we conclude that our system is sufficiently usable compared to conventional TV caption program.

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning (기계학습에 기반한 한국어 미등록 형태소 인식 및 품사 태깅)

  • Choi, Maeng-Sik;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.45-50
    • /
    • 2011
  • Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

High Speed Korean Morphological Analysis based on Adjacency Condition Check (인접 조건 검사에 의한 초고속 한국어 형태소 분석)

  • 심광섭;양재형
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.1
    • /
    • pp.89-99
    • /
    • 2004
  • This paper proposes a morphological analysis method that enables morphological analysis by checking conditions between two adjacent morphemes. These conditions are fed from a dictionary. This method eliminates a code conversion module and the application of transformational rules for candidate generation. The method claims that very high speed morphological analysis is attainable through simple bit operations for adjacency condition check. MACH, an implementation of the proposed method, is a supersonic Korean morphological analyzer which is able to analyze a document of 1 GB in 5 minutes on a PC with 1.13 GHz Pentium III CPU. The analysis accuracy of MACH is 99.2 %.

An Efficient Method for Korean Noun Extraction Using Noun Patterns (명사 출현 특성을 이용한 효율적인 한국어 명사 추출 방법)

  • 이도길;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.173-183
    • /
    • 2003
  • Morphological analysis is the most widely used method for extracting nouns from Korean texts. For every Eojeol, in order to extract nouns from it, a morphological analyzer performs frequent dictionary lookup and applies many morphonological rules, therefore it requires many operations. Moreover, a morphological analyzer generates all the possible morphological interpretations (sequences of morphemes) of a given Eojeol, which may by unnecessary from the noun extraction`s point of view. To reduce unnecessary computation of morphological analysis from the noun extraction`s point of view, this paper proposes a method for Korean noun extraction considering noun occurrence characteristics. Noun patterns denote conditions on which nouns are included in an Eojeol or not, which are positive cues or negative cues, respectively. When using the exclusive information as the negative cues, it is possible to reduce the search space of morphological analysis by ignoring Eojeols not including nouns. Post-noun syllable sequences(PNSS) as the positive cues can simply extract nouns by checking the part of the Eojeol preceding the PNSS and can guess unknown nouns. In addition, morphonological information is used instead of many morphonological rules in order to recover the lexical form from its altered surface form. Experimental results show that the proposed method can speed up without losing accuracy compared with other systems based on morphological analysis.

Error-driven Noun-Connection Rule Extraction for Morphological Analysis (오류에 기반한 복합명사 좌우접속규칙 사전 구축)

  • Lee, Kong Joo;Lee, Songwook
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.36 no.8
    • /
    • pp.1123-1128
    • /
    • 2012
  • The goal of this research is to develop an error-driven noun-connection rules which is used for breaking complicate nouns in Korean morphology analysis module. We collected complicate nouns from Web sites, and analyzed them by CnuMa. Whenever we find errors from outputs of the analyzer, we write noun-connection rules to correct the errors. The noun-connection rules are devised by considering left/right contexts in compound nouns. The error-driven noun-connection rules are helpful in improving precision and recall of a Korean morphology analyzer, CnuMa by 2.8% and 10.8%, respectively.

Morphological control and electrostatic deposition of silver nanoparticles produced by condensation-evaporation method (증발-응축법에 의해 발생된 은(silver) 나노입자의 구조제어 및 전기적 부착 특성 연구)

  • Kim, Whidong;Ahn, Ji Young;Kim, Soo Hyung
    • Particle and aerosol research
    • /
    • v.5 no.2
    • /
    • pp.83-90
    • /
    • 2009
  • This paper describes a condensation-evaporation method (CEM) to produce size-controlled spherical silver nanoparticles by perturbing coagulation and coalescence processes in the gas phase. Polydisperse silver nanoparticles generated by the CEM were first introduced into a differential mobility analyzer (DMA) to select a group of silver nanoparticles with same electrical mobility, which also enables to make a group of nanoparticles with elongated structures and same projected area. These silver nanoparticles selected by the DMA were then in-situ sintered at ${\sim}600^{\circ}C$, and then they were observed to turn into spherical shaped nanoparticles by the rapid coalescence process. With the assistance of modified converging-typed quartz reactor, we can also produce the 10 times higher number concentration of silver nanoparticles compared with a general quartz reactor with uniform diameter. Finally, the spherical silver nanoparticles with 30 nm were electrostatically deposited on the surface of silicon substrate with the coverage rate of ~4%/hr. This useful preparation method of size-controlled monodisperse silver nanoparticles developed in this work can be applied to the various studies for characterizing the physical, chemical, optical, and biological properties of nanoparticles as a function of their size.

  • PDF

Characteristic Change of Fiber Depending on the Refining Conditions of Reconstituted Tobacco Process (판상엽 고해조건에 따른 섬유특성 변화 평가)

  • Han young-Rim;Sung Yong-Joo;Kim Sam-Kon;Kim Kun-Soo;Han In-Ho
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.27 no.2
    • /
    • pp.195-200
    • /
    • 2005
  • The goal of refining is to treat fibers so they meet the requirements of the papermaking process. The refining process in papermaking has great influence on the quality of the final product by changing the fiber properties, such as fiber length, shape, fine contents and so on. In this study, the effect on the morphological change of fibers by the refining conditions were investigated using the fiber morphology analyzer. Fiber morphology analyzer used to determine which pulps are suitable for producing particular products. Furthermore it is widely used in paper mills to monitor paper quality. The morphological change of fibers according to refining conditions were evaluated out by measuring fiber, shive and fine. In the fiber morphology, the domestic reconstituted tobacco fiber has the bigger average fiber length value than that of the foreign reconstituted tobacco.