• Title/Summary/Keyword: Morphological analyzer

Search Result 145, Processing Time 0.028 seconds

Practical Development and Application of a Korean Morphological Analyzer for Automatic Indexing (자동 색인을 위한 한국어 형태소 분석기의 실제적인 구현 및 적용)

  • Choi, Sung-Pil;Seo, Jerry;Chae, Young-Suk
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.689-700
    • /
    • 2002
  • In this paper, we developed Korean Morphological Analyzer for an automatic indexing that is essential for Information Retrieval. Since it is important to index large-scaled document set efficiently, we concentrated on maximizing the speed of word analysis, modularization and structuralization of the system without new concepts or ideas. In this respect, our system is characterized in terms of software engineering aspect to be used in real world rather than theoretical issues. First, a dictionary of words was structured. Then modules that analyze substantive words and inflected words were introduced. Furthermore numeral analyzer was developed. And we introduced an unknown word analyzer using the patterns of morpheme. This whole system was integrated into K-2000, an information retrieval system.

A Design of Japanese Analyzer for Japanese to Korean Translation System (일반 번역시스탬을 위한 일본어 해석기 설계)

  • 강석훈;최병욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.1
    • /
    • pp.136-146
    • /
    • 1995
  • In this paper, a Japanese morphological analyzer for Japanese to Korean Machine Translation System is designed. The analyzer reconstructs the Japanese input sentence into word phrases that include grammatical and dictionary informations. Thus we propose the algorithm to separate morphemes and then connect them by reference to a corresponding Korean word phrases. And we define the connector to control Japanese word phrases It is used in controlling the start and the end point of the word phrase in the Japanese sentence which is without a space. The proposed analyzer uses the analysis dictionary to perform more efficient analysis than the existing analyzer. And we can decrease the number of its dictionary searches. Since the analyzer, proposed in this paper, for Japanese to Korean Machine Translation System processes each word phrase in consideration of the corresponding Korean word phrase, it can generate more accurate Korean expressions than the existing one which places great importance on the generation of the entire sentence structure.

  • PDF

Analysis of Reconstituted Tobacco Products by Characterizing Morphological Properties of Major Structure Materials (국내외산 판상엽 구성물질의 형태적 특성 비교)

  • Sung Yong-Joo;Han Young-Lim;Kim Sam-Gon;Kim Geun-Su;Joo Jeon-Hyun;Song Tae-Won
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.27 no.2
    • /
    • pp.189-194
    • /
    • 2005
  • The morphological properties of various structure materials of domestic and foreign reconstituted tobacco products(RTP) were investigated by using the Bauer-McNett classifier and the image analyzer. The results of the fiber classification showed the fraction of the bigger size structure materials was larger in a domestic RTP than that in two foreign RTPs. In case of fine fraction, the domestic RTP had bigger fine fraction than two foreign RTPs. Images of each structure materials showed the scrap in the foreign RTPs kept the original shape which were rare in the domestic RTP fractions. Those results deduced that the raw materials in a foreign RTP process might be treated separately depending on the mechanical and morphological properties, which could reduce the amount of fine generation and increase the efficiency in raw material treatment.

A Study on the Morphological Analysis of Sperm (정자의 형태학적 특성 분석에 관한 연구)

  • Paick, Jae-Seung;Jeon, Seong-Soo;Kim, Soo-Woong;Yi, Won-Jin;Park, Kwang-Suk
    • Clinical and Experimental Reproductive Medicine
    • /
    • v.24 no.2
    • /
    • pp.153-165
    • /
    • 1997
  • In male reproducible health, fertility and IVF (in-vitro fertilization), semen analysis has been most important. Semen analysis can be divided into concentration, motional and morphological analysis of sperm. The existing method which was developed earlier to analyze semen concentrated on the sperm motility analysis. To provide more useful and precise solutions for clinical problems such as infertility, semen analysis must include sperm morphological analysis. But the traditional tools for semen analysis are subjective, imprecise, inaccurate, difficult to standardize, and difficult to reproduce. Therefore, with the help of development of microcomputers and image processing techniques, we developed a new sperm morphology analyzer to overcome these problems. In this study the agreement on percent normal morphology was studied between different observers and a computerized sperm morphology analyzer on a slide-by-slide basis using strict criteria. Slides from 30 different patients from the SNUH andrology laboratory were selected randomly. Microscopic fields and sperm cells were chosen randomly and percent normal morphology was recorded. The ability of sperm morphology analyzer to repeat the same reading for normal and abnormal cells was studied. The results showed that there was no significant bias between two experienced observers. The limits of agreement were 4.1%${\sim}$-3.8%. The Pearson correlation coefficient between readers was 0.79. Between the manual and sperm morphology analyzer, the same findings were reported. In this experiments the slides were stained by two different methods, PAP and Diff-Quik staining methods. The limits of agreement were 7.2%${\sim}$-5.7% and 6.0%${\sim}$-6.3%, respectively. The Pearson correlation coefficients ware 0.76 and 0.91, respectively. The limits of agreement was tighter below 20% normal forms. In the experiments of repeatability, 52 cells stained by PAP and Diff-Quik staining methods were analyzed three times in succession. Estimating pairwise agreement, the kappa statistic for the pairs were 0.76, 0.81, 0.86, and 0.75, 0.88, 0.88 respectively. In this study it was shown that there was good agreement between manual and computerized assessment of normal and abnormal cells. The repeatability and agreement per slide of computerized sperm morphology analyzer was excellent. The computer's ability to classify normal morphology per slide is promising. Based on results obtained, this system can be of clinical value both in andrology laboratories and IVF units.

  • PDF

moHANA: Morphological Hangul Analyzer using Multi-Dimensional Analysis Dictionary (moHANA: 다차원 해석 사전을 기반으로 한 한국어 형태소 분석기)

  • Seo, SeungHyeon;Kang, In-Ho;Kim, JaeDong
    • Annual Conference on Human and Language Technology
    • /
    • 2007.10a
    • /
    • pp.99-106
    • /
    • 2007
  • 본 연구는 국어의 모든 언어적 특성을 기술하고 이를 실제 형태소 분석에 적용할 수 있도록 다차원 해석 사전을 이용하는 형태소 분석 시스템인 moHANA(Morphological Hangul Analyzer)에 관한 연구이다. moHANA의 해석 사전은 태그정보 사전, 어휘 사전 그리고 문법 사전으로 구성된다. 태그정보 사전은 기존 형태소 해석기의 일차원적인 품사 정보와 달리 어류 태그정보, 형태적 정보, 통사적 정보, 의미적 정보 및 화용 정보의 5 차원 벡터 정보로 작성된다. 어휘 사전은 어휘와 그 어휘가 가질 수 있는 태그정보를 우선 순위에 기반하여 순서열로 가지며, 문법 사전은 특수 문법 연산자를 이용하여 태그정보 사전에 정의된 각각의 태그가 연결 가능한지 여부를 규정하는 문법이 구축되어 있다. 형태소가 가지는 태그정보를 다차원으로 정의하고 이에 따른 문법 규칙의 표현을 통해 보다 자세한 형태소 분석 및 새로운 형태소 태그의 삽입과 삭제의 용이함을 얻을 수 있다.

  • PDF

Analyzer to Identify Phrases and the Functional Roles in Sentences: Its Architectural Aspects

  • Alam, Yukiko Sasaki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.67-75
    • /
    • 2007
  • This paper presents the architectural aspects of the phrase analyzer that attempts to recognize phrases and identify the functional roles in the sentences in formal Japanese documents. Since the object of interest is a phrase, the current system, designed in an object-oriented architecture, contains the Phrase class, and makes use of the linguistic generalization about languages with Case markers that a phrase, whether a noun phrase, a verb phrase, a postposition (or preposition) phrase or a clause phrase, can be separated into the content and the function components. Without a dictionary, and drawing on the orthographic information on the words to parse, it also contains a class that identifies the types of characters, a class representing grammar, and a class playing the role of a controller. The system has a simple and intuitive structure, externally and internally, and therefore is easy to modify and extend.

  • PDF

An Experimental Approach of Keyword Extraction in Korean-Chinese Text (국한문 혼용 텍스트 색인어 추출기법 연구 『시사총보』를 중심으로)

  • Jeong, Yoo Kyung;Ban, Jae-yu
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.7-19
    • /
    • 2019
  • The aim of this study is to develop a technique for keyword extraction in Korean-Chinese text in the modern period. We considered a Korean morphological analyzer and a particle in classical Chinese as a possible method for this study. We applied our method to the journal "Sisachongbo," employing proper-noun dictionaries and a list of stop words to extract index terms. The results show that our system achieved better performance than a Chinese morphological analyzer in terms of recall and precision. This study is the first research to develop an automatic indexing system in the traditional Korean-Chinese mixed text.

Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer (말뭉치와 형태소 분석기를 활용한 한국어 자동 띄어쓰기)

  • Shim, Kwangseob
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.68-75
    • /
    • 2015
  • This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.

Keyword Retrieval-Based Korean Text Command System Using Morphological Analyzer (형태소 분석기를 이용한 키워드 검색 기반 한국어 텍스트 명령 시스템)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.159-165
    • /
    • 2019
  • Based on deep learning technology, speech recognition method has began to be applied to commercial products, but it is still difficult to be used in the area of VR contents, since there is no easy and efficient way to process the recognized text after the speech recognition module. In this paper, we propose a Korean Language Command System, which can efficiently recognize and respond to Korean speech commands. The system consists of two components. One is a morphological analyzer to analyze sentence morphemes and the other is a retrieval based model which is usually used to develop a chatbot system. Experimental results shows that the proposed system requires only 16% commands to achieve the same level of performance when compared with the conventional string comparison method. Furthermore, when working with Google Cloud Speech module, it revealed 60.1% of success rate. Experimental results show that the proposed system is more efficient than the conventional string comparison method.