• Title/Summary/Keyword: Morpheme Analysis

Search Result 122, Processing Time 0.02 seconds

The Acquisition of Korean Grammatical Morphemes in Early Childhood (한국아동이 초기에 획득한 문법적 형태소의 종류 및 획득 시기)

  • Yi, Soon-Hyung
    • Korean Journal of Child Studies
    • /
    • v.21 no.4
    • /
    • pp.51-68
    • /
    • 2000
  • To reveal when toddlers and children acquire the grammatical morpheme of Korean language, this study investigated the way they respond to some picture tasks. The object of this research was 174 children ranging from 18 months to 60 months, who were selected from two child-care centers located in Seoul and Gyeongi Province. Following the statistical analysis of the data, this study ascertained that 2-, and 3- year-old children acquire the most part of grammatical morphemes such as nouns, pronouns, verbs, adverbs, adjectives, and interrogation terms. The fact that the process of acquisition was significantly different among six age groups has proved the hypothesis of the gradual acquisition of grammatical morpheme.

  • PDF

Syllable-based POS Tagging without Korean Morphological Analysis (형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅)

  • Shim, Kwang-Seob
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.327-345
    • /
    • 2011
  • In this paper, a new approach to Korean POS (Part-of-Speech) tagging is proposed. In previous works, a Korean POS tagger was regarded as a post-processor of a morphological analyzer, and as such a tagger was used to determine the most likely morpheme/POS sequence from morphological analysis. In the proposed approach, however, the POS tagger is supposed to generate the most likely morpheme and POS pair sequence directly from the given sentences. 398,632 eojeol POS-tagged corpus and 33,467 eojeol test data are used for training and evaluation, respectively. The proposed approach shows 96.31% of POS tagging accuracy.

  • PDF

On the Analysis of Natural Language Processing Morphology for the Specialized Corpus in the Railway Domain

  • Won, Jong Un;Jeon, Hong Kyu;Kim, Min Joong;Kim, Beak Hyun;Kim, Young Min
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.189-197
    • /
    • 2022
  • Today, we are exposed to various text-based media such as newspapers, Internet articles, and SNS, and the amount of text data we encounter has increased exponentially due to the recent availability of Internet access using mobile devices such as smartphones. Collecting useful information from a lot of text information is called text analysis, and in order to extract information, it is performed using technologies such as Natural Language Processing (NLP) for processing natural language with the recent development of artificial intelligence. For this purpose, a morpheme analyzer based on everyday language has been disclosed and is being used. Pre-learning language models, which can acquire natural language knowledge through unsupervised learning based on large numbers of corpus, are a very common factor in natural language processing recently, but conventional morpheme analysts are limited in their use in specialized fields. In this paper, as a preliminary work to develop a natural language analysis language model specialized in the railway field, the procedure for construction a corpus specialized in the railway field is presented.

Design and Implementation of Interactive Search Service based on Deep Learning and Morpheme Analysis in NTIS System (NTIS 시스템에서 딥러닝과 형태소 분석 기반의 대화형 검색 서비스 설계 및 구현)

  • Lee, Jong-Won;Kim, Tae-Hyun;Choi, Kwang-Nam
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.12
    • /
    • pp.9-14
    • /
    • 2020
  • Currently, NTIS (National Technology Information Service) is building an interactive search service based on artificial intelligence technology. In order to understand users' search intentions and provide R&D information, an interactive search service is built based on deep learning models and morpheme analyzers. The deep learning model learns based on the log data loaded when using NTIS and interactive search services and understands the user's search intention. And it provides task information through step-by-step search. Understanding the search intent makes exception handling easier, and step-by-step search makes it easier and faster to obtain the desired information than integrated search. For future research, it is necessary to expand the range of information provided to users.

Korean Head-Tail Tokenization and Part-of-Speech Tagging by using Deep Learning (딥러닝을 이용한 한국어 Head-Tail 토큰화 기법과 품사 태깅)

  • Kim, Jungmin;Kang, Seungshik;Kim, Hyeokman
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.199-208
    • /
    • 2022
  • Korean is an agglutinative language, and one or more morphemes are combined to form a single word. Part-of-speech tagging method separates each morpheme from a word and attaches a part-of-speech tag. In this study, we propose a new Korean part-of-speech tagging method based on the Head-Tail tokenization technique that divides a word into a lexical morpheme part and a grammatical morpheme part without decomposing compound words. In this method, the Head-Tail is divided by the syllable boundary without restoring irregular deformation or abbreviated syllables. Korean part-of-speech tagger was implemented using the Head-Tail tokenization and deep learning technique. In order to solve the problem that a large number of complex tags are generated due to the segmented tags and the tagging accuracy is low, we reduced the number of tags to a complex tag composed of large classification tags, and as a result, we improved the tagging accuracy. The performance of the Head-Tail part-of-speech tagger was experimented by using BERT, syllable bigram, and subword bigram embedding, and both syllable bigram and subword bigram embedding showed improvement in performance compared to general BERT. Part-of-speech tagging was performed by integrating the Head-Tail tokenization model and the simplified part-of-speech tagging model, achieving 98.99% word unit accuracy and 99.08% token unit accuracy. As a result of the experiment, it was found that the performance of part-of-speech tagging improved when the maximum token length was limited to twice the number of words.

Morphology Representation using STT API in Rasbian OS (Rasbian OS에서 STT API를 활용한 형태소 표현에 대한 연구)

  • Woo, Park-jin;Im, Je-Sun;Lee, Sung-jin;Moon, Sang-ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.373-375
    • /
    • 2021
  • In the case of Korean, the possibility of development is lower than that of English if tagging is done through the word tokenization like English. Although the form of tokenizing the corpus by separating it into morpheme units via KoNLPy is represented as a graph database, full separation of voice files and verification of practicality is required when converting the module from graph database to corpus. In this paper, morphology representation using STT API is shown in Raspberry Pi. The voice file converted to Corpus is analyzed to KoNLPy and tagged. The analyzed results are represented by graph databases and can be divided into tokens divided by morpheme, and it is judged that data mining extraction with specific purpose is possible by determining practicality and degree of separation.

  • PDF

Alveolar Fricative Sound Errors by the Type of Morpheme in the Spontaneous Speech of 3- and 4-Year-Old Children (자발화에 나타난 형태소 유형에 따른 3-4세 아동의 치경마찰음 오류)

  • Kim, Soo-Jin;Kim, Jung-Mee;Yoon, Mi-Sun;Chang, Moon-Soo;Cha, Jae-Eun
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.129-136
    • /
    • 2012
  • Korean alveolar fricatives are late-developing speech sounds. Most previous research on phonemes used individual words or pseudo words to produce sounds, but word-level phonological analysis does not always reflect a child's practical articulation ability. Also, there has been limited research on articulation development looking at speech production by grammatical morphemes despite its importance in Korean language. Therefore, this research examines the articulation development and phonological patterns of the /s/ phoneme in terms of morphological types produced in children's spontaneous conversational speech. The subjects were twenty-two typically developing 3- and 4-year-old Koreans. All children showed normal levels in three screening tests: hearing, vocabulary, and articulation. Spontaneous conversational samples were recorded at the children's homes. The results are as follows. The error rates decreased with increasing age in all morphological contexts. Also, error percentages within an age group were significantly lower in lexical morphemes than in grammatical morphemes. The stopping of fricative sounds was the main error pattern in all morphological contexts and reduced as age increased. This research shows that articulation performance can differ significantly by morphological contexts. The present study provides data that can be used to identify the difficult context for articulatory evaluation and therapy of alveolar fricative sounds.

Korean Semantic Role Labeling using Stacked Bidirectional LSTM-CRFs (Stacked Bidirectional LSTM-CRFs를 이용한 한국어 의미역 결정)

  • Bae, Jangseong;Lee, Changki
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.36-43
    • /
    • 2017
  • Syntactic information represents the dependency relation between predicates and arguments, and it is helpful for improving the performance of Semantic Role Labeling systems. However, syntax analysis can cause computational overhead and inherit incorrect syntactic information. To solve this problem, we exclude syntactic information and use only morpheme information to construct Semantic Role Labeling systems. In this study, we propose an end-to-end SRL system that only uses morpheme information with Stacked Bidirectional LSTM-CRFs model by extending the LSTM RNN that is suitable for sequence labeling problem. Our experimental results show that our proposed model has better performance, as compare to other models.

Utilizing Prosodic Information on the Sentence Comprehension in Children with High Functioning Autism

  • Chung, Chan-Hee;Lee, Hee-Ran;Kim, Jin-Dong
    • Biomedical Science Letters
    • /
    • v.23 no.4
    • /
    • pp.362-371
    • /
    • 2017
  • The purpose of this study is to investigate difficulties in using prosodic information to identify the meaning of ambiguous sentences in children with high functioning autism (HFA). Fifteen high functioning autistic children and fifteen children who matched their chronological age (CA) participated in this study. We compared the performance of the two groups by conducting syntactically and affectively ambiguous sentence comprehension (SASC and AASC) tasks. The results of this study show that in both tasks, the difference between the two groups was statistically significant at each condition and the performance of high functioning autistic children was significantly lower. In a correlation analysis of major variables, children who matched CA showed a correlation between prosody-only (PO) and AASC, while children with HFA showed a correlation between PO and MO (morpheme-only). Children with HFA used grammatical morpheme information to understand general sentences. We found that the ability to use prosodic information in children with HFA is significantly lower than that of normally developed children. Considering the relevance of prosody to linguistic, non-linguistic and emotional aspects of communication, improving prosodic perception is thought to be a way to mediate deficits in the comprehension of ambiguous sentences in children with HFA.

Analyzing Morpheme of the Natural Language to Express the Symptoms of Korean Medicine (한의학 증상용어의 형태소 분석을 위한 자연어 표기 분석)

  • Kim, Hye-Eun;Sung, Ho-Kyung;Eom, Dong-Myung;Lee, Choong-Yeol;Lee, Byung-Wook
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.17 no.2
    • /
    • pp.179-187
    • /
    • 2013
  • Objectives : In many cases, patient's symptoms have been recorded on EMR in natural language instead of medical terminologies. It is possible to build a database by analyzing the symptoms of Korean Medicine(KM) that indicates patient's symptoms in natural language. Using the database, when doctors record patient's symptoms on EMR in natural language, conversely it'll be also possible to extract the symptoms of KM from those natural language. The database will enhance the value of EMR as a medical data. Methods : In this study, we aimed to make data structure of the terminologies that represent the symptoms of KM. The data structure is combinations of smallest unit in natural language. We made the database by analyzing morpheme of the natural language to express the symptoms of KM. Results & Conclusions : By classifying the natural language in 15 features, we made the structure of concept and the data available for morphological analysis.