• Title/Summary/Keyword: English morpheme

Search Result 22, Processing Time 0.02 seconds

A study on the correlation between the introduction order of English morphemes in the English textbook for the 7th graders and the natural order hypothesis (중학교 1학년 영어 교과서의 영어 형태소 도입 순위와 자연적 순서 가설과의 상관관계 연구)

  • Sohng, Hae-Sung
    • English Language & Literature Teaching
    • /
    • v.9 no.1
    • /
    • pp.131-152
    • /
    • 2003
  • The purpose of this study is to investigate the correlation between the introduction order of 9 English morphemes in the English textbook used in the middle school and the learning order of the morphemes by the 7th graders learning English as a foreign language. The subjects are 139 students in two middle schools, who learn English with different textbooks. The introduction order of each morpheme in two textbooks was examined according to its quantity and frequency. Data on the real learning order were collected through the written SLOPE test, and each morpheme was ranked by its group score. The introduction order of each morpheme in the textbook and the real learning order were analyzed by Spearman rank order correlation. It was shown that the correlation between the two was very low. This means that those textbooks do not take the learning order of English morphemes into account. Also it was shown that in the earlier stage of learning English the introduction order of each morpheme in the textbook had much influence on its learning order, but in the later stage such influence reduced gradually. This means that the learning order of English morphemes approaches the natural order as time passes by.

  • PDF

On Subjunctives in Korean: Exploiting a Bilingual Corpus

  • Song, Sanghoun
    • Language and Information
    • /
    • v.18 no.1
    • /
    • pp.1-32
    • /
    • 2014
  • This paper provides a corpus study on subjunctives in Korean in a way of comparative semantics. The whole arguments of this paper are bolstered by distributional evidence taken from naturally occurring bitexts (i.e. a bilingual corpus), in which one sentence in a language is aligned with one translation in the other language. Since previous studies regard past tense morphology as the main component to express irrealis and uncertainty, this paper accordingly checks out whether the past tense morpheme (e/a)ss in Korean is also responsible for conveying the meaning of subjunctives. My finding is that the past tense morpheme (e/a)ss is a sufficient condition for forming subjunctives in Korean. The current corpus study verifies that the past tense morpheme is not obligatorily used in present conditional counterfactuals in Korean, unlike English. Yet, if (e/a)ss is used and the antecedent denotes a present situation, the conditional sentence can only be interpreted as conveying counterfactuality. On the other hand, wish constructions in Korean, irrespective of the semantic tense, often contain the past tense morpheme. Hence, this work substantiates Iatridou (2000)'s theory of 'fake past tense' is applicable to Korean subjunctives. The present corpus study, additionally, reveals that a conditional marker telamyen is a component of expressing past counterfactuals in Korean.

  • PDF

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Temporal Interpretation Rules (시제 해석 규칙)

  • Chung, So-Woo
    • Language and Information
    • /
    • v.3 no.1
    • /
    • pp.1-20
    • /
    • 1999
  • The purpose of this paper is to expand Stowell (1993), Stowell (1995), Stowell (1996)'s syntactic analysis of tense in English. Stowell treats Tense as a dyadic predicate of temporal ordering which takes those two time-denoting phrases as its arguments. He further argues that those two morphemes 'resent' and 'past' are polarity-sensitive elements encoding an LF-scope relation with respect to true PAST tense. This paper proposes that English future 'will' should be treated as a true tense and that its future morpheme is an anti-PAST polarity item. It also provides a syntactic interpretation of a peculiar morphological aspect of English that it has no future form of the verb. To this end, Stowell's analysis is incorporated into the Minimalist program of Chomsky(1995). It is proposed that, unlike in other languages like French and Spanish, FUTURE in English is of an affix. This provides an intuitively correct description of why English verbs do not have a future form like other languages. The last but not least point which this paper will discuss is that Ogihara (1995a)'s claim that the referential theory of tensed sentences is inadequate is untenable.

  • PDF

Morphology Representation using STT API in Rasbian OS (Rasbian OS에서 STT API를 활용한 형태소 표현에 대한 연구)

  • Woo, Park-jin;Im, Je-Sun;Lee, Sung-jin;Moon, Sang-ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.373-375
    • /
    • 2021
  • In the case of Korean, the possibility of development is lower than that of English if tagging is done through the word tokenization like English. Although the form of tokenizing the corpus by separating it into morpheme units via KoNLPy is represented as a graph database, full separation of voice files and verification of practicality is required when converting the module from graph database to corpus. In this paper, morphology representation using STT API is shown in Raspberry Pi. The voice file converted to Corpus is analyzed to KoNLPy and tagged. The analyzed results are represented by graph databases and can be divided into tokens divided by morpheme, and it is judged that data mining extraction with specific purpose is possible by determining practicality and degree of separation.

  • PDF

A Study on the Development of English Inflectional Morphemes Based on the CHILDES Corpus (CHILDES 코퍼스를 기반으로 한 아동의 영어 굴절형태소 발달 연구)

  • Min, Myung Sook;Jun, Jongsup;Lee, Sun-Young
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.3
    • /
    • pp.203-235
    • /
    • 2013
  • The goal of this paper is to test the findings about English-speaking children's acquisition of inflectional morphemes in the literature using a large-scale database. For this, we obtained a 4.7-million-word corpus from the CHILDES (Child Language Data Exchange System) database, and analyzed 1,630 British and American children's uses of English derivational morphemes up to age 7. We analyzed the type and token frequencies, type per token ratio (TTR), and the lexical diversity (D) for such inflectional morphemes as the present progressive -ing, the past tense -(e)d, the comparative and superlative -er/est with reference to children's nationality and age groups. To sum up our findings, the correlations between the D value and children's age varied from morpheme to morpheme; e.g. we found no correlation for -ing, a marginal correlation for -ed, and a strong correlation for -er/-est. Our findings are consistent with Brown's (1973) classical observation that children learn progressive forms earlier than the past tense marker. In addition, overgeneralization errors were frequently found for -ed, but rarely for -ing, showing a U-shaped developmental pattern at ages 2-3. Finally, American children showed higher D scores than British children, which showed that American children used inflectional morphemes for more word types compared with British children. The present study has its significance in testing the earlier findings in the literature by setting up well-defined methodology for analyzing the entire CHILDES database.

  • PDF

The Types of Korean As-Parenthetical Constructions

  • Kim, Mija
    • Language and Information
    • /
    • v.19 no.1
    • /
    • pp.37-57
    • /
    • 2015
  • This paper is primarily intended to provide a new insight on which the structural properties of As-Parenthetical constructions shown by Potts (2002) might be regarded as cross-linguistically common one. As a first attempt, it introduces the characteristics of Korean As-Parenthetical by carefully investigating them through the data, focusing on the similarities or differences between two languages with a constructional theoretical perspective. The paper here provides three properties of Korean as-clauses in the morphological and syntactic aspects. First, the morpheme 'as' in English as-clause would be realized as three different morphemes as a bound one. Korean as-clauses can be introduced by three different morphemes, '-tusi, -chelem, -taylo' and unlike that in English as-clauses, they behave as bound morphemes which do not stand alone. Even though they are attached into different morpho-syntactic stems, they do not make any meaning change only under this clause. Secondly, two syntactic types of as-clauses can also be found in Korean, similarly to those of English: CP-As type and Predicate-As type, depending on which types of gap they involve in. English has one more subtype of Predicate-As type (called inverted Predicate-As clause), while Korean does not show this subtype. Thirdly, the various mismatches attributed by the gap and the antecedent come from the constructional restrictions of as-clauses in Korean. In addition, the paper attempts to display various ambiguities from the as-clauses through disjoint references or negative sentences in As-Parenthetical constructions.

  • PDF

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

A Constraint on Lexical Transfer: Implications for Computer-Assisted Translation(CAT)

  • Park, Kabyong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.9-16
    • /
    • 2016
  • The central goal of the current paper is to investigate lexical transfer between Korean and English and to identify rule-governed behavior and to provide implications for development of computer-assisted translation(CAT) software for the two languages. It will be shown that Sankoff and Poplack's Free Morpheme Constraint can not account for all the range of data. A constraint is proposed that a set of case-assigners such as verbs, INFL, prepositions, and the possessive marker may not undergo lexical transfer. The translation software is also expected to be equipped with the proposed claim that English verbs are actually borrowed as nouns or as defective verbs to escape from the direct attachment of inflectional morphemes.

Fake News Detection Using Deep Learning

  • Lee, Dong-Ho;Kim, Yu-Ri;Kim, Hyeong-Jun;Park, Seung-Myun;Yang, Yu-Jun
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1119-1130
    • /
    • 2019
  • With the wide spread of Social Network Services (SNS), fake news-which is a way of disguising false information as legitimate media-has become a big social issue. This paper proposes a deep learning architecture for detecting fake news that is written in Korean. Previous works proposed appropriate fake news detection models for English, but Korean has two issues that cannot apply existing models: Korean can be expressed in shorter sentences than English even with the same meaning; therefore, it is difficult to operate a deep neural network because of the feature scarcity for deep learning. Difficulty in semantic analysis due to morpheme ambiguity. We worked to resolve these issues by implementing a system using various convolutional neural network-based deep learning architectures and "Fasttext" which is a word-embedding model learned by syllable unit. After training and testing its implementation, we could achieve meaningful accuracy for classification of the body and context discrepancies, but the accuracy was low for classification of the headline and body discrepancies.