• Title/Summary/Keyword: Learner Corpus

Search Result 25, Processing Time 0.021 seconds

Native Language Identification for Korean Learner Corpus (한국어 학습자 말뭉치의 모어 판별)

  • Hur, Heuijung;Chung, Seung Yeon;Kim, Han-Saem
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.300-304
    • /
    • 2021
  • 모어 판별이란 제 2 언어를 습득하는 학습자들이 생산한 목표 언어에 기반하여 학습자들의 제 1 언어를 자동적으로 확인하는 작업을 말한다. 모여 판별 과제를 성공적으로 수행하기 위한 방법을 다룬 다양한 연구들이 진행되어 왔으나, 한국어를 대상으로 진행된 모어 판별 연구는 그 수가 극히 적다. 본 연구에서는 한국어 학습자 텍스트를 대상으로 머신 러닝, 딥 러닝의 다양한 문서 분류 모델을 실험하고, 이를 통해 한국어 학습자 텍스트 모어 판별을 위해 적합한 모델을 구축하기 위해 필요한 조건을 찾아보고자 하였다.

  • PDF

Computer Codes for Korean Sounds: K-SAMPA

  • Kim, Jong-mi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.3-16
    • /
    • 2001
  • An ASCII encoding of Korean has been developed for extended phonetic transcription of the Speech Assessment Methods Phonetic Alphabet (SAMPA). SAMPA is a machine-readable phonetic alphabet used for multilingual computing. It has been developed since 1987 and extended to more than twenty languages. The motivating factor for creating Korean SAMPA (K-SAMPA) is to label Korean speech for a multilingual corpus or to transcribe native language (Ll) interfered pronunciation of a second language learner for bilingual education. Korean SAMPA represents each Korean allophone with a particular SAMPA symbol. Sounds that closely resemble it are represented by the same symbol, regardless of the language they are uttered in. Each of its symbols represents a speech sound that is spectrally and temporally so distinct as to be perceptually different when the components are heard in isolation. Each type of sound has a separate IPA-like designation. Korean SAMPA is superior to other transcription systems with similar objectives. It describes better the cross-linguistic sound quality of Korean than the official Romanization system, proclaimed by the Korean government in July 2000, because it uses an internationally shared phonetic alphabet. It is also phonetically more accurate than the official Romanization in that it dispenses with orthographic adjustments. It is also more convenient for computing than the International Phonetic Alphabet (IPA) because it consists of the symbols on a standard keyboard. This paper demonstrates how the Korean SAMPA can express allophonic details and prosodic features by adopting the transcription conventions of the extended SAMPA (X-SAMPA) and the prosodic SAMPA(SAMPROSA).

  • PDF

Study on the Use of Objectification Strategy in Academic Writing (학술적 글쓰기에서의 객관화 전략 사용 양상 연구 - 한국어 학습자와 한국어 모어 화자 간의 비교를 중심으로 -)

  • Kim, Han-saem;Bae, Mi-yeon
    • Cross-Cultural Studies
    • /
    • v.49
    • /
    • pp.95-126
    • /
    • 2017
  • The purpose of this paper is to compare learners' academic texts with academic texts of native speakers and to examine the usage patterns of learners' objectification strategies in detail. In order to achieve objectivity as a discourse mechanism applied to describe the results of academic inquiry in a scientific way with universality and validity, we analyzed concepts and signs such as related intentionality, accuracy, and mitigation of the linguistic markers of objectification strategies. As a result of the comparison, it was analyzed that there are intersectional overlaps with the signs that reveal objectivity, signs indicating related mechanisms, and there is a different set that is differentiated. Objective markers can be broadly classified as emphasizing stativity of research results, separating research subjects from research results, and generalizing research contents. Sustainable expressions and noun phrases emphasize statehood, and non-inhabited expressions, passive expressions, and self-quotations are maintained in the distance between the claimant and the writer, and the pluralization through first-person pronouns and suffixes contributes to generalization. In the case of the learner, the non-inhuman expression of the quotation type appears to be very less compared to the maw speaker, which could be due to the lack of recognition of the citation method of the Korean academic text. Next, in the generalization of the research contents, the expression of 'we' was very less compared to the maw speakers.

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Comparison of vowel lengths of articles and monosyllabic nouns in Korean EFL learners' noun phrase production in relation to their English proficiency (한국인 영어학습자의 명사구 발화에서 영어 능숙도에 따른 관사와 단음절 명사 모음 길이 비교)

  • Park, Woojim;Mo, Ranm;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.33-40
    • /
    • 2020
  • The purpose of this research was to find out the relation between Korean learners' English proficiency and the ratio of the length of the stressed vowel in a monosyllabic noun to that of the unstressed vowel in an article of the noun phrases (e.g., "a cup", "the bus", etcs.). Generally, the vowels in monosyllabic content words are phonetically more prominent than the ones in monosyllabic function words as the former have phrasal stress, making the vowels in content words longer in length, higher in pitch, and louder in amplitude. This study, based on the speech samples from Korean-Spoken English Corpus (K-SEC) and Rated Korean-Spoken English Corpus (Rated K-SEC), examined 879 English noun phrases, which are composed of an article and a monosyllabic noun, from sentences which are rated on 4 levels of proficiency. The lengths of the vowels in these 879 target NPs were measured and the ratio of the vowel lengths in nouns to those in articles was calculated. It turned out that the higher the proficiency level, the greater the mean ratio of the vowels in nouns to the vowels in articles, confirming the research's hypothesis. This research thus concluded that for the Korean English learners, the higher the English proficiency level, the better they could produce the stressed and unstressed vowels with more conspicuous length differences between them.