• Title/Summary/Keyword: English Word

Search Result 576, Processing Time 0.021 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Acoustic and Pronunciation Model Adaptation Based on Context dependency for Korean-English Speech Recognition (한국인의 영어 인식을 위한 문맥 종속성 기반 음향모델/발음모델 적응)

  • Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
    • MALSORI
    • /
    • v.68
    • /
    • pp.33-47
    • /
    • 2008
  • In this paper, we propose a hybrid acoustic and pronunciation model adaptation method based on context dependency for Korean-English speech recognition. The proposed method is performed as follows. First, in order to derive pronunciation variant rules, an n-best phoneme sequence is obtained by phone recognition. Second, we decompose each rule into a context independent (CI) or a context dependent (CD) one. To this end, it is assumed that a different phoneme structure between Korean and English makes CI pronunciation variabilities while coarticulation effects are related to CD pronunciation variabilities. Finally, we perform an acoustic model adaptation and a pronunciation model adaptation for CI and CD pronunciation variabilities, respectively. It is shown from the Korean-English speech recognition experiments that the average word error rate (WER) is decreased by 36.0% when compared to the baseline that does not include any adaptation. In addition, the proposed method has a lower average WER than either the acoustic model adaptation or the pronunciation model adaptation.

  • PDF

Bridging the Gap between Grammar and Conversation in Korean College English Conversation Classes

  • Lee, Eun-Ah
    • English Language & Literature Teaching
    • /
    • no.5
    • /
    • pp.27-48
    • /
    • 1999
  • College students frequently feel their grammar knowledge from primary and middle school is not useful when they are asked to speak in college conversation classes. Because of their frustration at their lack of communicational ability as well as inappropriate teaching methods and class textbooks that have little to do with the student's major course of study, the student often has a low motivation to study. It is not uncommon for students to seek English education outside of their college classrooms by going to language institutes or studying abroad. College teachers need to find a way to use the student's background in grammar from primary and secondary schools. Despite the student's sentiment about his/her grammar education, grammar is an essential key to successful English conversation. Some ways that teachers can close the gap between primary and secondary school grammar education and college conversation classes are: to use a theme-based methodology, cue cards, and modeling. Activities such as Grammar Clinic, Grammar Police, and Show and Tell can be effective ways to bridge this gap. Teachers can use these activities and methods to correct such student errors as: incorrect word order, missing or unnecessary be verbs, confusion between be and do verbs, subject-verb agreement. and incorrect tense.

  • PDF

EFL College Students' Learning Experiences during Film-based Reading Class: Focused on the Analysis of Students' Reflective Journals

  • Baek, Jiyeon
    • International Journal of Advanced Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.49-55
    • /
    • 2019
  • In the age of information, newly produced knowledge is mostly written in English. Therefore, there has been a strong demand for English language learning in the EFL context. However, most EFL learners possess a lack of interest and motivation in the text-based reading class. In this educational context, film is one of the most widely used materials in English reading classes considering that modern learners are predominantly familiar with various audiovisual materials. The purpose of this study is to investigate how Korean EFL learners experienced in the film-based reading class. Specifically, this study aims to analyze the EFL students' perceptions about the class and learning strategies that they used during the class. In order to comprehensively interpret the EFL learners' experiences in the classroom, a coding system consisting of five categories was developed: report, emotion, reflection, evaluation, future plans. The results of data analysis showed that the use of movies in English reading classes had positive effects on reading comprehension and inference of word meaning. The most frequently used learning strategies were affective strategies which helped them control their emotion, attitude, motivations and values, whereas memorization strategies were rarely used. In this respect, this study suggests that the use of movies in the EFL reading classroom encourage students' attention and help them obtain and activate schema which is useful in gaining a better understanding of text-based reading materials.

Segmental Interpretation of Suprasegmental Properties in Non-native Phoneme Perception

  • Kim, Miran
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.117-128
    • /
    • 2015
  • This paper investigates the acoustic-perceptual relation between Korean dent-alveolar fricatives and the English voiceless alveolar fricative /s/ in varied prosodic contexts (e.g., stress, accent, and word initial position). The denti-alveolar fricatives in Korean show a two-way distinction, which can be referred to as either plain (lenis) /s/ or fortis /$s^*$/. The English alveolar voiceless fricative /s/ that corresponds to the two Korean fricatives would be placed in a one-to-two non-native phoneme mapping situation when Korean listeners hear English /s/. This raises an interesting question of how the single fricative of English perceptually maps into the two-way distinction in Korean. This paper reports the acoustic-perceptual mapping pattern by investigating spectral properties of the English stimuli that are heard as either /s/ or /$s^*$/ by Korean listeners, in order to answer the two questions: first, how prosody influences fricatives acoustically, and second, how the resultant properties drive non-native listeners to interpret them as segmental features instead of as prosodic information. The results indicate that Korean listeners' responses change depending on the prosodic context in which the stimuli are placed. It implies that Korean speakers interpret some of the information provided by prosody as segmental one, and that the listeners take advantage of the information in their judgment of non-native phonemes.

A Debate over Translating VS Localizing 'Democracy'

  • A-Kuran, Mohammad Ahmad H.
    • Cross-Cultural Studies
    • /
    • v.24
    • /
    • pp.147-156
    • /
    • 2011
  • A brief consultation of English Arabic dictionaries and encyclopedias shows that there is no one single standard Arabic translation of the English concept 'democracy'. Arab authors use, instead, a series of multiple terms that need clarification if the first term is to be clear. In many cases, they tend to localize the term into Arabic using various orthographic forms; at other times, they run a rather lengthy analysis to elucidate the concept that seems to be an essentially contested term. This paper aims to inquire into the reasons for the confusion and inconsistency in the translation of the concept 'democracy', as well as the underlying arguments for advocating the localization rather than translation of this political concept. This will be followed by a discussion of the implications of this study for lexicographers and translators. Given the fact that ideology is of non-Arabic origin, English perceptions of this fluid concept might help account for its lack of clarity in Arabic translations since Arabic is highly influenced by English in various spheres of life. It would thus be wise first to check the perceptivity of English authors of the concept. To better serve the purpose of this study, the author distinguishes here between 'translation' and so-called 'localization'. The term 'translation' is concerned with finding an existing term in the target language with an equivalent meaning for a foreign word, whereas localization involves taking the foreign term and making it linguistically and culturally appropriate to the target language, by subjecting it to the morphological and syntactic rules of Arabic to be used as if it were originally Arabic.

Formulaic Language Development in Asian Learners of English: A Comparative Study of Phrase-frames in Written and Oral Production

  • Yoon Namkung;Ute Romer
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.2
    • /
    • pp.1-39
    • /
    • 2023
  • Recent research in usage-based Second Language Acquisition has provided new insights into second language (L2) learners' development of formulaic language (Wulff, 2019). The current study examines the use of phrase-frames, which are recurring sequences of words including one or more variable slots (e.g., it is * that), in written and oral production data from Asian learners of English across four proficiency levels (beginner, low-intermediate, high-intermediate, advanced) and native English speakers. The variability, predictability, and discourse functions of the most frequent 4-word phrase-frames from the written essay and spoken dialogue sub-corpora of the International Corpus Network of Asian Learners of English (ICNALE) were analyzed and then compared across groups and modes. The results revealed that while learners' phrase-frames in writing became more variable and unpredictable as proficiency increased, no clear developmental patterns were found in speaking, although all groups used more fixed and predictable phrase-frames than the reference group. Further, no developmental trajectories in the functions of the most frequent phrase-frames were found in both modes. Additionally, lower-level learners and the reference group used more variable phrase-frames in speaking, whereas advanced-level learners showed more variability in writing. This study contributes to a better understanding of the development of L2 phraseological competence.

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

  • Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.382-386
    • /
    • 2004
  • Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].

  • PDF

Orthographic Influence in the Perception and Production of English Intervocalic Consonants: A Pilot Study (영어 모음사이 자음의 인지와 발화에서 철자의 영향: 파일럿 연구)

  • Cho, Mi-Hui;Chung, Ju-Yeon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.459-466
    • /
    • 2009
  • While Korean allows the same consonants at the coda of the preceding syllable and at the onset of the following syllable, English does not allow the geminate consonants in the same intervocalic position. Due to this difference between Korean and English, Korean learners of English tend to incorrectly produce geminate consonants for English geminate graphemes as in $su\underline{mm}er$. Based on this observation, a pilot study was designed to investigate how Korean learners of English perceive and produce English doubleton graphemes and singleton graphemes. Twenty Korean college students were asked to perform a forced-choice perception test as well as a production test for the 36 real word stimuli which consist of (near) minimal pairs of singleton and doubleton graphemes. The result showed that the accuracy rates for the words with singleton graphemes were higher than those for the words with doubleton graphemes both in perception and production because the subjects misperceived and misproduced the doubleton graphemes as geminates due to orthographic influence. In addition, the low error rates of the word with voiced stops were accounted for by Korean language transfer. Further, spectrographic analyses were provided where more production errors were witnessed in doubleton grapheme words than singleton grapheme words. Finally, pedagogical implications are provided.

A Corpus-based Analysis on Primary English Education Research for the Past 20 Years (초등영어교육 연구 논문의 변천: 코퍼스 기반 분석)

  • Choi, Wonkyung
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.2
    • /
    • pp.11-21
    • /
    • 2019
  • It has been about 20 years since the English subject was formally taught in public elementary schools in Korea. The present research aims to analyze the studies regarding 'primary English' implemented in Korea during the time period. I have investigated 6,467 theses or research papers in total that were published in Korea with the help of the corpus programs Utagger and WordSmith Tools. The results show that for the last 20 years the number of overall studies appears to have increased since the year 1997, although the recent trend seems to be in recession. The research scope ranges from 'teaching-learning interaction' to 'curriculum' and 'assessment', which have been steadily investigated for 20 years. Furthermore, researchers sometimes appear to have followed the English education policy by conducting particular investigations like 'immersion program' or 'native English speaking teachers' in a certain time period. Recently, researchers started to have interest in the cutting-edge ICT. In conclusion, the academic field of 'primary English' in Korea has grown in quantity, and the spectrum of research areas has been expanded for the past 20 years. It is hoped that the results of this research will help set a new direction for future research.