• Title/Summary/Keyword: learning English words

Search Result 92, Processing Time 0.022 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

An ESL Teacher's Perspective on Recasts: A Qualitative Exploration of "When" and "How"?

  • Byun, Ji-Hyun;Kayi-Aydar, Hayriye
    • English Language & Literature Teaching
    • /
    • v.16 no.4
    • /
    • pp.1-18
    • /
    • 2010
  • Recasts, which are defined as implicit types of corrective feedback, have been the focus of numerous SLA researchers for more than a decade. A range of classroom-based observational and experimental research studies have explored how and when language teachers provide recasts to learners' ill-formed utterances and aimed to understand the role of recasts in language acquisition or learning. On the basis of previous studies on recasts, our study investigated when an ESL teacher provided recasts and how recasts were provided in his class. The research questions were as follows: (1) When does an ESL teacher provide recasts? (2) How does the teacher provide recasts? The data came from observations of one ESL classroom as well as consecutive-semi structured interviews with the teacher. The data analysis included transcriptions of teacher-student interactions in the target setting and categories of recasts according to the linguistic phenomena, which prompted recasting. Based on the findings, practical suggestions for ESL teachers were provided. [156 words].

  • PDF

A Study on the Cognitive Learning of Meaning through Frame Semantics (틀 의미론을 통한 인지적 의미학습에 관한 연구)

  • Oh, Ju-Young
    • Cross-Cultural Studies
    • /
    • v.19
    • /
    • pp.295-311
    • /
    • 2010
  • The concept of frame in semantics has implications for our understanding of such problematic terms as "meaning" and "concept". It is conventional to say that a particular word corresponds to a particular "concept" and to assume that concepts are essentially identical across speakers. In contrast, the notion of frame accepts that the frame for a particular word can vary across speakers as a function of their particular life experience. To say, instead of thinking in terms of words as expressing "concepts", we should think of them as tools, like frames, that cause listeners to activate certain areas of their knowledge base, with different areas activated to different degrees in different contexts of use. This notion is Fillmore's most crucial contribution to current cognitive linguistic theories, and his frame semantics is built on such a notion. This paper discusses the basic assumptions and goals of frame semantics, and examines the notion of frame and illustrates various framing words of English and Korean under such a notion.

A System for Learning English Words Using Relations between Words (단어간의 관계를 이용한 영어 단어 학습 시스템)

  • Siyeong Bae;Sangchul Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.1154-1157
    • /
    • 2008
  • 오늘날은 실용성 있는 생활 영어교육이 절실히 필요한 시대로서 영어 교육은 무엇보다도 의사소통 능력 개발에 중점을 두고 있다. 영어 학습을 시작하는 초기 학습자 단계에서 가장 우선적으로 직면하게 되는 것이 바로 어휘 학습이다. 기존 영어 단어 학습 시스템은 학습자에게 지나치게 많은 단어들을 단순한 방법을 통해서 학습하게 함으로써 심리적 부담을 주고 있다. 심리언어학에서는 언어 이해의 과정이 단순히 제시된 것을 그대로 받아들이는 수용의 과정이 아니라 학습자가 이미 보유한 경험과 개념을 근거로 활성망의 확산을 통해 적절한 관계를 찾는 역동적·능동적 과정이라는 이론이 있다. 본 논문에서는 언어 학습 이론을 바탕으로 단어들 사이의 관계를 부각시킴으로써 추론과 기억에 도움을 주는 영어 단어 학습 시스템을 제안한다. 본 시스템은 단어들 간의 관계를 정의한 단어 관계 망을 중심으로 단어 학습 순서를 결정할 수 있고, 이미지 및 게임 기능을 지원하여 단어학습의 흥미를 유발하는 특징이 있다. 본 학습시스템을 실제 단어 학습에 적용해 본 결과 학습자들의 만족도가 높았다.

A Comparison of Korean EFL Learners' Oral and Written Productions

  • Lee, Eun-Ha
    • English Language & Literature Teaching
    • /
    • v.12 no.2
    • /
    • pp.61-85
    • /
    • 2006
  • The purpose of the present study is to compare Korean EFL learners' speech corpus (i.e. oral productions) with their composition corpus (i.e. written productions). Four college students participated in the study. The composition corpus was collected through a writing assignment, and the speech corpus was gathered by audio-taping their oral presentations. The results of the data analysis indicate that (i) As for error frequency, young adult low-intermediate Korean EFL learners showed high frequency in determiners (mostly, indefinite articles), vocabulary (mostly, semantic errors), and prepositions. The frequency order did not show much difference between the speech corpus and the composition corpus; and (ii) When comparing the oral productions with the written productions, there were not many differences between them in terms of the contents, a style (i.e., colloquial vs. literary), vocabulary selection, and error types and frequency. Therefore, it is assumed that the proficiency in oral presentation of EFL learners at this learning stage heavily depends on how much/how well they are able to write. In other words, EFL learners' writing and speaking skills are closely co-related. It implies that the teacher does not need to separate teaching how to speak from teaching how to write. The teacher may use the same methods or strategies to help the learners improve their English speaking and writing skills. Furthermore, it will be more effective to teach writing before speaking since they have more opportunities to write than speak in the EFL contexts.

  • PDF

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

  • Geonu Kim;Jungyeon Jang;Juwon Lee;Kitae Kim;Woonyoung Yeo;Jong Woo Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.771-788
    • /
    • 2019
  • Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) show superior performance in text classification than traditional approaches such as Support Vector Machines (SVMs) and Naïve Bayesian approaches. When using CNNs for text classification tasks, word embedding or character embedding is a step to transform words or characters to fixed size vectors before feeding them into convolutional layers. In this paper, we propose a parallel word-level and character-level embedding approach in CNNs for text classification. The proposed approach can capture word-level and character-level patterns concurrently in CNNs. To show the usefulness of proposed approach, we perform experiments with two English and three Korean text datasets. The experimental results show that character-level embedding works better in Korean and word-level embedding performs well in English. Also the experimental results reveal that the proposed approach provides better performance than traditional CNNs with word-level embedding or character-level embedding in both Korean and English documents. From more detail investigation, we find that the proposed approach tends to perform better when there is relatively small amount of data comparing to the traditional embedding approaches.

Monolingual 2- to 3-Year-Old Children's Understanding of Foreign Words (단일 언어 사용 2-3세 아동의 외국어 단어에 대한 이해)

  • Lee, Hyuna;Kim, Eun Young;Song, Hyun-joo
    • Korean Journal of Child Studies
    • /
    • v.37 no.4
    • /
    • pp.159-168
    • /
    • 2016
  • Objective: This study investigated the age at which monolingual children can understand that different languages are different conventional systems of communication. In particular, we investigated when children can suspend using the mutual exclusivity (ME) assumption that a label solely refers to one category when interpreting novel words from foreign languages. Methods: Two-year-olds (n = 16) and 3-year-olds (n = 16) participated in the procedure, which consisted of three blocks. In the first block, a Korean speaker taught the children a novel word, muppi, referring to a novel object. The children were presented with two objects, muppi and another novel object. The Korean speaker then asked the children to find a referent of either muppi or the other novel Korean label, kkati. In the second block, a foreign language (either English or Spanish) speaker asked children to find the object for a foreign novel word, sefo, presenting two objects: muppi and the third novel object, which had not been presented before. The procedure of the third block was identical to that of the first block. Results: Three-year-olds exploited the ME assumption when interpreting a Korean novel word but not when interpreting a foreign novel word. In contrast, 2-year-old children did not use the ME assumption when interpreting native and foreign words. Conclusion: Children acquire an understanding that native and foreign languages have different words for an object at least by 3 years of age.

An Augmented Reality-Based Digital App as an Educational Tool for Foreign Language Learning and the Evaluation of Its Learning Effect: Towards an Examination of Learning Motivation, Learning Satisfaction, and Learning Engagement (증강현실(Augmented Reality) 기술 기반의 글자교구재 디지털 앱 개발 사례와 교육효과 평가: 학습동기, 학습만족도, 학습몰입도를 중심으로)

  • Sae Roan Kim;Eun Jin Won;Hyung Gi Kim;Pil Jung Yun
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.141-157
    • /
    • 2023
  • The present work aimed to present the development of 'Funt', the augmented reality-based digital app as an educational tool for foreign language learning. Our work further evaluated the learning efficacy of the tool by the assessment of the three dependent measures including learning motivation, learning satisfaction, and learning involvement. With a learning app of 'Funt', students can use AR app to access recognition-based or location-based experiences such that any objects, artifacts, or media appear to be in the app. Students are then able to interact with the digital content by manipulating it to learn more about it. Students's engagement should also increase when they create their own experience in AR to demonstrate their understanding of a particular concept or words. Learning effects were evaluated on survey data collected from a hundred respondents aging six to nine years. One-group design for pre-test and post-test was utilized to examine the differences of learning efficacy by comparing the non-'Funt' group and the Funt group scores. A pairwise t-Test was performed for pairwise comparisons between two learning groups. The results indicate that the 'Funt' group scored significantly higher than the non-'Funt' group in the measures of learning motivation, learning satisfaction, and learning involvement. Overall, our results suggest that 'Funt' attracted the students' attention, provided them with a fun context to learn English vocabulary, and develop positive motivation and satisfaction towards vocabulary learning through AR technology.

Case Study of a Dog Vocalizing Human's Words (사람의 말을 발성하는 개의 사례 연구)

  • Kyon, Doo-Heon;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.235-243
    • /
    • 2012
  • This paper studies characteristics and causes of sound, and many others by distinguishing passivity and activity of the cases of a dog vocalizing human's words. As a result of the previous cases of vocalization of human's words, the dog was able to understand characteristics of a host's voice and imitate the sound using his own vocal organs. This is the case of passive vocalization accompanied by temporary voice imitation without a function of communication. On the contrary, as a consequence of the recently reported case in which a dog vocalizes such words as "Um-ma" and "Nu-na-ya," it shows the vocalization pattern clearly distinguished from the prior cases. The given dog repeatedly vocalizes pertaining words in an active manner according to circumstances and plays a role of fundamental communication and interaction with its host. The reason why the dog can vocalize the man's words actively is determined to be that the dog has a high level of intelligence and intimacy with its host, that people react actively to its pertaining pronunciation, and so forth. The following results can be used for the study that investigates animals' sound with vocalization possibility and language learning feasibility.

Network Analysis between Uncertainty Words based on Word2Vec and WordNet (Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.247-271
    • /
    • 2019
  • Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.