• Title/Summary/Keyword: learning English words

Search Result 92, Processing Time 0.025 seconds

An Analysis of Structural Features, Contents, and Cognitive Levels of Questions of Korea and Secondary Textbooks in the Evolution Unit

  • Park, Sung-Il;Kang, Nam-Ha
    • Journal of The Korean Association For Science Education
    • /
    • v.28 no.7
    • /
    • pp.697-712
    • /
    • 2008
  • The purpose of this study was to seek strengths and weaknesses from analyzing Korea and U.S. science textbooks in terms of general structural features, contents, cognitive levels of questions and the purpose of questions used in science textbooks. This provided insight into improvement of textbooks that can effectively assist teaching and learning. To investigate organization of unit in textbooks in-depth, the evolution unit was selected and scrutinized as one example. The results showed that the number of pages, activities, vocabulary words, and vocabulary lists are considerably different between Korean and the U.S. Commonly, U.S. textbooks were more laden with information and lacking in coherence than those of the Korean textbooks. The findings on the cognitive levels of questions showed that the majority of questions in both nations are concerned with knowledge. However, the difference between the two nations is great in the ratios of analysis, synthesis, and evaluation questions. Questions are concentrated in review section (45% of Korean and 60.6% of U.S.) in textbooks. It suggested that well-planned questions in a review section can provide the basic guidance for strength in a science classroom.

Recognition of Korean Implicit Citation Sentences Using Machine Learning with Lexical Features (어휘 자질 기반 기계 학습을 사용한 한국어 암묵 인용문 인식)

  • Kang, In-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.8
    • /
    • pp.5565-5570
    • /
    • 2015
  • Implicit citation sentence recognition is to locate citation sentences which lacks explicit citation markers, from articles' full-text. State-of-the-art approaches exploit word ngrams, clue words, researcher's surnames, mentions of previous methods, and distance relative to nearest explicit citation sentences, etc., reaching over 50% performance. However, most previous works have been conducted on English. As for Korean, a rule-based method using positive/negative clue patterns was reported to attain the performance of 42%, requiring further improvement. This study attempted to learn to recognize implicit citation sentences from Korean literatures' full-text using Korean lexical features. Different lexical feature units such as Eojeol, morpheme, and Eumjeol were evaluated to determine proper lexical features for Korean implicit citation sentence recognition. In addition, lexical features were combined with the position features representing backward/forward proximities to explicit citation sentences, improving the performance up to over 50%.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.

Mathematical Errors of Minority Students from North Korean Defectors and Low-SES in Learning of Mathematical Basic Concepts (교육소외 학생들의 기초학력 신장을 위한 수학학습에서 나타난 수학적 오류: 탈북학생과 저소득층 학생을 대상으로)

  • ChoiKoh, Sang-Sook
    • Journal of Educational Research in Mathematics
    • /
    • v.22 no.2
    • /
    • pp.203-227
    • /
    • 2012
  • This was to investigate how the slow learners who specially belonged to low-SES, or North Korean defectors showed their errors in mathematical learning. To conduct the study, two groups for each minority group participated in the study volunteerly during the Winter vacation, in 2011. Based on the preliminary interviews, a total of 15 units were given, focusing on building mathematical basic concepts. As results, they had some errors in common. They both were in lack of understanding of the terminologies and not able to apply the meanings of definitions and theorems to a problem. Because of uncertainty of basic knowledge of mathematics, they easily lost their focus and were apt to make a mistake. Also, they showed clear differences. North Korean defectors were not accustomed to using or understanding the meanings of Chines or English in Korean words in expressing, writing mathematical terminologies and reading data on the context. Technical errors, and misinterpreted errors were found. However, students from the low SES showed that they were familiar with mathematical words and terminologies, but their errors mostly belonged to carelessness because of the lack of mastering mathematical concepts.

  • PDF

Korean Hedge Detection Using Word Usage Information and Neural Networks (단어 쓰임새 정보와 신경망을 활용한 한국어 Hedge 인식)

  • Ren, Mei-Ying;Kang, Sin-jae
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.317-325
    • /
    • 2017
  • In this paper, we try to classify Korean hedge sentences, which are regarded as not important since they express uncertainties or personal assumptions. Through previous researches to English language, we found dependency information of words has been one of important features in hedge classification, but not used in Korean researches. Additionally, we found that word embedding vectors include the word usage information. We assume that the word usage information could somehow represent the dependency information. Therefore, we utilized word embedding and neural networks in hedge sentence classification. We used more than one and half million sentences as word embedding dataset and also manually constructed 12,517-sentence hedge classification dataset obtained from online news. We used SVM and CRF as our baseline systems and the proposed system outperformed SVM by 7.2%p and also CRF by 1.2%p. This indicates that word usage information has positive impacts on Korean hedge classification.

Investigating the Function of Backchannel Tokens, uh, um(uhm), and and hm as a Positive Influence in Second Language Learning (백채널 토큰 uh, um(uhm), and, hm 이 제2외국어 학습에서 미치는 순기능의 연구)

  • Kang, SungKwan;Chon, Hyong Joseph
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.25-38
    • /
    • 2017
  • This study investigates non-native speakers(NNS) of English use of backchannels with beginner-intermediate learners' use of 'uh', 'um(uhm)', 'and' and 'hm' suggesting a view as a possible pedagogical implication. The initial aim of this study was to learn this phenomenon and observe their conversation patterns to compare with previous studies. Based on the previous findings, the analyzed data using conventional Conversation Analysis (CA) methods indicate the possible presence of L1 topic markers, '-un' and '-nun' in the form of L2 backchannel tokens when uttered by beginning and intermediate level speakers of English and the presences of L2 backchannel tokens appear only in front of noun phrases. Additionally, these same words with these tokens and when translated back to Korean also require topic markers of '-un' and '-nun.' Finally, This study discusses possible pedagogical implications with the initial analysis of backchannel tokens for Korean EFL learners. In addition, the ultimate goal of this study is to refine this analysis with follow up experiments to validate this investigation into a working hypothesis generating discussions of this backchannel phenomenon from being viewed as a hindrance to as an positive influence that needs to be understood.

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

  • Pok, Gou-Chol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.3
    • /
    • pp.197-205
    • /
    • 2020
  • Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

The Blended Approach of Machine Translation and Human Translation (기계번역과 인간번역의 혼합적 접근법)

  • Kim, Yangsoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.1
    • /
    • pp.239-244
    • /
    • 2022
  • Neural Machine Translation (NMT) is gradually breaking down the boundary between human and machine translation. We look at actual cases of human and machine translation and discuss why machine translation needs a human touch. In this paper, we raise three driving questions: Can humans be replaced by machines?; How human translators can remain successful in a NMT-driven world?; Is it possible to eliminate language barrier in the era of NMT and World Englishes? The answers to these questions are all negative. We suggest that machine translation is a useful tool with rapidity, accuracy, and low cost productivity. However, the machine translation is limited in the areas of culture, borrowing, ambiguity, new words and (national) dialects. The machines cannot imitate the emotional and intellectual abilities of human translators since machines are based on machine learning, while humans are on intuition. The machine translation will be a useful tool that does not cause moral problems when using methods such as back translation and human post-editing. To conclude, we propose the blended approach that machine translation cannot be completed without the touch of human translation.

A Mobile Dictionary based on a Prefetching Method (선인출 기반의 모바일 사전)

  • Hong, Soon-Jung;Moon, Yang-Sae;Kim, Hea-Suk;Kim, Jin-Ho;Chung, Young-Jun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.3
    • /
    • pp.197-206
    • /
    • 2008
  • In the mobile Internet environment, frequent communications between a mobile device and a content server are required for searching or downloading learning materials. In this paper, we propose an efficient prefetching technique to reduce the network cost and to improve the communication efficiency in the mobile dictionary. Our prefetching-based approach can be explained as follows. First, we propose an overall framework for the prefetching-based mobile dictionary. Second, we present a systematic way of determining the amount of prefetching data for each of packet-based and flat-rate billing cases. Third, by focusing on the English-Korean mobile dictionary for middle or high school students, we propose an intuitive method of determining the words to be prefetched in advance. Fourth, based on these determination methods, we propose an efficient prefetching algorithm. Fifth, through experiments, we show the superiority of our prefetching-based method. From this approach, we can summarize major contributions as follows. First, to our best knowledge, this is the first attempt to exploit prefetching techniques in mobile applications. Second, we propose a systematic way of applying prefetching techniques to a mobile dictionary. Third, using prefetching techniques we improve the overall performance of a network-based mobile dictionary. Experimental results show that, compared with the traditional on-demand approach, our prefetching based approach improves the average performance by $9.8%{\sim}33.2%$. These results indicate that our framework can be widely used not only in the mobile dictionary but also in other mobile Internet applications that require the prefetching technique.