• Title/Summary/Keyword: Syntactic Features

검색결과 93건 처리시간 0.025초

과학교과서의 학년 간 언어적 특성 분석 -텍스트 정합성을 중심으로- (An Analysis of Linguistic Features in Science Textbooks across Grade Levels: Focus on Text Cohesion)

  • 류지수;전문기
    • 한국과학교육학회지
    • /
    • 제41권2호
    • /
    • pp.71-82
    • /
    • 2021
  • 교과서를 통한 학습의 효율성을 최대화하기 위해서는 교과서에 수록된 텍스트 특성이 예상된 학습자의 특성(i.e., 언어적 및 인지적 능력, 배경지식 수준)에 따라 체계적으로 조절되어야 한다. 이에 따라 현재 연구에서는 과학교과서 개발에 이러한 체계적인 원칙이 반영되어 있는지를 알아보기 위하여 중학교 1, 2, 3학년 과학교과서의 학년 간 언어적 특성을 비교 분석하였다. 구체적으로 한국어 분석 프로그램인 Auto-Kohesion 시스템을 활용하여 기존 텍스트 분석 연구에 많이 활용되었던 텍스트 표층 구조 측정치, 어휘 관련 측정치, 통사적 복잡성 측정치와 같은 피상적 측정치에 더하여 여러 정합성 관련 측정치(e.g., 명사 반복, 접속사, 대명사)를 분석하였다. 주요 분석 결과, 대체로 어절 및 문장 길이, 어휘 빈도와 같은 피상적으로 두드러지는 특성에 대해서는 학년이 증가함에 따라 텍스트 복잡도가 상승하는 방향으로 단계적으로 조절이 이루어졌지만, 그 외의 많은 언어적 특질에 대해서는 체계적으로 조절되지 않은 것으로 나타났다. 특히 여러 정합성 측정치들이 교과서 개발 과정에서 충분히 고려되지 않은 것으로 시사되었다. 이러한 결과는 저학년 학습자들이 교과서를 사용할 때 발달 단계에 맞지 않는 어려운 텍스트를 접할 가능성이 있어서 학습 의욕 및 효율성 저하 현상이 발생할 수 있다는 것을 제시한다. 아울러 고학년 교과서가 고등 교육을 대비하여 더욱 복잡한 텍스트를 처리할 수 있는 능력을 개발시키기 위한 용도로 적절하지 않을 수 있음을 시사한다. 본 연구는, 추후 교과서 개발 과정에서, 예상된 독자 특성의 변화에 따라 정합성 측정치를 포함한 여러 언어적 특성이 단계적으로 조절되어야 함을 제안한다.

한·중 형용사 통사론적 비교 연구 - 형용사의 특징과 기능을 중심으로 (The syntax comparative research of Korean and Chinese Adjectives)

  • 단명결
    • 비교문화연구
    • /
    • 제25권
    • /
    • pp.483-527
    • /
    • 2011
  • The main focus of this dissertation is the comparative research of Korean and Chinese adjectives. With the comparison and contrast of the concepts, features and usages of Korean and Chinese adjectives, we have concluded some similarities and differences. The aim is to help Chinese learners who study Korean better understand the features of Korean adjectives and use them more easily. Korean belongs to 阿?泰?族 and expresses meanings with pronunciation; however, Chinese belongs to ?藏?族 and expresses meanings with characters. There are many similarities between those two languages that look completely different, such as pronunciation and grammar at some extent. Even the Chinese words in Korean are quite similar to Chinese. However, the two languages are very different from each other, from the detailed grammatical view. For instance, the auxiliary word in Korean and Chinese is completely different. Then, Korean has a concept: ?尾that does not exist in Chinese at all. Especially, about categories of words, it is very important and difficult to distinguish adjective and verb for the Chinese Korean-learners. One reason of the challenge is that some Korean adjectives are categorized as verbs in Chinese. For example, "like", "dislike", "fear" in Korean are "psychological adjective" however, they are "psychological verb" in Chinese. The differences in categorization always mislead learners in understanding whole articles. At the same time, they cause more problems and difficulties in learning other grammatical items for Chinese Korean-learners. Based on that, the dissertation is helpful for Chinese learners who are studying Korean. Starting from the most basic concepts, the second chapter focuses on analyzing the similarities and differences between Korean and Chinese adjectives. The correct understanding of adjective is the basis of accurate learning of it. With the comparison of concepts and primary comprehension of adjective, the third chapter analyzes in detail about the features of Korean and Chinese adjective from grammar and meaning. Based on those features, we analyze the detailed usages of Korean and Chinese adjective in articles; especially we provide the detailed explanations of adjective changes in different tense and ?尾 changes in using with noun and verb. The fourth chapter emphasizes the similarities and differences of adjective meanings in Korean and Chinese. We have provided the comparative analyses from six different views, which could be helpful for Chinese Korean-learners. Until now, there are few comparative studies of Korean and Chinese adjectives. About this dissertation, some limitations also exist in such an area. However, we hope it could provide some help for Chinese Korean-learners, and more profound research will be developed in the future.

자동요약의 주제어 추출을 위한 의미사전의 동적 확장 (Dynamic Expansion of Semantic Dictionary for Topic Extraction in Automatic Summarization)

  • 추교남;우요섭
    • 전기전자학회논문지
    • /
    • 제13권2호
    • /
    • pp.241-247
    • /
    • 2009
  • 본 논문에서는 자동문서요약 시스템에서 정확하고 실용적인 주제어 추출을 위하여 한국어의 의미론적 특성을 고려한 의미사전의 확장 방법론에 대하여 논하고자 한다. 첫째로 동의어 사전을 통하여 의미표지 분석의 정확도를 높이고자 한다. 둘째로 하위범주화사전에 가중치를 부여하여 구문과 의미 분석에서 가장 올바른 분석 결과를 결정하는 참조 정보로 활용하고자 한다. 셋째로 미등록 용언의 하위범주화패턴 예측을 통하여 한국어에서 접사 파생되는 용언에 대하여 원활한 의미 분석을 수행할 수 있도록 한다.

  • PDF

언어자원 자동 구축을 위한 위키피디아 콘텐츠 활용 방안 연구 (A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources)

  • 류철중;김용;윤보현
    • 디지털융복합연구
    • /
    • 제13권5호
    • /
    • pp.187-194
    • /
    • 2015
  • 급변하는 자연언어를 기계가 이해할 수 있도록 하기 위해서는 다양한 언어지식자원(linguistic knowledge resources)의 구축이 필수적으로 수반된다. 본 논문에서는 온라인 콘텐츠의 특성을 활용해 언어지식자원을 자동으로 구축함으로써 지속적으로 확장 가능한 방법을 고안하고자 한다. 특히 언어분석 과정에서 가장 활용도가 높은 개체명(NE: Named Entity) 사전을 자동으로 구축, 확장하는데 주안점을 둔다. 이를 위해 본 논문에서는 개체명 사전 구축대상문서로 위키피디아(Wikipedia)를 선정, 그 특성을 파악하기 위해 다양한 통계 분석을 수행하였다. 이에 기반하여 위키피디아 콘텐츠가 갖는 구문적 특성과 구조 정보 등의 메타데이터를 활용하여 개체명 사전을 구축, 확장하는 방법을 제안한다.

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

  • Kim, Hee sook;Lee, Min Hi
    • International Journal of Advanced Culture Technology
    • /
    • 제5권1호
    • /
    • pp.83-89
    • /
    • 2017
  • In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.

한국어 구조적 중의성 문장에 대한 일본인 중급 한국어 학습자들의 발화양상 (Prosodic aspects of structural ambiguous sentences in Korean produced by Japanese intermediate Korean learners)

  • 윤영숙
    • 말소리와 음성과학
    • /
    • 제7권3호
    • /
    • pp.89-97
    • /
    • 2015
  • The aim of this study is to investigate the prosodic aspects of structural ambiguous sentences in Korean produced by Japanese Korean learners and the influence of their first language prosody. Previous studies reported that structural ambiguous sentences in Korean are different especially in prosodic phrasing. So we examined whether Japanese Korean leaners can also distinguish, in production, between two types of structural ambiguous sentences on the basis of prosodic features. For this purpose 4 Korean native speakers and 8 Japanese Korean learners participated in the production test. Analysis materials are 6 sentences where a relative clause modify either NP1 or NP1+NP2. The results show that Korean native speakers produced ambiguous sentences by different prosodic structure depending on their semantic and syntactic structure (left branching or right branching sentence). Japanese speakers also show distinct prosodic structure for two types of ambiguous sentences in most cases, but they have more errors in producing left branching sentences than right branching sentences. In addition to that, interference of Japanese pitch accent in the production of Korean ambiguous sentences was observed.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권3호
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • 아시아태평양코퍼스연구
    • /
    • 제2권2호
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

Improved Character-Based Neural Network for POS Tagging on Morphologically Rich Languages

  • Samat Ali;Alim Murat
    • Journal of Information Processing Systems
    • /
    • 제19권3호
    • /
    • pp.355-369
    • /
    • 2023
  • Since the widespread adoption of deep-learning and related distributed representation, there have been substantial advancements in part-of-speech (POS) tagging for many languages. When training word representations, morphology and shape are typically ignored, as these representations rely primarily on collecting syntactic and semantic aspects of words. However, for tasks like POS tagging, notably in morphologically rich and resource-limited language environments, the intra-word information is essential. In this study, we introduce a deep neural network (DNN) for POS tagging that learns character-level word representations and combines them with general word representations. Using the proposed approach and omitting hand-crafted features, we achieve 90.47%, 80.16%, and 79.32% accuracy on our own dataset for three morphologically rich languages: Uyghur, Uzbek, and Kyrgyz. The experimental results reveal that the presented character-based strategy greatly improves POS tagging performance for several morphologically rich languages (MRL) where character information is significant. Furthermore, when compared to the previously reported state-of-the-art POS tagging results for Turkish on the METU Turkish Treebank dataset, the proposed approach improved on the prior work slightly. As a result, the experimental results indicate that character-based representations outperform word-level representations for MRL performance. Our technique is also robust towards the-out-of-vocabulary issues and performs better on manually edited text.

새싹: 초보자를 위한 한글 객체 지향 프로그래밍 언어 (Saesark: A Korean Object-Oriented Programming Language for Beginners)

  • 천준석;우균
    • 한국콘텐츠학회논문지
    • /
    • 제16권3호
    • /
    • pp.288-295
    • /
    • 2016
  • 컴퓨터가 일상생활에 널리 사용됨에 따라 프로그래밍은 필수 기술로 떠오르고 있다. 프로그래밍 교육을 지원하기 위해서 우리나라는 2018년까지 프로그래밍 정규 과목을 개발할 계획을 추진하고 있다. 하지만 대부분의 프로그래밍 언어가 영어를 기반으로 하고 있기 때문에 우리나라의 프로그래밍 교육은 어려워지고 있으며 학생들도 집중력을 쉽게 잃을 수 있다. 이 논문에서는 우리나라 학생들의 프로그래밍 교육에 효과적인 한글 프로그래밍 언어 새싹을 제안하고 있다. 새싹은 Java를 바탕으로 개발하였으며, 객체지향 프로그래밍과 람다 식을 지원한다. 새싹의 교육 적합성을 평가하기 위해서 우리는 새싹을 다른 한글 프로그래밍 언어와 세 가지 측면에서, 즉 문법적 특성, IDE 지원, 한국어 오류 메시지 측면에서 비교하였다. 비교 결과, 새싹이 다른 한글 프로그래밍 언어보다 더 교육에 적합함을 알 수 있었다. 특히, IDE 기능과 한글로 출력되는 오류 메시지는 프로그래밍 초보자에게 도움이 많이 될 것으로 예상된다.