• Title/Summary/Keyword: syntactic model

Search Result 101, Processing Time 0.026 seconds

I-QANet: Improved Machine Reading Comprehension using Graph Convolutional Networks (I-QANet: 그래프 컨볼루션 네트워크를 활용한 향상된 기계독해)

  • Kim, Jeong-Hoon;Kim, Jun-Yeong;Park, Jun;Park, Sung-Wook;Jung, Se-Hoon;Sim, Chun-Bo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.11
    • /
    • pp.1643-1652
    • /
    • 2022
  • Most of the existing machine reading research has used Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) algorithms as networks. Among them, RNN was slow in training, and Question Answering Network (QANet) was announced to improve training speed. QANet is a model composed of CNN and self-attention. CNN extracts semantic and syntactic information well from the local corpus, but there is a limit to extracting the corresponding information from the global corpus. Graph Convolutional Networks (GCN) extracts semantic and syntactic information relatively well from the global corpus. In this paper, to take advantage of this strength of GCN, we propose I-QANet, which changed the CNN of QANet to GCN. The proposed model performed 1.2 times faster than the baseline in the Stanford Question Answering Dataset (SQuAD) dataset and showed 0.2% higher performance in Exact Match (EM) and 0.7% higher in F1. Furthermore, in the Korean Question Answering Dataset (KorQuAD) dataset consisting only of Korean, the learning time was 1.1 times faster than the baseline, and the EM and F1 performance were also 0.9% and 0.7% higher, respectively.

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

One-Class Classification Model Based on Lexical Information and Syntactic Patterns (어휘 정보와 구문 패턴에 기반한 단일 클래스 분류 모델)

  • Lee, Hyeon-gu;Choi, Maengsik;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.817-822
    • /
    • 2015
  • Relation extraction is an important information extraction technique that can be widely used in areas such as question-answering and knowledge population. Previous studies on relation extraction have been based on supervised machine learning models that need a large amount of training data manually annotated with relation categories. Recently, to reduce the manual annotation efforts for constructing training data, distant supervision methods have been proposed. However, these methods suffer from a drawback: it is difficult to use these methods for collecting negative training data that are necessary for resolving classification problems. To overcome this drawback, we propose a one-class classification model that can be trained without using negative data. The proposed model determines whether an input data item is included in an inner category by using a similarity measure based on lexical information and syntactic patterns in a vector space. In the experiments conducted in this study, the proposed model showed higher performance (an F1-score of 0.6509 and an accuracy of 0.6833) than a representative one-class classification model, one-class SVM(Support Vector Machine).

Automatic Evaluation of Elementary School English Writing Based on Recurrent Neural Network Language Model (순환 신경망 기반 언어 모델을 활용한 초등 영어 글쓰기 자동 평가)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.21 no.2
    • /
    • pp.161-169
    • /
    • 2017
  • We often use spellcheckers in order to correct the syntactic errors in our documents. However, these computer programs are not enough for elementary school students, because their sentences are not smooth even after correcting the syntactic errors in many cases. In this paper, we introduce an automated method for evaluating the smoothness of two synonymous sentences. This method uses a recurrent neural network to solve the problem of long-term dependencies and exploits subwords to cope with the rare word problem. We trained the recurrent neural network language model based on a monolingual corpus of about two million English sentences. In our experiments, the trained model successfully selected the more smooth sentences for all of nine types of test set. We expect that our approach will help in elementary school writing after being implemented as an application for smart devices.

Clustering-based Statistical Machine Translation Using Syntactic Structure and Word Similarity (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Han-Kyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • Clustering method which based on sentence type or document genre is a technique used to improve translation quality of SMT(statistical machine translation) by domain-specific translation. But there is no previous research using sentence type and document genre information simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Kernel function and cosine measures are applied to calculate structural similarity and word similarity. With these similarities, we used machine learning algorithms similar to K-means to clustering. In Japanese-English patent translation corpus, we got 2.5% point relative improvements of translation quality at optimal case.

Loaming Syntactic Constraints for Improving the Efficiency of Korean Parsing (한국어 구문분석의 효율성을 개선하기 위한 구문제약규칙의 학습)

  • Park, So-Young;Kwak, Yong-Jae;Chung, Hoo-Jung;Hwang, Young-Sook;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.10
    • /
    • pp.755-765
    • /
    • 2002
  • In this paper, we observe various syntactic information for Korean parsing and propose a method to learn constraints and improve the efficiency of a parsing model by using the constraints. The proposed method has the following three characteristics. First, it improves the parsing efficiency since we use constraints that can prevent the parser from generating unsuitable candidates. Second, it is robust on a given Korean sentence because the attributes for the constraints are selected based on the syntactic and lexical idiosyncrasy of Korean. Third, it is easy to acquire constraints automatically from a treebank by using a decision tree learning algorithm. The experimental results show that the parser using acquired constraints can reduce the number of overgenerated candidates up to 1/2~1/3 of candidates and it runs 2~3 times faster than the one without any constraints.

Determination of Thematic Roles according to Syntactic Relations Using Rules and Statistical Models in Korean Language Processing (한국어 전산처리에서 규칙과 확률을 이용한 구문관계에 따른 의미역 결정)

  • 강신재;박정혜
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.1
    • /
    • pp.33-42
    • /
    • 2003
  • This paper presents an efficient determination method of thematic roles from syntactic relations using rules and statistical model in Korean language processing. This process is one of the main core of semantic analysis and an important issue to be solved in natural language processing. It is problematic to describe rules for determining thematic roles by only using general linguistic knowledge and experience, since the final result may be different according to the subjective views of researchers, and it is impossible to construct rules to cover all cases. However, our hybrid method is objective and efficient by considering large corpora, which contain practical usages of Korean language, and case frames in the Sejong Electronic Lexicon of Korean, which is being developed by dozens of Korean linguistic researchers. To determine thematic roles more correctly, our system uses syntactic relations, semantic classes, morpheme information, position of double subject. Especially by using semantic classes, we can increase the applicability of our system.

  • PDF

An Abstraction Method for State Minimization based on Syntactic and Semantic Patterns in the Execution Space of Real-Time Systems (실시간 시스템의 실행 공간상에서 구문 및 의미패턴에 기반한 상태 최소화를 위한 추상화 방법)

  • 박지연;조기환;이문근
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.103-116
    • /
    • 2003
  • States explosion due to composition of spaces of data, temporal, and locational values is one of the well-known critical problems which cause difficulty in understanding and analysing real-time systems specified with state-based formal methods. In order to overcome this problem, this paper presents an abstraction method for state minimization based on an abstraction in system specification and an abstraction in system execution. The first is named the syntactic in system specification and an abstraction in system execution. The first is named the syntactic abstraction, through which the patterns of the unconditionally internalized computation and the repetition and selection structures are abstracted. The latter is named the semantic abstraction, through which the patterns of the execution space represented with data. Through the abstractions, the components of a system in specification and execution model is hierarchically organized. The system can be analyzed briefly in the upper level in an skeleton manner with low complexity. The system, however, can be abstraction method for the state minimization and the decrease in analysis complexity through the abstraction with examples.

A Simple Syntax for Complex Semantics

  • Lee, Kiyong
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.2-27
    • /
    • 2002
  • As pact of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processing strings of words or sentences to construct semantic data structures. For design arid modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction arid temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser's (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity arid two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be conceptually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax cart incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes arid that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

  • PDF

Segmentation of Long Chinese Sentences using Comma Classification (쉼표의 자동분류에 따른 중국에 장문분할)

  • Jin Me-Ixun;Kim Mi-Young;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.5
    • /
    • pp.470-480
    • /
    • 2006
  • The longer the input sentences, the worse the parsing results. To improve the parsing performance, many methods about long sentence segmentation have been reserarched. As an isolating language, Chinese sentence has fewer cues for sentence segmentation. However, the average frequency of comma usage in Chinese is higher than that of other languages. The syntactic information that the comma conveys can play an important role in long sentence segmentation of Chinese languages. This paper proposes a method for classifying commas in Chinese sentences according to the context where the comma occurs. Then, sentences are segmented using the classification result. The experimental results show that the accuracy of the comma classification reaches 87.1%, and with our segmentation model, the dependency parsing accuracy of our parser is improved by 5.6%.