• Title/Summary/Keyword: free word-order language

Search Result 16, Processing Time 0.02 seconds

A Morpheme-unit Korean Feature-Based Brammer (KFG) with the X-bar Theoretic Notion of Headedness (X-바 이론의 중심어 개념을 도입한 형태소 단위의 한국어 자질 기반 문법)

  • Park, So-Yeong;Hwang, Yeong-Suk;Im, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1247-1259
    • /
    • 1999
  • 본 논문에서는 한국어 문장형성원리를 간결하게 제시할 수 있도록 X-바 이론의 중심어 개념을 도입한 한국어 자질기반 문법을 제안한다. 제안하는 문법은 어절에 관계없이 나타나는 한국어의 문법현상을 명확히 설명할 수 있도록 어절 대신 형태소를 기본단위로 한다. 그리고, 한국어의 구문범주가 지닌 의미정보와 기능정보를 자질을 이용하여 독립적으로 표현하며, 구문범주간의 결합관계를 바탕으로 하는 자질연산을 수행하여 문장을 분석한다. 또한, 한국어의 부분자유어순과 생략현상에 대해 견고하게 분석할 수 있도록 자질연산을 이진결합중심의 CNF(Chomsky Normal Form)로 제한한다. 이렇게 구성된 한국어 자질기반 문법은 규칙을 직관적이고도 간단하게 기술하며, 한국어의 다양한 문장들을 견고하게 분석한다. SERI Test Suites 97과 신문기사에서 746문장을 추출하여 실험한 결과 94%~99%의 적용율을 보였다.Abstract In this paper, we propose a Korean feature-based grammar(KFG) which adopts the X-bar theoretic notion of headedness for a precise representation of Korean syntactic structure. In order to explain various language phenomena in a given sentence, we use not the word but the morpheme as a constituent unit of KFG. We use features manifesting both the syntactic information and the semantic information of Korean syntactic categories, and feature operations based on the association relationship between two categories. In addition, we restrict feature operations to CNF(Chomsky Normal Form) binary form, which provides a robust representation for properties in Korean such as the frequent ellipsis and the partial free-order. The KFG is intuitive, simple, and versatile in representing most Korean sentences. The experimental result shows 94%~99% coverage on 746 sentences extracted from SERI Test Suites 97 and newspaper sentences.

A Hybrid of Rule based Method and Memory based Loaming for Korean Text Chunking (한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합)

  • 박성배;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.3
    • /
    • pp.369-378
    • /
    • 2004
  • In partially free word order languages like Korean and Japanese, the rule-based method is effective for text chunking, and shows the performance as high as machine learning methods even with a few rules due to the well-developed overt Postpositions and endings. However, it has no ability to handle the exceptions of the rules. Exception handling is an important work in natural language processing, and the exceptions can be efficiently processed in memory-based teaming. In this paper, we propose a hybrid of rule-based method and memory-based learning for Korean text chunking. The proposed method is primarily based on the rules, and then the chunks estimated by the rules are verified by memory-based classifier. An evaluation of the proposed method on Korean STEP 2000 corpus yields the improvement in F-score over the rules or various machine teaming methods alone. The final F-score is 94.19, while those of the rules and SVMs, the best machine learning method for this task, are just 91.87 and 92.54 respectively.

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning (접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정)

  • Seok, Miran;Kim, Yu-Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.555-562
    • /
    • 2016
  • Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

Magritte's drawings and Lacan's Subject theory: Gaze, Encounter with the world (마그리트 회화와 라캉의 주체론 - 응시, 세계와의 조우)

  • Baek, Jin-Hwa
    • The Journal of Art Theory & Practice
    • /
    • no.5
    • /
    • pp.7-24
    • /
    • 2007
  • The subject is connected with a structure named "The Symbolic" to Lacan, but he denied that the subject is explained simply as a fruit of language and "Other". From his point of view, passing through Subject, De-formation and Crack over it is designated as foundation of generation and creation rather than our destined defect. It should not be understood that subject of "The Real" is a concept of the subject free itself from restraint of "The Symbolic". However, this does not mean he asserts "Subject" is something incapable of being controlled by the unknown power. The problem is that this autonomous existence meets inside of it with something "more than one's own self" by "circulating around itself" like a permanent star. This is the indication of a "stranger in the middle of my privacy", or "extimit$\'{e}$", a coined-word by Lacan. Perhaps "Subject" is nothing more than the name of distance of object which is "too hot" to come close, and of this circulating movement. It's because of this object that the real subject stands against generalization and the subject can't be restored to any place in symbolic order-even though it is empty. The part which is told from Lacan's structural theory, that is to say, an importance to Lacan is that his Subject theory is not suggested or denied as a manual structure. On the contrary, it is a study of the relationship between the settled symbol that included in "real subject which is a unconscious one" and the symbolic subject hold- that is a metaphysical subject in general meaning. In Lacan's enlarged concept of subject beyond symbolic reality, it is noticeable that it gives justifiability to the union of a medium of different nature in artistic expression. We can recognize that the unconscious world is a living space which enables it to be a "condition of human being", not something dark under the surface of water through Magritte's(Rene Magritte, 1898~1967) surrealistic works. In other words, Magritte's art secures a core dimension of human nature through a mysterious gap of conscious and settled space. Magritte's drawings often evokes strange and unsettling feelings in people who view his paintings. This is because routine objects are found in "unsuitable" places from which we usually find them in our everyday lives. "Reality" in Magritte's paintings makes it aware that it is a strained field of concealment and disclosure basically between truths, and we can learn that his behavior to overturn to paint in-visible things is finally an effort to restore the "real subject" to the viewer's reality. In other words, such reversion arouses a nostalgic desire for the objects existing in their original appearance as they are - natural condition that our gaze had not been distorted yet by anamorphic stains. - and the state when we are conscious of them normally. Such desire offers an opportunity for us to get out of mental depression rather than operates to us as an abnormal crack. It's a successive process of effort to search for lost subject and Paradise Lost facing up to reality of subject human that is to be a subject of world and life are ousted from their place by structure and authority of culture.

  • PDF