• Title/Summary/Keyword: 어순

Search Result 179, Processing Time 0.026 seconds

Development of Subcategorization Dictionary for the Disambiguation Korean Language Analysis (한국어 분석의 중의성 해소를 위한 하위범주화 사전 구축)

  • Lee, Su-Seon;Park, Hyun-Jae;Woo, Yo-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.257-264
    • /
    • 1999
  • 자연언어 처리에 있어 문장의 성분 구조를 파악하는 통사적 해석에서는 애매성 있는 결과가 많이 생성된다. 한국어의 경우 어순 등의 통사적 특성뿐 아니라 상황과 의미, 문맥이 문장의 분석에 더 중요한 역할을 하기 때문에 문맥 자유 문법에 의한 접근 방법만으로는 중의적 구조의 해결이 어렵다. 이는 또한 의미 분석시 애매성을 증가시키는 원인이 된다. 이러한 통사적, 의미적 중의성 해결을 위해 용언 중심의 하위범주화 사전을 구축하였다. 본 논문에서는 용언에 따라 제한될 수 있는 하위범주 패턴을 정의하고 패턴에 따라 하위범주 사전을 구축하였다. 하위범주화 사전에는 명사의 시소러스와 정합하여 보어를 선택 제한(Selectional Restriction)할 수 있도록 용언과 명사와의 의미적 연어 관계에 따라 의미마커를 부여했다. 말뭉치를 통해 수집된 용언 12,000여개를 대상으로 25,000여개의 하위범주 패턴을 구축하였고 이렇게 구축한 하위범주화 사전이 120,000여 명사에 대한 의미를 갖고 있는 계층 시소러스 의미 사전과 연동하도록 하였다. 또한 논문에서 구현된 하위범주화 사전이 구문과 어휘의 중의성을 어느 정도 해소하는지 확인하기 위해 반자동적으로 의미 태깅(Sense Tagging)된 2만여 문장의 말뭉치를 통해 검증 작업을 수행하고, 의존관계와 어휘의 의미를 포함하고 있는 말뭉치에 하위범주 패턴이 어느정도 정합되는지를 분석하여, 하위범주 패턴과 말뭉치의 의존관계만 일치하는 경우와 어휘의 의미까지 일치하는 경우에 대해 평가한다. 이 과정에서 하위범주 패턴에 대한 빈도 정보나, 연어 정보를 수집하여 데이터베이스에 포함시키고, 각 의미역과 용언의 통계적 공기 정보 등을 추출하는 방법도 제시하고자 한다.을 입증하였다.적응에 문제점을 가지기도 하였다. 본 연구에서는 그 동안 계속되어 온 한글과 한잔의 사용에 관한 논쟁을 언어심리학적인 연구 방법을 통해 조사하였다. 즉, 글을 읽는 속도, 글의 의미를 얼마나 정확하게 이해했는지, 어느 것이 더 기억에 오래 남는지를 측정하여 어느 쪽의 입장이 옮은 지를 판단하는 것이다. 실험 결과는 문장을 읽는 시간에서는 한글 전용문인 경우에 월등히 빨랐다. 그러나. 내용에 대한 기억 검사에서는 국한 혼용 조건에서 더 우수하였다. 반면에, 이해력 검사에서는 천장 효과(Ceiling effect)로 두 조건간에 차이가 없었다. 따라서, 본 실험 결과에 따르면, 글의 읽기 속도가 중요한 문서에서는 한글 전용이 좋은 반면에 글의 내용 기억이 강조되는 경우에는 한자를 혼용하는 것이 더 효율적이다.이 높은 활성을 보였다. 7. 이상을 종합하여 볼 때 고구마 끝순에는 페놀화합물이 다량 함유되어 있어 높은 항산화 활성을 가지며, 아질산염소거능 및 ACE저해활성과 같은 생리적 효과도 높아 기능성 채소로 이용하기에 충분한 가치가 있다고 판단된다.등의 관련 질환의 예방, 치료용 의약품 개발과 기능성 식품에 효과적으로 이용될 수 있음을 시사한다.tall fescue 23%, Kentucky bluegrass 6%, perennial ryegrass 8%) 및 white clover 23%를 유지하였다. 이상의 결과를 종합할 때, 초종과 파종비율에 따른 혼파초지의 건물수량과 사료가치의 차이를 확인할 수 있었으며, 레드 클로버 + 혼파 초지가 건물수량과 사료가치를 높이는데 효과적이었다.\ell}$ 이었으며 , yeast extract 첨가(添加)하여 배양시(培養時)는 yeast extract

  • PDF

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning (접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정)

  • Seok, Miran;Kim, Yu-Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.555-562
    • /
    • 2016
  • Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

필름 스피커 적용을 위한 PZT/polymer 복합체의 후막 제조 및 압전 특성 평가

  • Son, Yong-Ho;Eo, Sun-Cheol;Kim, Seong-Jin;Gwon, Seong-Yeol;Gwon, Sun-Yong
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2007.11a
    • /
    • pp.346-346
    • /
    • 2007
  • 압전세라믹 재료는 현재 압전 변압기, actuator, transducer, sensor, speaker 등에 광범위하게 이용이 되고 있다. 이 중에서 압전세라믹 소결체를 이용한 스피커의 제조는 가공이 까다롭고, 대형의 크기로 제작 시 소자가 깨지는 등의 많은 제약을 받고 있으며, 저음 특성이 떨어져 응용 범위가 한정되어 있다. 따라서 최근에는 이러한 단점을 극복하기 위하여 세라믹/고분자 복합체를 이용한 필름 스피커를 제작하고자 시도하고 있다. 이러한 세라믹/고분자 0-3형 압전 복합체를 이용할 경우, 제품의 경량화를 실현할 수 있고, 크기나 환경의 영향을 거의 받지 않으므로, 고기능성 스피커로의 응용에 적합할 것으로 보인다. 따라서 본 연구에서는 PZT계의 세라믹와 PVDF, PVDF-TrFE, Polyester, acrylic resin 등의 여러 고분자 물질과의 복합체를 제조하여 압전특성을 평가하였다. 본 실험은 먼저 $(Pb_{1-a-b}Ba_aCd_b)(Zr_xTi_{1-x})_{1-c-d}(Ni_{1/3}Nb_{2/3})_c(Zn_{1/3}Nb_{2/3})_dO_3$ (이하 PZT라 표기)의 최적화 조성을 선택하여, $1050^{\circ}C$에서 소결된 분말을 48시간 ball milling방법 로 약 $1{\mu}m$ 크기로 분쇄하였다. 고분자 물질들은 알맞은 용제들을 선택하여 녹였다. 그 다음 소결된 PZT분말과 고분자를 50:50, 60:40, 65:35, 70:30등의 무게 분율로 혼합하고, 분산제, 소포제 등을 첨가하여 3단 roll mill을 이용하여 충분히 분산시켜 페이스트 (Paste)를 제조하였다. 제조된 페이스트를 ITO가 코팅된 PET필름 위에 스크린 프린팅 법을 사용하여 인쇄하여 $120^{\circ}C$에서 5분간 건조하였다. 코팅된 복합체의 두께는 약 $80{\mu}m$ 정도로 측정되었다. Ag 페이스트를 이용한 상부 전극 형성에도 스크린 프린팅 법을 적용하였다. 이를 $120^{\circ}C$에서 4 kV/mm의 DC 전계로 분극 공정을 수행한 후 전기적 특성을 평가하였다. 유전특성을 조사하기 위해서 LCR meter (EDC-1620)를 사용하였고, 시편의 결정구조는 XRD (Rigaku; D/MAX-2500H)을 통해 분석하였으며, 전자현미경(SEM)을 이용하여 미세구조를 분석하였다. 압전 전하상수$(d_{33})$ 값은 APC 8000 모델을 이용하여 측정하였다. PZT의 혼합비가 증가할수록 비유전율 및 압전 전하 상수 등의 전기적 특성이 증가되었다. 또 여러 고분자 물질 중에서 PVDF-TrFE 수지가 가장 우수한 특성을 보였다. 이는 PVDF-TrFE 수지가 압전성을 나타내기 때문인 것으로 판단되었다.

  • PDF

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

Translation of Korean Object Case Markers to Mongolian's Suffixes (한국어 목적격조사의 몽골어 격 어미 번역)

  • Setgelkhuu, Khulan;Shin, Joon Choul;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.2
    • /
    • pp.79-88
    • /
    • 2019
  • Machine translation (MT) system, especially Korean-Mongolian MT system, has recently attracted much attention due to its necessary for the globalization generation. Korean and Mongolian have the same sentence structure SOV and the arbitrarily changing of their words order does not change the meaning of sentences due to postpositional particles. The particles that are attached behind words to indicate their grammatical relationship to the clause or make them more specific in meaning. Hence, the particles play an important role in the translation between Korean and Mongolian. However, one Korean particle can be translated into several Mongolian particles. This is a major issue of the Korean-Mongolian MT systems. In this paper, to address this issue, we propose a method to use the combination of UTagger and a Korean-Mongolian particles table. UTagger is a system that can analyze morphologies, tag POS, and disambiguate homographs for Korean texts. The Korean-Mongolian particles table was manually constructed for matching Korean particles with those of Mongolian. The experiment on the test set extracted from the National Institute of Korean Language's Korean-Mongolian Learner's Dictionary shows that our method achieved the accuracy of 88.38% and it improved the result of using only UTagger by 41.48%.

The linguistic characteristics of Chinese character and Reading for the Analects of Confucius (한자(漢字)의 언어적 특성과 『논어(論語)』 읽기)

  • Kim, Sang-Rae
    • The Journal of Korean Philosophical History
    • /
    • no.30
    • /
    • pp.191-225
    • /
    • 2010
  • This paper is the outcome of attempting to approach for reading the Analects of Confucius through the polysemy of Chinese character and indecidability of articles. For this purpose at first, I explained which this Chinese character can be applied for 'philosophy language'. In the 16th century Matteo Ricci had tried to find out the possibility of ideographic script as standing for a universal language. On the other hand, Hegel and Heidegger strictly insisted on the Chinese character is inappropriate for expressing the logic thought of the human being. The reason was as next; firstly, this character had not the preposition and articles, and secondly the only one word could not indicates the bisemy including the meaning of opposition, lastly this language system expresses and communicates only with the change of word order without inflection. But With some scholar like Cassirer, Saussure and Derrida we can confirm the possibility which will discover the Chinese character for using the logic and reasoning language of from different view. Because in the language system of this Chinese character the connection of words in contexts is more important other than meaning as the individual word, in comparison to the language of the West. The Chinese character hides the original meaning until being what kind of event and thing relationship watch inside with different letters. So to speak, the Chinese character is called as 'language of indecidability'. For these points, even though The Chinese character lacks of preposition, articles, and inflection speech etc. the letter systematic, this language system can play a role for expressing as the philosophic language which manages with the complicated problems of the human being.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

Outline History of Corporation Yudohoi(儒道會) via 『Cheongeumrok(晴陰錄)』 by Hong Chan-Yu: "Volume of Materials" (『청음록(晴陰錄)』으로 본 (사(社))유도회(儒道會) 약사(略史))

  • Chaung, hoo soo
    • (The)Study of the Eastern Classic
    • /
    • no.55
    • /
    • pp.265-291
    • /
    • 2014
  • Cheongeumrok is the journal of Gwonwoo(卷宇) Hong Chan-yu(1915-2005) during the period of January 9, 1969~January 14, 1982. He was personally involved in the foundation of a corporation called Yudohoi and also all of its operation, which makes him the most knowledgeable person about its history. His Cheongeumrok thus seems worthy enough as a proper material to arrange its history. Cheongeumrok consists of total 19 books, amounting to approximately 3,300 pieces of squared manuscript paper containing 200 letters per piece. He wrote it in Chinese and sometimes followed the Hangul-style word order while writing in Chinese. Many parts of the manuscript were written in a cursive hand with many Chinese poems embedded throughout it. The manuscript offers major information related to the corporation Yudohoi extracted from his journal. 1. There was a meeting of promoters to commemorate the foundation of the corporation in November, 1968, and it was in January, 1969 that it was established after getting a permit from the Ministry of Culture and Communication in January, 1969(Permit No. of Ministry of Culture and Communication: Da(다)-2-3(Jongmu(宗務)1732.5)). 2. Its office was moved from the original location of the 3rd floor of Wonnam Building, 133-1 Wonnam-dong, Jongro-gu, Seoul(currently Daekhak Pharmacy in front of Seoul National University Hospital) to Room 388 of Gwangjang Company, 4 Yeji-dong, Jongro-gu(office of Heungsan Social Gathering) and to second floor of KyungBo building, 21 Kyansu-dong, and to 3rd floor of Geongguk Building in Gyeongwoon-dong. 3. Its operational costs were covered by the supports of Seong Sang-yeong, the eldest son of Seong Jong-ho, the chairman of the board, later Kim Won-tae and Gwon Tae-hun, next chairmen of the board, and Hong Chan-yun, a director, since 1979. 4. His Confucian activities include participating in Seonggyungwan Seokjeonje (成均館 釋奠), joining in the erection of the Parijangseo(巴里長書) Monument and the publication of its commemorative poetry book, compiling the biographies(not completed) of Confucian patriotic martyrs for independence, and participating in the establishment of family rituals and regulations as a practice member. 5. His Yudohoi had a dispute with Seonggyungwan and lost a suit at the High Court in July, 1975 and Supreme Court in February, 1976. 6. There were discussions about its unification with Seonggyungwan Yudohoi, but there was hardly any progress. 7. Yudohoi started to provide full-scale courses on Confucian and Chinese classics under the leadership of Director Hong Chan-yu in 1979, and they have continued on today. Its courses for scholarship students including those for common citizens boast a history of 29 years and 220 graduates.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.