• 제목/요약/키워드: 영어처리

Search Result 471, Processing Time 0.032 seconds

Open-domain Question Answering Using Lexico-Semantic Patterns (Lexico-Semantic Pattern을 이용한 오픈 도메인 질의 응답 시스템)

  • Lee, Seung-Woo;Jung, Han-Min;Kwak, Byung-Kwan;Kim, Dong-Seok;Cha, Jeong-Won;An, Joo-Hui;Lee, Gary Geun-Bae;Kim, Hark-Soo;Kim, Kyung-Sun;Seo, Jung-Yun
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.538-545
    • /
    • 2001
  • 본 연구에서는 오픈 도메인에서 동작할 수 있는 질의 응답 시스템(Open-domain Question Answer ing System)을 구현하고 영어권 TREC에 참가한 결과를 기술하였다. 정답 유형을 18개의 상위 노드를 갖는 계층구조로 분류하였고, 질문 처리에서는 LSP(Lexico-Semantic Pattern)으로 표현된 문법을 사용하여 질문의 정답 유형을 결정하고, lemma 형태와 WordNet 의미, stem 형태의 3가지 유형의 키워드로 구성된 질의를 생성한다. 이 질의를 바탕으로, 패시지 선택에서는 문서검색 엔진에 의해 검색된 문서들을 문장단위로 나눠 정수를 계산하고, 어휘체인(Lexical Chain)을 고려하여 인접한 문장을 결합하여 패시지를 구성하고 순위를 결정한다. 상위 랭크의 패시지를 대상으로, 정답 처리에서는 질문의 정답 유형에 따라 품사와 어휘, 의미 정보로 기술된 LSP 매칭과 AAO (Abbreviation-Appositive-Definition) 처리를 통해 정답을 추출하고 정수를 계산하여 순위를 결정한다. 구현된 시스템의 성능을 평가하기 위해 TREC10 QA Track의 main task의 질문들 중, 200개의 질문에 대해 TRIC 방식으로 자체 평가를 한 결과, MRR(Mean Reciprocal Rank)은 0.341로 TREC9의 상위 시스템들과 견줄 만한 성능을 보였다.

  • PDF

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

General Relation Extraction Using Probabilistic Crossover (확률적 교차 연산을 이용한 보편적 관계 추출)

  • Je-Seung Lee;Jae-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.371-380
    • /
    • 2023
  • Relation extraction is to extract relationships between named entities from text. Traditionally, relation extraction methods only extract relations between predetermined subject and object entities. However, in end-to-end relation extraction, all possible relations must be extracted by considering the positions of the subject and object for each pair of entities, and so this method uses time and resources inefficiently. To alleviate this problem, this paper proposes a method that sets directions based on the positions of the subject and object, and extracts relations according to the directions. The proposed method utilizes existing relation extraction data to generate direction labels indicating the direction in which the subject points to the object in the sentence, adds entity position tokens and entity type to sentences to predict the directions using a pre-trained language model (KLUE-RoBERTa-base, RoBERTa-base), and generates representations of subject and object entities through probabilistic crossover operation. Then, we make use of these representations to extract relations. Experimental results show that the proposed model performs about 3 ~ 4%p better than a method for predicting integrated labels. In addition, when learning Korean and English data using the proposed model, the performance was 1.7%p higher in English than in Korean due to the number of data and language disorder and the values of the parameters that produce the best performance were different. By excluding the number of directional cases, the proposed model can reduce the waste of resources in end-to-end relation extraction.

Lexical Status of Main and Supportive Verbs in Mental Lexicon (본용언과 보조용언의 의미 처리에 관한 연구 : 일반인과 실어증 환자를 대상으로)

  • Moon, Young-Sun;Kim, Dong-Hwui;Pyun, Sung-Bum;Hwang, Yu-Mi;Jung, Jae-Bun;Nam, Ki-Chun
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.447-454
    • /
    • 1999
  • 본 연구에서는 국어의 본용언과 보조용언이 어떠한 방식으로 처리되는지에 대해 알아보는 것이 목적이다. 영어와 달리 국어는 보조용언이 조동사의 역할을 담당하여 화자의 심리적인 상태나 상(想)을 나타내는 기능을 한다. 따라서 같은 어휘가 본용언으로 쓰일 때와 보조용언으로 쓰일 때 그 의미적 차이는 뚜렷하다. 특히 보조용언으로 쓰일 때는 어휘적 의미가 대부분 사라지고 추상적 의미만 남기 때문에 본용언과의 관련성을 따져보는 것도 중요한 연구과제이다. 또한 우리의 심성어휘집(mental lexicon)에서 본용언과 보조용언이 동일한 영역에서 처리되는지도 알아볼 필요가 있다. 만일 동일한 심성어휘집을 사용한다면 보조용언으로 쓰인 환경에서도 본용언의 어휘적 의미가 활성화될 것이다. 이에 대해 본 연구에서는 정상인 피험자와 실어증환자를 대상으로 실험을 하였다. 정상인 피험자를 대상으로는 SOA가 짧은 조건과 긴 조건에서 각각 보조용언을 어떻게 처리하는 지 살펴보았고, 실어증환자를 대상으로는 정상인 피험자와 비교해서 어떤 양상으로 보조용언을 처리하는 지 살펴보았다. 그 결과 정상인 피험자는 SOA가 짧은 조건에서는 본용언과 보조용언을 모두 동일한 방식으로 의미처리하였다. 즉 보조용언의 어휘적 의미가 본용언과 마찬가지로 SOA가 짧을 때는 활성화되었다. 그러나 SOA가 길어지면 보조용언은 문맥 정보로 인해 어휘적 의미가 억제되어 본용언과 다른 의미로 해석된다는 결론을 얻었다. 이런 정상인 피험자와 비교해 보았을 때, 실어증 환자는 두 가지 양상이 나타났다. 명칭성 실어증환자의 경우, 정상인과 비슷한 결과가 나왔으나 보조용언으로 쓰일 때, 본용언보다 어휘적 의미가 다소 불안정하게 활성화됨을 보였다. 그러나 이해성 실어증환자의 경우, 보조용언으로 쓰일 때 어휘적 의미가 전혀 활성화되지 않아 정상인과는 다른 언어처리를 하고 있음이 밝혀졌다.류의 의미가 모두 활성화되는 것을 보여 주었다. 즉, "먹은"과 간은 어절 이해는 구성 형태소로의 분석과 구성 형태소 어휘 접근을 통해 어절 이해가 이루어진다는 가설을 지지하고 있다. 실험 2에서는 실험 1과 다르게 한 뜻으로만 안일 수밖에 없는 "쥐어"와 같은 어절을 사용하여 이런 경우에도(즉, 어절의 문맥이 특정 뜻으로 한정하는 경우) 구성 형태소로의 분석 과정이 일어나는지를 조사하였다. 실험 2의 결과는 실험 1의 결과와는 다르게 어간의 한가지 의미와 관련된 조건만 촉진적 점화 효과가 나타나는 것을 보여주었다. 특히, 실험 2에서 SOA가 1000msec일 경우, 두 의미의 활성화가 나타나는 것을 보여주었는데, 이 같은 결과는 어절 문맥이 특정한 의미로 한정시킬 경우는 심성어휘집에 활용형태로 들어있다는 것이다. 또한 명칭성 실어증 환자의 경우에는 즉시적 점화과제에서는 일반인과 같은 형태소 처리과정을 보였으나, 그이후의 처리과정이 일반인과 다른 형태를 보였다. 실험 1과 실험 2의 결과는 한국어 어절 분석이 구문분석 또는 활용형태를 통해 어휘 접근되는 가설을 지지하고 있다. 또 명칭성 실어증 환자의 경우에는 지연된 점화과제에서 형태소 처리가 일반인과 다르다는 것이 밝혀졌다. 이 결과가 옳다면 한국의 심성 어휘집은 어절 문맥에 따라서 어간이나 어근 또는 활용형 그 자체로 이루어져 있을 것이다.으며, 레드 클로버 + 혼파 초지가 건물수량과 사료가치를 높이는데 효과적이었다.\ell}$ 이었으며 , yeast extract 첨가(添加)하여 배양시(培養時)는 yeast extract 농도(濃度)가 증가(增加)함에 따라 단백질(蛋白質) 함량(含量)도 증가(增加)하였다. 7. CHS-13 균주(菌株)의 RNA 함량(含量)은 $4.92{\times}10^{-2 }\;mg/m{\ell}$이었으며 yeast ext

  • PDF

Processing of syntactic dependency in Korean relative clauses: Evidence from an eye-tracking study (안구이동추적을 통해 살펴본 관계절의 통사처리 과정)

  • Lee, Mi-Seon;Yong, Nam-Seok
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.4
    • /
    • pp.507-533
    • /
    • 2009
  • This paper examines the time course and processing patterns of filler-gap dependencies in Korean relative clauses, using an eyetracking method. Participants listened to a short story while viewing four pictures of entities mentioned in the story. Each story is followed by an auditorily presented question involving a relative clause (subject relative or dative relative). Participants' eye movements in response to the question were recorded. Results showed that the proportion of looks to the picture corresponding to a filler noun significantly increased at the relative verb affixed with a relativizer, and was largest at the filler where the fixation duration on the filler picture significantly increased. These results suggest that online resolution of the filler-gap dependency only starts at the relative verb marked with a relativiser and is finally completed at the filler position. Accordingly, they partly support the filler-driven parsing strategy for Korean, as for head-initial languages. In addition, the different patterns of eye movements between subject relatives and dative relatives indicate the role of case markers in parsing Korean sentences.

  • PDF

Design of Regional Function Message of AIS for Hangul Text messaging (한글 텍스트 메시징을 위한 AIS 지역 기반 메시지 설계)

  • Yu, Dong-Hui
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.2
    • /
    • pp.77-81
    • /
    • 2013
  • The international standard AIS, which stands for the safety of ship navigation and vessel traffic management, provides 27 messages to exchange the navigational information of ship. Among 27 messages, message ID 6 and 8 are defined as the binary data format to exchange application specific information and are classified into IFM for international use and RFM for national or regional use. Since international standards are based on English, there have been some needs to exchange data in Hangul text for vessel traffic management to correct the static and dynamic ships' information. In this paper, I analyze international standards to provide a Hangul text messaging service based on RFM and propose a RFM message and a simple protocol to correct information of a ship.

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model (자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축)

  • Choi, Jun-Young;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.33-39
    • /
    • 2020
  • In the field of Natural Language Processing, Various research such as Translation, POS Tagging, Q&A, and Sentiment Analysis are globally being carried out. Sentiment Analysis shows high classification performance for English single-domain datasets by pretrained sentence embedding models. In this thesis, the classification performance is compared by Korean E-commerce online dataset with various domain attributes and 6 Neural-Net models are built as BOW (Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], and BERT(KoBERT)[4]. It has been confirmed that the performance of pretrained sentence embedding models are higher than word embedding models. In addition, practical Neural-Net model composition is proposed after comparing classification performance on dataset with 17 categories. Furthermore, the way of compressing sentence embedding model is mentioned as future work, considering inference time against model capacity on real-time service.

Implementation of TTS Engine for Natural Voice (자연음 TTS(Text-To-Speech) 엔진 구현)

  • Cho Jung-Ho;Kim Tae-Eun;Lim Jae-Hwan
    • Journal of Digital Contents Society
    • /
    • v.4 no.2
    • /
    • pp.233-242
    • /
    • 2003
  • A TTS(Text-To-Speech) System is a computer-based system that should be able to read any text aloud. To output a natural voice, we need a general knowledge of language, a lot of time, and effort. Furthermore, the sound pattern of english has a variable pattern, which consists of phonemic and morphological analysis. It is very difficult to maintain consistency of pattern. To handle these problems, we present a system based on phonemic analysis for vowel and consonant. By analyzing phonological variations frequently found in spoken english, we have derived about phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In conclusion, we have a rule data which consists of phoneme, and a engine which economize in system. The proposed system can use not only communication system, but also utilize office automation and so on.

  • PDF

Machine Learning Language Model Implementation Using Literary Texts (문학 텍스트를 활용한 머신러닝 언어모델 구현)

  • Jeon, Hyeongu;Jung, Kichul;Kwon, Kyoungah;Lee, Insung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.427-436
    • /
    • 2021
  • The purpose of this study is to implement a machine learning language model that learns literary texts. Literary texts have an important characteristic that pairs of question-and-answer are not frequently clearly distinguished. Also, literary texts consist of pronouns, figurative expressions, soliloquies, etc. They hinder the necessity of machine learning using literary texts by making it difficult to learn algorithms. Algorithms that learn literary texts can show more human-friendly interactions than algorithms that learn general sentences. For this goal, this paper proposes three text correction tasks that must be preceded in researches using literary texts for machine learning language model: pronoun processing, dialogue pair expansion, and data amplification. Learning data for artificial intelligence should have clear meanings to facilitate machine learning and to ensure high effectiveness. The introduction of special genres of texts such as literature into natural language processing research is expected not only to expand the learning area of machine learning, but to show a new language learning method.

Quantifying L2ers' phraseological competence and text quality in L2 English writing (L2 영어 학습자들의 연어 사용 능숙도와 텍스트 질 사이의 수치화)

  • Kwon, Junhyeok;Kim, Jaejun;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.281-284
    • /
    • 2017
  • On the basis of studies that show multi-word combinations, that is the field of phraseology, this study aims to examine relationship between the quality of text and phraseological competence in L2 English writing, following Yves Bestegen et al. (2014). Using two different association scores, t-score and Mutual Information(MI), which are opposite ways of measuring phraseological competence, in terms of scoring frequency and infrequency, bigrams from L2 writers' text scored based on a reference corpus, GloWbE (Corpus of Global Web based English). On a cross-sectional approach, we propose that the quality of the essays and the mean MI score of the bigram extracted from YELC, Yonsei English Learner Corpus, correlated to each other. The negative scores of bigrams are also correlated with the quality of the essays in the way that these bigrams are absent from the reference corpus, that is mostly ungrammatical. It indicates that increase in the proportion of the negative scored bigrams debases the quality of essays. The conclusion shows the quality of the essays scored by MI and t-score on cross-sectional approach, and application to teaching method and assessment for second language writing proficiency.

  • PDF