• Title/Summary/Keyword: Paraphrase

Search Result 29, Processing Time 0.03 seconds

Automatic Extraction of Paraphrases from a Parallel Bible Corpus (정렬된 성경 코퍼스로부터 바꿔쓰기표현(paraphrase)의 자동 추출)

  • Lee, Kong-Joo;Yun, Bo-Hyun
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.4
    • /
    • pp.323-336
    • /
    • 2006
  • In this paper, we present a pilot system that can extract paraphrases from a parallel corpus using to-training method. Paraphrases are useful for the applications that should rreate a varied ind fluent text, such as machine translation, question-answering system, and multidocument summarization system. One of the difficulties in extracting paraphrases is to find a rich source from which we can extract paraphrases. The bible is one of the good sources fur extracting paraphrases as it has several Korean versions in which every sentence can be easily aligned by the chapter and the verse. We ran extract not only the lexical-level paraphrases but also the phrasal-level paraphrases from the parallel corpus which consists of the bibles using co-training method.

  • PDF

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

  • Kong, Leilei;Han, Zhongyuan;Qi, Haoliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.9
    • /
    • pp.4814-4832
    • /
    • 2019
  • Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.

Generation Paraphrase using Pointer Generation Network (포인터 생성 네트워크를 이용한 패러프레이즈 생성)

  • Park, Da-Sol;Kim, Young-kil;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.535-539
    • /
    • 2020
  • 다양한 발화를 모델링하는 요구는 자연어 처리 분야에서 꾸준히 있었으며 단어, 구 또는 문장과 동등한 의미 콘텐츠를 자동으로 식별하고 생성하는 것은 자연어 처리의 중요한 부분이다. 본 논문에서는 포인터 생성 네트워크(Pointer Generate Nework)를 이용하여 패러프레이즈 생성 모델을 제안한다. 제안한 모델의 성능을 측정하기 위해 사람이 직접 구축한 유사 문장 코퍼스를 이용하였으며, 토큰 단위의 BLEU-4 0.250, ROUGE_L 0.455, CIDEr 2.190의 성능을 보였다. 하지만 입력 문장과 동일한 문장을 출력하는 문제점이 존재하여 빔서치(beam search)를 적용하여 입력 문장과 비교하여 생성 문장을 선택하는 방식을 적용하였다. 입력 문장과 동일한 문장을 제외한 문장으로 평가를 진행했으며, 토큰 단위의 BLEU-4 0.234, ROUGE_L 0.459, CIDEr 2.041의 성능을 보였으나, 패러프레이즈 생성 데이터 양이 크게 증가하였다. 본 연구는 문장 간의 의미적으로 동일한 정보를 정확하게 추출할 수 있게 됨으로써 정보 추출, 온톨로지 생성에 도움이 될 것이다. 또한 이러한 기법이 챗봇에서 사용자의 의도 탐지 및 MRC와 같은 자연어 처리의 여러 분야에 유용한 자원으로 사용될 것이다.

  • PDF

A Study on the Construction of keyphrase dataset for paraphrase extraction (패러프레이즈 추출을 위한 키프레이즈 데이터셋 구축 방법론 연구)

  • Kang, Hyerin;Kang, Yejee;park, Seoyoon;Jang, Yeonji;Kim, Hansaem
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.357-362
    • /
    • 2020
  • 자연어 처리 응용 시스템이 패러프레이즈 표현을 얼마나 정확하게 포착하는가에 따라 응용 시스템의 성능 측면에서 차이가 난다. 따라서 자연어 처리의 응용 분야 전반에서 패러프레이즈 표현에 대한 중요성이 커지고 있다. 시스템의 성능 향상을 위해서는 모델을 학습시킬 충분한 말뭉치가 필요하다. 특히 이러한 패러프레이즈 말뭉치를 구축하기 위해서는 정확한 패러프레이즈 추출이 필수적이다. 따라서 본 연구에서는 패러프레이즈를 추출을 위한 언어 자원으로 키프레이즈 데이터셋을 제안하고 이를 기반으로 유사한 의미를 전달하는 패러프레이즈 관계의 문장을 추출하였다. 구축한 키프레이즈 데이터셋을 패러프레이즈 추출에 활용한다면 본 연구에서 수행한 것과 같은 간단한 방법으로 패러프레이즈 관계에 있는 문장을 찾을 수 있다는 것을 보였다.

  • PDF

Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language (말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결)

  • Park, Hancheol;Kim, Jung-Ho;Park, Jong C.
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.163-170
    • /
    • 2017
  • Despite the rise of studies in spoken to sign language translation, low-resource problems of sign language corpus have been rarely addressed. As a first step towards translating from spoken to sign language, we addressed the problems arising from resource scarcity when translating spoken language to manual signals translation using statistical machine translation techniques. More specifically, we proposed three preprocessing methods: 1) paraphrase generation, which increases the size of the corpora, 2) lemmatization, which increases the frequency of each word in the corpora and the translatability of new input words in spoken language, and 3) elimination of function words that are not glossed into manual signals, which match the corresponding constituents of the bilingual sentence pairs. In our experiments, we used different types of English-American sign language parallel corpora. The experimental results showed that the system with each method and the combination of the methods improved the quality of manual signals translation, regardless of the type of the corpora.

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.

Jointly learning class coincidence classification for FAQ classification (FAQ 분류 성능 향상을 위한 클래스 일치 여부 결합 학습 모델)

  • Yang, Dongil;Ham, Jina;Lee, Kangwook;Lee, Jiyeon
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.12-17
    • /
    • 2019
  • FAQ(Frequently Asked Questions) 질의 응답 시스템은 자주 묻는 질문과 답변을 정의하고, 사용자 질의에 대해 정의된 답변 중 가장 알맞는 답변을 추론하여 제공하는 시스템이다. 정의된 대표 질문 및 대응하는 답변을 클래스(Class)라고 했을 때, FAQ 질의 응답 시스템은 분류(Classification) 문제라고 할 수 있다. 종래의 FAQ 분류는 동일 클래스 내 동의 문장(Paraphrase)에서 나타나는 공통적인 특징을 통해 분류 문제를 학습하였으나, 이는 비슷한 단어 구성을 가지면서 한 두 개의 단어에 의해 의미가 다른 문장의 차이를 구분하지 못하며, 특히 서로 다른 클래스에 속한 학습 데이터 간에 비슷한 의미를 가지는 문장이 존재할 때 클래스 분류에 오류가 발생하기 쉬운 문제점을 가지고 있다. 본 논문에서는 이 문제점을 해결하고자 서로 다른 클래스 내의 학습 데이터 문장들이 상이한 클래스임을 구분할 수 있도록 클래스 일치 여부(Class coincidence classification) 문제를 결합 학습(Jointly learning)하는 기법을 제안한다. 동일 클래스 내 학습 문장의 무작위 쌍(Pair)을 생성 및 학습하여 해당 쌍이 같은 클래스에 속한다는 것을 학습하게 하면서, 동시에 서로 다른 클래스 간 학습 문장의 무작위 쌍을 생성 및 학습하여 해당 쌍은 상이한 클래스임을 구분해 내는 능력을 함께 학습하도록 유도하였다. 실험을 위해서는 최근 발표되어 자연어 처리 분야에서 가장 좋은 성능을 보이고 있는 BERT 의 텍스트 분류 모델을 이용했으며, 제안한 기법을 적용한 모델과의 성능 비교를 위해 한국어 FAQ 데이터를 기반으로 실험을 진행했다. 실험 결과, 분류 문제만 단독으로 학습한 BERT 기본 모델보다 본 연구에서 제안한 클래스 일치 여부 결합 학습 모델이 유사한 문장들 간의 차이를 구분하며 유의미한 성능 향상을 보인다는 것을 확인할 수 있었다.

  • PDF

The Communication Repair Strategy Characteristics According to Communication Breakdown of Elderly Man With Alzheimer's Dementia (알츠하이머 치매 노인의 의사소통 단절에 따른 의사소통 회복전략 특성)

  • Kim, Sun-Young;Park, Hee-June
    • Therapeutic Science for Rehabilitation
    • /
    • v.8 no.4
    • /
    • pp.53-63
    • /
    • 2019
  • Objective : Many communication recovery strategies should be used when communication breakdowns occur for successful communication, however, communication problems increase due to inadequate use of such strategies in older people with dementia. The purpose of this study was to investigate the difference of recovery strategy between dementia and the elderly in conversational discourse. Method : The subjects were eight of Alzheimer's dementia and 10 general elderly. Conversation discourse tasks were conducted face-to-face with the subjects. Communication breakdown and communication recovery strategies were analyzed based on 200 utterances collected in the conversation discourse. Result : First, the AD group had more communication breakdown than the control group, but the recovery rate did not differ between the groups. Second, in the AD group, the nonspecific recovery strategy and the clarification demand strategy were used as the expression strategy. The recovery rate after using expressive strategy was more than 90% in explanation strategy, combined strategy, nonspecific repair strategy, and repetition confirmation strategy. The response strategy used a lot of paraphrase strategy and combined strategies, and the recovery rate after using the response strategy was 100% for the simplification strategy, repeat strategy and gesture strategy. Conclusion : The AD group showed more breakdown of research subjects and breakdown of researchers than control group, and it showed ability to use various expression strategy and response strategy though there was difference in repair rate between communication repair strategy. AD group used nonspecific repair strategy in expression strategy the most and paraphrase strategy in response strategy the most. This shows different characteristic from ordinary elderly people. Therefore, it is necessary to utilize this repair strategy for rehabilitation of AD elderly.

Plans for Teaching and Learning of Learner-centered Activities in Korean Verse Education (시조교육의 현황과 학습자 활동 중심의 교수$\cdot$학습 모형 - 고등학교 국어 교과서 수록 작품 <시조>를 중심으로 -)

  • Kang Myong-Hye
    • Sijohaknonchong
    • /
    • v.20
    • /
    • pp.141-171
    • /
    • 2004
  • Even though only 3 sijo are in high school textbook. through these 3 sijo each type can be understood in that each represents pyung sijo, sasul sijo, and present sijo. To learn with learner-centered activities, which aim for full knowledge acquisition regarding literary works, as the preparing stage, students can learn what theyll learn by teachers. Sijo are, so to speak, formed with three chapters, and stand for the world that is colorless, scentless, and flavorless. So, the theme can be found with ease. Compared with other genres, sijo can be formed creating background with ease. Moreover, sijo are not too long, so learners can paraphrase it. Sijo that express private experiences with the everyday language can be related to other genres or everyday language. So, sijo are last to present. In the teaching phase, on the gradation of concretion and gradation, writing or presentation activities are presented. After classroom, learners keep a reaction journal. In the phase of concretion and gradation, learners can apprehend that typical differences of the emotions of poetic speakers is from typical differences, even though emotions of poetic speakers of (1)$\cdot$(2)$\cdot$(3) that is each stand for pyung sijo, sasul sijo, and present sijo are roughly summarized loneliness, desolateness, and gloominess. Moreover, these typical differences are from social, political. and cultural settings, namely, the differences of contexts. In this teaching model. learners should prepare for content regarding context and text before the class. Teachers should act as an assistant to help learners pre-understand their subjective experiences and imaginations.

  • PDF