• Title/Summary/Keyword: Seq2seq(Sequence to sequence)

검색결과 48건 처리시간 0.024초

A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model (Sequence-to-Sequence 모델 기반으로 한 한국어 형태소 분석의 재순위화 모델)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제7권4호
    • /
    • pp.121-128
    • /
    • 2018
  • A Korean morphological analyzer adopts sequence-to-sequence (seq2seq) model, which can generate an output sequence of different length from an input. In general, a seq2seq based Korean morphological analyzer takes a syllable-unit based sequence as an input, and output a syllable-unit based sequence. Syllable-based morphological analysis has the advantage that unknown words can be easily handled, but has the disadvantages that morpheme-based information is ignored. In this paper, we propose a reranking model as a post-processor of seq2seq model that can improve the accuracy of morphological analysis. The seq2seq based morphological analyzer can generate K results by using a beam-search method. The reranking model exploits morpheme-unit embedding information as well as n-gram of morphemes in order to reorder K results. The experimental results show that the reranking model can improve 1.17% F1 score comparing with the original seq2seq model.

A Dialogue System using CNN Sequence-to-Sequence (CNN Sequence-to-Sequence를 이용한 대화 시스템 생성)

  • Seong, Su-Jin;Sin, Chang-Uk;Park, Seong-Jae;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 한국정보과학회언어공학연구회 2018년도 제30회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.151-154
    • /
    • 2018
  • 본 논문에서는 CNN Seq2Seq 구조를 이용해 한국어 대화 시스템을 개발하였다. 기존 Seq2Seq는 RNN 혹은 그 변형 네트워크에 데이터를 입력하고, 입력이 완료된 후의 은닉 층의 embedding에 기반해 출력열을 생성한다. 우리는 CNN Seq2Seq로 입력된 발화에 대해 출력 발화를 생성하는 대화 모델을 학습하였고, 그 성능을 측정하였다. CNN에 대해서는 약 12만 발화 쌍을 이용하여 학습하고 1만 발화 쌍으로 실험하였다. 평가 결과 제안 모델이 기존의 RNN 기반 모델에 비해 우수한 결과를 보였다.

  • PDF

Automatic Conversion of English Pronunciation Using Sequence-to-Sequence Model (Sequence-to-Sequence Model을 이용한 영어 발음 기호 자동 변환)

  • Lee, Kong Joo;Choi, Yong Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제6권5호
    • /
    • pp.267-278
    • /
    • 2017
  • As the same letter can be pronounced differently depending on word contexts, one should refer to a lexicon in order to pronounce a word correctly. Phonetic alphabets that lexicons adopt as well as pronunciations that lexicons describe for the same word can be different from lexicon to lexicon. In this paper, we use a sequence-to-sequence model that is widely used in deep learning research area in order to convert automatically from one pronunciation to another. The 12 seq2seq models are implemented based on pronunciation training data collected from 4 different lexicons. The exact accuracy of the models ranges from 74.5% to 89.6%. The aim of this study is the following two things. One is to comprehend a property of phonetic alphabets and pronunciations used in various lexicons. The other is to understand characteristics of seq2seq models by analyzing an error.

Seq2Seq model-based Prognostics and Health Management of Robot Arm (Seq2Seq 모델 기반의 로봇팔 고장예지 기술)

  • Lee, Yeong-Hyeon;Kim, Kyung-Jun;Lee, Seung-Ik;Kim, Dong-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • 제12권3호
    • /
    • pp.242-250
    • /
    • 2019
  • In this paper, we propose a method to predict the failure of industrial robot using Seq2Seq (Sequence to Sequence) model, which is a model for transforming time series data among Artificial Neural Network models. The proposed method uses the data of the joint current and angular value, which can be measured by the robot itself, without additional sensor for fault diagnosis. After preprocessing the measured data for the model to learn, the Seq2Seq model was trained to convert the current to angle. Abnormal degree for fault diagnosis uses RMSE (Root Mean Squared Error) during unit time between predicted angle and actual angle. The performance evaluation of the proposed method was performed using the test data measured under different conditions of normal and defective condition of the robot. When the Abnormal degree exceed the threshold, it was classified as a fault, and the accuracy of the fault diagnosis was 96.67% from the experiment. The proposed method has the merit that it can perform fault prediction without additional sensor, and it has been confirmed from the experiment that high diagnostic performance and efficiency are available without requiring deep expert knowledge of the robot.

A Study for Sequence-to-sequence based Korean Abstract Meaning Representation (AMR) Parsing (Seq2seq 기반 한국어 추상 의미 표상(AMR) 파싱 연구)

  • Hao Huang;Hyejin Park;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 한국정보과학회언어공학연구회 2022년도 제34회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.257-261
    • /
    • 2022
  • 본 연구에서는 한국어 AMR 자동 파싱을 하기 위해 seq2seq 방법론을 적용하였다. Seq2seq 방법론은 AMR 파싱 태스크를 자연어 문장을 바탕으로 선형화된(linearization) 그래프의 문자열을 번역해내는 과정을 거친다. 본고는 Transformer 모델을 파싱 모델로 적용하여 2020년 공개된 한국어 AMR와 자체적으로 구축된 한국어 <어린 왕자> AMR 데이터에서 실험을 진행하였다. 이 연구에서 seq2seq 방법론 기반 한국어 AMR 파싱의 성능은 Smatch F1-Score 0.30으로 나타났다.

  • PDF

Sentence-Chain Based Seq2seq Model for Corpus Expansion

  • Chung, Euisok;Park, Jeon Gue
    • ETRI Journal
    • /
    • 제39권4호
    • /
    • pp.455-466
    • /
    • 2017
  • This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.

COVID-19 Chat Bot by using Deep Learning (딥러닝을 이용한 코로나 챗봇)

  • Lee, Se-Hoon;Jeong, Ji-Seok;Kim, Young-Jin;Kwon, Hyeon-guen;Seo, Hee-Ju
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 한국컴퓨터정보학회 2020년도 제62차 하계학술대회논문집 28권2호
    • /
    • pp.315-316
    • /
    • 2020
  • 본 논문에서는 현재 이슈가 되고 있는 코로나에 대해서 사람들이 실생활에서 궁금해할 정보들을 Seq2seq 기술을 사용한 챗봇으로 정보를 제공한다.

  • PDF

Development of Contig Assembly Program for Nucleotide Sequencing (염기서열 해독작업을 위한 핵산 단편 조립 프로그램의 개발)

  • 이동훈
    • Korean Journal of Microbiology
    • /
    • 제35권2호
    • /
    • pp.121-127
    • /
    • 1999
  • An effective computer program for assembling fragments in DNA sequencing has been developed. The program, called SeqEditor (Sequence Editor), is usable on the pcrsonal computer systems of MS-Widows which is the mosl popular operating system in Korea. It c'm recd several sequence file formats such as GenBak, FASTA, and ASCII. In the SeqEditor program, a dynamic programming algorihm is applied to compute the maximalscoring overlapping alignment between each pjlr of fragments. A novel feature of the program is that SeqEdilor implemnents interaclive operation with a graphical user interface. The performance lests of the prograln 011 fragmen1 data from 16s and 18s rDNA sequencing pi-ojects produced saiisIactory results. This program may be useful to a person who has work of time with large-scale DNA sequencing projects.

  • PDF

Big Data Analytics in RNA-sequencing (RNA 시퀀싱 기법으로 생성된 빅데이터 분석)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • 제55권4호
    • /
    • pp.235-243
    • /
    • 2023
  • As next-generation sequencing has been developed and used widely, RNA-sequencing (RNA-seq) has rapidly emerged as the first choice of tools to validate global transcriptome profiling. With the significant advances in RNA-seq, various types of RNA-seq have evolved in conjunction with the progress in bioinformatic tools. On the other hand, it is difficult to interpret the complex data underlying the biological meaning without a general understanding of the types of RNA-seq and bioinformatic approaches. In this regard, this paper discusses the two main sections of RNA-seq. First, two major variants of RNA-seq are described and compared with the standard RNA-seq. This provides insights into which RNA-seq method is most appropriate for their research. Second, the most widely used RNA-seq data analyses are discussed: (1) exploratory data analysis and (2) pathway enrichment analysis. This paper introduces the most widely used exploratory data analysis for RNA-seq, such as principal component analysis, heatmap, and volcano plot, which can provide the overall trends in the dataset. The pathway enrichment analysis section introduces three generations of pathway enrichment analysis and how they generate enriched pathways with the RNA-seq dataset.

Identification of Alternative Splicing and Fusion Transcripts in Non-Small Cell Lung Cancer by RNA Sequencing

  • Hong, Yoonki;Kim, Woo Jin;Bang, Chi Young;Lee, Jae Cheol;Oh, Yeon-Mok
    • Tuberculosis and Respiratory Diseases
    • /
    • 제79권2호
    • /
    • pp.85-90
    • /
    • 2016
  • Background: Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. Methods: RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. Results: RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. Conclusion: In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.