• Title/Summary/Keyword: sequence.

Search Result 17,697, Processing Time 0.034 seconds

Korean phrase structure parsing using sequence-to-sequence learning (Sequence-to-sequence 모델을 이용한 한국어 구구조 구문 분석)

  • Hwang, Hyunsun;Lee, Changki
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.20-24
    • /
    • 2016
  • Sequence-to-sequence 모델은 입력열을 길이가 다른 출력열로 변환하는 모델로, 단일 신경망 구조만을 사용하는 End-to-end 방식의 모델이다. 본 논문에서는 Sequence-to-sequence 모델을 한국어 구구조 구문 분석에 적용한다. 이를 위해 구구조 구문 트리를 괄호와 구문 태그 및 어절로 이루어진 출력열의 형태로 만들고 어절들을 단일 기호 'XX'로 치환하여 출력 단어 사전의 수를 줄였다. 그리고 최근 기계번역의 성능을 높이기 위해 연구된 Attention mechanism과 Input-feeding을 적용하였다. 실험 결과, 세종말뭉치의 구구조 구문 분석 데이터에 대해 기존의 연구보다 높은 F1 89.03%의 성능을 보였다.

  • PDF

Korean morphological analysis and phrase structure parsing using multi-task sequence-to-sequence learning (Multi-task sequence-to-sequence learning을 이용한 한국어 형태소 분석과 구구조 구문 분석)

  • Hwang, Hyunsun;Lee, Changki
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.103-107
    • /
    • 2017
  • 한국어 형태소 분석 및 구구조 구문 분석은 한국어 자연어처리에서 난이도가 높은 작업들로서 최근에는 해당 문제들을 출력열 생성 문제로 바꾸어 sequence-to-sequence 모델을 이용한 end-to-end 방식의 접근법들이 연구되었다. 한국어 형태소 분석 및 구구조 구문 분석을 출력열 생성 문제로 바꿀 시 해당 출력 결과는 하나의 열로서 합쳐질 수가 있다. 본 논문에서는 sequence-to-sequence 모델을 이용하여 한국어 형태소 분석 및 구구조 구문 분석을 동시에 처리하는 모델을 제안한다. 실험 결과 한국어 형태소 분석과 구구조 구문 분석을 동시에 처리할 시 형태소 분석이 구구조 구문 분석에 영향을 주는 것을 확인 하였으며, 구구조 구문 분석 또한 형태소 분석에 영향을 주어 서로 영향을 줄 수 있음을 확인하였다.

  • PDF

Periodic Binary Sequence Time Offset Calculation Based on Number Theoretic Approach for CDMA System (CDMA 시스템을 위한 정수론 접근 방법에 의한 주기이진부호의 사건?? 계산)

  • 한영열
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.5
    • /
    • pp.952-958
    • /
    • 1994
  • In this paper a method calculates the time offset between a binary sequence and its shifted sequence based on the number theoretic approach is presented. Using this method the time offset between a binary sequence and its shifted sequence can be calculated. It has been recongnized that the defining the reference (zero-offset) sequence is important in synchronous code division multiple access(CDMA) system since the same spreading sequence are used by the all base station. The time offset of the sequence with respect to the zero offset sequence are used to distinguish signal received at a mobile station from different base stations. This paper also discusses a method that defines the reference sequence.

  • PDF

Korean phrase structure parsing using sequence-to-sequence learning (Sequence-to-sequence 모델을 이용한 한국어 구구조 구문 분석)

  • Hwang, Hyunsun;Lee, Changki
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.20-24
    • /
    • 2016
  • Sequence-to-sequence 모델은 입력열을 길이가 다른 출력열로 변환하는 모델로, 단일 신경망 구조만을 사용하는 End-to-end 방식의 모델이다. 본 논문에서는 Sequence-to-sequence 모델을 한국어 구구조 구문 분석에 적용한다. 이를 위해 구구조 구문 트리를 괄호와 구문 태그 및 어절로 이루어진 출력열의 형태로 만들고 어절들을 단일 기호 'XX'로 치환하여 출력 단어 사전의 수를 줄였다. 그리고 최근 기계번역의 성능을 높이기 위해 연구된 Attention mechanism과 Input-feeding을 적용하였다. 실험 결과, 세종말뭉치의 구구조 구문 분석 데이터에 대해 기존의 연구보다 높은 F1 89.03%의 성능을 보였다.

  • PDF

Protein Sequence Search based on N-gram Indexing

  • Hwang, Mi-Nyeong;Kim, Jin-Suk
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.46-50
    • /
    • 2006
  • According to the advancement of experimental techniques in molecular biology, genomic and protein sequence databases are increasing in size exponentially, and mean sequence lengths are also increasing. Because the sizes of these databases become larger, it is difficult to search similar sequences in biological databases with significant homologies to a query sequence. In this paper, we present the N-gram indexing method to retrieve similar sequences fast, precisely and comparably. This method regards a protein sequence as a text written in language of 20 amino acid codes, adapts N-gram tokens of fixed-length as its indexing scheme for sequence strings. After such tokens are indexed for all the sequences in the database, sequences can be searched with information retrieval algorithms. Using this new method, we have developed a protein sequence search system named as ProSeS (PROtein Sequence Search). ProSeS is a protein sequence analysis system which provides overall analysis results such as similar sequences with significant homologies, predicted subcellular locations of the query sequence, and major keywords extracted from annotations of similar sequences. We show experimentally that the N-gram indexing approach saves the retrieval time significantly, and that it is as accurate as current popular search tool BLAST.

  • PDF

Robust Digital Watermarking Using Chaotic Sequence (카오스 시퀀스를 이용한 견고한 디지털 워터마킹)

  • 김현환;정기룡
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.4
    • /
    • pp.630-637
    • /
    • 2003
  • This paper proposed a new watermarking algorithm using chaotic sequence instead of conventional M-sequence for protecting copyright to the author. Robustness and security is very important for watermarking process. We use multi-threshold value according to the human visual system for improving robustness of watermarking to each subband images coefficient differently after wavelet transform. And then, we embedded watermark image to original image by multi-watermark weights which are made by random sequence generator. We detect watermark image from the difference data which is made from each wavelet subband images. We also simulate the efficiency from the various possible attacks. Chaotic sequence is better than M-sequence, because the one is very easy to make sequence and the chaotic sequence is changed easy according to the initial value. So, the chaotic sequence has the better security than the conventional M-sequence.

New Construction Method for Quaternary Aperiodic, Periodic, and Z-Complementary Sequence Sets

  • Zeng, Fanxin;Zeng, Xiaoping;Zhang, Zhenyu;Zeng, Xiangyong;Xuan, Guixin;Xiao, Lingna
    • Journal of Communications and Networks
    • /
    • v.14 no.3
    • /
    • pp.230-236
    • /
    • 2012
  • Based on the known binary sequence sets and Gray mapping, a new method for constructing quaternary sequence sets is presented and the resulting sequence sets' properties are investigated. As three direct applications of the proposed method, when we choose the binary aperiodic, periodic, and Z-complementary sequence sets as the known binary sequence sets, the resultant quaternary sequence sets are the quaternary aperiodic, periodic, and Z-complementary sequence sets, respectively. In comparison with themethod proposed by Jang et al., the new method can cope with either both the aperiodic and periodic cases or both even and odd lengths of sub-sequences, whereas the former is only fit for the periodic case with even length of sub-sequences. As a consequence, by both our and Jang et al.'s methods, an arbitrary binary aperiodic, periodic, or Z-complementary sequence set can be transformed into a quaternary one no matter its length of sub-sequences is odd or even. Finally, a table on the existing quaternary periodic complementary sequence sets is given as well.

On Fast M-Gold Hadamard Sequence Transform (고속 M-Gold-Hadamard 시퀀스 트랜스폼)

  • Lee, Mi-Sung;Lee, Moon-Ho;Park, Ju-Yong
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.7
    • /
    • pp.93-101
    • /
    • 2010
  • In this paper we generate Gold-sequence by using M-sequence which is made by two primitive polynomial of GF(2). Generally M-sequence is generated by linear feedback shift register code generator. Here we show that this matrix of appropriate permutation has Hadamard matrix property. This matrix proves that Gold-sequence through two M-sequence and additive matrix of one column has one of major properties of Hadamard matrix, orthogonal. and this matrix show another property that multiplication with one matrix and transpose matrix of this matrix have the result of unit matrix. Also M-sequence which is made by linear feedback shift register gets Hadamard matrix property mentioned above by adding matrices of one column and one row. And high-speed conversion is possible through L-matrix and the S-matrix.

Cloning of the Adenosine Deaminase Gene from Pseudomonas iodinum IFO 3558

  • Jo, Young-Bae;Baik, Hyung-Suk;Bae, Kyung-Mi;Jun, Hong-Ki
    • Journal of Life Science
    • /
    • v.9 no.2
    • /
    • pp.9-14
    • /
    • 1999
  • Pseudomonas iodinum IFO 3558 adenosine deaminase(ADA) gene was cloned by the polymerase chain reaction and deduced the amino acid sequence of the enzyme. DNA sequence homology of Pseudomonas iodinum IFO 3558 ADA gene was compared to those of E. coli, human and mouse ADA genes. Unambiguous sequence from both strands of pM21 was obtained for the region believed to encode ADA. The sequence included a 804-nucleotide open reading frame, bounded on one end by sense primer and on the other end by two antisense primer. This open reading frame encodes a protein of 268 amino acids having a molecular weight of 29,448. The deduced amino acid sequence shows considerable similarity to those of E. coli, mouse and human ADA. Pseudomonas iodinum IFO 3558 nucleotide sequence shows 98.5% homology with that of the E. coli ADA sequence and 51.7% homology with that of the mouse ADA sequence and 52.5% homology with that of the human ADA sequence. The ADA protein sequence of Pseudomonas iodinum IFO 3558 shows 96.9% homology with that of the E. coli and 40.7% homology with that of the mouse and 41.8% homology with that of the human. The distance between two of the conserved elements, TVHAGE and SL(1)NTDDP has veen exactly conserved at 76 amino acids for all four ADAs. Two of the four conserved sequence elements shared among the four ADAs are also present in the yeast, rat, human (M), and Human(L) AMP deaminase. The SLSTDDP sequence differs only in the conservative substitution of a serine for an asparagine. A conserved cysteine with conserved spacing between these two regions is also found. Thus, sequence analysis of four ADAs and four AMP deaminases revealed the presence of a highly conserved sequence motif, SLN(S)TDDP, a conserved dipeptide, HA, and a conserved cysteine residue.

A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model (Sequence-to-Sequence 모델 기반으로 한 한국어 형태소 분석의 재순위화 모델)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.121-128
    • /
    • 2018
  • A Korean morphological analyzer adopts sequence-to-sequence (seq2seq) model, which can generate an output sequence of different length from an input. In general, a seq2seq based Korean morphological analyzer takes a syllable-unit based sequence as an input, and output a syllable-unit based sequence. Syllable-based morphological analysis has the advantage that unknown words can be easily handled, but has the disadvantages that morpheme-based information is ignored. In this paper, we propose a reranking model as a post-processor of seq2seq model that can improve the accuracy of morphological analysis. The seq2seq based morphological analyzer can generate K results by using a beam-search method. The reranking model exploits morpheme-unit embedding information as well as n-gram of morphemes in order to reorder K results. The experimental results show that the reranking model can improve 1.17% F1 score comparing with the original seq2seq model.