• 제목/요약/키워드: Data Sequence

검색결과 3,093건 처리시간 0.041초

Sequence-to-Sequence 모델 기반으로 한 한국어 형태소 분석의 재순위화 모델 (A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model)

  • 최용석;이공주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제7권4호
    • /
    • pp.121-128
    • /
    • 2018
  • Sequence-to-sequence(Seq2seq) 모델은 입력열과 출력열의 길이가 다를 경우에도 적용할 수 있는 모델로 한국어 형태소 분석에서 많이 사용되고 있다. 일반적으로 Seq2seq 모델을 이용한 한국어 형태소 분석에서는 원문을 음절 단위로 처리하고 형태소와 품사를 음절 단위로 출력한다. 음절 단위의 형태소 분석은 사전 미등록어 문제를 쉽게 처리할 수 있다는 장점이 있는 반면 형태소 단위의 사전 정보를 반영하지 못한다는 단점이 있다. 본 연구에서는 Seq2seq 모델의 후처리로 재순위화 모델을 추가하여 형태소 분석의 최종 성능을 향상시킬 수 있는 모델을 제안한다. Seq2seq 모델에 빔 서치를 적용하여 K개 형태소 분석 결과를 생성하고 이들 결과의 순위를 재조정하는 재순위화 모델을 적용한다. 재순위화 모델은 기존의 음절 단위 처리에서 반영하지 못했던 형태소 단위의 임베딩 정보와 n-gram 문맥 정보를 활용한다. 제안한 재순위화 모델은 기존 Seq2seq 모델에 비해 약 1.17%의 F1 점수가 향상되었다.

Whole-genome sequence analysis through online web interfaces: a review

  • Gunasekara, A.W.A.C.W.R.;Rajapaksha, L.G.T.G.;Tung, T.L.
    • Genomics & Informatics
    • /
    • 제20권1호
    • /
    • pp.3.1-3.10
    • /
    • 2022
  • The recent development of whole-genome sequencing technologies paved the way for understanding the genomes of microorganisms. Every whole-genome sequencing (WGS) project requires a considerable cost and a massive effort to address the questions at hand. The final step of WGS is data analysis. The analysis of whole-genome sequence is dependent on highly sophisticated bioinformatics tools that the research personal have to buy. However, many laboratories and research institutions do not have the bioinformatics capabilities to analyze the genomic data and therefore, are unable to take maximum advantage of whole-genome sequencing. In this aspect, this study provides a guide for research personals on a set of bioinformatics tools available online that can be used to analyze whole-genome sequence data of bacterial genomes. The web interfaces described here have many advantages and, in most cases exempting the need for costly analysis tools and intensive computing resources.

구매의도 생성 순서와 구매실현 순서의 역전 현상을 감안한 확장된 순차분석 방법론 (An Investigation on Expanding Traditional Sequential Analysis Method by Considering the Reversion of Purchase Realization Order)

  • 김민석;김남규
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제22권3호
    • /
    • pp.25-42
    • /
    • 2013
  • Recently various kinds of Information Technology services are created and the quantities of the data flow are increase rapidly. Not only that, but the data patterns that we deal with also slowly becoming diversity. As a result, the demand of discover the meaningful knowledge/information through the various mining analysis such as linkage analysis, sequencing analysis, classification and prediction, has been steadily increasing. However, solving the business problems using data mining analysis does not always concerning, one of the major causes of these limitations is there are some analyzed data can't accurately reflect the real world phenomenon. For example, although the time gap of purchasing the two products is very short, by using the traditional sequencing analysis, the precedence relationship of the two products is clearly reflected. But in the real world, with the very short time interval, the precedence relationship of the two purchases might not be defined. What was worse, the sequence of the purchase intention and the sequence of the purchase realization of the two products might be mutually be reversed. Therefore, in this study, an expanded sequencing analysis methodology has been proposed in order to reflect this situation. In this proposed methodology, the purchases that being made in a very short time interval among the purchase order which might not important will be notice, and the analysis which included the original sequence and reversed sequence will be used to extend the analysis of the data. Also, to some extent a very short time interval can be defined as the time interval, so an experiment were carried out to determine the varying based on the time interval for the actual data.

대중저속 무선 통신을 위한 DSSS 모뎀 설계 및 구현 (DSSS MODEM Design and Implementation for a Medium Speed Wireless Link)

  • 원희석;김영식
    • 대한전자공학회논문지TC
    • /
    • 제43권1호
    • /
    • pp.121-126
    • /
    • 2006
  • 본 논문은 9.6kbps 무선 통신용 DSSS CDU방식의 모뎀을 설계 및 제작하였다 개발된 모뎀은 마이크로프로세서에서 신호를 주고받을 수 있도록 범용 인터페이스를 제공한다. 인터페이스는 8비트 데이터버스와 칩 Enable, R/W, 및 인터럽트 핀으로 구성하였다. 송신은 먼저 외부로 8비트 병렬 데이터를 받아 시리얼 데이터로 변환하고 모뎀 내부에서 8 비트 PN-code를 생성하여 Direct Sequence 방식으로 데이터를 76.Bkcps로 확산하여 전송한다 그리고 송수신기의 동기를 위해 8비트 훈련시퀀스를 데이터 프레임 헤드에 첨부하였다. 수신기의 경우 수신된 76.8kcps의 확산된 데이터에서 먼저 PN코드 동기를 찾아낸 후 훈련시퀀스를 이용하여 데이터 동기를 얻어낸다. 이를 위해 Early and Late방식을 이용하였다. 본 논문의 모뎀은 Xilinx FPGA 보드로 구현 및 검증된 후 Hynix $0.25{\mu}m$ CMOS 공정을 이용하여 ASIC 칩으로 제작되었으며, DSSS를 이용한 다중사용자 방식을 사용하였다.

Aspergillus nidulans의 tRNA유전자의 구조와 발현에 관한 연구 V Aspergillus nidulansd의 $tRNA^{Arg}$ 분자구조 (Studies on the Oranization and Expression of tRNA Genes in Aspergillus nidulans (V) The Molecular Structure of $tRNA^{Arg}$ in Aspergillus nidulans)

  • 이병재;강현삼
    • 미생물학회지
    • /
    • 제24권2호
    • /
    • pp.79-85
    • /
    • 1986
  • A. nidulans의 $tRNA^{Arg}$의 염기순서를 효소절단 방법으로 결정하였다. 이 방법으로 염기순서를 결정한 결과 다음과 같았다. 5'GGCCGGCUGGCCCAAXUGGCAAGGCXUCUGAXUACGAAXCAGGAGAUUGCAXXXXXGAGCXXUXXGUCGGUCACCA3'. 위의 결과로 플로버잎 구조를 만들어본 결과 안티코돈이 ACG인 $tRNA^{Arg}$으로 판명되었고. 이 결과는 아미노산 부하검사(charging test)의 결과와 일치하였다. 이 tRNA의 유천자의 염기순서 결과와 비교하여 염기순서의 정확성을 검증하였고, minor base분석을 통하여 전 염기순서를 추정하였다.

  • PDF

피보나치수와 벤포드법칙에 대한 탐색적 접근 (Exploratory Approach for Fibonacci Numbers and Benford's Law)

  • 장대흥
    • 응용통계연구
    • /
    • 제22권5호
    • /
    • pp.1103-1113
    • /
    • 2009
  • 피보나치수열의 첫 숫자수열이 벤포드법칙을 따름은 알려진 사실이다. 이러한 피보나치수열을 확장하여 임의의 두개의 자연수를 정하고 재귀식 $a_{n+2}=a_{n+1}+a_n$을 만족하는 수열을 만들었을 때 이 수열의 첫 숫자수열이 벤포드법칙을 만족하는 지를 확인하고 이러한 수열의 첫 숫자수열의 구조를 탐색적 자료분석의 입장에서 살펴보았다.

Molecular Cloning and Sequencing of Cell Wall Hydrolase Gene of an Alkalophilic Bacillus subtilis BL-29

  • Kim, Tae-Ho;Hong, Soon-Duck
    • Journal of Microbiology and Biotechnology
    • /
    • 제7권4호
    • /
    • pp.223-228
    • /
    • 1997
  • A DNA fragment containing the gene for cell wall hydrolase of alkalophilic Bacillus subtilis BL-29 was cloned into E. coli JM109 using pUC18 as a vector. A recombinant plasmid, designated pCWL45B, was contained in the fragment originating from the alkalophilic B. subtilis BL-29 chromosomal DNA by Southern hybridization analysis. The nucleotide sequence of a 1.6-kb HindIII fragment containing a cell wall hydrolase-encoding gene was determined. The nucleotide sequence revealed an open reading frame (ORF) of 900 bp with a concensus ribosome-binding site located 6 nucleotide upstream from the ATG start codon. The primary amino acid sequence deduced from the nucleotide sequence revealed a putative protein of 299 amino acid residues with an M.W. of 33, 206. Based on comparison of the amino acid sequence of the ORF with amino acid sequences in the GenBank data, it showed significant homology to the sequence of cell wall amidase of the PBSX bacteriophage of B. subtilis.

  • PDF

Could Decimal-binary Vector be a Representative of DNA Sequence for Classification?

  • Sanjaya, Prima;Kang, Dae-Ki
    • International journal of advanced smart convergence
    • /
    • 제5권3호
    • /
    • pp.8-15
    • /
    • 2016
  • In recent years, one of deep learning models called Deep Belief Network (DBN) which formed by stacking restricted Boltzman machine in a greedy fashion has beed widely used for classification and recognition. With an ability to extracting features of high-level abstraction and deal with higher dimensional data structure, this model has ouperformed outstanding result on image and speech recognition. In this research, we assess the applicability of deep learning in dna classification level. Since the training phase of DBN is costly expensive, specially if deals with DNA sequence with thousand of variables, we introduce a new encoding method, using decimal-binary vector to represent the sequence as input to the model, thereafter compare with one-hot-vector encoding in two datasets. We evaluated our proposed model with different contrastive algorithms which achieved significant improvement for the training speed with comparable classification result. This result has shown a potential of using decimal-binary vector on DBN for DNA sequence to solve other sequence problem in bioinformatics.

Initial Timing Acquisition for Binary Phase-Shift Keying Direct Sequence Ultra-wideband Transmission

  • Kang, Kyu-Min;Choi, Sang-Sung
    • ETRI Journal
    • /
    • 제30권4호
    • /
    • pp.495-505
    • /
    • 2008
  • This paper presents a parallel processing searcher structure for the initial synchronization of a direct sequence ultra-wideband (DS-UWB) system, which is suitable for the digital implementation of baseband functionalities with a 1.32 Gsample/s chip rate analog-to-digital converter. An initial timing acquisition algorithm and a data demodulation method are also studied. The proposed searcher effectively acquires initial symbol and frame timing during the preamble transmission period. A hardware efficient receiver structure using 24 parallel digital correlators for binary phase-shift keying DS-UWB transmission is presented. The proposed correlator structure operating at 55 MHz is shared for correlation operations in a searcher, a channel estimator, and the demodulator of a RAKE receiver. We also present a pseudo-random noise sequence generated with a primitive polynomial, $1+x^2+x^5$, for packet detection, automatic gain control, and initial timing acquisition. Simulation results show that the performance of the proposed parallel processing searcher employing the presented pseudo-random noise sequence outperforms that employing a preamble sequence in the IEEE 802.15.3a DS-UWB proposal.

  • PDF

Efficient Accessing and Searching in a Sequence of Numbers

  • Seo, Jungjoo;Han, Myoungji;Park, Kunsoo
    • Journal of Computing Science and Engineering
    • /
    • 제9권1호
    • /
    • pp.1-8
    • /
    • 2015
  • Accessing and searching in a sequence of numbers are fundamental operations in computing that are encountered in a wide range of applications. One of the applications of the problem is cryptanalytic time-memory tradeoff which is aimed at a one-way function. A rainbow table, which is a common method for the time-memory tradeoff, contains elements from an input domain of a hash function that are normally sorted integers. In this paper, we present a practical indexing method for a monotonically increasing static sequence of numbers where the access and search queries can be addressed efficiently in terms of both time and space complexity. For a sequence of n numbers from a universe $U=\{0,{\ldots},m-1\}$, our data structure requires n lg(m/n) + O(n) bits with constant average running time for both access and search queries. We also give an analysis of the time and space complexities of the data structure, supported by experiments with rainbow tables.