• Title/Summary/Keyword: Data Sequence

Search Result 3,093, Processing Time 0.032 seconds

Sequence driven features for prediction of subcellular localization of proteins (단백질의 세포내 소 기관별 분포 예측을 위한 서열 기반의 특징 추출 방법)

  • Kim, Jong-Kyoung;Choi, Seung-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.226-228
    • /
    • 2005
  • Predicting the cellular location of an unknown protein gives valuable information for inferring the possible function of the protein. For more accurate Prediction system, we need a good feature extraction method that transforms the raw sequence data into the numerical feature vector, minimizing information loss. In this paper we propose new methods of extracting underlying features only from the sequence data by computing pairwise sequence alignment scores. In addition, we use composition based features to improve prediction accuracy. To construct an SVM ensemble from separately trained SVM classifiers, we propose specificity based weighted majority voting . The overall prediction accuracy evaluated by the 5-fold cross-validation reached $88.53\%$ for the eukaryotic animal data set. By comparing the prediction accuracy of various feature extraction methods, we could get the biological insight on the location of targeting information. Our numerical experiments confirm that our new feature extraction methods are very useful forpredicting subcellular localization of proteins.

  • PDF

PAPR-minimized Sequence Mapping with Data Space Reduction by Partial Data Side Information in OFDM System (OFDM 시스템에서 부분 데이터 추가정보를 이용한 데이터 공간 감소를 갖는 최대 전력 대 평균 전력 비 최소화 시퀀스 사상 기법)

  • Jin Jiyu;Ryu Kwan Woongn;Park Yong wan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12A
    • /
    • pp.1340-1348
    • /
    • 2004
  • In this paper, we propose a PAPR-minimized sequence mapping scheme that achieves the minimum Peak-to-Average Power Ratio (PAPR) and the minimum amount of computations for the OFDM system. To reduce the PAPR, the mapping table is created with information about block index and symbol patterns of the lower signal power. When the input data sequence comes, it performed division by the block length to find the quotient and remainder. The symbol pattern of the lower signal power can be found in terms of the block index as the quotient in the mapping table and transmitted with remainder as the side information to distinguish and recover the original data sequence in the receiver. The two methods with the proposed mapping scheme are proposed in this paper. One is with mapping table to recover the O%M signal in both transmitter and receiver. The other is with mapping table only in transmitter to reduce the load and the complexity in the mobile system. We show that this algorithm provides the PAPR reduction, the simple processing and less computational complexity to be implemented for the multi-carrier system.

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.

An Efficient Receiver Structure Based on PN Performance in Underwater Acoustic Communications (수중음향통신에서 PN 성능 기반의 효율적인 수신 구조)

  • Baek, Chang-Uk;Jung, Ji-Won
    • Journal of Navigation and Port Research
    • /
    • v.41 no.4
    • /
    • pp.173-180
    • /
    • 2017
  • Underwater communications are degraded as a result of inter symbol interference in multipath channels. Therefore, a channel coding scheme is essential for underwater communications. Packets consist of a PN sequence and a data field, and the uncoded PN sequence is used to estimate the frequency and phase offset using a Doppler and phase estimation algorithm. The estimated frequency and phase offset are fed to a coded data field to compensate for the Doppler and phase offset. The PN sequence is generally utilized to acquire the synchronization information, and the bit error rate of an uncoded PN sequence predicts the performance of the coded data field. To ensure few errors, we resort to powerful BCJR decoding algorithms of convolutional codes with rates of 1/2, 2/3, and 3/4. We use this powerful channel coding algorithm to present an efficient receiver structure based on the relation between the bit error of the uncoded PN sequence and coded data field in computer simulations and lake experiments.

Some Considerations on the Problems of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 문제점에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.327-330
    • /
    • 2004
  • Because of its effectiveness for the PD(partial discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, PSA has a big problem that can misanalyze patterns in case of data missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data missing and noise adding cases were investigated. For the purpose, PD data obtained from various defects including noise adding data were used and analysed, The result showed that both cases can cause fatal errors in recognizing PD patterns. In case of the data missing, the error depends on the kinds of defect and the degree of degradation. Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

  • PDF

Some Considerations on the On-site Applicability of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 현장 적용성에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.18 no.5
    • /
    • pp.484-489
    • /
    • 2005
  • Because of its effectiveness for the PD(Partial Discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, it is generally thought that PSA has some possibility to misjudge patterns in case of data-missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data-missing and noise-adding cases were investigated. for the purpose, PD data obtained from various defects including noise-adding data were used and analyzed. As a result, it was shown that both cases could cause fatal errors in recognizing PD patterns. In case of the data missing, the error was dependant on the kinds of defect and the degree of degradation Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

A Pattern Matching Extended Compression Algorithm for DNA Sequences

  • Murugan., A;Punitha., K
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.196-202
    • /
    • 2021
  • DNA sequencing provides fundamental data in genomics, bioinformatics, biology and many other research areas. With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge for bioinformatics. Those large volumes of data also require a fast transmission, effective storage, superior functionality and provision of quick access to any record. Data storage costs have a considerable proportion of total cost in the formation and analysis of DNA sequences. In particular, there is a need of highly control of disk storage capacity of DNA sequences but the standard compression techniques unsuccessful to compress these sequences. Several specialized techniques were introduced for this purpose. Therefore, to overcome all these above challenges, lossless compression techniques have become necessary. In this paper, it is described a new DNA compression mechanism of pattern matching extended Compression algorithm that read the input sequence as segments and find the matching pattern and store it in a permanent or temporary table based on number of bases. The remaining unmatched sequence is been converted into the binary form and then it is been grouped into binary bits i.e. of seven bits and gain these bits are been converted into an ASCII form. Finally, the proposed algorithm dynamically calculates the compression ratio. Thus the results show that pattern matching extended Compression algorithm outperforms cutting-edge compressors and proves its efficiency in terms of compression ratio regardless of the file size of the data.

Transposition of IntAs into the Conserved Regions of IS3 Family Elements

  • Han, Chang-Gyun
    • Journal of Microbiology
    • /
    • v.42 no.1
    • /
    • pp.56-59
    • /
    • 2004
  • Together with the previous reports, my computer survey revealed that several bacteria contain six copies of the type group II intron IntA. The sequence analysis of IntAs showed the high level of homology in the nucleotide sequence (91.9-99.8%). The consensus sequence, 2,270 base pair long, was derived from the nucleotide sequences of all IntA members. The size of the open reading frame intA was 502 amino acids long, that is homologous to reverse transcriptase-like proteins encoded within the group II introns. It was reported that EPEC.IntA and Sf.IntA were inserted into IS911 and IS629, respectively. The sequence of the flanking region IntA was analyzed here. The data show the insertion of EC.IntA into IS629, the insertion of EHEC.IntA into IS3, the insertion of Yp.IntA into IS904-like sequence, and the insertion of EK12.IntA into IS911. Interestingly, these IS elements nested by IntAs were the members of IS3 family elements. The sequences of the IS3 members correspond to the OrfB with the DDE motif conserved in retroviral integrases. Alignment of the flanking sequences of IntAs revealed that the flanking regions -25 to + 10 of insertion sites, that are generally believed to be required for the retrohoming, were not strongly conserved. The data presented here suggests that the retrohoming pathway of IntA seems to differ from those of other group II introns.

A New PAPR Reduction Method in the OFDM System using GD and Radix-2 DIF IFFT (OFDM 시스템에서의 GD방식과 Radix-2 DIF IFFT를 이용한 효과적인 PAPR 감소 방식)

  • Lee, Sun-Ho;Lee, Hae-Kie;Kim, Sung-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.3
    • /
    • pp.41-46
    • /
    • 2008
  • Many methods have been developed to overcome the PAPR(peak-to-average power ratio) problem. Selective mapping(SLM), partial transmit sequence(PTS), subblock phase weighting(SPW) and gradient descent(GD) are used widely to reduce the PAPR. In this paper, we present an effective PAPR reduction method that decreases the number of calculations through Radix-2 DIF IFFT procedure and GD method that transmits selected data sequence. The data sequence is constructed by choosing elements that satisfy threshold value as one part of the sequence and the rest elements of each sequence are chosen to have the lower papr operating, which yields performance improvement.

A New Frame Synchronization Scheme for Underwater Ultrasonic Image Burst Transmission System (초음파를 이용한 수중 영상 버스트 전송 시스템을 위한 새로운 프레임 동기 방안)

  • Kim, Seung-Geun;Choi, Young-Chol;Park, Jong-Won;Kim, Sea-Moon;Lim, Yong-Gon;Kim, Sang-Tae
    • Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
    • /
    • 2003.05a
    • /
    • pp.336-340
    • /
    • 2003
  • The frame synchronization should be acquired before performing other data-aided receiving algorithms, such as data-aided channel equalizing, beam-forming and phase, symbol timing, and frequency synchronizing, since all of them are using preamble or training sequence to estimate the amount of error from the received signal. In this paper, we present a new frame synchronization scheme for underwater ultrasonic image burst transmission system, which computes the correlation between received symbol sequence and one CAZAC sequence, composed of the latter half of the first CAZAC sequence of preamble and the first half of the second CAZAC sequence of preamble and then compares a threshold value. If the correlation value is bigger than the threshold value, the frame detector determines that the frame synchronization is achieved at that sample.

  • PDF