• Title/Summary/Keyword: Data Sequence

Search Result 3,097, Processing Time 0.03 seconds

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.15 no.1
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

New low-complexity segmentation scheme for the partial transmit sequence technique for reducing the high PAPR value in OFDM systems

  • Jawhar, Yasir Amer;Ramli, Khairun Nidzam;Taher, Montadar Abas;Shah, Nor Shahida Mohd;Audah, Lukman;Ahmed, Mustafa Sami;Abbas, Thamer
    • ETRI Journal
    • /
    • v.40 no.6
    • /
    • pp.699-713
    • /
    • 2018
  • Orthogonal frequency division multiplexing (OFDM) has been the overwhelmingly prevalent choice for high-data-rate systems due to its superior advantages compared with other modulation techniques. In contrast, a high peak-to-average-power ratio (PAPR) is considered the fundamental obstacle in OFDM systems since it drives the system to suffer from in-band distortion and out-of-band radiation. The partial transmit sequence (PTS) technique is viewed as one of several strategies that have been suggested to diminish the high PAPR trend. The PTS relies upon dividing an input data sequence into a number of subblocks. Hence, three common types of the subblock segmentation methods have been adopted - interleaving (IL-PTS), adjacent (Ad-PTS), and pseudorandom (PR-PTS). In this study, a new type of subblock division scheme is proposed to improve the PAPR reduction capacity with a low computational complexity. The results indicate that the proposed scheme can enhance the PAPR reduction performance better than the IL-PTS and Ad-PTS schemes. Additionally, the computational complexity of the proposed scheme is lower than that of the PR-PTS and Ad-PTS schemes.

IMPLEMENTATION OF SUBSEQUENCE MAPPING METHOD FOR SEQUENTIAL PATTERN MINING

  • Trang, Nguyen Thu;Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.627-630
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

  • PDF

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

  • Kim, Jong-Kyoung;Raghava, G. P. S.;Kim, Kwang-S.;Bang, Sung-Yang;Choi, Seung-Jin
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.158-166
    • /
    • 2004
  • Predicting the destination of a protein in a cell gives valuable information for annotating the function of the protein. Recent technological breakthroughs have led us to develop more accurate methods for predicting the subcellular localization of proteins. The most important factor in determining the accuracy of these methods, is a way of extracting useful features from protein sequences. We propose a new method for extracting appropriate features only from the sequence data by computing pairwise sequence alignment scores. As a classifier, support vector machine (SVM) is used. The overall prediction accuracy evaluated by the jackknife validation technique reach 94.70% for the eukaryotic non-plant data set and 92.10% for the eukaryotic plant data set, which show the highest prediction accuracy among methods reported so far with such data sets. Our numerical experimental results confirm that our feature extraction method based on pairwise sequence alignment, is useful for this classification problem.

  • PDF

Implementation of Subsequence Mapping Method for Sequential Pattern Mining

  • Trang Nguyen Thu;Lee Bum-Ju;Lee Heon-Gyu;Park Jeong-Seok;Ryu Keun-Ho
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.5
    • /
    • pp.457-462
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

Bioinformatics for the Korean Functional Genomics Project

  • Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.45-52
    • /
    • 2000
  • Genomic approach produces massive amount of data within a short time period, New high-throughput automatic sequencers can generate over a million nucleotide sequence information overnight. A typical DNA chip experiment produces tens of thousands expression information, not to mention the tens of megabyte image files, These data must be handled automatically by computer and stored in electronic database, Thus there is a need for systematic approach of data collection, processing, and analysis. DNA sequence information is translated into amino acid sequence and is analyzed for key motif related to its biological and/or biochemical function. Functional genomics will play a significant role in identifying novel drug targets and diagnostic markers for serious diseases. As an enabling technology for functional genomics, bioinformatics is in great need worldwide, In Korea, a new functional genomics project has been recently launched and it focuses on identi☞ing genes associated with cancers prevalent in Korea, namely gastric and hepatic cancers, This involves gene discovery by high throughput sequencing of cancer cDNA libraries, gene expression profiling by DNA microarray and proteomics, and SNP profiling in Korea patient population, Our bioinformatics team will support all these activities by collecting, processing and analyzing these data.

  • PDF

A New SLM Method using Dummy Sequence Insertion far the PAPR Reduction of the OFDM Communication System (OFDM통신 시스템의 PAPR저감을 위한 Dummy Sequence를 삽입하는 새로운 SLM 기법)

  • 이재은;허근재;김상우;유흥균
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.4
    • /
    • pp.379-386
    • /
    • 2004
  • OFDM(orthogonal frequency division multiplexing) communications system is very attractive for the high data rate transmissionin the frequency selective fading channel. Since OFDM has high PAPR(peak-to-average power ratio), OFDM signal may be distorted by the nonlinear HPA(high power amplifier). In this paper, we propose an improved dummy sequence scheme for reducing the PAPR in OFDM communication system. This method inserts each different dummy sequence at the predefined sub-carriers fur PAPR reduction. After IFFT, the OFDM data signal with the lowest PAPR is selected to transmit. The complementary sequence is used as dummy sequence. So, it can cut down the computation time and quantity because it dose not require the peak value optimization for finding the phase rotation factor and the transmission of the side information about the rotation factor unlike the PTS method.

A Study on the Optimal Signal Timing for Area Traffic Control (지역 교통망 관리를 위한 최적 신호순서에 관한 연구)

    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.24 no.2
    • /
    • pp.69-80
    • /
    • 1999
  • A genetic algorithm to determine the optimal signal sequence and double cycle pattern is described. The signal sequence and double cycle pattern are used as the input for TRANSYT to find optimal signal timing at each junction in the area traffic networks, In the genetic process, the partially matched crossover and simple crossover operators are used for evolution of signal sequence and double cycle pattern respectively. A special conversion algorithm is devised to convert the signal sequence into the link-stage assignment for TRANSYT. Results from tests using data from an area traffic network in Leicester region R are given.

  • PDF

Discriminative Training of Sequence Taggers via Local Feature Matching

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.209-215
    • /
    • 2014
  • Sequence tagging is the task of predicting frame-wise labels for a given input sequence and has important applications to diverse domains. Conventional methods such as maximum likelihood (ML) learning matches global features in empirical and model distributions, rather than local features, which directly translates into frame-wise prediction errors. Recent probabilistic sequence models such as conditional random fields (CRFs) have achieved great success in a variety of situations. In this paper, we introduce a novel discriminative CRF learning algorithm to minimize local feature mismatches. Unlike overall data fitting originating from global feature matching in ML learning, our approach reduces the total error over all frames in a sequence. We also provide an efficient gradient-based learning method via gradient forward-backward recursion, which requires the same computational complexity as ML learning. For several real-world sequence tagging problems, we empirically demonstrate that the proposed learning algorithm achieves significantly more accurate prediction performance than standard estimators.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.