• Title/Summary/Keyword: sequence data

Search Result 3,115, Processing Time 0.039 seconds

Investigation on Structure and Properties of a Novel Designed Peptide with Half-Sequence Ionic Complement

  • Ruan, Li-Ping;Luo, Han-Lin;Zhang, Hang-Yu;Zhao, Xiaojun
    • Macromolecular Research
    • /
    • v.17 no.8
    • /
    • pp.597-602
    • /
    • 2009
  • Although the existing design principle of full-sequence ionic complement is convenient for the development of peptides, it greatly constrains the exploration of peptides with other possible assembly mechanisms and different yet essential functions. Herein, a novel designed half-sequence ionic complementary peptide (referred to as P9), AC-Pro-Ser-Phe-Asn-Phe-Lys-Phe-Glu-Pro-$NH_2$, is reported. When transferred from pure water to sodium chloride solution, P9 underwent a dramatic morphological transformation from globular aggregations to nanofibers. Moreover, the rheological experiment showed that the P9 could form a hydrogel with a storage modulus of about 30 Pa even at very low peptide concentration (0.5% (wt/vol)). The P9 hydrogel formed in salt solution could recover in a period of about 1,800 sec, which is faster than that in the pure water. The data suggestcd that the half-sequence, ionic complementary peptide might be worthy of further research for its special properties.

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Improving transformer-based acoustic model performance using sequence discriminative training (Sequence dicriminative training 기법을 사용한 트랜스포머 기반 음향 모델 성능 향상)

  • Lee, Chae-Won;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.335-341
    • /
    • 2022
  • In this paper, we adopt a transformer that shows remarkable performance in natural language processing as an acoustic model of hybrid speech recognition. The transformer acoustic model uses attention structures to process sequential data and shows high performance with low computational cost. This paper proposes a method to improve the performance of transformer AM by applying each of the four algorithms of sequence discriminative training, a weighted finite-state transducer (wFST)-based learning used in the existing DNN-HMM model. In addition, compared to the Cross Entropy (CE) learning method, sequence discriminative method shows 5 % of the relative Word Error Rate (WER).

Nonparametric Nonlinear Model Predictive Control

  • Kashiwagi, Hiroshi;Li, Yun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1443-1448
    • /
    • 2003
  • Model Predictive Control (MPC) has recently found wide acceptance in industrial applications, but its potential has been much impounded by linear models due to the lack of a similarly accepted nonlinear modelling or data based technique. The authors have recently developed a new method for obtaining Volterra kernels of up to third order by use of pseudorandom M-sequence. By use of this method, nonparametric NMPC is derived in discrete-time using multi-dimensional convolution between plant data and Volterra kernel measurements. This approach is applied to an industrial polymerisation process using Volterra kernels of up to the third order. Results show that the nonparametric approach is very efficient and effective and considerably outperforms existing methods, while retaining the original data-based spirit and characteristics of linear MPC.

  • PDF

Doubly-Selective Channel Estimation for OFDM Systems Using a Pilot-Embedded Training Scheme

  • Wang, Li-Dong;Lim, Dong-Min
    • Journal of electromagnetic engineering and science
    • /
    • v.6 no.4
    • /
    • pp.203-208
    • /
    • 2006
  • Channel estimation and data detection for OFDM systems over time- and frequency-selective channels are investigated. Relying on the complex exponential basis expansion channel model, a pilot-embedded channel estimation scheme with low computational complexity and spectral efficiency is proposed. A periodic pilot sequence is superimposed at a low power on information bearing sequence at the transmitter before modulation and transmission. The channel state information(CSI) can be estimated using the first-order statistics of the received data. In order to enhance the performance of channel estimation, we recover the transmitted data which can be exploited to estimate CSI iteratively. Simulation results show that the proposed method is suitable for doubly-selective channel estimation for the OFDM systems and the performance of the proposed method can be better than that of the Wiener filter method under some conditions. Through simulations, we also analyze the factors which can affect the system performances.

Patome: Database of Patented Bio-sequences

  • Kim, SeonKyu;Lee, ByungWook
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.94-97
    • /
    • 2005
  • We have built a database server called Patome which contains the annotation information for patented bio-sequences from the Korean Intellectual Property Office (KIPO). The aims of the Patome are to annotate Korean patent bio-sequences and to provide information on patent relationship of public database entries. The patent sequences were annotated with Reference Sequence (RefSeq) or NCBI's nr database. The raw patent data and the annotated data were stored in the database. Annotation information can be used to determine whether a particular RefSeq ID or NCBI's nr ID is related to Korean patent. Patome infrastructure consists of three components­the database itself, a sequence data loader, and an online database query interface. The database can be queried using submission number, organism, title, applicant name, or accession number. Patome can be accessed at http://www.patome.net. The information will be updated every two months.

Development of Workbench for Analysis and Visualization of Whole Genome Sequence (전유전체(Whole gerlome) 서열 분석과 가시화를 위한 워크벤치 개발)

  • Choe, Jeong-Hyeon;Jin, Hui-Jeong;Kim, Cheol-Min;Jang, Cheol-Hun;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.387-398
    • /
    • 2002
  • As whole genome sequences of many organisms have been revealed by small-scale genome projects, the intensive research on individual genes and their functions has been performed. However on-memory algorithms are inefficient to analysis of whole genome sequences, since the size of individual whole genome is from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench system for analysis and visualization of whole genome sequence using string B-tree that is suitable for analysis of huge data. This system consists of two parts : analysis query part and visualization part. Query system supports various transactions such as sequence search, k-occurrence, and k-mer analysis. Visualization system helps biological scientist to easily understand whole structure and specificity by many kinds of visualization such as whole genome sequence, annotation, CGR (Chaos Game Representation), k-mer, and RWP (Random Walk Plot). One can find the relations among organisms, predict the genes in a genome, and research on the function of junk DNA using our workbench.

An Efficient Sequence Matching Method for XML Query Processing (XML 질의 처리를 위한 효율적인 시퀀스 매칭 기법)

  • Seo, Dong-Min;Song, Seok-Il;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.356-367
    • /
    • 2008
  • As XML is gaining unqualified success in being adopted as a universal data representation and exchange format, particularly in the World Wide Web, the problem of querying XML documents poses interesting challenges to database researcher. Several structural XML query processing methods, including XISS and XR-tree, for past years, have been proposed for fast query processing. However, structural XML query processing has the problem of requiring expensive Join cost for twig path query Recently, sequence matching based XML query processing methods, including ViST and PRIX, have been proposed to solve the problem of structural XML query processing methods. Through sequence matching based XML query processing methods match structured queries against structured data as a whole without breaking down the queries into sub queries of paths or nodes and relying on join operations to combine their results. However, determining the structural relationship of ViST is incorrect because its numbering scheme is not optimized. And PRIX requires many processing time for matching LPS and NPS about XML data trees and queries. Therefore, in this paper, we propose efficient sequence matching method u sing the bottom-up query processing for efficient XML query processing. Also, to verify the superiority of our index structure, we compare our sequence matching method with ViST and PRIX in terms of query processing with linear path or twig path including wild-card('*' and '//').

Genetic Variations of Candida glabrata Clinical Isolates from Korea using Multi-locus Sequence Typing (Multi-locus sequence typing을 이용한 한국에서 분리한 Candida glabrata 임상균주의 유전자 유형 분석)

  • Kang, Min Ji;Lee, Kyung Eun;Jin, Hyunwoo
    • Journal of Life Science
    • /
    • v.30 no.2
    • /
    • pp.122-128
    • /
    • 2020
  • Although Candida albicans is the major fungal pathogen of candidemia, severe infections by non-albicans Candida (NAC) spp. have been increasing in recent years. Among NAC spp., C. glabrata has emerged as the second most common pathogen. However, few studies have been conducted to investigate its structure, epidemiology, and basic biology. In the present study, multi-locus sequence typing (MLST) was performed with a total of 102 C. glabrata clinical isolates that were isolated from various types of clinical specimen. For MLST, six housekeeping genes-FKS, LEU2, NMT1, TRP1, UGP1, and URA3-were amplified and sequenced. The results were analyzed using the C. glabrata database. Out of a total of 3,345 base-pair DNA sequences, 49 variable nucleotide sites were found, and the results showed that 12 different sequence types (ST) were identified from the 102 clinical isolates. The data also demonstrated that the undetermined ST1 was the most predominant ST in Korea. Further, seven undetermined STs (UST) containing UST2-8 were classified at specific loci. The data from this study may provide a fundamental database for further studies on C. glabrata, including its epidemiology and evolution. The data may also contribute to the development of novel antifungal agents and diagnostic tests.

VDCluster : A Video Segmentation and Clustering Algorithm for Large Video Sequences (VDCluster : 대용량 비디오 시퀀스를 위한 비디오 세그멘테이션 및 클러스터링 알고리즘)

  • Lee, Seok-Ryong;Lee, Ju-Hong;Kim, Deok-Hwan;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.168-179
    • /
    • 2002
  • In this paper, we investigate video representation techniques that are the foundational work for the subsequent video processing such as video storage and retrieval. A video data set if a collection of video clips, each of which is a sequence of video frames and is represented by a multidimensional data sequence (MDS). An MDS is partitioned into video segments considering temporal relationship among frames, and then similar segments of the clip are grouped into video clusters. Thus, the video clip is represented by a small number of video clusters. The video segmentation and clustering algorithm, VDCluster, proposed in this paper guarantee clustering quality to south an extent that satisfies predefined conditions. The experiments show that our algorithm performs very effectively with respect to various video data sets.