• Title/Summary/Keyword: Sequences Analysis

Search Result 3,186, Processing Time 0.041 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • v.21 no.2
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

Characterization of earthquake ground motion of multiple sequences

  • Moustafa, Abbas;Takewaki, Izuru
    • Earthquakes and Structures
    • /
    • v.3 no.5
    • /
    • pp.629-647
    • /
    • 2012
  • Multiple acceleration sequences of earthquake ground motions have been observed in many regions of the world. Such ground motions can cause large damage to the structures due to accumulation of inelastic deformation from the repeated sequences. The dynamic analysis of inelastic structures under repeated acceleration sequences generated from simulated and recorded accelerograms without sequences has been recently studied. However, the characteristics of recorded earthquake ground motions of multiple sequences have not been studied yet. This paper investigates the gross characteristics of earthquake records of multiple sequences from an engineering perspective. The definition of the effective number of acceleration sequences of the ground shaking is introduced. The implication of the acceleration sequences on the structural response and damage of inelastic structures is also studied. A set of sixty accelerograms is used to demonstrate the general properties of repeated acceleration sequences and to investigate the associated structural inelastic response.

A Statistical Analysis of SNPs, In-Dels, and Their Flanking Sequences in Human Genomic Regions

  • Shin, Seung-Wook;Kim, Young-Joo;Kim, Byung-Dong
    • Genomics & Informatics
    • /
    • v.5 no.2
    • /
    • pp.68-76
    • /
    • 2007
  • Due to the increasing interest in SNPs and mutational hot spots for disease traits, it is becoming more important to define and understand the relationship between SNPs and their flanking sequences. To study the effects of flanking sequences on SNPs, statistical approaches are necessary to assess bias in SNP data. In this study we mainly applied Markov chains for SNP sequences, particularly those located in intronic regions, and for analysis of in-del data. All of the pertaining sequences showed a significant tendency to generate particular SNP types. Most sequences flanking SNPs had lower complexities than average sequences, and some of them were associated with microsatellites. Moreover, many Alu repeats were found in the flanking sequences. We observed an elevated frequency of single-base-pair repeat-like sequences, mirror repeats, and palindromes in the SNP flanking sequence data. Alu repeats are hypothesized to be associated with C-to-T transition mutations or A-to-I RNA editing. In particular, the in-del data revealed an association between particular changes such as palindromes or mirror repeats. Results indicate that the mechanism of induction of in-del transitions is probably very different from that which is responsible for other SNPs. From a statistical perspective, frequent DNA lesions in some regions probably have effects on the occurrence of SNPs.

ON THE COMPUTATION OF THE NON-PERIODIC AUTOCORRELATION FUNCTION OF TWO TERNARY SEQUENCES AND ITS RELATED COMPLEXITY ANALYSIS

  • Koukouvinos, Christos;Simos, Dimitris E.
    • Journal of applied mathematics & informatics
    • /
    • v.29 no.3_4
    • /
    • pp.547-562
    • /
    • 2011
  • We establish a new formalism of the non-periodic autocorrelation function (NPAF) of two sequences, which is suitable for the computation of the NPAF of any two sequences. It is shown, that this encoding of NPAF is efficient for sequences of small weight. In particular, the check for two sequences of length n having weight w to have zero NPAF can be decided in $O(n+w^2{\log}w)$. For n > w^2{\log}w$, the complexity is O(n) thus we cannot expect asymptotically faster algorithms.

Categorizing accident sequences in the external radiotherapy for risk analysis

  • Kim, Jonghyun
    • Radiation Oncology Journal
    • /
    • v.31 no.2
    • /
    • pp.88-96
    • /
    • 2013
  • Purpose: This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods: This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results: The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion: This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.

Analysis of binary sequences generated by GMW sequences and No sequences (GMW 수열과 No 수열에 의해서 생성된 이진 수열 분석)

  • Cho, Sung-Jin;Yim, Ji-Mi
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.10
    • /
    • pp.2181-2187
    • /
    • 2011
  • In this paper, a family of binary sequences generated by GMW sequences and No sequences is introduced and analyzed. Each sequence within a family has period $N=2^n-1$, n=2m and there are $2^m$ sequences within that family. We obtain auto and cross-correlation values and linear span of the synthesized sequence.

Characterization and Phylogenetic Analysis of Chitin Synthase Genes from the Genera Sporobolomyces and Bensingtonia subrorea

  • Nam, Jin-Sik
    • Korean Journal of Environmental Biology
    • /
    • v.23 no.4
    • /
    • pp.335-342
    • /
    • 2005
  • We cloned seven genes encoding chitin synthases (CHSs) by PCR amplification from genomic DNAs of four strains of the genus Sporobolomyces and of Bensingtonia subrosea using degenerated primers based on conserved regions of the CHS genes. Though amino acid sequences of these genes were shown similar as 176 to 189 amino acids except SgCHS2, DNA sequences were different in size, which was due to various introns present in seven fragments. Alignment and phylogenetic analysis of their deduced amino acid sequences together with the reported CHS genes of basidiomycetes separated the sequences into classes I, II and III. This analysis also permitted the classification of isolated CHSs; SgCHS1 belongs to class I, BsCHS1, SaCHS1, SgCHS2, SpgCHS1, and SsCHS1 belong to class II, and BsCHS2 belongs to class III. The deduced amino acid sequences involving in class II that were discovered from five strains were also compared with those of other basidiomycetes by CLUSTAL X program. The bootstrap analysis and phylogenetic tree by neighbor-joining method revealed the taxonomic and evolutionary position for four strains of the genus Sporobolomyces and for Bensingtonia subrosea which agreed with the previous classification. The results clearly showed that CHS fragments could be used as a valuable key for the molecular taxonomic and phylogenetic studies of basidiomycetes.

Analysis of Cross-Correlation of m-sequences and Equation on Finite Fields (유한체상의 방정식과 m-수열의 상호상관관계 분석)

  • Choi, Un-Sook;Cho, Sung-Jin
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.4
    • /
    • pp.821-826
    • /
    • 2012
  • p-ary sequences of period $N=2^k-1$ are widely used in many areas of engineering and sciences. Some well-known applications include coding theory, code-division multiple-access (CDMA) communications, and stream cipher systems. The analysis of cross-correlations of these sequences is a very important problem in p-ary sequences research. In this paper, we analyze cross-correlations of p-ary sequences which is associated with the equation $(x+1)^d=x^d+1$ over finite fields.

18S Ribosomal DNA Sequences Provide Insight into the Phylogeny of Patellogastropod Limpets (Mollusca: Gastropoda)

  • Yoon, Sook Hee;Kim, Won
    • Molecules and Cells
    • /
    • v.23 no.1
    • /
    • pp.64-71
    • /
    • 2007
  • To investigate the phylogeny of Patellogastropoda, the complete 18S rDNA sequences of nine patellogastropod limpets Cymbula canescens (Gmelin, 1791), Helcion dunkeri (Krauss, 1848), Patella rustica Linnaeus, 1758, Cellana toreuma (Reeve, 1855), Cellana nigrolineata (Reeve, 1854), Nacella magellanica Gmelin, 1791, Nipponacmea concinna (Lischke, 1870), Niveotectura pallida (Gould, 1859), and Lottia dorsuosa Gould, 1859 were determined. These sequences were then analyzed along with the published 18S rDNA sequences of 35 gastropods, one bivalve, and one chiton species. Phylogenetic trees were constructed by maximum parsimony, maximum likelihood, and Bayesian inference. The results of our 18S rDNA sequence analysis strongly support the monophyly of Patellogastropoda and the existence of three subgroups. Of these, two subgroups, the Patelloidea and Acmaeoidea, are closely related, with branching patterns that can be summarized as [(Cymbula + Helcion) + Patella] and [(Nipponacmea + Lottia) + Niveotectura]. The remaining subgroup, Nacelloidea, emerges as basal and paraphyletic, while its genus Cellana is monophyletic. Our analysis also indicates that the Patellogastropoda have a sister relationship with the order Cocculiniformia within the Gastropoda.