• Title/Summary/Keyword: Sequence database

Search Result 567, Processing Time 0.023 seconds

A Concurrency Control Technique Using Optimistic Atomic Broadcast In Replicated Database Systems (중복 데이터베이스 시스템에서 낙관적인 원자적 방송을 이용한 동시성제어 기법)

  • Choe, Hui-Yeong;Hwang, Bu-Hyeon
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.543-552
    • /
    • 2001
  • To process transactions in fully replicated database, an atomic broadcast is mainly used. In this case of using atomic broadcast, transactions can be delayed because of the coordinating step among servers before processing the transaction. In this paper, we propose an algorithm to resolve the problem of transaction delay. In the proposed algorithm, the transactions are processed by using the optimistic method. The operations of a transaction are performed in the site that it is submitted and its write operations its updates atomically in all replicated sites. Since the serializability of transaction is ensured by checking the sequence number of transactions in the completion-inspection step.

  • PDF

A Review of Extended STR Loci and DNA Database

  • Cho, Yoonjung;Lee, Min Ho;Kim, Su Jin;Park, Ji Hwan;Jung, Ju Yeon
    • Biomedical Science Letters
    • /
    • v.28 no.3
    • /
    • pp.157-169
    • /
    • 2022
  • DNA typing is the typical technology in the forensic science and plays a significant role in the personal identification of victims and suspects. Short tandem repeat (STR) is the short tandemly repeated DNA sequence consisting of 2~7 bp DNA units in specific loci. It is disseminated across the human genome and represents polymorphism among individuals. Because polymorphism is a key feature of the application of DNA typing STR analysis, STR analysis becomes the standard technology in forensics. Therefore, the DNA database (DNA-DB) was first introduced with 4 essential STR markers for the application of forensic science; however, the number of STR markers was expanded from 4 to 13 and 13 to 20 later to counteract the continuously increased DNA profile and other needed situations. After applying expanded STR markers to the South Korean DNA-DB system, it positively affected to low copy number analysis that had a high possibility of partial DNA profiles, and especially contributed to the theft cases due to the high portion of touch DNA evidence in the theft case. Furthermore, STR marker expansion not only contributed to the resolution of cold cases but also increased kinship index indicating the potential for improved kinship test accuracy using extended STR markers. Collectively, the expansion of the STR locus was considered to be necessary to keep pace with the continuously increasing DNA profile, and to improve the data integrity of the DNA-DB.

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences (단일 스캔을 통한 웹 방문 패턴의 탐색 기법)

  • Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.221-227
    • /
    • 2010
  • Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

Confirming Single Nucleotide Polymorphisms from Expressed Sequence Tag Datasets Derived from Three Cattle cDNA Libraries

  • Lee, Seung-Hwan;Park, Eung-Woo;Cho, Yong-Min;Lee, Ji-Woong;Kim, Hyoung-Yong;Lee, Jun-Heon;Oh, Sung-Jong;Cheong, Il-Cheong;Yoon, Du-Hak
    • BMB Reports
    • /
    • v.39 no.2
    • /
    • pp.183-188
    • /
    • 2006
  • Using the Phred/Phrap/Polyphred/Consed pipeline established in the National Livestock Research Institute of Korea, we predicted candidate coding single nucleotide polymorphisms (cSNPs) from 7,600 expressed sequence tags (ESTs) derived from three cDNA libraries (liver, M. longissimus dorsi, and intermuscular fat) of Hanwoo (Korean native cattle) steers. From the 7,600 ESTs, 829 contigs comprising more than two EST reads were assembled using the Phrap assembler. Based on the contig analysis, 201 candidate cSNPs were identified in 129 contigs, in which transitions (69%) outnumbered transversions (31%). To verify whether the predicted cSNPs are real, 17 SNPs involved in lipid and energy metabolism were selected from the ESTs. Twelve of these were confirmed to be real while five were identified as artifacts, possibly due to expressed sequence tag sequence error. Further analysis of the 12 verified cSNPs was performed using the program BLASTX. Five were identified as nonsynonymous cSNPs, five were synonymous cSNPs, and two SNPs were located in 3'-UTRs. Our data indicated that a relatively high SNP prediction rate (71%) from a large EST database could produce abundant cSNPs rapidly, which can be used as valuable genetic markers in cattle.

Intron sequence diversity of the asian cavity-nesting honey bee, Apis cerana (Hymenoptera: Apidae)

  • Wang, Ah Rha;Jeong, Su Yeon;Jeong, Jun Seong;Kim, Seong Ryul;Choi, Yong Soo;Kim, Iksoo
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.31 no.2
    • /
    • pp.62-69
    • /
    • 2015
  • The Asian cavity-nesting honeybee, Apis cerana (Hymenoptera: Apidae), has been extensively studied for its biogeography and genetic diversity, but the molecules utilized in past studies were mainly ~90 bp long mitochondrial non-coding sequences, located between $tRNA^{Leu}$ and COII. Thus, additional molecular markers may enrich our understanding of the biogeography and genetic diversity of this valuable bee species. In this study, we reviewed the public genome database to find introns of cDNA sequences, with the assumption that these introns may have less evolutionary constraints. The six introns selected were subjected to preliminary tests. Thereafter, two introns, titled White gene and MRJP9 gene, were selected. Sequencing of 552 clones from 184 individual bees showed a total of 222 and 141 sequence types in the White gene and MRJP9 gene introns, respectively. The sequence divergence ranged from 0.6% to 7.9% and from 0.26% to 17.6% in the White gene and the MRJP9 introns, respectively, indicating higher sequence divergence in both introns. Analysis of population genetic diversity for 16 populations originating from Korea, China, Vietnam, and Thailand shows that nucleotide diversity (π) ranges from 0.003117 to 0.025837 and from 0.016541 to 0.052468 in the White gene and MRJP9 introns, respectively. The highest π was found in a Vietnamese population for both intron sequences, whereas the nine Korean populations showed moderate to low sequence divergence. Considering the variability and diversity, these intron sequences can be useful as non-mitochondrial DNA-based molecular markers for future studies of population genetics.

Cloning and Molecular Characterization of Epoxide Hydrolase from Aspergillus niger LK (Apergillus niger LK 유래의 Epoxide Hydrolase 클로닝 및 특성 분석)

  • 이은열;김희숙
    • KSBB Journal
    • /
    • v.16 no.6
    • /
    • pp.562-567
    • /
    • 2001
  • Aspergillus niger LK harboring the enantioselective epoxide hydrolase (EHase) activity was isolated, and enantioselectivity of EHase was tested for various racemic aromatic epoxides. The gene encoding epoxide hydrolase was cloned from cDNA library generated by reverse transcriptase-polymerase chain reaction of the isolated total mRNA. Sequence analysis showed that the cloned gene encodes 398 amino acids with a deduced molecular mass of 44.5 kDa. Database comparison of the amino acid sequence reveals that it is similar to fungal EHase, whereas the sequence identity with bacterial EHase is very low. Recombinant expression of the cloned EHase in Escherichia coli BL21 yielded an active EHases, which can offer a potential biocatalyst for the production of chiral epoxides.

  • PDF

Discovering Sequence Association Rules for Protein Structure Prediction (단백질 구조 예측을 위한 서열 연관 규칙 탐사)

  • Kim, Jeong-Ja;Lee, Do-Heon;Baek, Yun-Ju
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.553-560
    • /
    • 2001
  • Bioinformatics is a discipline to support biological experiment projects by storing, managing data arising from genome research. In can also lead the experimental design for genome function prediction and regulation. Among various approaches of the genome research, the proteomics have been drawing increasing attention since it deals with the final product of genomes, i.e., proteins, directly. This paper proposes a data mining technique to predict the structural characteristics of a given protein group, one of dominant factors of the functions of them. After explains associations among amino acid subsequences in the primary structures of proteins, which can provide important clues for determining secondary or tertiary structures of them, it defines a sequence association rule to represent the inter-subsequences. It also provides support and confidence measures, newly designed to evaluate the usefulness of sequence association rules, After is proposes a method to discover useful sequence association rules from a given protein group, it evaluates the performance of the proposed method with protein sequence data from the SWISS-PROT protein database.

  • PDF

Pattern Similarity Retrieval of Data Sequences for Video Retrieval System (비디오 검색 시스템을 위한 데이터 시퀀스 패턴 유사성 검색)

  • Lee Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.347-356
    • /
    • 2006
  • A video stream can be represented by a sequence of data points in a multidimensional space. In this paper, we introduce a trend vector that approximates values of data points in a sequence and represents the moving trend of points in the sequence, and present a pattern similarity matching method for data sequences using the trend vector. A sequence is partitioned into multiple segments, each of which is represented by a trend vector. The query processing is based on the comparison of these vectors instead of scanning data elements of entire sequences. Using the trend vector, our method is designed to filter out irrelevant sequences from a database and to find similar sequences with respect to a query. We have performed an extensive experiment on synthetic sequences as well as video streams. Experimental results show that the precision of our method is up to 2.1 times higher and the processing time is up to 45% reduced, compared with an existing method.

Efficient Indexing for Large DNA Sequence Databases (대용량 DNA 시퀀스 데이타베이스를 위한 효율적인 인덱싱)

  • Won Jung-Im;Yoon Jee-Hee;Park Sang-Hyun;Kim Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.650-663
    • /
    • 2004
  • In molecular biology, DNA sequence searching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, a fast indexing mechanism is essential for efficient processing of DNA sequence searches. In this paper, we first identify the problems of the suffix tree in aspects of the storage overhead, search performance, and integration with DBMSs. Then, we propose a new index structure that solves those problems. The proposed index consists of two parts: the primary part represents the trie as bit strings without any pointers, and the secondary part helps fast accesses of the leaf nodes of the trio that need to be accessed for post processing. We also suggest an efficient algorithm based on that index for DNA sequence searching. To verify the superiority of the proposed approach, we conducted a performance evaluation via a series of experiments. The results revealed that the proposed approach, which requires smaller storage space, achieves 13 to 29 times performance improvement over the suffix tree.

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.