• Title/Summary/Keyword: DNA Computing

Search Result 69, Processing Time 0.023 seconds

Workflow for Building a Draft Genome Assembly using Public-domain Tools: Toxocara canis as a Case Study (개 회충 게놈 응용 사례에서 공개용 분석 툴을 사용한 드래프트 게놈 어셈블리 생성)

  • Won, JungIm;Kong, JinHwa;Huh, Sun;Yoon, JeeHee
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.513-518
    • /
    • 2014
  • It has become possible for small scale laboratories to interpret large scale genomic DNA, thanks to the reduction of the sequencing cost by the development of next generation sequencing (NGS). De novo assembly is a method which creates a putative original sequence by reconstructing reads without using a reference sequence. There have been various study results on de novo assembly, however, it is still difficult to get the desired results even by using the same assembly procedures and the analysis tools which were suggested in the studies reported. This is mainly because there are no specific guidelines for the assembly procedures or know-hows for the use of such analysis tools. In this study, to resolve these problems, we introduce steps to finding whole genome of an unknown DNA via NGS technology and de novo assembly, while providing the pros and cons of the various analysis tools used in each step. We used 350Mbp of Toxocara canis DNA as an application case for the detailed explanations of each stated step. We also extend our works for prediction of protein-coding genes and their functions from the draft genome sequence by comparing its homology with reference sequences of other nematodes.

Draft Genome of Toxocara canis, a Pathogen Responsible for Visceral Larva Migrans

  • Kong, Jinhwa;Won, Jungim;Yoon, Jeehee;Lee, UnJoo;Kim, Jong-Il;Huh, Sun
    • Parasites, Hosts and Diseases
    • /
    • v.54 no.6
    • /
    • pp.751-758
    • /
    • 2016
  • This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.

Next-generation Sequencing for Environmental Biology - Full-fledged Environmental Genomics around the Corner (차세대 유전체 기술과 환경생물학 - 환경유전체학 시대를 맞이하여)

  • Song, Ju Yeon;Kim, Byung Kwon;Kwon, Soon-Kyeong;Kwak, Min-Jung;Kim, Jihyun F.
    • Korean Journal of Environmental Biology
    • /
    • v.30 no.2
    • /
    • pp.77-89
    • /
    • 2012
  • With the advent of the genomics era powered by DNA sequencing technologies, life science is being transformed significantly and biological research and development have been accelerated. Environmental biology concerns the relationships among living organisms and their natural environment, which constitute the global biogeochemical cycle. As sustainability of the ecosystems depends on biodiversity, examining the structure and dynamics of the biotic constituents and fully grasping their genetic and metabolic capabilities are pivotal. The high-speed high-throughput next-generation sequencing can be applied to barcoding organisms either thriving or endangered and to decoding the whole genome information. Furthermore, diversity and the full gene complement of a microbial community can be elucidated and monitored through metagenomic approaches. With regard to human welfare, microbiomes of various human habitats such as gut, skin, mouth, stomach, and vagina, have been and are being scrutinized. To keep pace with the rapid increase of the sequencing capacity, various bioinformatic algorithms and software tools that even utilize supercomputers and cloud computing are being developed for processing and storage of massive data sets. Environmental genomics will be the major force in understanding the structure and function of ecosystems in nature as well as preserving, remediating, and bioprospecting them.

A Comprehensive Review of Emerging Computational Methods for Gene Identification

  • Yu, Ning;Yu, Zeng;Li, Bing;Gu, Feng;Pan, Yi
    • Journal of Information Processing Systems
    • /
    • v.12 no.1
    • /
    • pp.1-34
    • /
    • 2016
  • Gene identification is at the center of genomic studies. Although the first phase of the Encyclopedia of DNA Elements (ENCODE) project has been claimed to be complete, the annotation of the functional elements is far from being so. Computational methods in gene identification continue to play important roles in this area and other relevant issues. So far, a lot of work has been performed on this area, and a plethora of computational methods and avenues have been developed. Many review papers have summarized these methods and other related work. However, most of them focus on the methodologies from a particular aspect or perspective. Different from these existing bodies of research, this paper aims to comprehensively summarize the mainstream computational methods in gene identification and tries to provide a short but concise technical reference for future studies. Moreover, this review sheds light on the emerging trends and cutting-edge techniques that are believed to be capable of leading the research on this field in the future.

Genetic Risk Prediction for Normal-Karyotype Acute Myeloid Leukemia Using Whole-Exome Sequencing

  • Heo, Seong Gu;Hong, Eun Pyo;Park, Ji Wan
    • Genomics & Informatics
    • /
    • v.11 no.1
    • /
    • pp.46-51
    • /
    • 2013
  • Normal-karyotype acute myeloid leukemia (NK-AML) is a highly malignant and cytogenetically heterogeneous hematologic cancer. We searched for somatic mutations from 10 pairs of tumor and normal cells by using a highly efficient and reliable analysis workflow for whole-exome sequencing data and performed association tests between the NK-AML and somatic mutations. We identified 21 nonsynonymous single nucleotide variants (SNVs) located in a coding region of 18 genes. Among them, the SNVs of three leukemia-related genes (MUC4, CNTNAP2, and GNAS) reported in previous studies were replicated in this study. We conducted stepwise genetic risk score (GRS) models composed of the NK-AML susceptible variants and evaluated the prediction accuracy of each GRS model by computing the area under the receiver operating characteristic curve (AUC). The GRS model that was composed of five SNVs (rs75156964, rs56213454, rs6604516, rs10888338, and rs2443878) showed 100% prediction accuracy, and the combined effect of the three reported genes was validated in the current study (AUC, 0.98; 95% confidence interval, 0.92 to 1.00). Further study with large sample sizes is warranted to validate the combined effect of these somatic point mutations, and the discovery of novel markers may provide an opportunity to develop novel diagnostic and therapeutic targets for NK-AML.

Design and Implementation of Firmware for Low-cost Small PCR Devices (저가의 소형 PCR 장치를 위한 펌웨어 설계 및 구현)

  • Lee, Wan Yeon;Kim, Jong Dae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • In this paper, we design and implement a firmware for low-cost small PCR devices. To minimize machine code size, the proposed firmware controls real-time tasks simultaneously only with support of the hardware interrupt, but without support of the operating system program. The proposed firmware has the host-local structure in which the firmware receives operation commands from PC and sends operation results to PC through usb communication. We implement a low-cost small PCR device with the proposed firmware loaded on microchip PIC18F4550 chip, and verify that the implemented PCR device significantly reduces cost and volume size of existing commercial PCR devices with a similar performance.

A Position-Based Block Similarity Computing Method for Similar Transcript Model Search (유사 전사체 모델 탐색을 위한 위치 기반 블록 간의 유사도 비교 기법)

  • Kim, Sora;Park, TaeWon;Hwang, HyeRyeon;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.1326-1329
    • /
    • 2012
  • 전사체(transcript)는 유전자로부터 전사된 DNA 시퀀스 코드를 말한다. 전사체(transcript)의 발현된 형태에 따라 생성되는 단백질의 형태 역시 달라지므로 전사체 모델의 형태는 중요한 의미를 가지며 특정 위치의 전사체가 정상과 다르게 모델이 변할 경우 심각한 경우에는 유전자 질병에 노출될 수 있다. 현재 실험체에 대한 전사체 모형은 SpliceGrapher, Cufflinks와 같은 상용화된 도구들을 사용하여 얻을 수 있다. 하지만 이런 도구 간의 결과 값 및 어노테이션 정보와 결과 값 간의 유사도 비교를 위한 방법론은 현재 알려진 바 없다. 대신 전사체 비교를 위해 모형 간의 차이를 눈으로 하나씩 비교하거나 전사체 위치를 이용한 산수 값을 이용한다. 본 논문에서는 전사체 모형 간의 유사도를 비교하기 위한 방법론을 제시하고 Homo sapiens grch37 어노테이션 파일과 SRR387514 실험 데이터 간의 유사도를 제시한 방법론을 이용하여 측정한 결과 값을 분석하였다.

A Study on the Architectural Application of Biological Patterns (생물학적 패턴의 건축적 적용에 관한 연구)

  • Kim, Won Gaff
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.2
    • /
    • pp.35-45
    • /
    • 2012
  • The development of digital media made the change of architectural paradigm from tectonic to the surface and pattern. This means the transition to the new kind of materiality and the resurrection of ornament. This study started as an aim to apply biological pattern to architectural design from the new perception of pattern. Architectural patterns in the early era appeared as ladders, steps, chains, trees, vortices. But since 21st century, we can find patterns in nature like atoms and molecular structures, fluid forms of dynamics and new geometrical pattern like fractal and first of all biological patterns like viruses and micro-organisms, Voronoi cells, DNA structure, rhizomes and various hybrids and permutations of these. Pattern became one of the most important elements and themes of contemporary architecture through the change of materiality and resurrection of ornament with the new perception of surface in architecture. One of the patterns that give new creative availability to the architectural design is biological pattern which is self-organized as an optimum form through interaction with environment. Biological patterns emerge mostly as self-replicating patterns through morphogenesis, certain geometrical patterns(in particular triangles, pentagons, hexagons and spirals). The architectural application methods of biological patterns are direct figural pattern of organism, circle pattern, polygon pattern, energy-material control pattern, differentiation pattern, parametric pattern, growth principle pattern, evolutionary ecologic pattern. These patterns can be utilized as practical architectural patterns through the use of computer programs as morphogenetic programs like L-system, MoSS program and genetic algorithm programs like Grasshoper, Generative Components with the help of computing technology like mapping and scripting.

  • PDF

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window (랜드마크 윈도우 기반의 빈발 패턴 마이닝 기법의 분석 및 성능평가)

  • Pyun, Gwangbum;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.101-107
    • /
    • 2014
  • With the development of online service, recent forms of databases have been changed from static database structures to dynamic stream database structures. Previous data mining techniques have been used as tools of decision making such as establishment of marketing strategies and DNA analyses. However, the capability to analyze real-time data more quickly is necessary in the recent interesting areas such as sensor network, robotics, and artificial intelligence. Landmark window-based frequent pattern mining, one of the stream mining approaches, performs mining operations with respect to parts of databases or each transaction of them, instead of all the data. In this paper, we analyze and evaluate the techniques of the well-known landmark window-based frequent pattern mining algorithms, called Lossy counting and hMiner. When Lossy counting mines frequent patterns from a set of new transactions, it performs union operations between the previous and current mining results. hMiner, which is a state-of-the-art algorithm based on the landmark window model, conducts mining operations whenever a new transaction occurs. Since hMiner extracts frequent patterns as soon as a new transaction is entered, we can obtain the latest mining results reflecting real-time information. For this reason, such algorithms are also called online mining approaches. We evaluate and compare the performance of the primitive algorithm, Lossy counting and the latest one, hMiner. As the criteria of our performance analysis, we first consider algorithms' total runtime and average processing time per transaction. In addition, to compare the efficiency of storage structures between them, their maximum memory usage is also evaluated. Lastly, we show how stably the two algorithms conduct their mining works with respect to the databases that feature gradually increasing items. With respect to the evaluation results of mining time and transaction processing, hMiner has higher speed than that of Lossy counting. Since hMiner stores candidate frequent patterns in a hash method, it can directly access candidate frequent patterns. Meanwhile, Lossy counting stores them in a lattice manner; thus, it has to search for multiple nodes in order to access the candidate frequent patterns. On the other hand, hMiner shows worse performance than that of Lossy counting in terms of maximum memory usage. hMiner should have all of the information for candidate frequent patterns to store them to hash's buckets, while Lossy counting stores them, reducing their information by using the lattice method. Since the storage of Lossy counting can share items concurrently included in multiple patterns, its memory usage is more efficient than that of hMiner. However, hMiner presents better efficiency than that of Lossy counting with respect to scalability evaluation due to the following reasons. If the number of items is increased, shared items are decreased in contrast; thereby, Lossy counting's memory efficiency is weakened. Furthermore, if the number of transactions becomes higher, its pruning effect becomes worse. From the experimental results, we can determine that the landmark window-based frequent pattern mining algorithms are suitable for real-time systems although they require a significant amount of memory. Hence, we need to improve their data structures more efficiently in order to utilize them additionally in resource-constrained environments such as WSN(Wireless sensor network).