• Title/Summary/Keyword: contig assembly

Search Result 25, Processing Time 0.029 seconds

A Base-Calling Error Detection Program for Use in Microbial Genome Projects (미생물 유전체 프로젝트 수행을 위한 Base-Calling 오류 감지 프로그램 및 알고리즘 개발)

  • Lee, Dae-Sang;Park, Kie-Jung
    • Korean Journal of Microbiology
    • /
    • v.43 no.4
    • /
    • pp.317-320
    • /
    • 2007
  • In this paper, we have developed base-calling error detection program and algorithm which show the list of the genes or sequences that are suspected to contain base-calling errors. Those programs detect dubious bases in a few aspects in the process of microbial genome project. The first module detects base-calling error from the Phrap file by using contig assembly information. The second module analyzes frame shift mutation if it is originated from real mutation or artifact. Finally, in the case that there is control microbial genome annotation information, the third module extracts and shows the candidate base-calling error list by comparative genome analysis method.

Development of Contig Assembly Program for Nucleotide Sequencing (염기서열 해독작업을 위한 핵산 단편 조립 프로그램의 개발)

  • 이동훈
    • Korean Journal of Microbiology
    • /
    • v.35 no.2
    • /
    • pp.121-127
    • /
    • 1999
  • An effective computer program for assembling fragments in DNA sequencing has been developed. The program, called SeqEditor (Sequence Editor), is usable on the pcrsonal computer systems of MS-Widows which is the mosl popular operating system in Korea. It c'm recd several sequence file formats such as GenBak, FASTA, and ASCII. In the SeqEditor program, a dynamic programming algorihm is applied to compute the maximalscoring overlapping alignment between each pjlr of fragments. A novel feature of the program is that SeqEdilor implemnents interaclive operation with a graphical user interface. The performance lests of the prograln 011 fragmen1 data from 16s and 18s rDNA sequencing pi-ojects produced saiisIactory results. This program may be useful to a person who has work of time with large-scale DNA sequencing projects.

  • PDF

K-mer Based RNA-seq Read Distribution Method For Accelerating De Novo Transcriptome Assembly

  • Kwon, Hwijun;Jung, Inuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.8
    • /
    • pp.1-8
    • /
    • 2020
  • In this paper, we propose a gene family based RNA-seq read distribution method in means to accelerate the overal transcriptome assembly computation time. To measure the performance of our transcriptome sequence data distribution method, we evaluated the performance by testing four types of data sets of the Arabidopsis thaliana genome (Whole Unclassified Reads, Family-Classified Reads, Model-Classified Reads, and Randomly Classified Reads). As a result of de novo transcript assembly in distributed nodes using model classification data, the generated gene contigs matched 95% compared to the contig generated by WUR, and the execution time was reduced by 4.2 times compared to a single node environment using the same resources.

Development of an X-window Program, XFAP, for Assembling Contigs from DNA Fragment Data (DNA 염기 서열로부터 contig 구성을 위한 프로그램 XFAP의 개발)

  • Lee, Byung-Uk;Park, Kie-Jung;Kim, Seung-Moak
    • Korean Journal of Microbiology
    • /
    • v.34 no.1_2
    • /
    • pp.58-63
    • /
    • 1998
  • Fragment assembly problem is to reconstruct DNA sequence contigs from a collection of fragment sequences. We have developed an efficient X-window program, XFAP, for assembling DNA fragments. In the XFAP, the dimer frequency comparison method is used to quickly eliminate pairs of fragments that can not overlap. This method takes advantage of the difference of dimer frequencies within the minimum acceptable overlap length in each fragment pair. Hirschberg algorithm is applied to compute the maximal-scoring overlapping alignment in linear space. The perfomance of XFAP was tested on a set of DNA fragment sequences extracted from long DNA sequences of GenBank by a fragmentation program and showed a great improvement in execution time, especially as the number of fragments increases.

  • PDF

Status of Philippine Mango Genomics: Enriching Molecular Genomics Towards a Globally Competitive Philippine Mango Industry

  • Eureka Teresa M. Ocampo;Cris Q. Cortaga;Jhun Laurence S. Rasco;John Albert P. Lachica;Darlon V. Lantican
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.28-28
    • /
    • 2022
  • This paper presents the first genome assemblies of Philippine mangoes that provide valuable reference for varietal improvement and genomic studies on mango and related fruit crops. WE sequenced whole genomes of3 species, Mangifera odorata (Huani), Mangifera altissima (Paho), and Mangifera indica 'Carabao' (Sweet Elena). 'Carabao' is the major export variety of the Philippines; Paho is identified as vulnerable by the IUCN Red List of Threatened Species; Huani has fruit sap acrid which is the primary defense mechanism against insects and birds. We used Falcon, a diploid aware -de novo assembler to assemble SMRT generated long-read sequences. Falcon-unzip was employed to phase the output assembly producing larger contig sets (primary contigs) and shorter contigs corresponding to haplotypes (haplotigs). Assembly statistics were generated by comparing the assembly to a reference genome, Tommy Atkins, using Quality Assessment Tool (QUAST). Moreover, the extent of duplication and completeness of gene content was measured using Benchmarking Universal Single-Copy Orthologs (BUSCO). Draft assemblies with high duplications were processed using Purge Haplotigs and Purge Dups to lessen duplications with minimal impact on genome completeness. De novo assemblies of Huani, Paho and 'Carabao' were then generated with primary contig sizes of 463.64 Mb, 508.95 Mb and 401.51 Mb respectively. These draft assemblies of Huani, Paho and 'Carabao' showed 96.90%, 95.17% and 99.07% complete BUSCOs respectively which is comparable to 'Tommy Atkins' genome (98.6%). Using two mango transcriptome data (pooled RNA-seq from different mango varieties and tissues), 91-96% or 24-30 million reads were successfully mapped back for each generated assembly indicating high degree of completeness. The results obtained demonstrated the highly contiguous, phased, and near complete genome assembly of three Philippine mango species for structural and functional annotation of gene units, especially those with economic importance.

  • PDF

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

Identification and Characterization of Polymorphic Microsatellite Loci using Next Generation Sequencing in Quercus variabilis (차세대 염기서열 분석을 이용한 굴참나무(Quercus variabilis)의 microsatellite 마커 개발 및 특성 분석)

  • Baek, Seung-Hoon;Lee, Jei-Wan;Hong, Kyung-Nak;Lee, Seok-Woo;Ahn, Ji-Young;Lee, Min-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.105 no.2
    • /
    • pp.186-192
    • /
    • 2016
  • This study was conducted to develop microsatellite markers in Quercus variabilis using next generation sequencing. A total of 305,771 reads (384 bp on average) were generated on a Roche GS-FLX system, yielding 117 Mbp of sequences. The de novo assembly resulted in 7,346 contigs. A total of 606 contigs (20.75%) including 911 microsatellite loci were derived from the 2,921 contigs longer than 500 bp. A total of 180 primer sets were designed from the 911 microsatellite loci and screened in eight Q. variabilis individual trees sampled from a natural stand to obtain polymorphic loci. As a result, a total of thirteen polymorphic microsatellite loci were selected and used for estimating population genetic parameters in the 54 individual trees. The mean number of effective alleles was 4.996 ranging from 2.439 to 7.515. The observed heterozygosity and the expected heterozygosity ranged between 0.731 and 1.000 with an average of 0.873 and from 0.590 to 0.867 with an average of 0.766, respectively. Null alleles were not detected in all loci. No significant linkage disequilibrium was detected after Bonferroni correction in all loci. In the near future, these novel polymorphic microsatellite markers will be used to study population and conservation genetics of Q. variabilis of Korea in more detail.

Analysis of Genes Expressed during Pepper-Phytophthora capsici Interaction using EST Technology (EST기법을 이용한 고추와 고추역병균간의 상호작용에서 발현되는 유전자들의 분석)

  • Kim, Dongyoung;Lee, Jong-Hwan;Choi, Woobong
    • Journal of Life Science
    • /
    • v.24 no.11
    • /
    • pp.1187-1192
    • /
    • 2014
  • Pepper, consumed as a typical spice food around world, is mainly cultivated in warm countries, including Korea, China, and Mexico. Phytophthora capsici is a pathogen on several economically important crops, including pepper. The oomycete attacks the roots, stems, leaves, and fruit of the host plants. To understand the molecular mechanisms underlying development of the disease, the genes expressed during pepper-P. capsici interaction were explored by analyzing expressed sequence tags (ESTs). A cDNA library was constructed from total RNA extracted from pepper leaves challenged with P. capsici for three days, resulting in an early stage of symptom development for comparable interaction. A comprehensive analysis of single-pass sequencing of 5,760 randomly selected cDNA clones extracted 5,148 high-quality entries for contig assembly, which generated 2,990 unigenes. A homology search of the unigenes with BLASTX resulted in 2,409 matches, of which 606 showed classified functional catalogs.

Fragment Combination From DNA Sequence Data Using Fuzzy Reasoning Method (퍼지 추론기법을 이용한 DNA 염기 서열의 단편결합)

  • Kim, Kwang-Baek;Park, Hyun-Jung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2329-2334
    • /
    • 2006
  • In this paper, we proposed a method complementing failure of combining DNA fragments, defect of conventional contig assembly programs. In the proposed method, very long DNA sequence data are made into a prototype of fragment of about 700 bases that can be analyzed by automatic sequence analyzer at one time, and then matching ratio is calculated by comparing a standard prototype with 3 fragmented clones of about 700 bases generated by the PCR method. In this process, the time for calculation of matching ratio is reduced by Compute Agreement algorithm. Two candidates of combined fragments of every prototype are extracted by the degree of overlapping of calculated fragment pairs, and then degree of combination is decided using a fuzzy reasoning method that utilizes the matching ratios of each extracted fragment, and A, C, G, T membership degrees of each DNA sequence, and previous frequencies of each A, C, G, T. In this paper. DNA sequence combination is completed by the iteration of the process to combine decided optimal test fragments until no fragment remains. For the experiments, fragments or about 700 bases were generated from each sequence of 10,000 bases and 100,000 bases extracted from 'PCC6803', complete protein genome. From the experiments by applying random notations on these fragments, we could see that the proposed method was faster than FAP program, and combination failure, defect of conventional contig assembly programs, did not occur.