• 제목/요약/키워드: reference genome

검색결과 190건 처리시간 0.023초

Novel Discovery of LINE-1 in a Korean Individual by a Target Enrichment Method

  • Shin, Wonseok;Mun, Seyoung;Kim, Junse;Lee, Wooseok;Park, Dong-Guk;Choi, Seungkyu;Lee, Tae Yoon;Cha, Seunghee;Han, Kyudong
    • Molecules and Cells
    • /
    • 제42권1호
    • /
    • pp.87-95
    • /
    • 2019
  • Long interspersed element-1 (LINE-1 or L1) is an autonomous retrotransposon, which is capable of inserting into a new region of genome. Previous studies have reported that these elements lead to genomic variations and altered functions by affecting gene expression and genetic networks. Mounting evidence strongly indicates that genetic diseases or various cancers can occur as a result of retrotransposition events that involve L1s. Therefore, the development of methodologies to study the structural variations and interpersonal insertion polymorphisms by L1 element-associated changes in an individual genome is invaluable. In this study, we applied a systematic approach to identify human-specific L1s (i.e., L1Hs) through the bioinformatics analysis of high-throughput next-generation sequencing data. We identified 525 candidates that could be inferred to carry non-reference L1Hs in a Korean individual genome (KPGP9). Among them, we randomly selected 40 candidates and validated that approximately 92.5% of non-reference L1Hs were inserted into a KPGP9 genome. In addition, unlike conventional methods, our relatively simple and expedited approach was highly reproducible in confirming the L1 insertions. Taken together, our findings strongly support that the identification of non-reference L1Hs by our novel target enrichment method demonstrates its future application to genomic variation studies on the risk of cancer and genetic disorders.

Draft Genome Sequence of the Reference Strain of the Korean Medicinal Mushroom Wolfiporia cocos KMCC03342

  • Bogun Kim;Byoungnam Min;Jae-Gu Han;Hongjae Park;Seungwoo Baek;Subin Jeong;In-Geol Choi
    • Mycobiology
    • /
    • 제50권4호
    • /
    • pp.254-257
    • /
    • 2022
  • Wolfiporia cocos is a wood-decay brown rot fungus belonging to the family Polyporaceae. While the fungus grows, the sclerotium body of the strain, dubbed Bokryeong in Korean, is formed around the roots of conifer trees. The dried sclerotium has been widely used as a key component of many medicinal recipes in East Asia. Wolfiporia cocos strain KMCC03342 is the reference strain registered and maintained by the Korea Seed and Variety Service for commercial uses. Here, we present the first draft genome sequence of W. cocos KMCC03342 using a hybrid assembly technique combining both short- and long-read sequences. The genome has a total length of 55.5 Mb comprised of 343 contigs with N50 of 332 kb and 95.8% BUSCO completeness. The GC ratio was 52.2%. We predicted 14,296 protein-coding gene models based on ab initio gene prediction and evidence-based annotation procedure using RNAseq data. The annotated genome was predicted to have 19 terpene biosynthesis gene clusters, which was the same number as the previously sequenced W. cocos strain MD-104 genome but higher than Chinese W. cocos strains. The genome sequence and the predicted gene clusters allow us to study biosynthetic pathways for the active ingredients of W. cocos.

In Silico Identification of 6-Phosphogluconolactonase Genes that are Frequently Missing from Completely Sequenced Bacterial Genomes

  • Jeong, Hae-Young;F. Kim, Ji-Hyun;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • 제4권4호
    • /
    • pp.182-187
    • /
    • 2006
  • 6-Phosphogluconolactonase (6PGL) is one of the key enzymes in the ubiquitous pathways of central carbon metabolism, but bacterial 6PGL had been long known as a missing enzyme even after complete bacterial genome sequence information became available. Although recent experimental characterization suggests that there are two types of 6PGLs (DevB and YbhE), their phylogenetic distribution is severely biased. Here we present that proteins in COG group previously described as 3-oarboxymuconate cyclase (COG2706) are actually the YbhE-type 6PGLs, which are widely distributed in Proteobacteria and Fimicutes. This case exemplifies how erroneous functional description of a member in the reference database commonly used in transitive genome annotation cause systematic problem in the prediction of genes even with universal cellular functions.

MergeReference: A Tool for Merging Reference Panels for HLA Imputation

  • Cook, Seungho;Han, Buhm
    • Genomics & Informatics
    • /
    • 제15권3호
    • /
    • pp.108-111
    • /
    • 2017
  • Recently developed computational methods allow the imputation of human leukocyte antigen (HLA) genes using intergenic single nucleotide polymorphism markers. To improve the imputation accuracy in HLA imputation, it is essential to increase the sample size and the diversity of alleles in the reference panel. Our software, MergeReference, helps achieve this goal by providing a streamlined pipeline for combining multiple reference panels into one.

Genome-Wide SNP Calling Using Next Generation Sequencing Data in Tomato

  • Kim, Ji-Eun;Oh, Sang-Keun;Lee, Jeong-Hee;Lee, Bo-Mi;Jo, Sung-Hwan
    • Molecules and Cells
    • /
    • 제37권1호
    • /
    • pp.36-42
    • /
    • 2014
  • The tomato (Solanum lycopersicum L.) is a model plant for genome research in Solanaceae, as well as for studying crop breeding. Genome-wide single nucleotide polymorphisms (SNPs) are a valuable resource in genetic research and breeding. However, to do discovery of genome-wide SNPs, most methods require expensive high-depth sequencing. Here, we describe a method for SNP calling using a modified version of SAMtools that improved its sensitivity. We analyzed 90 Gb of raw sequence data from next-generation sequencing of two resequencing and seven transcriptome data sets from several tomato accessions. Our study identified 4,812,432 non-redundant SNPs. Moreover, the workflow of SNP calling was improved by aligning the reference genome with its own raw data. Using this approach, 131,785 SNPs were discovered from transcriptome data of seven accessions. In addition, 4,680,647 SNPs were identified from the genome of S. pimpinellifolium, which are 60 times more than 71,637 of the PI212816 transcriptome. SNP distribution was compared between the whole genome and transcriptome of S. pimpinellifolium. Moreover, we surveyed the location of SNPs within genic and intergenic regions. Our results indicated that the sufficient genome-wide SNP markers and very sensitive SNP calling method allow for application of marker assisted breeding and genome-wide association studies.

Comparison of prediction accuracy for genomic estimated breeding value using the reference pig population of single-breed and admixed-breed

  • Lee, Soo Hyun;Seo, Dongwon;Lee, Doo Ho;Kang, Ji Min;Kim, Yeong Kuk;Lee, Kyung Tai;Kim, Tae Hun;Choi, Bong Hwan;Lee, Seung Hwan
    • Journal of Animal Science and Technology
    • /
    • 제62권4호
    • /
    • pp.438-448
    • /
    • 2020
  • This study was performed to increase the accuracy of genomic estimated breeding value (GEBV) predictions for domestic pigs using single-breed and admixed reference populations (single-breed of Berkshire pigs [BS] with cross breed of Korean native pigs and Landrace pigs [CB]). The principal component analysis (PCA), linkage disequilibrium (LD), and genome-wide association study (GWAS) were performed to analyze the population structure prior to genomic prediction. Reference and test population data sets were randomly sampled 10 times each and precision accuracy was analyzed according to the size of the reference population (100, 200, 300, or 400 animals). For the BS population, prediction accuracy was higher for all economically important traits with larger reference population size. Prediction accuracy was ranged from -0.05 to 0.003, for all traits except carcass weight (CWT), when CB was used as the reference population and BS as the test. The accuracy of CB for backfat thickness (BF) and shear force (SF) using admixed population as reference increased with reference population size, while the results for CWT and muscle pH at 24 hours after slaughter (pH) were equivocal with respect to the relationship between accuracy and reference population size, although overall accuracy was similar to that using the BS as the reference.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

Functional annotation of de novo variants from healthy individuals

  • Lee, Jean;Hong, Sung Eun
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.46.1-46.7
    • /
    • 2019
  • The implications of germline de novo variants (DNVs) in diseases are well documented. Despite extensive research, inconsistencies between studies remain a challenge, and the distribution and genetic characteristics of DNVs need to be precisely evaluated. To address this issue at the whole-genome scale, a large number of DNVs identified from the whole-genome sequencing of 1,902 healthy trios (i.e., parents and progeny) from the Simons Foundation for Autism Research Initiative study and 20 healthy Korean trios were analyzed. These apparently nonpathogenic DNVs were enriched in functional elements of the genome but relatively depleted in regions of common copy number variants, implying their potential function as triggers of evolution even in healthy groups. No strong mutational hotspots were identified. The pathogenicity of the DNVs was not strongly elevated, reflecting the health status of the cohort. The mutational signatures were consistent with previous studies. This study will serve as a reference for future DNV studies.

Draft Genome of an AmpC-β-Lactamase Producing Serratia marcescens Isolate from Fresh farm Tomatoes in South Africa

  • Maike Claussen;Stefan Schmidt
    • 한국미생물·생명공학회지
    • /
    • 제51권3호
    • /
    • pp.309-313
    • /
    • 2023
  • Here we report essential features of the draft genome of an AmpC-β-lactamase-producing bacterial isolate obtained from farm tomatoes in South Africa. The isolate designated strain Tom1 featured a genome of 4950426 bp with a G+C% of 59.83. It was identified as Serratia marcescens by ribosomal multilocus sequence typing (rMLST), digital DNA-DNA hybridization (dDDH), average nucleotide identity (ANI), and phylogenetic analysis using reference genomes. Its genome encoded an AmpC-β-lactamase (blaSST-1), an efflux pump providing tetracycline resistance (tet(41)), and an aminoglycoside acetyltransferase (aac(6')-Ic). Additionally, genes encoding proteins involved in prodigiosin biosynthesis and associated with adherence, biofilm formation, virulence, and pathogenicity were detected.

Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle

  • Lee, DooHo;Kim, Yeongkuk;Chung, Yoonji;Lee, Dongjae;Seo, Dongwon;Choi, Tae Jeong;Lim, Dajeong;Yoon, Duhak;Lee, Seung Hwan
    • Journal of Animal Science and Technology
    • /
    • 제63권6호
    • /
    • pp.1232-1246
    • /
    • 2021
  • Recently, the cattle genome sequence has been completed, followed by developing a commercial single nucleotide polymorphism (SNP) chip panel in the animal genome industry. In order to increase statistical power for detecting quantitative trait locus (QTL), a number of animals should be genotyped. However, a high-density chip for many animals would be increasing the genotyping cost. Therefore, statistical inference of genotype imputation (low-density chip to high-density) will be useful in the animal industry. The purpose of this study is to investigate the effect of the reference population size and marker density on the imputation accuracy and to suggest the appropriate number of reference population sets for the imputation in Hanwoo cattle. A total of 3,821 Hanwoo cattle were divided into reference and validation populations. The reference sets consisted of 50k (38,916) marker data and different population sizes (500, 1,000, 1,500, 2,000, and 3,600). The validation sets consisted of four validation sets (Total 889) and the different marker density (5k [5,000], 10k [10,000], and 15k [15,000]). The accuracy of imputation was calculated by direct comparison of the true genotype and the imputed genotype. In conclusion, when the lowest marker density (5k) was used in the validation set, according to the reference population size, the imputation accuracy was 0.793 to 0.929. On the other hand, when the highest marker density (15k), according to the reference population size, the imputation accuracy was 0.904 to 0.967. Moreover, the reference population size should be more than 1,000 to obtain at least 88% imputation accuracy in Hanwoo cattle.