• Title/Summary/Keyword: single imputation

Search Result 33, Processing Time 0.043 seconds

Imputation Accuracy from 770K SNP Chips to Next Generation Sequencing Data in a Hanwoo (Korean Native Cattle) Population using Minimac3 and Beagle (Minimac3와 Beagle 프로그램을 이용한 한우 770K chip 데이터에서 차세대 염기서열분석 데이터로의 결측치 대치의 정확도 분석)

  • An, Na-Rae;Son, Ju-Hwan;Park, Jong-Eun;Chai, Han-Ha;Jang, Gul-Won;Lim, Dajeong
    • Journal of Life Science
    • /
    • v.28 no.11
    • /
    • pp.1255-1261
    • /
    • 2018
  • Whole genome analysis have been made possible with the development of DNA sequencing technologies and discovery of many single nucleotide polymorphisms (SNPs). Large number of SNP can be analyzed with SNP chips, since SNPs of human as well as livestock genomes are available. Among the various missing nucleotide imputation programs, Minimac3 software is suggested to be highly accurate, with a simplified workflow and relatively fast. In the present study, we used Minimac3 program to perform genomic missing value substitution 1,226 animals 770K SNP chip and imputing missing SNPs with next generation sequencing data from 311 animals. The accuracy on each chromosome was about 94~96%, and individual sample accuracy was about 92~98%. After imputation of the genotypes, SNPs with R Square ($R^2$) values for three conditions were 0.4, 0.6, and 0.8 and the percentage of SNPs were 91%, 84%, and 70% respectively. The differences in the Minor Allele Frequency gave $R^2$ values corresponding to seven intervals (0, 0.025), (0.025, 0.05), (0.05, 0.1), (0.1, 0.2), (0.2, 0.3). (0.3, 0.4) and (0.4, 0.5) of 64~88%. The total analysis time was about 12 hr. In future SNP chip studies, as the size and complexity of the genomic datasets increase, we expect that genomic imputation using Minimac3 can improve the reliability of chip data for Hanwoo discrimination.

A Study on the Correlation between SLC25A26 Polymorphism and Gastritis and Gastric Ulcers in Koreans (한국인의 SLC25A26 유전자 다형성과 위염, 위궤양과의 상관성에 관한 연구)

  • Soyeun PARK;Dahyun HWANG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.55 no.4
    • /
    • pp.291-297
    • /
    • 2023
  • Gastritis is an inflammation of the gastric mucosa and gastric ulcers are a break in the mucosa of the stomach lining. Past research on gastritis and gastric ulcers has been mainly conducted from the perspective that environmental factors are the primary cause of these gastric diseases. However, recently the importance of genetic factors has been emphasized due to current developments in genetic research. The SLC25A26 gene is believed to be associated with the accumulation of reactive oxygen species. Oxidative stress promotes an inflammatory response, which increases the production of free radicals and causes cellular damage, and these lead to the development of gastric diseases. In this study, the correlation between SLC25A26 and gastric diseases was analyzed. Polymorphisms in SLC25A26 were analyzed in 1,369 domestic gastric disease patients and 7,471 healthy controls. As a result, 11 single nucleotide polymorphisms (SNPs) (in the genotype) and 13 SNPs (in the imputation) showed statistical significance (P<0.05), and high relative risk of gastric diseases. Among them, the rs13874 allele of SLC25A26 showed a highly significant association with gastric diseases. In the genotype-based mRNA expression analysis, the minor allele (C) group showed increased mRNA expression and this could increase oxidative stress. In conclusion, SLC25A26 polymorphisms are associated with gastric diseases. These results may provide a basis for new guidelines for gastric disease management in the Korean population.

Accuracy of Imputation of Microsatellite Markers from BovineSNP50 and BovineHD BeadChip in Hanwoo Population of Korea

  • Sharma, Aditi;Park, Jong-Eun;Park, Byungho;Park, Mi-Na;Roh, Seung-Hee;Jung, Woo-Young;Lee, Seung-Hwan;Chai, Han-Ha;Chang, Gul-Won;Cho, Yong-Min;Lim, Dajeong
    • Genomics & Informatics
    • /
    • v.16 no.1
    • /
    • pp.10-13
    • /
    • 2018
  • Until now microsatellite (MS) have been a popular choice of markers for parentage verification. Recently many countries have moved or are in process of moving from MS markers to single nucleotide polymorphism (SNP) markers for parentage testing. FAO-ISAG has also come up with a panel of 200 SNPs to replace the use of MS markers in parentage verification. However, in many countries most of the animals were genotyped by MS markers till now and the sudden shift to SNP markers will render the data of those animals useless. As National Institute of Animal Science in South Korea plans to move from standard ISAG recommended MS markers to SNPs, it faces the dilemma of exclusion of old animals that were genotyped by MS markers. Thus to facilitate this shift from MS to SNPs, such that the existing animals with MS data could still be used for parentage verification, this study was performed. In the current study we performed imputation of MS markers from the SNPs in the 500-kb region of the MS marker on either side. This method will provide an easy option for the labs to combine the data from the old and the current set of animals. It will be a cost efficient replacement of genotyping with the additional markers. We used 1,480 Hanwoo animals with both the MS data and SNP data to impute in the validation animals. We also compared the imputation accuracy between BovineSNP50 and BovineHD BeadChip. In our study the genotype concordance of 40% and 43% was observed in the BovineSNP50 and BovineHD BeadChip respectively.

A whole genome sequence association study of muscle fiber traits in a White Duroc×Erhualian F2 resource population

  • Guo, Tianfu;Gao, Jun;Yang, Bin;Yan, Guorong;Xiao, Shijun;Zhang, Zhiyan;Huang, Lusheng
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.5
    • /
    • pp.704-711
    • /
    • 2020
  • Objective: Muscle fiber types, numbers and area are crucial aspects associated with meat production and quality. However, there are few studies of pig muscle fibre traits in terms of the detection power, false discovery rate and confidence interval precision of whole-genome quantitative trait loci (QTL). We had previously performed genome scanning for muscle fibre traits using 183 microsatellites and detected 8 significant QTLs in a White Duroc×Erhualian F2 population. The confidence intervals of these QTLs ranged between 11 and 127 centimorgan (cM), which contained hundreds of genes and hampered the identification of QTLs. A whole-genome sequence imputation of the population was used for fine mapping in this study. Methods: A whole-genome sequences association study was performed in the F2 population. Genotyping was performed for 1,020 individuals (19 F0, 68 F1, and 933 F2). The whole-genome variants were imputed and 21,624,800 single nucleotide polymorphisms (SNPs) were identified and examined for associations to 11 longissimus dorsi muscle fiber traits. Results: A total of 3,201 significant SNPs comprising 7 novel QTLs showing associations with the relative area of fiber type I (I_RA), the fiber number per square centimeter (FN) and the total fiber number (TFN). Moreover, one QTL on pig chromosome 14 was found to affect both FN and TFN. Furthermore, four plausible candidate genes associated with FN (kinase non-catalytic C-lobe domain containing [KNDC1]), TFN (KNDC1), and I_RA (solute carrier family 36 member 4, contactin associated protein like 5, and glutamate metabotropic receptor 8) were identified. Conclusion: An efficient and powerful imputation-based association approach was utilized to identify genes potentially associated with muscle fiber traits. These identified genes and SNPs could be explored to improve meat production and quality via marker-assisted selection in pigs.

A genome-wide association study on growth traits of Korean commercial pig breeds using Bayesian methods

  • Jong Hyun Jung;Sang Min Lee;Sang-Hyon Oh
    • Animal Bioscience
    • /
    • v.37 no.5
    • /
    • pp.807-816
    • /
    • 2024
  • Objective: This study aims to identify the significant regions and candidate genes of growth-related traits (adjusted backfat thickness [ABF], average daily gain [ADG], and days to 90 kg [DAYS90]) in Korean commercial GGP pig (Duroc, Landrace, and Yorkshire) populations. Methods: A genome-wide association study (GWAS) was performed using single-nucleotide polymorphism (SNP) markers for imputation to Illumina PorcineSNP60. The BayesB method was applied to calculate thresholds for the significance of SNP markers. The identified windows were considered significant if they explained ≥1% genetic variance. Results: A total of 28 window regions were related to genetic growth effects. Bayesian GWAS revealed 28 significant genetic regions including 52 informative SNPs associated with growth traits (ABF, ADG, DAYS90) in Duroc, Landrace, and Yorkshire pigs, with genetic variance ranging from 1.00% to 5.46%. Additionally, 14 candidate genes with previous functional validation were identified for these traits. Conclusion: The identified SNPs within these regions hold potential value for future marker-assisted or genomic selection in pig breeding programs. Consequently, they contribute to an improved understanding of genetic architecture and our ability to genetically enhance pigs. SNPs within the identified regions could prove valuable for future marker-assisted or genomic selection in pig breeding programs.

Association Study of NDFIP2 Genetic Polymorphism with Asthma in the Korean Population (한국인에서 NDFIP2 유전적 다형성과 천식의 상관 연구)

  • Choi, Eun Hye;Hwang, Dahyun
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.53 no.3
    • /
    • pp.249-256
    • /
    • 2021
  • Asthma is a chronic inflammatory airway disease. There are many factors including genetic and environmental factors that influence asthma. The mitogen-activated protein kinase (MAPK) pathway is involved in maintaining the T helper cells 1 and 2 (Th1/Th2) balance and plays an important role in the development of asthma. In this study, the correlation between the NDFIP2 gene that regulates the MAPK pathway and asthma was analyzed. The genetic polymorphism of the NDFIP2 gene was analyzed between 193 asthma patients and 3,228 healthy controls in Korea. As a result, 4 single nucleotide polymorphisms (SNPs) showed a significant correlation (P<0.05) and high relative risk with asthma. Among them, rs2783122 of NDFIP2 showed a statistically significant association with asthma (P-value=9.76×10-6, odds ratio (OR)=1.67, 95% confidence interval (CI)=1.33~2.10). In the SNP imputation on the NDFIP2, 16 SNPs were discovered, and all of them showed significant correlation with asthma and high odds ratio. The genotype-based mRNA expression analysis revealed that the group of minor alleles of rs1408049 showed increased mRNA expression. Increased NDFIP2 expression causes the activation of the MAPK pathway, and this may influence the development of asthma. In conclusion, the polymorphisms of NDFIP2 are associated with asthma development and this can provide the basis for new guidelines for the management of asthma in the Korean population.

Smoothed RSSI-Based Distance Estimation Using Deep Neural Network (심층 인공신경망을 활용한 Smoothed RSSI 기반 거리 추정)

  • Hyeok-Don Kwon;Sol-Bee Lee;Jung-Hyok Kwon;Eui-Jik Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.2
    • /
    • pp.71-76
    • /
    • 2023
  • In this paper, we propose a smoothed received signal strength indicator (RSSI)-based distance estimation using deep neural network (DNN) for accurate distance estimation in an environment where a single receiver is used. The proposed scheme performs a data preprocessing consisting of data splitting, missing value imputation, and smoothing steps to improve distance estimation accuracy, thereby deriving the smoothed RSSI values. The derived smoothed RSSI values are used as input data of the Multi-Input Single-Output (MISO) DNN model, and are finally returned as an estimated distance in the output layer through input layer and hidden layer. To verify the superiority of the proposed scheme, we compared the performance of the proposed scheme with that of the linear regression-based distance estimation scheme. As a result, the proposed scheme showed 29.09% higher distance estimation accuracy than the linear regression-based distance estimation scheme.

Whole-genome sequence association study identifies cyclin dependent kinase 8 as a key gene for the number of mummified piglets

  • Pingxian, Wu;Dejuan, Chen;Kai, Wang;Shujie, Wang;Yihui, Liu;Anan, Jiang;Weihang, Xiao;Yanzhi, Jiang;Li, Zhu;Xu, Xu;Xiaotian, Qiu;Xuewei, Li;Guoqing, Tang
    • Animal Bioscience
    • /
    • v.36 no.1
    • /
    • pp.29-42
    • /
    • 2023
  • Objective: Pigs, an ideal biomedical model for human diseases, suffer from about 50% early embryonic and fetal death, a major cause of fertility loss worldwide. However, identifying the causal variant remains a huge challenge. This study aimed to detect single nucleotide polymorphisms (SNPs) and candidate genes for the number of mummified (NM) piglets using the imputed whole-genome sequence (WGS) and validate the potential candidate genes. Methods: The imputed WGS was introduced from genotyping-by-sequencing (GBS) using a multi-breed reference population. We performed genome-wide association studies (GWAS) for NM piglets at birth from a Landrace pig populatiGWAS peak located on SSC11: 0.10 to 7.11 Mbp (Top SNP, SSC11:1,889,658 bp; p = 9.98E-13) was identified in cyclin dependent kinase on. A total of 300 Landrace pigs were genotyped by GBS. The whole-genome variants were imputed, and 4,252,858 SNPs were obtained. Various molecular experiments were conducted to determine how the genes affected NM in pigs. Results: A strong GWAS peak located on SSC11: 0.10 to 7.11 Mbp (Top SNP, SSC11:1,889,658 bp; p = 9.98E-13) was identified in cyclin dependent kinase 8 (CDK8) gene, which plays a crucial role in embryonic retardation and lethality. Based on the molecular experiments, we found that Y-box binding protein 1 (YBX1) was a crucial transcription factor for CDK8, which mediated the effect of CDK8 in the proliferation of porcine ovarian granulosa cells via transforming growth factor beta/small mother against decapentaplegic signaling pathway, and, as a consequence, affected embryo quality, indicating that this pathway may be contributing to mummified fetal in pigs. Conclusion: A powerful imputation-based association study was performed to identify genes associated with NM in pigs. CDK8 was suggested as a functional gene for the proliferation of porcine ovarian granulosa cells, but further studies are required to determine causative mutations and the effect of loci on NM in pigs.

Accuracy of genomic-polygenic estimated breeding value for milk yield and fat yield in the Thai multibreed dairy population with five single nucleotide polymorphism sets

  • Wongpom, Bodin;Koonawootrittriron, Skorn;Elzo, Mauricio A.;Suwanasopee, Thanathip;Jattawa, Danai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.9
    • /
    • pp.1340-1348
    • /
    • 2019
  • Objective: The objectives were to compare variance components, genetic parameters, prediction accuracies, and genomic-polygenic estimated breeding value (EBV) rankings for milk yield (MY) and fat yield (FY) in the Thai multibreed dairy population using five single nucleotide polymorphism (SNP) sets from GeneSeek GGP80K chip. Methods: The dataset contained monthly MY and FY of 8,361 first-lactation cows from 810 farms. Variance components, genetic parameters, and EBV for five SNP sets from the GeneSeek GGP80K chip were obtained using a 2-trait single-step average-information restricted maximum likelihood procedure. The SNP sets were the complete SNP set (all available SNP; SNP100), top 75% set (SNP75), top 50% set (SNP50), top 25% set (SNP25), and top 5% set (SNP5). The 2-trait models included herd-year-season, heterozygosity and age at first calving as fixed effects, and animal additive genetic and residual as random effects. Results: The estimates of additive genetic variances for MY and FY from SNP subsets were mostly higher than those of the complete set. The SNP25 MY and FY heritability estimates (0.276 and 0.183) were higher than those from SNP75 (0.265 and 0.168), SNP50 (0.275 and 0.179), SNP5 (0.231 and 0.169), and SNP100 (0.251and 0.159). The SNP25 EBV accuracies for MY and FY (39.76% and 33.82%) were higher than for SNP75 (35.01% and 32.60%), SNP50 (39.64% and 33.38%), SNP5 (38.61% and 29.70%), and SNP100 (34.43% and 31.61%). All rank correlations between SNP100 and SNP subsets were above 0.98 for both traits, except for SNP100 and SNP5 (0.93 for MY; 0.92 for FY). Conclusion: The high SNP25 estimates of genetic variances, heritabilities, EBV accuracies, and rank correlations between SNP100 and SNP25 for MY and FY indicated that genotyping animals with SNP25 dedicated chip would be a suitable to maintain genotyping costs low while speeding up genetic progress for MY and FY in the Thai dairy population.

Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS

  • Kwon, Ji-Sun;Kim, Ji-Hye;Nam, Doug-U;Kim, Sang-Soo
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.123-127
    • /
    • 2012
  • Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.