• Title/Summary/Keyword: genomic data

Search Result 626, Processing Time 0.026 seconds

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Comparison Architecture for Large Number of Genomic Sequences

  • Choi, Hae-won;Ryoo, Myung-Chun;Park, Joon-Ho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.1
    • /
    • pp.11-19
    • /
    • 2012
  • Generally, a suffix tree is an efficient data structure since it reveals the detailed internal structures of given sequences within linear time. However, it is difficult to implement a suffix tree for a large number of sequences because of memory size constraints. Therefore, in order to compare multi-mega base genomic sequence sets using suffix trees, there is a need to re-construct the suffix tree algorithms. We introduce a new method for constructing a suffix tree on secondary storage of a large number of sequences. Our algorithm divides three files, in a designated sequence, into parts, storing references to the locations of edges in hash tables. To execute experiments, we used 1,300,000 sequences around 300Mbyte in EST to generate a suffix tree on disk.

Bioinformatics and Genomic Medicine (생명정보학과 유전체의학)

  • Kim, Ju-Han
    • Journal of Preventive Medicine and Public Health
    • /
    • v.35 no.2
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

Accuracy of genomic breeding value prediction for intramuscular fat using different genomic relationship matrices in Hanwoo (Korean cattle)

  • Choi, Taejeong;Lim, Dajeong;Park, Byoungho;Sharma, Aditi;Kim, Jong-Joo;Kim, Sidong;Lee, Seung Hwan
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.7
    • /
    • pp.907-911
    • /
    • 2017
  • Objective: Intramuscular fat is one of the meat quality traits that is considered in the selection strategies for Hanwoo (Korean cattle). Different methods are used to estimate the breeding value of selection candidates. In the present work we focused on accuracy of different genotype relationship matrices as described by forni and pedigree based relationship matrix. Methods: The data set included a total of 778 animals that were genotyped for BovineSNP50 BeadChip. Among these 778 animals, 72 animals were sires for 706 reference animals and were used as a validation dataset. Single trait animal model (best linear unbiased prediction and genomic best linear unbiased prediction) was used to estimate the breeding values from genomic and pedigree information. Results: The diagonal elements for the pedigree based coefficients were slightly higher for the genomic relationship matrices (GRM) based coefficients while off diagonal elements were considerably low for GRM based coefficients. The accuracy of breeding value for the pedigree based relationship matrix (A) was 13% while for GRM (GOF, G05, and Yang) it was 0.37, 0.45, and 0.38, respectively. Conclusion: Accuracy of GRM was 1.5 times higher than A in this study. Therefore, genomic information will be more beneficial than pedigree information in the Hanwoo breeding program.

Bridging Comparative Genomics and DNA Marker-aided Molecular Breeding

  • Choi, Hong-Kyu;Cook, Douglas R.
    • Korean Journal of Breeding Science
    • /
    • v.43 no.2
    • /
    • pp.103-114
    • /
    • 2011
  • In recent years, genomic resources and information have accumulated at an ever increasing pace, in many plant species, through whole genome sequencing, large scale analysis of transcriptomes, DNA markers and functional studies of individual genes. Well-characterized species within key plant taxa, co-called "model systems", have played a pivotal role in nucleating the accumulation of genomic information and databases, thereby providing the basis for comparative genomic studies. In addition, recent advances to "Next Generation" sequencing technologies have propelled a new wave of genomics, enabling rapid, low cost analysis of numerous genomes, and the accumulation of genetic diversity data for large numbers of accessions within individual species. The resulting wealth of genomic information provides an opportunity to discern evolutionary processes that have impacted genome structure and the function of genes, using the tools of comparative analysis. Comparative genomics provides a platform to translate information from model species to crops, and to relate knowledge of genome function among crop species. Ultimately, the resulting knowledge will accelerate the development of more efficient breeding strategies through the identification of trait-associated orthologous genes and next generation functional gene-based markers.

Molecular genetic evaluation of gorals(naemorhedus caudatus raddeanus) genetic resources using microsatellite markers (초위성체 마커를 이용한 산양의 분자유전학적 고찰)

  • Seo, Joo Hee;Lee, Yoonseok;Jeon, Gwang Joo;Kong, Hong Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1043-1053
    • /
    • 2017
  • In this study, genotyping was executed by using 13 microsatellite markers for genetic diversity of 224 Gorals (Saanen(88), Laoshan(67), Toggenburg(32), Alpine(12), Anglonubian(9), Jamnapari(7) and Black Bengal(4)). The number of alleles was observed 4 (INRA005) to 18 (SRCRSP23) each markers. Observed heterozygostiy ($H_{obs}$), expected heterozygosity ($H_{\exp}$) and polymorphism information content (PIC) were observed 0.482 to 0.786, 0.476 to 0.923, and 0.392 to 0.915, respectively. Principal Components Analysis(PCoA) results were similar to the results of FCA. NE-I(on-exclusion probability for identity of two unrelated individuals) was estimated at $2.47{\times}10^{-15}$. In conclusion, this study shows the useful data that be utilized as a basic data of Gorals breeding and development.

An Integrated Genomic Resource Based on Korean Cattle (Hanwoo) Transcripts

  • Lim, Da-Jeong;Cho, Yong-Min;Lee, Seung-Hwan;Sung, Sam-Sun;Nam, Jung-Rye;Yoon, Du-Hak;Shin, Youn-Hee;Park, Hye-Sun;Kim, Hee-Bal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.23 no.11
    • /
    • pp.1399-1404
    • /
    • 2010
  • We have created a Bovine Genome Database, an integrated genomic resource for Bos taurus, by merging bovine data from various databases and our own data. We produced 55,213 Korean cattle (Hanwoo) ESTs from cDNA libraries from three tissues. We concentrated on genomic information based on Hanwoo transcripts and provided user-friendly search interfaces within the Bovine Genome Database. The genome browser supported alignment results for the various types of data: Hanwoo EST, consensus sequence, human gene, and predicted bovine genes. The database also provides transcript data information, gene annotation, genomic location, sequence and tissue distribution. Users can also explore bovine disease genes based on comparative mapping of homologous genes and can conduct searches centered on genes within user-selected quantitative trait loci (QTL) regions. The Bovine Genome Database can be accessed at http://bgd.nabc.go.kr.

Analysis of Microsatellite Markers on Bovine Chromosomes 1 and 14 for Potential Allelic Association with Carcass Traits in Hanwoo (Korean Cattle)

  • Choi, I.S.;Kong, H.S.;Oh, J.D.;Yoon, D.H.;Cho, B.W.;Choi, Y.H.;Kim, K.S.;Choi, K.D.;Lee, H.K.;Jeon, G.J.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.19 no.7
    • /
    • pp.927-930
    • /
    • 2006
  • This study was conducted to investigate potential effects of previously identified QTL regions on carcass traits in Hanwoo. The data analyzed in this study was collected from 326 steers of 67 proven sire. Thirteen micorsatellite markers spanning QTL regions on bovine chromosomes 1 and 14 were genotyped in 326 steers. The following breeding values were analyzed for QTL effects. Cold carcass weight breeding value (CCWBV), longissimus muscle area breeding value (LMABV), marbling score breeding value (MSBV) and backfat thickness breeding value (BFTBV). Chi-square tests were performed to compare frequencies of individual allele between high and low breeding value groups. Significant differences of allele frequencies in BMS711, MCM130, BMS4049, and BMS2263 were found. And also, in RM180, BL1029, BM4305, and BMS2055 there were significant differencies of allele frequencies. These results showed a potential application for investigation of putative QTL locations.

Characterization of Quantitative Trait Loci (QTL) for Growth using Genome Scanning in Korean Native Pig

  • Lee, H.K.;Choi, I.S.;Choi, B.H.;Kim, T.H.;Jung, I.J.
    • Reproductive and Developmental Biology
    • /
    • v.28 no.2
    • /
    • pp.107-112
    • /
    • 2004
  • Molecular genetic markers were genotyped used to detect chromosomal regions which contain economically important traits such as growth traits in pigs. Three generation resource population was constructed from a cross between the Korean native boars and Landrace sows. A total of 193 F2 animals from intercross of F1 were produced. Phenotypic data on 7 traits, birth weight, body weight at 3, 5, 12, 30 weeks of age, live empty weight were collected for F2 animals. Animals including grandparents (F0), parents (F1), offspring (F2) were genotyped for 194 microsatellite markers covering from chromosome 1 to 18. Quantitative trait locus analyses were performed using interval mapping by regression under line-cross model. To characterize presence of imprinting, genetic full model in which dominance, additive and imprinting effect were included was fitted in this analysis. Significance thresholds were determined by permutation test. Using imprinting full model, four QTL with expression of imprinted effect were detected at 5% chromosome-wide significance level for growth traits on chromosome 1, 5, 7, 13, 14, and 16.

Network-based regularization for analysis of high-dimensional genomic data with group structure (그룹 구조를 갖는 고차원 유전체 자료 분석을 위한 네트워크 기반의 규제화 방법)

  • Kim, Kipoong;Choi, Jiyun;Sun, Hokeun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1117-1128
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, regularization procedures based on penalized likelihood are often applied to identify genes or genetic regions associated with diseases or traits. A network-based regularization procedure can utilize biological network information (such as genetic pathways and signaling pathways in genetic association studies) with an outstanding selection performance over other regularization procedures such as lasso and elastic-net. However, network-based regularization has a limitation because cannot be applied to high-dimension genomic data with a group structure. In this article, we propose to combine data dimension reduction techniques such as principal component analysis and a partial least square into network-based regularization for the analysis of high-dimensional genomic data with a group structure. The selection performance of the proposed method was evaluated by extensive simulation studies. The proposed method was also applied to real DNA methylation data generated from Illumina Innium HumanMethylation27K BeadChip, where methylation beta values of around 20,000 CpG sites over 12,770 genes were compared between 123 ovarian cancer patients and 152 healthy controls. This analysis was also able to indicate a few cancer-related genes.