• 제목/요약/키워드: genomic visualization

검색결과 23건 처리시간 0.023초

스트링 B-트리를 이용한 게놈 서열 분석 시스템 (An Analysis System for Whole Genomic Sequence Using String B-Tree)

  • 최정현;조환규
    • 정보처리학회논문지A
    • /
    • 제8A권4호
    • /
    • pp.509-516
    • /
    • 2001
  • 생명 과학의 발전과 많은 게놈(genome) 프로젝트의 결과로 여러 종의 게놈 서열이 밝혀지고 있다. 생물체의 서열을 분석하는 방법은 전역정렬(global alignment), 지역정렬(local alignment) 등 여러 가지 방법이 있는데, 그 중 하나가 k-mer 분석이다. k-mer는 유전자의 염기 서열내의 길이가 k인 연속된 염기 서열로서 k-mer 분석은 염기서열이 가진 k-mer들의 빈도 분포나 대칭성 등을 탐색하는 것이다. 그런데 게놈의 염기 서열은 대용량 텍스트이고 k가 클 때 기존의 온메모리 알고리즘으로는 처리가 불가능하므로 효율적인 자료구조와 알고리즘이 필요하다. 스트링 B-트리는 패턴 일치(pattern matching)에 적합하고 외부 메모리를 지원하는 좋은 자료구조이다. 본 논문에서는 스트링 B-트리(string B-tree)를 k-mer 분석에 효율적인 구조로 개선하여, C. elegans 외의 30개의 게놈 서열에 대해 분석한다. k-mer들의 빈도 분포와 대칭성을 보여주기 위해 CGR(Chaotic Game Representation)을 이용한 가시화 시스템을 제시한다. 게놈 서열과 매우 유사한 서열 상의 어떤 부분을 시그니쳐(signature)라 하고, 높은 유사도를 가지는 최소 길이의 시그니쳐를 찾는 알고리즘을 제시한다.

  • PDF

Development of a Knowledge Base for Korean Pharmacogenomics Research Network

  • Park, Chan Hee;Lee, Su Yeon;Jung, Yong;Park, Yu Rang;Lee, Hye Won;Kim, Ju Han
    • Genomics & Informatics
    • /
    • 제3권3호
    • /
    • pp.68-73
    • /
    • 2005
  • Pharmacogenomics research requires an intelligent integration of large-scale genomic and clinical data with public and private knowledge resources. We developed a web-based knowledge base for KPRN (Korea Pharmacogenomics Research Network, http://kprn.snubi. org/). Four major types of information is integrated; genetic variation, drug information, disease information, and literature annotation. Eighteen Korean pharmacogenomics research groups in collaboration have submitted 859 genotype data sets for 91 disease-related genes. Integrative analysis and visualization of the large collection of data supported by integrated biomedical path­ways and ontology resources are provided with a user-friendly interface and visualization engine empowered by Generic Genome Browser.

MP-Lasso chart: a multi-level polar chart for visualizing group Lasso analysis of genomic data

  • Min Song;Minhyuk Lee;Taesung Park;Mira Park
    • Genomics & Informatics
    • /
    • 제20권4호
    • /
    • pp.48.1-48.7
    • /
    • 2022
  • Penalized regression has been widely used in genome-wide association studies for joint analyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (Lasso) method effectively removes some coefficients from the model by shrinking them to zero. To handle group structures, such as genes and pathways, several modified Lasso penalties have been proposed, including group Lasso and sparse group Lasso. Group Lasso ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group Lasso performs group selection as in group Lasso, but also performs individual selection as in Lasso. While these sparse methods are useful in high-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. Lasso's results are often expressed as trace plots of regression coefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar Lasso (MP-Lasso) chart, which can effectively represent the results from group Lasso and sparse group Lasso analyses. An R package to draw MP-Lasso charts was developed. Through a real-world genetic data application, we demonstrated that our MP-Lasso chart package effectively visualizes the results of Lasso, group Lasso, and sparse group Lasso.

Parsing KEGG XML Files to Find Shared and Duplicate Compounds Contained in Metabolic Pathway Maps: A Graph-Theoretical Perspective

  • Kang, Sung-Hui;Jang, Myung-Ha;Whang, Ji-Young;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • 제6권3호
    • /
    • pp.147-152
    • /
    • 2008
  • The basic graph layout technique, one of many visualization techniques, deals with the problem of positioning vertices in a way to maximize some measure of desirability in a graph. The technique is becoming critically important for further development of the field of systems biology. However, applying the appropriate automatic graph layout techniques to the genomic scale flow of metabolism requires an understanding of the characteristics and patterns of duplicate and shared vertices, which is crucial for bioinformatics software developers. In this paper, we provide the results of parsing KEGG XML files from a graph-theoretical perspective, for future research in the area of automatic layout techniques in biological pathway domains.

Visualization of chromatin higher-order structures and dynamics in live cells

  • Park, Tae Lim;Lee, YigJi;Cho, Won-Ki
    • BMB Reports
    • /
    • 제54권10호
    • /
    • pp.489-496
    • /
    • 2021
  • Chromatin has highly organized structures in the nucleus, and these higher-order structures are proposed to regulate gene activities and cellular processes. Sequencing-based techniques, such as Hi-C, and fluorescent in situ hybridization (FISH) have revealed a spatial segregation of active and inactive compartments of chromatin, as well as the non-random positioning of chromosomes in the nucleus, respectively. However, regardless of their efficiency in capturing target genomic sites, these techniques are limited to fixed cells. Since chromatin has dynamic structures, live cell imaging techniques are highlighted for their ability to detect conformational changes in chromatin at a specific time point, or to track various arrangements of chromatin through long-term imaging. Given that the imaging approaches to study live cells are dramatically advanced, we recapitulate methods that are widely used to visualize the dynamics of higher-order chromatin structures.

Visualization of Multicolored in vivo Organelle Markers for Co-Localization Studies in Oryza sativa

  • Dangol, Sarmina;Singh, Raksha;Chen, Yafei;Jwa, Nam-Soo
    • Molecules and Cells
    • /
    • 제40권11호
    • /
    • pp.828-836
    • /
    • 2017
  • Eukaryotic cells consist of a complex network of thousands of proteins present in different organelles where organelle-specific cellular processes occur. Identification of the subcellular localization of a protein is important for understanding its potential biochemical functions. In the post-genomic era, localization of unknown proteins is achieved using multiple tools including a fluorescent-tagged protein approach. Several fluorescent-tagged protein organelle markers have been introduced into dicot plants, but its use is still limited in monocot plants. Here, we generated a set of multicolored organelle markers (fluorescent-tagged proteins) based on well-established targeting sequences. We used a series of pGWBs binary vectors to ameliorate localization and co-localization experiments using monocot plants. We constructed different fluorescent-tagged markers to visualize rice cell organelles, i.e., nucleus, plastids, mitochondria, peroxisomes, golgi body, endoplasmic reticulum, plasma membrane, and tonoplast, with four different fluorescent proteins (FPs) (G3GFP, mRFP, YFP, and CFP). Visualization of FP-tagged markers in their respective compartments has been reported for dicot and monocot plants. The comparative localization of the nucleus marker with a nucleus localizing sequence, and the similar, characteristic morphology of mCherry-tagged Arabidopsis organelle markers and our generated organelle markers in onion cells, provide further evidence for the correct subcellular localization of the Oryza sativa (rice) organelle marker. The set of eight different rice organelle markers with four different FPs provides a valuable resource for determining the subcellular localization of newly identified proteins, conducting co-localization assays, and generating stable transgenic localization in monocot plants.

자기 조직화 지도에 기반한 유전자 발현 데이터의 계층적 군집화 (Hierarchical Clustering of Gene Expression Data Based on Self Organizing Map)

  • Park, Chang-Beom;Lee, Dong-Hwan;Lee, Seong-Whan
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.170-177
    • /
    • 2003
  • Gene expression data are the quantitative measurements of expression levels and ratios of numberous genes in different situations based on microarray image analysis results. The process to draw meaningful information related to genomic diseases and various biological activities from gene expression data is known as gene expression data analysis. In this paper, we present a hierarchical clustering method of gene expression data based on self organizing map which can analyze the clustering result of gene expression data more efficiently. Using our proposed method, we could eliminate the uncertainty of cluster boundary which is the inherited disadvantage of self organizing map and use the visualization function of hierarchical clustering. And, we could process massive data using fast processing speed of self organizing map and interpret the clustering result of self organizing map more efficiently and user-friendly. To verify the efficiency of our proposed algorithm, we performed tests with following 3 data sets, animal feature data set, yeast gene expression data and leukemia gene expression data set. The result demonstrated the feasibility and utility of the proposed clustering algorithm.

  • PDF

Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

  • Choi, In Young;Kim, Tae-Min;Kim, Myung Shin;Mun, Seong K.;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • 제11권4호
    • /
    • pp.186-190
    • /
    • 2013
  • The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

세포유전학 기술에 관한 고찰 (Overview of Cytogenetic Technologies)

  • 강지언
    • 대한임상검사과학회지
    • /
    • 제50권4호
    • /
    • pp.375-381
    • /
    • 2018
  • 세포 유전학적 분석은 인간에서의 다양한 종류의 질환을 연구하고 진단하는데 매우 유용하게 사용되고 있다. 지난 수년 동안 세포 유전학적 분석을 통해 매우 의미 있는 결과를 얻을 수 있었으며, 현재 임상검사실에서 일반적인 검사로 확대되어 질병을 진단하고 결과를 평가하는데 매우 유용하게 사용 되고 있다. Microarray는 분자 세포 유전학적인 방법과 기존의 세포유전학적 방법이 융합된 검사방법으로 기존 검사 방법의 단점을 보완하여 유전 관련 질환을 진단하는데 매우 유용하게 사용되고 있다. 따라서 본 논문은 유전질환 진단에 있어 기존의 일반적인 세포유전학적 방법에서 마이크로어레이를 통한 분자세포유전학적 방법으로 어떻게 전환되어 왔는지, 유전 진단을 하는데 앞으로 이 검사방법들이 얼마나 의미 있게 사용될 것인지에 관하여 고찰하였다.

Loss of Heterozygosity at the Calcium Regulation Gene Locus on Chromosome 10q in Human Pancreatic Cancer

  • Long, Jin;Zhang, Zhong-Bo;Liu, Zhe;Xu, Yuan-Hong;Ge, Chun-Lin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권6호
    • /
    • pp.2489-2493
    • /
    • 2015
  • Background: Loss of heterozygosity (LOH) on chromosomal regions is crucial in tumor progression and this study aimed to identify genome-wide LOH in pancreatic cancer. Materials and Methods: Single-nucleotide polymorphism (SNP) profiling data GSE32682 of human pancreatic samples snap-frozen during surgery were downloaded from Gene Expression Omnibus database. Genotype console software was used to perform data processing. Candidate genes with LOH were screened based on the genotype calls, SNP loci of LOH and dbSNP database. Gene annotation was performed to identify the functions of candidate genes using NCBI (the National Center for Biotechnology Information) database, followed by Gene Ontology, INTERPRO, PFAM and SMART annotation and UCSC Genome Browser track to the unannotated genes using DAVID (the Database for Annotation, Visualization and Integration Discovery). Results: The candidate genes with LOH identified in this study were MCU, MICU1 and OIT3 on chromosome 10. MCU was found to encode a calcium transporter and MICU1 could encode an essential regulator of mitochondrial $Ca^{2+}$ uptake. OIT3 possibly correlated with calcium binding revealed by the annotation analyses and was regulated by a large number of transcription factors including STAT, SOX9, CREB, NF-kB, PPARG and p53. Conclusions: Global genomic analysis of SNPs identified MICU1, MCU and OIT3 with LOH on chromosome 10, implying involvement of these genes in progression of pancreatic cancer.