• Title/Summary/Keyword: 유전자 분류

Search Result 744, Processing Time 0.034 seconds

Hybrid Gene Selection Method for Cancer Classification (암 분류를 위한 하이브리드 유전자 선택 기법)

  • Piao, Yongjun;Hiep, Vu Quang;Erdenetuya, Namsrai;Piao, Minghao;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.154-156
    • /
    • 2012
  • 암 분류를 위한 마이크로어레이 데이터로부터의 유전자 선택은 최근 각광을 받고 있는 연구분야이다. 마이크로어레이 데이터는 적은 샘플 수에 비해 대규모의 유전자로 구성된다. 그렇기 때문에 분류의 정확도를 높이기 위하여 대상 암과 관련된 유전자만 선택할 수 있는 차원 축소 기법이 필요하다. 따라서 본 논문에서는 Symmetrical Uncertainty와 Support Vector Machine (SVM)을 이용한 하이브리드 속성선택 기법을 제안하였다. 제안한 기법은 실험 결과를 통해 다른 속성 선택 기법보다 좋은 성능을 보여주었다.

Disease Classification using Random Subspace Method based on Gene Interaction Information and mRMR Filter (유전자 상호작용 정보와 mRMR 필터 기반의 Random Subspace Method를 이용한 질병 진단)

  • Choi, Sun-Wook;Lee, Chong-Ho
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.192-197
    • /
    • 2012
  • With the advent of DNA microarray technologies, researches for disease diagnosis has been actively in progress. In typical experiments using microarray data, problems such as the large number of genes and the relatively small number of samples, the inherent measurement noise and the heterogeneity across different samples are the cause of the performance decrease. To overcome these problems, a new method using functional modules (e.g. signaling pathways) used as markers was proposed. They use the method using an activity of pathway summarizing values of a member gene's expression values. It showed better classification performance than the existing methods based on individual genes. The activity calculation, however, used in the method has some drawbacks such as a correlation between individual genes and each phenotype is ignored and characteristics of individual genes are removed. In this paper, we propose a method based on the ensemble classifier. It makes weak classifiers based on feature vectors using subsets of genes in selected pathways, and then infers the final classification result by combining the results of each weak classifier. In this process, we improved the performance by minimize the search space through a filtering process using gene-gene interaction information and the mRMR filter. We applied the proposed method to a classifying the lung cancer, it showed competitive classification performance compared to existing methods.

Ovarian Cancer Microarray Data Classification System Using Marker Genes Based on Normalization (표준화 기반 표지 유전자를 이용한 난소암 마이크로어레이 데이타 분류 시스템)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.9
    • /
    • pp.2032-2037
    • /
    • 2011
  • Marker genes are defined as genes in which the expression level characterizes a specific experimental condition. Such genes in which the expression levels differ significantly between different groups are highly informative relevant to the studied phenomenon. In this paper, first the system can detect marker genes that are selected by ranking genes according to statistics after normalizing data with methods that are the most widely used among several normalization methods proposed the while, And it compare and analyze a performance of each of normalization methods with mult-perceptron neural network layer. The Result that apply Multi-Layer perceptron algorithm at Microarray data set including eight of marker gene that are selected using ANOVA method after Lowess normalization represent the highest classification accuracy of 99.32% and the lowest prediction error estimate.

Classification of Gene Expression Data Using Membership Function and Neural Network (소속도 함수와 신경망을 이용한 유전자 발현 정보의 분류)

  • 염해영;문영식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.757-759
    • /
    • 2004
  • 유전자 발현은 유전자가 mRNA와 생체의 기능을 일으키게 하는 단백질을 만들어내는 과정이다. 유전자 발현에 대한 정보는 유전자의 기능을 밝히고 유전자간의 상관 관계를 알아내는데 중요한 역할을 한다. 이러한 유전자 발현 연구를 위한 정보를 대량으로 신속하게 얻을 수 있는 도구가 DNA Chip이다. DNA Chip으로 얻은 수백-수천 개의 데이터는 그 데이터만으로는 의미를 갖지 못한다. 따라서 유전자 발현 정도에 따라 수치적으로 획득된 데이터에서 의미적인 특성을 찾아내기 위해서는 클러스터링 방법이 필요하다. 본 논문에서는 수많은 유전자 데이터 중에서 주요 정보를 포함한 것으로 판단되는 유전자 데이터를 선택하여 특징간을 계산하고 신경망 학습을 이용한 클러스터링하는 알고리즘에 대해서 기술한다.

  • PDF

Ensemble Classifier with Negatively Correlated Features for Cancer Classification (암 분류를 위한 음의 상관관계 특징을 이용한 앙상블 분류기)

  • 원홍희;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.12
    • /
    • pp.1124-1134
    • /
    • 2003
  • The development of microarray technology has supplied a large volume of data to many fields. In particular, it has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. It is essential to efficiently analyze DNA microarray data because the amount of DNA microarray data is usually very large. Since accurate classification of cancer is very important issue for treatment of cancer, it is desirable to make a decision by combining the results of various expert classifiers rather than by depending on the result of only one classifier. Generally combining classifiers gives high performance and high confidence. In spite of many advantages of ensemble classifiers, ensemble with mutually error-correlated classifiers has a limit in the performance. In this paper, we propose the ensemble of neural network classifiers learned from negatively correlated features using three benchmark datasets to precisely classify cancer, and systematically evaluate the performances of the proposed method. Experimental results show that the ensemble classifier with negatively correlated features produces the best recognition rate on the three benchmark datasets.

Performance Comparison of Multiclass Classification Methods for cancer Classification (암 분류를 위한 분류기법의 성능비교)

  • Park Yun-Jung;Park Seung-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.220-222
    • /
    • 2006
  • 현재 마이크로어레이 기술은 대량의 유전자 발현 데이터 특히 암과 관련한 데이터들을 쏟아내고 있다. 이 데이터를 기반으로 암의 종류에 따른 유전자들의 차별적 발현 양상을 분석하고 발현량의 변화가 두드러지는 유전자들에 기반하여 암을 분별할 수 있는 분류 모델을 구축한 후, 이것을 암을 진단하거나 예측하는데 이용할 수 있다. 본 논문에서는 마이크로어레이 데이터를 사용해 특징추출방법과 분류를 위한 Naive Bayes, k-Nearest Neighborhood, Decision Tree, Support Vector Machine, Neural Network 알고리즘을 이용하여 최적의 조합을 찾고 어떤 알고리즘이 가장 효과적인지 실험을 통해 분석해보고 성능평가 하는 것을 목표로 한다.

  • PDF

Hotelling의 T$^{2}$ 통계량을 이용한 cDNA 마이크로어레이 분석

  • Kim, Byeong-Su;Lee, Seon-Ho;Kim, In-Yeong;Kim, Sang-Cheol;Ra, Seon-Yeong;Jeong, Hyeon-Cheol
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.295-297
    • /
    • 2003
  • 본 논의에서는 cDNA 마이크로어레이 분석에서 다변량 분석의 한 방법인 Hotelling의 T제곱 통계량을 이용하여 유의적 유전자군을 검색하고, 이 유전자군을 사용하여 검사자료를 두군으로 분류하는데 단변량 t통계량에 기초한 접근보다 얼마나 효율적인지를 평가하고자 한다.

  • PDF

Phylogenetic analysis of procaryote by uridylate kinase (Uridylate kinase를 이용한 원핵생물의 분류)

  • 이동근;김철민;김상진;하배진;하종명;이상현;이재화
    • Journal of Life Science
    • /
    • v.13 no.6
    • /
    • pp.856-864
    • /
    • 2003
  • The 16S rRNA gene is the most common gene in the phylogenetic analysis of procaryotes. However very high conservative of 16S rRNA has limitation in the discrimination of highly related organisms, hence other molecule was applied in this study and the result was compared with that of 16S rRNA. Three COGs (Clusters of Orthologous of protein) were only detected in 42 procaryotes ; transcription elongation facto. (COG0195), bacterial DNA primase (COG0358) and uridylate kinase (COG0528). Uridylate kinase gene was selected because of the similarity and one single copy number in each genome. Bacteria, belong to same genus, and Archaebacteria were same position with high bootstrap value in phylogenetic tree like the tree of 16S rRNA. However, alpha and epsilon Proteobcteria showed different position and Spirochaetales of Eubarteria was grouped together with Archaebacteria unlike the result of 16S rRNA. Uridylate kinase may compensate the problem of very high conservative of 16S rRNA gene and it would help to access more accurate discrimination and phylogenetic analysis of bacteria.

The System Of Microarray Data Classification Using Significant Gene Combination Method based on Neural Network. (신경망 기반의 유전자조합을 이용한 마이크로어레이 데이터 분류 시스템)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.7
    • /
    • pp.1243-1248
    • /
    • 2008
  • As development in technology of bioinformatics recently mates it possible to operate micro-level experiments, we can observe the expression pattern of total genome through on chip and analyze the interactions of thousands of genes at the same time. In this thesis, we used CDNA microarrays of 3840 genes obtained from neuronal differentiation experiment of cortical stem cells on white mouse with cancer. It analyzed and compared performance of each of the experiment result using existing DT, NB, SVM and multi-perceptron neural network classifier combined the similar scale combination method after constructing class classification model by extracting significant gene list with a similar scale combination method proposed in this paper through normalization. Result classifying in Multi-Perceptron neural network classifier for selected 200 genes using combination of PC(Pearson correlation coefficient) and ED(Euclidean distance coefficient) represented the accuracy of 98.84%, which show that it improve classification performance than case to experiment using other classifier.

K-mer Based RNA-seq Read Distribution Method For Accelerating De Novo Transcriptome Assembly

  • Kwon, Hwijun;Jung, Inuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.8
    • /
    • pp.1-8
    • /
    • 2020
  • In this paper, we propose a gene family based RNA-seq read distribution method in means to accelerate the overal transcriptome assembly computation time. To measure the performance of our transcriptome sequence data distribution method, we evaluated the performance by testing four types of data sets of the Arabidopsis thaliana genome (Whole Unclassified Reads, Family-Classified Reads, Model-Classified Reads, and Randomly Classified Reads). As a result of de novo transcript assembly in distributed nodes using model classification data, the generated gene contigs matched 95% compared to the contig generated by WUR, and the execution time was reduced by 4.2 times compared to a single node environment using the same resources.