• 제목/요약/키워드: gene selection

검색결과 871건 처리시간 0.031초

Significant Gene Selection Using Integrated Microarray Data Set with Batch Effect

  • Kim Ki-Yeol;Chung Hyun-Cheol;Jeung Hei-Cheul;Shin Ji-Hye;Kim Tae-Soo;Rha Sun-Young
    • Genomics & Informatics
    • /
    • 제4권3호
    • /
    • pp.110-117
    • /
    • 2006
  • In microarray technology, many diverse experimental features can cause biases including RNA sources, microarray production or different platforms, diverse sample processing and various experiment protocols. These systematic effects cause a substantial obstacle in the analysis of microarray data. When such data sets derived from different experimental processes were used, the analysis result was almost inconsistent and it is not reliable. Therefore, one of the most pressing challenges in the microarray field is how to combine data that comes from two different groups. As the novel trial to integrate two data sets with batch effect, we simply applied standardization to microarray data before the significant gene selection. In the gene selection step, we used new defined measure that considers the distance between a gene and an ideal gene as well as the between-slide and within-slide variations. Also we discussed the association of biological functions and different expression patterns in selected discriminative gene set. As a result, we could confirm that batch effect was minimized by standardization and the selected genes from the standardized data included various expression pattems and the significant biological functions.

유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색 (Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm)

  • 박찬호;조성배
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권4호
    • /
    • pp.525-536
    • /
    • 2004
  • 유전발현 데이타는 생명체의 특정 조직에서 채취한 샘플을 microarray상에서 측정한 것으로, 유전자들의 발현 정도가 수치로 나타난 데이타이다. 일반적으로 정상조직과 이상조직에서 관련 유전자들의 발현정도는 차이를 보이기 때문에, 유전발현 데이타를 통하여 질병을 분류할 수 있다. 이러한 분류에 모든 유전자들이 관여하지는 않으므로 관련 유전자를 선별하는 작업인 특징선택이 필요하며, 선택된 유전자들을 적절히 분류하는 방법이 필요하다. 본 논문에서는 상관계수, 유사도, 정보이론 등에 기반을 둔 7가지 특징선택 방법과 대표적인 6가지 분류기에 대하여 특징-분류기 쌍의 최적 앙상블을 탐색하기 위한 유전자 알고리즘 기반 방법을 제안한다. 두 가지 암 관련 유전자 발현 데이타에 대하여 leave-one-out cross validation을 포함한 실험을 해본 결과, 림프종 데이타와 대장암 데이타 모두 단일 특징-분류기 쌍보다 훨씬 우수한 성능을 보이는 앙상블들을 발견할 수 있었다.

Development of Gene-based Markers for the Allelic Selection of the Restorer-of-fertility Gene, Rfo, in Radish (Raphanus sativus)

  • Kim, Sunggil;Lim, Heerae;Cho, Kang-Hee;Park, Pue Hee;Park, Suhyung;Sung, Soon-Kee;Oh, Daegeun;Kim, Ki-Taek
    • 한국육종학회지
    • /
    • 제41권3호
    • /
    • pp.194-204
    • /
    • 2009
  • Cytoplasmic male sterility (CMS) and fertility restoration have been utilized as valuable tools for $F_1$-hybrid seed production in many crops despite laborious breeding processes. Molecular markers for the selection of CMS-related genes help reduce the expenses and breeding times. A previously reported genomic region containing the Ppr-B gene, which is responsible for restoration of fertility and corresponds to the Rfo locus, was used to develop gene-based or so-called "functional" markers for allelic selection of the restorer-of-fertility gene (Rfo) in $F_1$-hybrid breeding of radish (Raphanus sativus L.) Polymorphic sequences among Rfo alleles of diverse breeding lines of radish were examined by sequencing the Ppr-B alleles. However, presence of Ppr-B homolog, designated as Ppr-D, interferes on specific PCR amplification of Ppr-B in certain breeding lines. The organization of Ppr-D, resolved by genome walking, revealed extended homology with Ppr-B even in the promoter region. Interestingly, PCR amplification of Ppr-D was repeatedly unsuccessful in certain breeding lines implying the lack of Ppr-D in these radishes. Ppr-B could only be successfully amplified for analysis through designing primers based on the sequences unique to Ppr-B that exclude interference from Ppr-D gene. Four variants of Rfo alleles were identified from 20 breeding lines. A combination of three molecular markers was developed in order to genotype the Rfo locus based on polymorphisms among four different variants. These markers will be useful in facilitating $F_1$-hybrid cultivar development in radish.

초파리의 보행행동에 관한 인위도태와 자연도태에 의한 유전적 효과 (Effects of Artificial and Natural Selection on Walking Behavior in Drosophila melanogaster)

  • 주종길;이현화
    • 한국동물학회지
    • /
    • 제26권2호
    • /
    • pp.95-106
    • /
    • 1983
  • Drosophila melanogaster의 Oregon-R 계통과 lethal free 집단을 대상으로 connected test tube apparatus를 사용하여 보행행동에 관한 rapid와 slow 행동을 방향성도태의 방법으로 15세대 동안에 걸쳐 도태하였다. 한편 10세대째부터 natural selection을 행하여 유전적 효과를 분석하였다. 1. 보행행동의 rapid와 slow 성질은 초기세대에서부터 뚜렷한 도태효과를 나타내어 제 7세대 이후에 각각 selection plateau에 달하였다. 2. 방향성 도태를 10세대 동안 실시한 후 realized heritability를 계산한 결과 rapid 성질은 $9\\sim14%$, slow 성질은 $11\\sim16%$로서 rapid행동보다 slow 행동의 유전율이 다소 높게 나타났다. 3. Rapid 성질을 지배하는 유전자와 slow 성질을 지배하는 유전자의 우열관계를 밝히기 위한 hybridization 실험결과 slow 유전자가 rapid 유전자에 대하여 partial dominance의 효과가 있었다. 4. 10세대 동안에 걸쳐 방향성 도태를 실시한 후 natural selection을 5세대 동안 실시한 결과 rapid 성질은 단 5세대만에 neutral의 상태 (6.5)로 복원되었으나 slow 성질은 모집단의 보행지수와 비교하여 전혀 변화가 없었다. 실험결과로 미루어 rapid와 slow 형질은 polygenic system에 의하여 control 되는 양적 형질임을 알았다. 한편 rapid 유전자는 natural selection에 의한 homeostasis의 효과가 있으나 slow 행동은 소수의 major gene에 의하여 지배되는 것을 알았다.

  • PDF

Optimal Design for Marker-assisted Gene Pyramiding in Cross Population

  • Xu, L.Y.;Zhao, F.P.;Sheng, X.H.;Ren, H.X.;Zhang, L.;Wei, C.H.;Du, L.X.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제25권6호
    • /
    • pp.772-784
    • /
    • 2012
  • Marker-assisted gene pyramiding aims to produce individuals with superior economic traits according to the optimal breeding scheme which involves selecting a series of favorite target alleles after cross of base populations and pyramiding them into a single genotype. Inspired by the science of evolutionary computation, we used the metaphor of hill-climbing to model the dynamic behavior of gene pyramiding. In consideration of the traditional cross program of animals along with the features of animal segregating populations, four types of cross programs and two types of selection strategies for gene pyramiding are performed from a practical perspective. Two population cross for pyramiding two genes (denoted II), three population cascading cross for pyramiding three genes(denoted III), four population symmetry (denoted IIII-S) and cascading cross for pyramiding four genes (denoted IIII-C), and various schemes (denoted cross program-A-E) are designed for each cross program given different levels of initial favorite allele frequencies, base population sizes and trait heritabilities. The process of gene pyramiding breeding for various schemes are simulated and compared based on the population hamming distance, average superior genotype frequencies and average phenotypic values. By simulation, the results show that the larger base population size and the higher the initial favorite allele frequency the higher the efficiency of gene pyramiding. Parents cross order is shown to be the most important factor in a cascading cross, but has no significant influence on the symmetric cross. The results also show that genotypic selection strategy is superior to phenotypic selection in accelerating gene pyramiding. Moreover, the method and corresponding software was used to compare different cross schemes and selection strategies.

암 예후를 효과적으로 예측하기 위한 Node2Vec 기반의 유전자 발현량 이미지 표현기법 (A Node2Vec-Based Gene Expression Image Representation Method for Effectively Predicting Cancer Prognosis)

  • 최종환;박상현
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제8권10호
    • /
    • pp.397-402
    • /
    • 2019
  • 암 환자에게 적절한 치료계획을 제공하기 위해 암의 진행양상 또는 환자의 생존 기간 등에 해당하는 환자의 예후를 정확히 예측하는 것은 생물정보학 분야에서 다루는 중요한 도전 과제 중 하나이다. 많은 연구에서 암 환자의 유전자 발현량 데이터를 이용하여 환자의 예후를 예측하는 기계학습 모델들이 많이 제안되어 오고 있다. 유전자 발현량 데이터는 약 17,000개의 유전자에 대한 수치값을 갖는 고차원의 수치형 자료이기에, 기존의 연구들은 특징 선택 또는 차원 축소 전략을 이용하여 예측 모델의 성능 향상을 도모하였다. 그러나 이러한 접근법은 특징 선택과 예측 모델의 훈련이 분리되어 있어서, 기계학습 모델은 선별된 유전자들이 생물학적으로 어떤 관계가 있는지 알기가 어렵다. 본 연구에서는 유전자 발현량 데이터를 이미지 형태로 변환하여 예후 예측이 효과적으로 특징 선택 및 예후 예측을 수행할 수 있는 기법을 제안한다. 유전자들 사이의 생물학적 상호작용 관계를 유전자 발현량 데이터에 통합하기 위해 Node2Vec을 활용하였으며, 2차원 이미지로 표현된 발현량 데이터를 효과적으로 학습할 수 있도록 합성곱 신경망 모델을 사용하였다. 제안하는 모델의 성능은 이중 교차검증을 통해 평가되었고, 유전자 발현량 데이터를 그대로 이용하는 기계학습모델보다 우월한 예후 예측 정확도를 가지는 것이 확인되었다. Node2Vec을 이용한 유전자 발현량의 새로운 이미지 표현법은 특징 선택으로 인한 정보의 손실이 없어 예측 모델의 성능을 높일 수 있으며, 이러한 접근법이 개인 맞춤형 의학의 발전에 이바지할 것으로 기대한다.

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • 제5권3호
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<

An enhanced feature selection filter for classification of microarray cancer data

  • Mazumder, Dilwar Hussain;Veilumuthu, Ramachandran
    • ETRI Journal
    • /
    • 제41권3호
    • /
    • pp.358-370
    • /
    • 2019
  • The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.

Expression of Porcine Epidemic Diarrhea Virus Spike Gene in Transgenic Carrot Plants

  • Kim, Young-Sook;Kwon, Tae-Ho;Yang, Moon-Sik
    • Plant Resources
    • /
    • 제6권2호
    • /
    • pp.108-113
    • /
    • 2003
  • This study was carried out to obtain basic information for possibility of oral vaccine in carrot using Agrobacteruim -mediated transformation system. The epitope region of porcine epidemic diarrhea virus (PEDV) spike gene which is classified as a member of the Coronaviridae and causes an acute enteritis in pigs was successfully expressed in carrot (Daucus carota) using the Agrobacterium-mediated transformation system. Hypocotyl segments of in vitro germinated plantlets were infected with Agrobacteriun tumefaciens LBA 4404 harboring PEDV spike gene. Embryogenic callus (EC) was induced on MS selection medium with 1 mg/L 2,4-D, 50 mg/L kanamycin and 300 mg/L cefotaxime after 45 days of culture. Subcultured ECs on MS selection medium without 2,4-D were converted to somatic embryos (SE) of various stage; globular, heart and torpedo stage. Putative transgenic embryos were selected on MS medium with 50 mg/L kanamycin and 300 mg/L cefotaxime. Regenerated plantlets from transformed SE were induced on MS medium containing 50 mg/L kanamycin after 30 days of culture. Genomic PCR confirmed the integration of PEDV spike gene into nuclear genome of carrot and northern blot analysis demonstrated the expression of PEDV spike gene in transgenic carrot.

  • PDF

Genetic Diversity and Clustering of the Rhoptry Associated Protein-1 of Plasmodium knowlesi from Peninsular Malaysia and Malaysian Borneo

  • Ummi Wahidah Azlan;Yee Ling Lau;Mun Yik Fong
    • Parasites, Hosts and Diseases
    • /
    • 제60권6호
    • /
    • pp.393-400
    • /
    • 2022
  • Human infection with simian malaria Plasmodium knowlesi is a cause for concern in Southeast Asian countries, especially in Malaysia. A previous study on Peninsular Malaysia P. knowlesi rhoptry associated protein-1 (PkRAP1) gene has discovered the existence of dimorphism. In this study, genetic analysis of PkRAP1 in a larger number of P. knowlesi samples from Malaysian Borneo was conducted. The PkRAP1 of these P. knowlesi isolates was PCR-amplified and sequenced. The newly obtained PkRAP1 gene sequences (n=34) were combined with those from the previous study (n=26) and analysed for polymorphism and natural selection. Sequence analysis revealed a higher genetic diversity of PkRAP1 compared to the previous study. Exon II of the gene had higher diversity (π=0.0172) than exon I (π=0.0128). The diversity of the total coding region (π=0.0167) was much higher than those of RAP1 orthologues such as PfRAP-1 (π=0.0041) and PvRAP1 (π=0.00088). Z-test results indicated that the gene was under purifying selection. Phylogenetic tree and haplotype network showed distinct clustering of Peninsular Malaysia and Malaysian Borneo PkRAP1 haplotypes. This geographical-based clustering of PkRAP1 haplotypes provides further evidence of the dimorphism of the gene and possible existence of 2 distinct P. knowlesi lineages in Malaysia.