• Title/Summary/Keyword: Gene Algorithm

Search Result 232, Processing Time 0.024 seconds

Data Mining Techniques for Analyzing Promoter Sequences (프로모터 염기서열 분석을 위한 데이터 마이닝 기법)

  • 김정자;이도헌
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.4
    • /
    • pp.739-744
    • /
    • 2000
  • As DNA sequences have been known through the Genome project the techniques for dealing with molecule-level gene information are being made researches briskly. It is also urgent to develop new computer algorithms for making databases and analyzing it efficiently considering the vastness of the information for known sequences. In this respect, this paper studies the association rule search algorithms for finding out the characteristics shown by means of the association between promoter sequences and genes, which is one of the important research areas in molecular biology. This paper treat biological data, while previous search algorithms used transaction data. So, we design a transformed association rule algorithm that covers data types and biological properties. These research results will contribute to reducing the time and the cost for biological experiments by minimizing their candidates.

  • PDF

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Kim, Chayoung
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.25-30
    • /
    • 2018
  • It is extremely lacking and urgently required that the method of constructing the Gene Regulatory Network (GRN) from RNA-Sequencing data (RNA-Seq) because of Big-Data and GRN in Big-Data has obtained substantial observation as the interactions among relevant featured genes and their regulations. We propose newly the computational comparative feature patterns selection method by implementing a minimum-redundancy maximum-relevancy (MRMR) filter the support vector machine-recursive feature elimination (SVM-RFE) with Intensity-dependent normalization (DEGSEQ) as a preprocessor for emphasizing equal preciseness in RNA-seq in Big-Data. We found out the proposed algorithm might be more scalable and convenient because of all libraries in R package and be more improved in terms of the time consuming in Big-Data and minimum-redundancy maximum-relevancy of a set of feature patterns at the same time.

Raw Animal Meats as Potential Sources of Clostridium difficile in Al-Jouf, Saudi Arabia

  • Taha, Ahmed E.
    • Food Science of Animal Resources
    • /
    • v.41 no.5
    • /
    • pp.883-893
    • /
    • 2021
  • Clostridium difficile present in feces of food animals may contaminate their meats and act as a potential source of C. difficile infection (CDI) to humans. C. difficile resistance to antibiotics, its production of toxins and spores play major roles in the pathogenesis of CDI. This is the first study to evaluate C. difficile prevalence in retail raw animal meats, its antibiotics susceptibilities and toxigenic activities in Al-Jouf, Saudi Arabia. Totally, 240 meat samples were tested. C. difficile was identified by standard microbiological and biochemical methods. Vitek-2 compact system confirmed C. difficile isolates were 15/240 (6.3%). Toxins A/B were not detected by Xpect C. difficile toxin A/B tests. Although all isolates were susceptible to vancomycin and metronidazole, variable degrees of reduced susceptibilities to moxifloxacin, clindamycin or tetracycline antibiotics were detected by Epsilon tests. C. difficile strains with reduced susceptibility to antibiotics should be investigated. Variability between the worldwide reported C. difficile contamination levels could be due to absence of a gold standard procedure for its isolation. Establishment of a unified testing algorithm for C. difficile detection in food products is definitely essential to evaluate the inter-regional variation in its prevalence on national and international levels. Proper use of antimicrobials during animal husbandry is crucial to control the selective drug pressure on C. difficile strains associated with food animals. Investigating the protective or pathogenic potential of non-toxigenic C. difficile strains and the possibility of gene transfer from certain toxigenic/ antibiotics-resistant to non-toxigenic/antibiotics-sensitive strains, respectively, should be worthy of attention.

Development of a new explicit soft computing model to predict the blast-induced ground vibration

  • Alzabeebee, Saif;Jamei, Mehdi;Hasanipanah, Mahdi;Amnieh, Hassan Bakhshandeh;Karbasi, Masoud;Keawsawasvong, Suraparb
    • Geomechanics and Engineering
    • /
    • v.30 no.6
    • /
    • pp.551-564
    • /
    • 2022
  • Fragmenting the rock mass is considered as the most important work in open-pit mines. Ground vibration is the most hazardous issue of blasting which can cause critical damage to the surrounding structures. This paper focuses on developing an explicit model to predict the ground vibration through an multi objective evolutionary polynomial regression (MOGA-EPR). To this end, a database including 79 sets of data related to a quarry site in Malaysia were used. In addition, a gene expression programming (GEP) model and several empirical equations were employed to predict ground vibration, and their performances were then compared with the MOGA-EPR model using the mean absolute error (MAE), root mean square error (RMSE), mean (𝜇), standard deviation of the mean (𝜎), coefficient of determination (R2) and a20-index. Comparing the results, it was found that the MOGA-EPR model predicted the ground vibration more precisely than the GEP model and the empirical equations, where the MOGA-EPR scored lower MAE and RMSE, 𝜇 and 𝜎 closer to the optimum value, and higher R2 and a20-index. Accordingly, the proposed MOGA-EPR model can be introduced as a useful method to predict ground vibration and has the capacity to be generalized to predict other blasting effects.

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

Investigation of Conservative Genes in 711 Prokaryotes (원핵생물 711종의 보존적 유전자 탐색)

  • Lee, Dong-Geun;Lee, Sang-Hyeon
    • Journal of Life Science
    • /
    • v.25 no.9
    • /
    • pp.1007-1013
    • /
    • 2015
  • A COG (Cluster of Orthologous Groups of proteins) algorithm was applied to detect conserved genes in 711 prokaryotes. Only COG0080 (ribosomal protein L11) was common among all the 711 prokaryotes analyzed and 58 COGs were common in more than 700 prokaryotes. Nine COGs among 58, including COG0197 (endonuclease III) and COG0088 (ribosomal protein L4), were conserved in a form of one gene per one organism. COG0008 represented 1356 genes in 709 of the prokaryotes and this was the highest number of genes among 58 COGs. Twenty-two COGs were conserved in more than 708 prokaryotes. Of these, two were transcription related, four were tRNA synthetases, eight were large ribosomal subunits, seven were small ribosomal subunits, and one was translation elongation factor. Among 58 conserved COGs in more than 700 prokaryotes, 50 (86.2%) were translation related, and four (6.9%) were transcription related, pointing to the importance of protein-synthesis in prokaryotes. Among these 58 COGs, the most conserved COG was COG0060 (isoleucyl tRNA synthetase), and the least conserved was COG0143 (methionyl tRNA synthetase). Archaea and eubacteria were discriminated in the genomic analysis by the average distance and variation in distance of common COGs. The identification of these conserved genes could be useful in basic and applied research, such as antibiotic development and cancer therapeutics.

2D-QSAR analysis for hERG ion channel inhibitors (hERG 이온채널 저해제에 대한 2D-QSAR 분석)

  • Jeon, Eul-Hye;Park, Ji-Hyeon;Jeong, Jin-Hee;Lee, Sung-Kwang
    • Analytical Science and Technology
    • /
    • v.24 no.6
    • /
    • pp.533-543
    • /
    • 2011
  • The hERG (human ether-a-go-go related gene) ion channel is a main factor for cardiac repolarization, and the blockade of this channel could induce arrhythmia and sudden death. Therefore, potential hERG ion channel inhibitors are now a primary concern in the drug discovery process, and lots of efforts are focused on the minimizing the cardiotoxic side effect. In this study, $IC_{50}$ data of 202 organic compounds in HEK (human embryonic kidney) cell from literatures were used to develop predictive 2D-QSAR model. Multiple linear regression (MLR), Support Vector Machine (SVM), and artificial neural network (ANN) were utilized to predict inhibition concentration of hERG ion channel as machine learning methods. Population based-forward selection method with cross-validation procedure was combined with each learning method and used to select best subset descriptors for each learning algorithm. The best model was ANN model based on 14 descriptors ($R^2_{CV}$=0.617, RMSECV=0.762, MAECV=0.583) and the MLR model could describe the structural characteristics of inhibitors and interaction with hERG receptors. The validation of QSAR models was evaluated through the 5-fold cross-validation and Y-scrambling test.

The partial matching method for effective recognizing HLA entities (효과적인 HLA개체인식을 위한 부분매칭기법)

  • Chae, Jeong-Min;Jung, Young-Hee;Lee, Tae-Min;Chae, Ji-Eun;Oh, Heung-Bum;Jung, Soon-Young
    • The Journal of Korean Association of Computer Education
    • /
    • v.14 no.2
    • /
    • pp.83-94
    • /
    • 2011
  • In the biomedical domain, the longest matching method is frequently used for recognizing named entity written in the literature. This method uses a dictionary as a resource for named entity recognition. If there exist appropriated dictionary about target domain, the longest matching method has the advantage of being able to recognize the entities of target domain quickly and exactly. However, the longest matching method is difficult to recognize the enumerated named entities, because these entities are frequently expressed as being omitted some words. In order to resolve this problem, we propose the partial matching method using a dictionary. The proposed method makes several candidate entities on the assumption that the ellipses may be included. After that, the method selects the most valid one among candidate entities through the optimization algorithm. We tested the longest and partial matching method about HLA entities: HLA gene, antigen, and allele entities, which are frequently enumerated among biomedical entities. As preparing for named entity recognition, we built two new resource, extended dictionary and tag-based dictionary about HLA entities. And later, we performed the longest and partial matching method using each dictionary. According to our experiment result, the longest matching method was effective in recognizing HLA antigen entities, in which the ellipses are rare, and the partial matching method was effective in recognizing HLA gene and allele entities, in which the ellipses are frequent. Especially, the partial matching method had a high F-score 95.59% about HLA alleles.

  • PDF

Identification and Biochemical Characterization of Xylanase-producing Streptomyces glaucescens subsp. WJ-1 Isolated from Soil in Jeju Island, Korea (제주도 토양에서 분리한 xylanase 생산균주 Streptomyces glaucescens subsp. WJ-1의 동정 및 효소의 생화학적 특성 연구)

  • Kim, Da Som;Jung, Sung Cheol;Bae, Chang Hwan;Chi, Won-Jae
    • Microbiology and Biotechnology Letters
    • /
    • v.45 no.1
    • /
    • pp.43-50
    • /
    • 2017
  • A xylan-degrading bacterium (strain WJ-1) was isolated from soil collected from Jeju Island, Republic of Korea. Strain WJ-1 was characterized as a gram-positive, aerobic, and spore-forming bacterium. The predominant fatty acid in this bacterium was anteiso-$C_{15:0}$ (42.99%). A similarity search based on 16S rRNA gene sequences suggested that the strain belonged to the genus Streptomyces. Further, strain WJ-1 shared the highest sequence similarity with the type strains Streptomyces spinoveruucosus NBRC 14228, S. minutiscleroticus NBRC 13000, and S. glaucescens NBRC 12774. Together, they formed a coherent cluster in a phylogenetic tree based on the neighbor-joining algorithm. The DNA G+C content of strain WJ-1 was 74.7 mol%. The level of DNA-DNA relatedness between strain WJ-1 and the closest related species S. glaucescens NBRC 12774 was 85.7%. DNA-DNA hybridization, 16S rRNA gene sequence similarity, and the phenotypic and chemotaxonomic characteristics suggest that strain WJ-1 constitutes a novel subspecies of S. glaucescens. Thus, the strain was designated as S. glaucescens subsp. WJ-1 (Korean Agricultural Culture Collection [KACC] accession number 92086). Additionally, strain WJ-1 secreted thermostable endo-type xylanases that converted xylan to xylooligosaccharides such as xylotriose and xylotetraose. The enzymes exhibited optimal activity at pH 7.0 and $55^{\circ}C$.

Genetic Composition Analysis of Marine-Origin Euryarchaeota by using a COG Algorithm (COG 알고리즘을 통한 해양성 Euryarchaeota의 유전적 조성 분석)

  • 이재화;이동근;김철민;이은열
    • Journal of Life Science
    • /
    • v.13 no.3
    • /
    • pp.298-307
    • /
    • 2003
  • To figure out the conserved genes and newly added genes at each phylogenetic level of Archaea, COG (clusters of orthologous groups of proteins) algorithm was applied. The number of conserved genes within 9 species of Archaea was 340 and that of 8 species of Euryarchaeota was 388. Many of conserved 265 COGs, which are specific to Archaea and absent in Bacteria and S. cerevisiae, were concerned with 'information storage and processing' (94 COG, 35.5%) and 'metabolism' (82 COG, 30.9%). COGs related to these functions were assumed as highly conserved and permit peculiar life form to Archaea. It seemed that there was some difference in 'nucleotide transport and metabolism' and there was little difference in 'information storage and processing' between Euryarchaeota and Crenarchaeota. Marine-origin Euryarchaeota showed different conserved COGs with terrestrial Euryarchaeota. Conserved COGs, related to carbohydrate transport and metabolism and others, were different between marine- and terrestrial-origin Euryarchaeota. Hence it was assumed that their physiology might be different. This study may help to understand the origin and conserved genes at each phylogenetic level of marine-origin Euryarchaeota and may help in the mining of useful genes in marine Archaea as Manco et al. (Arch. Biochem. Biophy. 373, 182 (2000)).