• 제목/요약/키워드: Bioinformatics data

검색결과 646건 처리시간 0.025초

전력 부하 패턴 자동 예측을 위한 분류 기법 (Classification Methods for Automated Prediction of Power Load Patterns)

  • ;박진형;이헌규;류근호
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2008년도 한국컴퓨터종합학술대회논문집 Vol.35 No.1 (C)
    • /
    • pp.26-30
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed our approach consists of three stages: (i) data pre-processing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  • PDF

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo;Wang, Yuxing;Zhou, Kaiyin;Xia, Jingbo
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.23.1-23.7
    • /
    • 2021
  • Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

전력배전 시스템에서의 취약 선로 분류를 위한 출현 패턴 마이닝 (Emerging Patterns Mining for Classifying Non-Safe Electrical Sections in Power Distribution System)

  • ;;이헌규;신진호;류근호
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2008년도 추계학술발표대회
    • /
    • pp.325-327
    • /
    • 2008
  • In electrical industry, classification methodology has been an important issue for analyzing power consumption patterns. It has many applications including decisions on energy purchasing, load switching as well as helping in infrastructure development. Our aim in this work is to classify the electrical section and find potentially non-safe electrical sections. For this purpose, we use Emerging Patterns based classification. The classification method uses the aggregate score of emerging patterns to build classifier. The proposed methodology was applied to a set of electrical section data of the Korea power. The test data and relational electricity information and knowledge are supported by Korea Electric Power Research Institute (KEPRI).

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok;Jang, Dong-Ik;Oh, Seung-Yoon;Kim, Hee-Bal
    • Interdisciplinary Bio Central
    • /
    • 제2권2호
    • /
    • pp.4.1-4.6
    • /
    • 2010
  • The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

DESIGN AND IMPLEMENTATION OF METADATA MODEL FOR SENSOR DATA STREAM

  • Lee, Yang-Koo;Jung, Young-Jin;Ryu, Keun-Ho;Kim, Kwang-Deuk
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
    • /
    • pp.768-771
    • /
    • 2006
  • In WSN(Wireless Sensor Network) environment, a large amount of sensors, which are small and heterogeneous, generates data stream successively in physical space. These sensors are composed of measured data and metadata. Metadata includes various features such as location, sampling time, measurement unit, and their types. Until now, wireless sensors have been managed with individual specification, not the explicit standardization of metadata, so it is difficult to collect and communicate between heterogeneous sensors. To solve this problem, OGC(Open Geospatial Consortium) has proposed a SensorML(Sensor Model Language) which can manage metadata of heterogeneous sensors with unique format. In this paper, we introduce a metadata model using SensorML specification to manage various sensors, which are distributed in a wide scope. In addition, we implement the metadata management module applied to the sensor data stream management system. We provide many functions, namely generating metadata file, registering and storing them according to definition of SensorML.

  • PDF

클러스터링 기법을 이용한 산불 데이터의 상관관계 분석 (Correlation Analysis of forest fire data based on Clustering Method)

  • 김은희;지정희;손호선;류근호;이충호
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 한국공간정보시스템학회 2005년도 추계학술대회
    • /
    • pp.81-86
    • /
    • 2005
  • 이 논문에서는 산불 발생의 패턴을 예측하기 위해 데이터 마이닝의 클러스터링 기법을 이용하여 산불 데이터를 그룹화하고 그 결과를 이용하여 산불 데이터의 상관관계를 분석하는 방법을 제안하였다. 즉, 클러스터링 기법을 이용하여 산불 데이터를 사용자가 원하는 수의 그룹으로 분류하고, 생성된 산불 데이터 클러스터 모델을 이용하여 새로운 유형의 산불패턴을 예측 할 수 있도록 하였다. 또한 결과 클러스터의 생성을 위해 이전의 산불 분포 데이터를 저장 관리하여 클러스터 간의 상관관계 분석을 통해 시퀀스를 생성하였고, 생성된 각각의 클러스터 시퀀스를 통합하여 클러스터들의 시퀀스를 추출하여 산불이 발생한 이후의 향후 발생 가능한 산불 유형을 예측하기 위한 방법을 제공하였다. 이는 과거에 발생된 산불의 유형뿐만 아니라 새로운 형태의 산불 유형 분류나 분석에 이용 가능하다.

  • PDF

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index

  • Bae, Sunghwan;Choi, Sungkyoung;Kim, Sung Min;Park, Taesung
    • Genomics & Informatics
    • /
    • 제14권4호
    • /
    • pp.149-159
    • /
    • 2016
  • With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.

Binding Studies of Erythromycin A and its Analogues using Molecular Docking Technique

  • Kamarulzaman, Ezatul Ezleen;Mordi, Mohd Nizam;Mansur, Shariff Mahsufi;Wahab, Habibah
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.35-40
    • /
    • 2005
  • Interaction of twelve erythromycin A analogues with 50S ribosomal subunit were studied employing AutoDock 3.0.5. Results showed that all active macrolides bound at the same binding site with erythromycin A in contrast to the inactive analogues which bound at location slightly different than erythromycin A. The binding site showed consistency with the X-ray data from the perspectives of hydrogen bonding and hydrophobic interactions formed by erythromycins, roxithromycin, azithromycin, cethromycin and telithromycin with the ribosome. The inactive derivatives of erythromycin A anhydride showed higher binding free energy, while 5-desosaminyl erythronolides A and B even though having quiet similar values of binding free energy with the active analogues, docked at binding sites which are quiet different than the active analogues. These results suggest the molecular docking technique can be used in predicting the binding of erythromycin A analogues to their ribosomal target.

  • PDF

High-level Expression, Polyclonal Antibody Preparation and Bioinformatics Analysis of Bombyx mori Nucleopolyhedrovirus orf47 Encodes Protein

  • Wu, Chao;Guo, Zhongjian;Chen, Keping;Shen, Hongxing
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • 제16권2호
    • /
    • pp.87-92
    • /
    • 2008
  • Bombyx mori nucleopolyhedrovirus (BmNPV) orf47 gene was characterized for the first time. The coding sequence of Bm47 was amplified and subcloned into the prokaryotic expression vector pET-30a(+) in order to produce His-tagged fusion protein in the BL21 (DE3) cells. The His-Bm47 fusion protein was expressed efficiently after induction with IPTG. The purified fusion protein was used to immunize New Zealand white rabbits to prepare polyclonal antibody. As the genome of BmNPV is available in GenBank and the EST database of BmNPV is expanding, identification of novel genes of BmNPV was conceivable by data-mining techniques and bioinformatics tools. Structural bioinformatics approach to analyze the properties of Bm47 encodes protein.

유전체 발현의 정보학적 분석과 응용 (Genomic Applications of Biochip Informatics)

  • 김주한
    • 유전체소식지
    • /
    • 제5권4호
    • /
    • pp.9-16
    • /
    • 2005
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic expression data transforms the challenges m biomedical research into ones in bioinformatics. Clinical informatics has long developed technologies to imp개ve biomedical research by integrating experimental and clinical information systems. Biomedical informatics, powered by high throughput techniques, genomic-scale databases and advanced clinical information system, is likely to transform our biomedical understanding forever much the same way that biochemistry did to biology a generation ago. The emergence of healthcare and biomedical informatics revolutionizing both bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics and prognostics.

  • PDF