• Title/Summary/Keyword: bayesian network

Search Result 509, Processing Time 0.027 seconds

Nonstandard Machine Learning Algorithms for Microarray Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.165-196
    • /
    • 2001
  • DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.

  • PDF

Exploring the Feature Selection Method for Effective Opinion Mining: Emphasis on Particle Swarm Optimization Algorithms

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.11
    • /
    • pp.41-50
    • /
    • 2020
  • Sentimental analysis begins with the search for words that determine the sentimentality inherent in data. Managers can understand market sentimentality by analyzing a number of relevant sentiment words which consumers usually tend to use. In this study, we propose exploring performance of feature selection methods embedded with Particle Swarm Optimization Multi Objectives Evolutionary Algorithms. The performance of the feature selection methods was benchmarked with machine learning classifiers such as Decision Tree, Naive Bayesian Network, Support Vector Machine, Random Forest, Bagging, Random Subspace, and Rotation Forest. Our empirical results of opinion mining revealed that the number of features was significantly reduced and the performance was not hurt. In specific, the Support Vector Machine showed the highest accuracy. Random subspace produced the best AUC results.

Robust Particle Filter Based Route Inference for Intelligent Personal Assistants on Smartphones (스마트폰상의 지능형 개인화 서비스를 위한 강인한 파티클 필터 기반의 사용자 경로 예측)

  • Baek, Haejung;Park, Young Tack
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.190-202
    • /
    • 2015
  • Much research has been conducted on location-based intelligent personal assistants that can understand a user's intention by learning the user's route model and then inferring the user's destinations and routes using data of GPS and other sensors in a smartphone. The intelligence of the location-based personal assistant is contingent on the accuracy and efficiency of the real-time predictions of the user's intended destinations and routes by processing movement information based on uncertain sensor data. We propose a robust particle filter based on Dynamic Bayesian Network model to infer the user's routes. The proposed robust particle filter includes a particle generator to supplement the incorrect and incomplete sensor information, an efficient switching function and an weight function to reduce the computation complexity as well as a resampler to enhance the accuracy of the particles. The proposed method improves the accuracy and efficiency of determining a user's routes and destinations.

Gesture based Input Device: An All Inertial Approach

  • Chang Wook;Bang Won-Chul;Choi Eun-Seok;Yang Jing;Cho Sung-Jung;Cho Joon-Kee;Oh Jong-Koo;Kim Dong-Yoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.3
    • /
    • pp.230-245
    • /
    • 2005
  • In this paper, we develop a gesture-based input device equipped with accelerometers and gyroscopes. The sensors measure the inertial measurements, i.e., accelerations and angular velocities produced by the movement of the system when a user is inputting gestures on a plane surface or in a 3D space. The gyroscope measurements are integrated to give orientation of the device and consequently used to compensate the accelerations. The compensated accelerations are doubly integrated to yield the position of the device. With this approach, a user's gesture input trajectories can be recovered without any external sensors. Three versions of motion tracking algorithms are provided to cope with wide spectrum of applications. Then, a Bayesian network based recognition system processes the recovered trajectories to identify the gesture class. Experimental results convincingly show the feasibility and effectiveness of the proposed gesture input device. In order to show practical use of the proposed input method, we implemented a prototype system, which is a gesture-based remote controller (Magic Wand).

Ontology-based User Intention Recognition for Proactive Planning of Intelligent Robot Behavior (지능형로봇 행동의 능동적 계획수립을 위한 온톨로지 기반 사용자 의도인식)

  • Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.1
    • /
    • pp.86-99
    • /
    • 2011
  • Due to the uncertainty of intention recognition for behaviors of users, the intention is differently recognized according to the situation for the same behavior by the same user, the accuracy of user intention recognition by minimizing the uncertainty is able to be improved. This paper suggests a novel ontology-based method to recognize user intentions, and able to minimize the uncertainties that are the obstacles against the precise recognition of user intention. This approach creates ontology for user intention, makes a hierarchy and relationship among user intentions by using RuleML as well as Dynamic Bayesian Network, and improves the accuracy of user intention recognition by using the defined RuleML as well as the gathered sensor data such as temperature, humidity, vision, and auditory. To evaluate the performance of robot proactive planning mechanism, we developed a simulator, carried out some experiments to measure the accuracy of user intention recognition for all possible situations, and analyzed and detailed described the results. The result of our experiments represented relatively high level the accuracy of user intention recognition. On the other hand, the result of experiments tells us the fact that the actions including the uncertainty get in the way the precise user intention recognition.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

Human Error Probability Assessment During Maintenance Activities of Marine Systems

  • Islam, Rabiul;Khan, Faisal;Abbassi, Rouzbeh;Garaniya, Vikram
    • Safety and Health at Work
    • /
    • v.9 no.1
    • /
    • pp.42-52
    • /
    • 2018
  • Background: Maintenance operations on-board ships are highly demanding. Maintenance operations are intensive activities requiring high man-machine interactions in challenging and evolving conditions. The evolving conditions are weather conditions, workplace temperature, ship motion, noise and vibration, and workload and stress. For example, extreme weather condition affects seafarers' performance, increasing the chances of error, and, consequently, can cause injuries or fatalities to personnel. An effective human error probability model is required to better manage maintenance on-board ships. The developed model would assist in developing and maintaining effective risk management protocols. Thus, the objective of this study is to develop a human error probability model considering various internal and external factors affecting seafarers' performance. Methods: The human error probability model is developed using probability theory applied to Bayesian network. The model is tested using the data received through the developed questionnaire survey of >200 experienced seafarers with >5 years of experience. The model developed in this study is used to find out the reliability of human performance on particular maintenance activities. Results: The developed methodology is tested on the maintenance of marine engine's cooling water pump for engine department and anchor windlass for deck department. In the considered case studies, human error probabilities are estimated in various scenarios and the results are compared between the scenarios and the different seafarer categories. The results of the case studies for both departments are also compared. Conclusion: The developed model is effective in assessing human error probabilities. These probabilities would get dynamically updated as and when new information is available on changes in either internal (i.e., training, experience, and fatigue) or external (i.e., environmental and operational conditions such as weather conditions, workplace temperature, ship motion, noise and vibration, and workload and stress) factors.

A Study on Detection of Small Size Malicious Code using Data Mining Method (데이터 마이닝 기법을 이용한 소규모 악성코드 탐지에 관한 연구)

  • Lee, Taek-Hyun;Kook, Kwang-Ho
    • Convergence Security Journal
    • /
    • v.19 no.1
    • /
    • pp.11-17
    • /
    • 2019
  • Recently, the abuse of Internet technology has caused economic and mental harm to society as a whole. Especially, malicious code that is newly created or modified is used as a basic means of various application hacking and cyber security threats by bypassing the existing information protection system. However, research on small-capacity executable files that occupy a large portion of actual malicious code is rather limited. In this paper, we propose a model that can analyze the characteristics of known small capacity executable files by using data mining techniques and to use them for detecting unknown malicious codes. Data mining analysis techniques were performed in various ways such as Naive Bayesian, SVM, decision tree, random forest, artificial neural network, and the accuracy was compared according to the detection level of virustotal. As a result, more than 80% classification accuracy was verified for 34,646 analysis files.

Two-Layer Approach Using FTA and BBN for Reliability Analysis of Combat Systems (전투 시스템의 신뢰성 분석을 위한 FTA와 BBN을 이용한 2계층 접근에 관한 연구)

  • Kang, Ji-Won;Lee, Jang-Se
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.3
    • /
    • pp.333-340
    • /
    • 2019
  • A combat system performs a given mission enduring various threats. It is important to analyze the reliability of combat systems in order to increase their ability to perform a given mission. Most of studies considered no threat or on threat and didn't analyze all the dependent relationships among the components. In this paper, we analyze the loss probability of the function of the combat system and use it to analyze the reliability. The proposed method is divided into two layers, A lower layer and a upper layer. In lower layer, the failure probability of each components is derived by using FTA to consider various threats. In the upper layer, The loss probability of function is analyzed using the failure probability of the component derived from lower layer and BBN in order to consider the dependent relationships among the components. Using the proposed method, it is possible to analyze considering various threats and the dependency between components.

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient (환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘)

  • Jung, Juho;Lee, Naeun;Kim, Sumin;Seo, Gaeun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1296-1301
    • /
    • 2021
  • With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.