Search | Korea Science

Nonstandard Machine Learning Algorithms for Microarray Data Mining

Zhang, Byoung-Tak
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2001.10a
- /
- pp.165-196
- /
- 2001
DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.
PDF

Evolutionary Learning of Hypernetwork Classifiers Based on Sequential Bayesian Sampling for High-dimensional Data (고차 데이터 분류를 위한 순차적 베이지안 샘플링을 기반으로 한 하이퍼네트워크 모델의 진화적 학습 기법)

Ha, Jung-Woo;Kim, Soo-Jin;Zhang, Byoung-Tak
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06b
- /
- pp.336-338
- /
- 2012
본 연구에서는 고차 데이터 분류를 위해 순차적 베이지만 샘플링 기반의 진화연산 기법을 이용한 하이퍼네트워크 모델의 학습 알고리즘을 제시한다. 제시하는 방법에서는 모델의 조건부 확률의 사후(posterior) 분포를 최대화하도록 학습이 진행된다. 이를 위해 사전(prior) 분포를 문제와 관련된 사전지식(prior knowledge) 및 모델 복잡도(model complexity)로 정의하고, 측정된 모델의 분류성능을 우도(likelihood)로 사 용하며, 측정된 사전분포와 우도를 이용하여 모델의 적합도(fitness)를 정의한다. 이를 통해 하이퍼네트워크 모델은 고차원 데이터를 효율적으로 학습 가능할 뿐이 아니라 모델의 학습시간 및 분류성능이 개선될 수 있다. 또한 학습 시에 파라미터로 주어지던 하이퍼에지의 구성 및 모델의 크기가 학습과정 중에 적응적으로 결정될 수 있다. 제안하는 학습방법의 검증을 위해 본 논문에서는 약 25,000개의 유전자 발현정보 데이터셋에 대한 분류문제에 모델을 적용한다. 실험 결과를 통해 제시하는 방법이 기존 하이퍼네트워크 학습 방법 뿐 아니라 다른 모델들에 비해 우수한 분류 성능을 보여주는 것을 확인할 수 있다. 또한 다양한 실험을 통해 사전분포로 사용된 사전지식이 모델 학습에 끼치는 영향을 분석한다.

Stochastic Model Comparison for the Breakup and Atomization of a Liquid Jet using LES (LES 해석에서 액체제트의 분열에 대한 확률론적 분열 모델링 비교)

Yoo, YoungLin;Sung, Hong-Gye
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.45 no.6
- /
- pp.447-454
- /
- 2017
A three-dimensional two-phase large eddy simulation(LES) has been conducted to investigate the breakup and atomization of liquid jets such as a diesel jet in parallel flow and water jet in cross flow. Gas-liquid two-phase flow was solved by a combined model of Eulerian for gas flow and Lagrangian for a liquid jet. Two stochastic breakup models were implemented to simulate the liquid column and droplet breakup process. The penetration depth and SMD(Sauter Mean Diameter) were analyzed, which was comparable with the experimental data.
https://doi.org/10.5139/JKSAS.2017.45.6.447 인용 PDF KSCI

Study of analytical probabilistic models for urban flood control detention facilities in Korea (도시 홍수 저감 저류시설 설계를 위한 해석적 확률모형 연구)

Lee, Moonyoung;Jeon, Seol;Kim, Si Yeon;An, Heejin;Jung, Kichul;Park, Daeryong
- Proceedings of the Korea Water Resources Association Conference
- /
- 2021.06a
- /
- pp.298-298
- /
- 2021
본 연구에서는 국내 6개 지역 서울, 강릉, 대전, 광주, 부산, 제주의 30년 치 시강우 자료에 해석적 확률모형(Analytical Probabilistic Models) 방법을 적용하여 도시 홍수 저감을 목적으로 하는 저류시설 설계를 위한 유출량 예측 정도를 지역별로 비교하고자 하였다. 강우 사상 분포의 해석적 확률모형을 적용하기 위해 무강우 시간을 결정하여 독립 호우를 결정하는데, 자기상관계수와 변동계수를 활용한 무강우 지속시간의 산정(IETD, Interevent Time Definition) 방법을 사용하였다. 해석적 확률모형인 유출량의 확률밀도함수(PDF, Probability Density Function)를 유도하기 위해서 불투수 지역과 투수 지역의 영향을 고려하여 유출계수를 적용하는 강우-유출 관계를 가지고 유출량을 정의하였다. 강우량, 강우 지속시간, 무강우시간과 같은 강우특성은 1변수 지수함수의 PDF를 따른다고 가정하였다. 확률모형 방법의 적합성을 판단하기 위해 결정된 IETD에 따라 각 지역별로 실제 강우 사상을 해석적 모델과 연속모의실험인 SWWM(Storm Water Management Model)에 적용하여 불투수율에 따른 유출량을 산정하였다. 각 방식으로 얻은 유출량 결과는 모든 지역에서 매우 유사하게 나타났고 결론적으로 우리나라에서 도시 홍수 저감을 위한 저류시설의 계획과 설계에 확률모형 방법이 적용 가능하다는 것을 확인할 수 있었다.
PDF

Statistical Estimation of Wind Speed in the Gwangyang-Myodo Region (광양 - 묘도 지역의 통계학적인 풍속 추정)

Bae, Yong Gwi;Han, Gwan Mun;Lee, Seong Lo
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.28 no.2A
- /
- pp.197-205
- /
- 2008
In order to estimate mean wind speed in the Gwangyang-Myodo Region, the probability distribution model of extreme values has been used in the statistical analysis of joint distribution probability of daily maximum wind speed and corresponding direction in this paper. For this purpose frequency of daily maximum records at respective stations is inquired into and sample of largest yearly wind speed of sixteen compass direction and non-direction is extracted from daily data of maximum wind speed and appropriate direction of the meteorological observing stations nearby the bridge construction site. These extreme speed records are applied to Gumbel and Weibull distribution model and parameters are estimated through method of moment and method of least squares etc. And also, distribution and parameters are inquired into whether it is fitted through the probability plot correlation coefficient examination. From fitted parameters the largest yearly wind speed of sixteen compass direction and non-direction is extrapolated taking into account factors regarding sample size of data and distance from the bridge construction site according to the appropriate stations.
https://doi.org/10.12652/Ksce.2008.28.2A.197 인용 PDF

Statistical Probability Analysis of Storage Temperatures of Domestic Refrigerator as a Risk Factor of Foodborne Illness Outbreak (식중독 발생 위해인자로서 가정용 냉장고의 온도에 대한 확률분포 분석)

Bahk, Gyung-Jin
- Korean Journal of Food Science and Technology
- /
- v.42 no.3
- /
- pp.373-376
- /
- 2010
The objective of this study was to present the proper probability distribution model based on the data obtained from surveys on domestic refrigerator food storage temperatures in home. Domestic refrigerator temperatures were determined as risk factors in foodborne disease outbreaks for microbial risk assessment (MRA). The temperature was measured by directly visiting 139 homes using a data logger from May to September of 2009. The overall mean temperature for all the refrigerators in the survey was $3.53{\pm}2.96^{\circ}C$, with 23.6% of the refrigerators measuring above $5^{\circ}C$. Probability distributions were also created using @RISK program based on the measured temperature data. Statistical ranking was determined by the goodness of fit (GOF, i.e., the Kolmogorov-Smirnov (KS) or Anderson-Darling (AD) test) to determine the proper probability distribution model. This result showed that the LogLogistic (-10.407, 13.616, 8.6107) distribution was found to be the most appropriate for the MRA model. The results of this study might be directly used as input variables in exposure evaluation for conducting MRA.
PDF KSCI

Accounting for zero flows to develop a hydrological model for Yongdam Basin (무유출의 고려를 통한 용담댐 유역에 수문모형의 구축)

Lee, Dong Gi;Ahn, Kuk-Hyun
- Proceedings of the Korea Water Resources Association Conference
- /
- 2020.06a
- /
- pp.138-138
- /
- 2020
본 연구에서는 우리나라에서 발생하는 무유출량을 고려하는 확률기반 격자형 수문 모형을 용담댐 유역에 구축하였다. 용담댐 유역은 무유출량이 종종 나타나는 간혈하천 (Ephemeral catchment) 유역으로 우리나라의 많은 유역들이 여기에 해당한다. 격자형 수문 모형의 구축을 위하여 Sacramento Soil Moisture Accounting Model (SAC-SMA) 유출 모형을 사용하여 라우팅 모형과 결합하였다. 무유출량을 표현하기 위해서 본 연구에서는 검열된 오류 모형 (censoring error model)을 사용하였다. 구축한 오류 모형과 기존에 많이 사용되는 정규화된 오류 모형의 비교를 하였으며 이를 통하여 본 연구에서 구축한 모형의 적합성을 평가하였다. 결과적으로 본 연구에서 구축한 두 개의 모형이 둘 다 신뢰할 만한 결과를 보여주지만 검열된 오류 모형이 더 적합한 결과를 보여주며 무유출의 빈도 증가에 따라 효율이 증가하는 것을 보여 준다. 그리고 기존의 방법론은 확률 기반의 유출량의 표현에 있어서 0 이하의 음수값을 표현하여 현실적이지 못한 수문 모델링을 표현한다. 따라서 본 연구에서 얻어진 결과는 간헐하천 유역에 대한 고려가 우리나라에 수문 모델 구축에 있어서 필요하다는 것을 의미한다.
PDF

Goal Inference of Behavior-Based Agent Using Bayesian Network (베이지안 네트워크를 이용한 행동기반 에이전트의 목적추론)

김경중;조성배
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10d
- /
- pp.349-351
- /
- 2002
베이지안 네트워크는 변수들간의 원인-결과 관계를 확률적으로 모델링하기 위한 도구로서 소프트웨어 사용자의 목적을 추론하기 위해 널리 이용된다. 행동기반 로봇 설계는 반응적(reactive) 행동 모듈을 효과적으로 결합하여 복잡한 행동을 생성하기 위한 접근 방법이다. 행동의 결합은 로봇의 목표, 외부환경, 행동들 사이의 관계를 종합적으로 고려하여 동적으로 이루어진다. 그러나 현재의 결합 모델은 사전에 설계자에 의해 구조가 결정되는 고정적인 형태이기 때문에 환경의 변화에 맞게 목표를 변화시키지 못한다. 본 연구에서는 베이지안 네트워크를 이용하여 현재 상황에 가장 적합한 로봇의 목표를 설정하여 유연한 행동선택을 유도한다. Khepera 이동로봇 시뮬레이터를 이용하여 실험을 수행해 본 결과 베이지안 네트워크를 적용한 모델이 상황에 적합하게 목적을 선택하여 문제를 해결하는 것을 알 수 있었다.
PDF

Prediction Algorithm of Threshold Violation in Line Utilization using ARIMA model (ARIMA 모델을 이용한 설로 이용률의 임계값 위반 예측 기법)

조강흥;조강홍;안성진;안성진;정진욱
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.25 no.8A
- /
- pp.1153-1159
- /
- 2000
This paper applies a seasonal ARIMA model to the timely forecasting in a line utilization and its confidence interval on the base of the past data of the lido utilization that QoS of the network is greatly influenced by and proposes the prediction algorithm of threshold violation in line utilization using the seasonal ARIMA model. We can predict the time of threshold violation in line utilization and provide the confidence based on probability. Also, we have evaluated the validity of the proposed model and estimated the value of a proper threshold and a detection probability, it thus appears that we have maximized the performance of this algorithm.
PDF

A Method of Ontology Inference based on Bayesian Probability for Decision Making of Intelligent Home Agents (지능형 홈 에이전트의 의사결정을 위한 베이지안 확률기반 온톨로지 추론 방법)

Lim, Sung-Soo;Cho, Sung-Bae
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10c
- /
- pp.357-361
- /
- 2007
지능형 에이전트가 홈네트워크 환경 속에서 사용자에게 적절한 서비스를 제공하기 위해서는 에이전트가 속한 환경에 대한 모델이 필요하다. 온톨로지는 이러한 환경 모델을 표현하기 위한 유용한 도구로 복잡한 도메인의 조직적 구조 표현에 있어서 뛰어난 성능을 보여준다. 하지만 전통적 온톨로지는 크리스프 로직에 기반하기 때문에 현실세계의 불확실성을 표현하기에는 적합하지 않다. 본 논문에서는 온톨로지의 이러한 한계점을 보완하고, 불확실한 환경 속에서 지능형 홈 에이전트가 적절한 의사결정을 내릴 수 있도록 하는 베이지안 네트워크기반 온톨로지 추론 방법을 제안한다. 제안하는 방법에서는 온톨로지의 클래스 객체를 베이지안 네트워크의 노드로 나타내고, 객체 속성(object property)을 아크로 표현함으로써, 확률적 추론이 가능한 온톨로지를 제공한다. 몇 가지 시나리오와 설계 복잡도 분석을 통해서 제안하는 방법의 유용성을 평가한다.
PDF

Search Result 211, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)