• Title/Summary/Keyword: a conditional probability

Search Result 295, Processing Time 0.031 seconds

The Effects of the Probability Activities in Thinking Science Program on the Development of the Probabilistic Thinking of Middle School Students (Thinking Science 프로그램의 확률 활동이 중학생의 확률적 사고 형성에 미치는 효과)

  • Kyung-In Shin;Sang-Kwon Lee;Ae-Kyung Shin;Byung-Soon Choi
    • Journal of the Korean Chemical Society
    • /
    • v.47 no.2
    • /
    • pp.165-174
    • /
    • 2003
  • The purposes of this study were to investigate the correlation between the cognitive level and the probabilistic thinking level and to analyze the effects of the probability activities in Thinking Science (TS) program on the development of probabilistic thinking. The 219 7th grade students were sampled in the middle school and were divided into an experimental group and a control group. The probability activities in TS program were implemented to the experimental group, while only normal curriculum was conducted in the control group. The results of this study showed that most of 7th grade students were in the concrete operational stage and used both subjective and quantitative strategy simultaneously in probability problem solving. It was also found that the higher the cognitive level of the students, the higher the probabilistic thinking level of them. The sample space and the probability of an event in the constructs of probability were first developed as compared to the probability comparisons and the conditional probability. The probability activities encouraged the students to use quantitative strategy in probability problem solving and to recognize probability of an event. Especially, the effectiveness was relatively higher for the students in the mid concrete operational stage than those in any other stage.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A probabilistic information retrieval model by document ranking using term dependencies (용어간 종속성을 이용한 문서 순위 매기기에 의한 확률적 정보 검색)

  • You, Hyun-Jo;Lee, Jung-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.763-782
    • /
    • 2019
  • This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.

Design and Estimation of Multiple Acceptance Sampling Plans for Stochastically Dependent Nonstationary Processes (확률적으로 종속적인 비평형 다단계 샘플링검사법의 설계 및 평가)

  • Kim, Won-Kyung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.25 no.1
    • /
    • pp.8-20
    • /
    • 1999
  • In this paper, a design and estimation procedure for the stochastically dependent nonstationary multiple acceptance sampling plans is developed. At first, the rough-cut acceptance and rejection numbers are given as an initial solution from the corresponding sequential sampling plan. A Monte-Carlo algorithm is used to find the acceptance and rejection probabilities of a lot. The conditional probability formula for a sample path is found. The acceptance and rejection probabilities are found when a decision boundary is given. Several decision criteria and the design procedure to select optimal plans are suggested. The formula for measuring performance of these sampling plans is developed. Type I and II error probabilities are also estimated. As a special case, by setting the stage size as 1 in a dependent sampling plan, a sequential sampling plan satisfying type I and II error probabilities is more accurate and a smaller average sample number can be found. In a numerical example, a Polya dependent process is examined. The sampling performances are shown to compare the selection scheme and the effect of the change of the dependency factor.

  • PDF

Derivation of Design Flood Using Multisite Rainfall Simulation Technique and Continuous Rainfall-Runoff Model

  • Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.540-544
    • /
    • 2009
  • Hydrologic pattern under climate change has been paid attention to as one of the most important issues in hydrologic science group. Rainfall and runoff is a key element in the Earth's hydrological cycle, and associated with many different aspects such as water supply, flood prevention and river restoration. In this regard, a main objective of this study is to evaluate design flood using simulation techniques which can consider a full spectrum of uncertainty. Here we utilize a weather state based stochastic multivariate model as conditional probability model for simulating the rainfall field. A major premise of this study is that large scale climatic patterns are a major driver of such persistent year to year changes in rainfall probabilities. Uncertainty analysis in estimating design flood is inevitably needed to examine reliability for the estimated results. With regard to this point, this study applies a Bayesian Markov Chain Monte Carlo scheme to the NWS-PC rainfall-runoff model that has been widely used, and a case study is performed in Soyang Dam watershed in Korea. A comprehensive discussion on design flood under climate change is provided.

  • PDF

Multi-dimension Categorical Data with Bayesian Network (베이지안 네트워크를 이용한 다차원 범주형 분석)

  • Kim, Yong-Chul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.2
    • /
    • pp.169-174
    • /
    • 2018
  • In general, the methods of the analysis of variance(ANOVA) for the continuous data and the chi-square test for the discrete data are used for statistical analysis of the effect and the association. In multidimensional data, analysis of hierarchical structure is required and statistical linear model is adopted. The structure of the linear model requires the normality of the data. A multidimensional categorical data analysis methods are used for causal relations, interactions, and correlation analysis. In this paper, Bayesian network model using probability distribution is proposed to reduce analysis procedure and analyze interactions and causal relationships in categorical data analysis.

Teaching Statistics through World Cup Soccer Examples (월드컵 축구 예제를 통한 통계교육)

  • Kim, Hyuk-Joo;Kim, Young-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1201-1208
    • /
    • 2010
  • In teaching probability and statistics classes, we should increase efforts to develop examples that enhance teaching methodology in delivering more meaningful knowledge to students. Sports is one field that provides a variety of examples and World Cup Soccer events are a treasure house of many interesting problems. Teaching, using examples from this field, is an effective way to enhance the interest of students in probability and statistics because World Cup Soccer is a matter of national interest. In this paper, we have suggested several examples pertaining to counting the number of cases and computing probabilities. These examples are related to many issues such as possible scenarios in the preliminary round, victory points necessary for each participant to advance to the second round, and the issue of grouping teams. Based on a simulation using a statistical model, we have proposed a logical method for computing the probabilities of proceeding to the second round and winning the championship for each participant in the 2010 South Africa World Cup.

Sampling Based Approach for Combining Results from Binomial Experiments

  • Cho, Jang-Sik;Kim, Dal-Ho;Kang, Sang-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.1
    • /
    • pp.1-9
    • /
    • 2001
  • In this paper, the problem of information related to I binomial experiments, each having a distinct probability of success ${\theta}_i$, i = 1,2, $\cdots$, I, is considered. Instead of using a standard exchangeable prior for ${\theta}\;=\;({\theta}_1,\;{\theta}_2,\;{\cdots},\;{\theta}_I)$, we con-sider a partition of the experiments and take the ${\theta}_i$'s belonging to the same partition subset to be exchangeable and the ${\theta}_i$'s belonging to distinct subsets to be independent. And we perform Gibbs sampler approach for Bayesian inference on $\theta$ conditional on a partition. Also we illustrate the methodology with a real data.

  • PDF

A Simultaneous Design of TSK - Linguistic Fuzzy Models with Uncertain Fuzzy Output

  • Kwak, Keun-Chang;Kim, Dong-Hwa
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.427-432
    • /
    • 2005
  • This paper is concerned with a simultaneous design of TSK (Takagi-Sugeno-Kang)-linguistic fuzzy models with uncertain model output and the computationally efficient representation. For this purpose, we use the fundamental idea of linguistic models introduced by Pedrycz and develop their comprehensive design framework. The design process consists of several main phases such as (a) the automatic generation of the linguistic contexts by probabilistic distribution using CDF (conditional density function) and PDF (probability density function) (b) performing context-based fuzzy clustering preserving homogeneity based on the concept of fuzzy granulation (c) augment of bias term to compensate bias error (d) combination of TSK and linguistic context in the consequent part. Finally, we contrast the performance of the enhanced models with other fuzzy models for automobile MPG predication data and coagulant dosing process in a water purification plant.

  • PDF

Review of Screening Procedure as Statistical Hypothesis Testing (통계적 가설검정으로서의 선별검사절차의 검토)

  • 권혁무;이민구;김상부;홍성훈
    • Journal of Korean Society for Quality Management
    • /
    • v.26 no.2
    • /
    • pp.39-50
    • /
    • 1998
  • A screening procedure, where one or more correlated variables are used for screeing, is reviewed from the point of statistical hypothesis testing. Without assuming a specific probability model for the joint distribution of the performance and screening variables, some principles are provided to establish the best screeing region. A, pp.ication examples are provided for two cases; ⅰ) the case where the performance variable is dichotomous and ⅱ) the case where the performance variable is continuous. In case ⅰ), a normal model is assumed for the conditional distribution of the screening variable given the performance variable. In case ⅱ), the performance and screening variables are assumed to be jointly normally distributed.

  • PDF