• Title/Summary/Keyword: statistical graph

Search Result 175, Processing Time 0.019 seconds

The performance of Bayesian network classifiers for predicting discrete data (이산형 자료 예측을 위한 베이지안 네트워크 분류분석기의 성능 비교)

  • Park, Hyeonjae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.309-320
    • /
    • 2020
  • Bayesian networks, also known as directed acyclic graphs (DAG), are used in many areas of medicine, meteorology, and genetics because relationships between variables can be modeled with graphs and probabilities. In particular, Bayesian network classifiers, which are used to predict discrete data, have recently become a new method of data mining. Bayesian networks can be grouped into different models that depend on structured learning methods. In this study, Bayesian network models are learned with various properties of structure learning. The models are compared to the simplest method, the naïve Bayes model. Classification results are compared by applying learned models to various real data. This study also compares the relationships between variables in the data through graphs that appear in each model.

Covariate selection criteria for controlling confounding bias in a causal study (인과연구에서 중첩편향을 제거하기 위한 공변량선택기준)

  • Thepepomma, Seethad;Kim, Ji-Hyun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.849-858
    • /
    • 2016
  • It is important to control confounding bias when estimating the causal effect of treatment in an observational study. We illustrated that the covariate selection in the causal inference is different from the variable selection in the ANCOVA model. We then investigated the three criteria of covariate selection for controlling confounding bias, which can be used when we have inadequate information to draw a complete causal graph. VanderWeele and Shpitser (2011) proposed one of them and claimed it was better than the other two. We show by example that their criterion also has limitations and some disadvantages. There is no clear winner; however, their criterion is better (if some correction is made on its condition) than the other two because it can remove the confounding bias.

A Case Study on Understanding of the Concept of Sampling and Data Analysis by Elementary 6th Graders (6학년 학생들의 표본개념 이해 및 자료 분석에 관한 연구)

  • Lee, Mi-Suk;Park, Young-Hee
    • School Mathematics
    • /
    • v.8 no.4
    • /
    • pp.441-463
    • /
    • 2006
  • The purpose of this research is to investigate how elementary school students execute sampling with what designs in order to gather information under a situation that requires collecting data and information about their household and everyday life, and to examine how they use tools, including table or graph, etc., in order to perform efficient analysis of data and information they surveyed, also what results they acquire. To test this, the researcher set up a situation in advance that requires collecting data, and, under this circumstance, the researcher instructed and guided school students to look for methods how to design and survey in order to gather data by having them discuss tasks, involving small groups or entire class, and seek its solutions by themselves through trial and errors. The results from surveys revealed that a lesson, which will have students do sampling and arrange statistical data and analyze the results, was possible to carry out in the class of 6th grade of elementary school.

  • PDF

Analysis of Intention in Spoken Dialogue based on Classifying Sentence Patterns (문형구조의 분류에 따른 대화음성의 의도분석에 관한 연구)

  • Choi, Hwan-Jin;Song, Chang-Hwan;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.61-70
    • /
    • 1996
  • According to topics or speaker's intentions in a dialogue, utterance spoken by speaker has a different sentence structure of word combinations. Based on these facts, we have proposed the statistical approach. IDT(intention decision table), which is modeling the correlations between sentence patterns and the intention. In a IDT, the sentence is splitted into 5 different factors, and the intention of a sentence is determined by the similarity between and intention and 5 factors that have represent a sentence. From the experimental results, the IDT has indicated that the prediction rate of an intention is improved 10~18% over the word-intention correlations and is enhanced 3~12% compared with the MIG(Markov intention graph) that models the intention with a transition graph for word categories in a sentence. Based on these facts, we have found that the IDT is effective method for the prediction of an intention.

  • PDF

Causal effect of urban parks on children's happiness (도시공원 면적이 유아 행복감에 미치는 영향에 대한 인과관계 연구)

  • Nayeon Kwon;Chanmin Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.63-83
    • /
    • 2023
  • Many existing studies have found significant correlations between green spaces, including urban parks, and children's happiness. Furthermore, it was implied that the area/proximity of the urban park would be effective in enhancing infancy happiness. However, inferring causal effects from observed data requires appropriate adjustment of confounding variables, and from this perspective, the causal relationship between the area of urban parks and children's happiness has not been well understood. The causal effect of urban parks on children's happiness was estimated in this study using data from the panel study on Korean children. As methods for adjusting confounding variables, regression adjustment using a regression method, weighting method, and matching method were used, and key concepts of each method were described before the analysis results. Confounders were chosen for the analysis using a directed acyclic graph. In contrast to previous research, the analysis found no significant causal relationship between the size of the city park and children's happiness.

Design of Robust Expected Loss Control Chart (로버스트 기대손실 관리도의 설계)

  • Lee, Hyeung-Jun;Chung, Young-Bae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.3
    • /
    • pp.10-17
    • /
    • 2016
  • Control Chart is a graph which dots the characteristic values of a process. It is the tool of statistical technique to keep a process in controlled condition. It is also used for investigating the state of a process. Therefore many companies have used Control Chart as the tool of statistical process control (SPC). Products from a production process represent accidental dispersion values around a certain reference value. Fluctuations cause of quality dispersion is classified as a chance cause and a assignable cause. Chance cause refers unmanageable practical cause such as operator proficiency differences, differences in work environment, etc. Assignable cause refers manageable cause which is possible to take actions to remove such as operator inattention, error of production equipment, etc. Traditionally ${\bar{x}}-R$ control chart or ${\bar{x}}-s$ control chart is used to find and remove the error cause. Traditional control chart is to determine whether the measured data are in control or not, and lets us to take action. On the other hand, RNELCC (Reflected Normal Expected Loss Control Chart) is a control chart which, even in controlled state, indicates the information of economic loss if a product is in inconsistent state with process target value. However, contaminated process can cause control line sensitive and cause problems with the detection capabilities of chart. Many studies on robust estimation using trimmed parameters have been conducted. We suggest robust RNELCC which used the idea of trimmed parameters with RNEL control chart. And we demonstrate effectiveness of new control chart by comparing with ARL value among traditional control chart, RNELCC and robust RNELCC.

An Empirical Comparison of Statistical Models for Pre-service Teachers' Help Networks using Binary and Valued Exponential Random Graph Models (예비교원의 도움 네트워크에 관한 통계 모형의 경험적 비교: 이항 및 가중 ERGM을 중심으로)

  • Kim, Sung-Yeun;Kim, Chong Min
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.4
    • /
    • pp.658-672
    • /
    • 2020
  • The purpose of this study is to empirically compare statistical models for pre-service teachers' help networks. We identified similarities and differences based on the results of the binary and valued ERGM. Research questions are as follows: First, what are the similarities of factors influencing the binary/valued help network for pre-service teachers? Second, what are differences of factors influencing the binary/valued help network for pre-service teachers? We measured 42 pre-service teachers with focus on their help and friend networks, happiness, and personal characteristics. Results indicated that, first, the similar factors influencing the binary and valued help network of pre-service teachers were local dependencies (reciprocity, transitivity), similarity (major, gender), activity (early childhood education, negative emotion), popularity (early childhood education) and multiplicity (friend network). Second, the difference between factors affecting pre-service teacher's binary and valued help network was the effect of activity (physical education) and popularity (GPA, negative emotion). Based on these findings, we presented implications.

Analysis on the Survivor's Pension Payment with Logistic Regression Model (로지스틱 회귀모형을 이용한 유족연금 수급 분석)

  • Kim, Mi-Jung;Kim, Jin-Hyung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.183-200
    • /
    • 2008
  • Research for efficient management of the National Pension has been emphasized as the current society trends toward aging and low birth rate. In this article, we suggest a statistical model for effective classification and prediction of the reserve for the survivor's pension in Korea. Logistic regression model is incorporated; correct classification rate, and distribution of the posterior probability for the reserve of survivor's pension are investigated and compared with the results from the general logistic models. Assessment of predictive model is also done with lift graph, ROC curve and K-S statistic. We suggest strategies for reducing financial risks in managing and planning the pension as an application of the suggested model.

Dimensioning Next Generation Networks for QoS Guaranteed Voice Services (NGN에서의 품질보장형 음성서비스 제공을 위한 대역 설계 방법)

  • Kim, Yoon-Kee;Lee, Hoon;Lee, Kwang-Hui
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.40 no.12
    • /
    • pp.9-17
    • /
    • 2003
  • In this paper we proposea method for estimating the bandwidth in next-generation If network. Especially, we concentrate on the edge routers accommodating the VoIP connections as well as a group of data connections. Bandwidth dimensioning is carried out at call level and packet level for voice traffic in the next-generation IP network. The model incorporates the statistical estimation approach at a call level for obtaining the number of voice connections simultaneously in the active mode. The call level model incorporates a statistical technique to compute the statistics of the number of active connections such as the mean and variance of the simultaneously connected calls in the network. The packet level model represents a load map for voice and data traffic by using non-preemptive M/G/1 queuing model with strict priority for voice over data buffer, From the proposed traffic model, we can derive a graph for upper bounds on the traffic load in terms of bandwidth for voice and data connections. Via numerical experiments we illustrate the implication of the work.

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.