• 제목/요약/키워드: Statistical data

검색결과 14,779건 처리시간 0.045초

통계적 사고와 그 함양에 관한 연구 (A Study on Statistical Thinking and developing Statistical thoughts)

  • 김상룡
    • 한국수학교육학회지시리즈C:초등수학교육
    • /
    • 제12권1호
    • /
    • pp.31-38
    • /
    • 2009
  • This paper aims to develop a program which cultivates statistical ability for elementary students. For this purpose, I examined the relationship between mathematical thinking and statistical thinking. I developed statistical programs including classification, discussion of data, generating statistical problem and project program. As result, this study suggests implications for further elementary statistical education.

  • PDF

Contents Analysis on the Internet Sites for Statistical Information

  • 조광현;박희창
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.131-140
    • /
    • 2006
  • There are many statistical information sites as the use of internet is increased quickly in recent years. In this paper, we explore and analyze internet sites for statistical information such as statistical survey system, education, database, and terminology. And then we classify these sites to apply statistical information to some particular spheres easily. In so doing, this study result aims at enhancing our understanding of internet sites for statistical information.

  • PDF

통계모델링 방법의 비교 연구 (A Comparison Study on Statistical Modeling Methods)

  • 노유정
    • 한국산학기술학회논문지
    • /
    • 제17권5호
    • /
    • pp.645-652
    • /
    • 2016
  • 입력 랜덤 변수(input random variable)의 통계 모델링은 기계시스템의 신뢰성 해석(reliability analysis), 신뢰성 기반 설계(reliability-based design optimization), 해석모델의 통계적 검정(validation) 및 보정(calibration)을 위해 반드시 필요하다. 대표적인 통계모델링 기법에는 Akaike Information Criterion (AIC), AIC correction (AICc), Bayesian Information Criterion, Maximum Likelihood Estimation (MLE), Bayesian 방법 등이 있다. 이러한 방법들은 기본적으로 주어진 데이터로부터 후보 모델의 우도함수값을 이용하여 후보 모델 중 가장 적합한 모델을 선택하는 방법이며, 방법에 따라 데이터 수 혹은 파라미터의 수를 고려하여 모델을 선정한다. 하지만 실제 현장에서 데이터의 통계모델링을 하는 엔지니어는 각 방법의 장단점에 대한 이해가 부족하여 어떤 방법이 정확한 방법인지 몰라 통계모델링 수행 시 어려움이 있다. 본 논문에서는 다양한 통계모델링 방법들을 비교하고 각 방법의 장단점 분석을 통해 가장 적합한 모델링 기법을 제안하고자 한다. 각 방법의 검증을 위해 다양한 모분포를 가정하고 다양한 사이즈의 샘플을 임의로 생성하여 시뮬레이션을 수행하였으며, 실제 공학 데이터를 사용하여 통계모델링 방법의 유효성을 검증하였다.

Investigations into Coarsening Continuous Variables

  • Jeong, Dong-Myeong;Kim, Jay-J.
    • 응용통계연구
    • /
    • 제23권2호
    • /
    • pp.325-333
    • /
    • 2010
  • Protection against disclosure of survey respondents' identifiable and/or sensitive information is a prerequisite for statistical agencies that release microdata files from their sample surveys. Coarsening is one of popular methods for protecting the confidentiality of the data. Grouped data can be released in the form of microdata or tabular data. Instead of releasing the data in a tabular form only, having microdata available to the public with interval codes with their representative values greatly enhances the utility of the data. It allows the researchers to compute covariance between the variables and build statistical models or to run a variety of statistical tests on the data. It may be conjectured that the variance of the interval data is lower that of the ungrouped data in the sense that the coarsened data do not have the within interval variance. This conjecture will be investigated using the uniform and triangular distributions. Traditionally, midpoint is used to represent all the values in an interval. This approach implicitly assumes that the data is uniformly distributed within each interval. However, this assumption may not hold, especially in the last interval of the economic data. In this paper, we will use three distributional assumptions - uniform, Pareto and lognormal distribution - in the last interval and use either midpoint or median for other intervals for wage and food costs of the Statistics Korea's 2006 Household Income and Expenditure Survey(HIES) data and compare these approaches in terms of the first two moments.

A Study of Association Rule Mining by Clustering through Data Fusion

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.927-935
    • /
    • 2007
  • Currently, Gyeongnam province is executing the social index survey every year to the provincials. But, this survey has the limit of the analysis as execution of the different survey per 3 year cycles. The solution of this problem is data fusion. Data fusion is the process of combining multiple data in order to provide information of tactical value to the user. But, data fusion doesn#t mean the ultimate result. Therefore, efficient analysis for the data fusion is also important. In this study, we present data fusion method of statistical survey data. Also, we suggest application methodology of association rule mining by clustering through data fusion of statistical survey data.

  • PDF

Patterns of Data Analysis\ulcorner

  • Unwin, Antony
    • Journal of the Korean Statistical Society
    • /
    • 제30권2호
    • /
    • pp.219-230
    • /
    • 2001
  • How do you carry out data analysis\ulcorner There are few texts and little theory. One approach could be to use a pattern language, an idea which has been successful in field as diverse as town planning and software engineering. Patterns for data analysis are defined and discussed, illustrated with examples.

  • PDF

Review of statistical methods for survival analysis using genomic data

  • Lee, Seungyeoun;Lim, Heeju
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.41.1-41.12
    • /
    • 2019
  • Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.

Zooming Statistics: Inference across scales

  • Hannig, Jan;Marron, J.S.;Riedi, R.H.
    • Journal of the Korean Statistical Society
    • /
    • 제30권2호
    • /
    • pp.327-345
    • /
    • 2001
  • New statistical methods are ended to analyzed data in a multi-scale way. Some multi-scale extensions of stand methods, including novel visualization using dynamic graphics are proposed. These tools are used to explore non-standard structure in internet traffic data.

  • PDF

Change Analysis with the Sample Fourier Coefficients

  • Jaehee Kim
    • Communications for Statistical Applications and Methods
    • /
    • 제3권1호
    • /
    • pp.207-217
    • /
    • 1996
  • The problem of detecting change with independent data is considered. The asymptotic distribution of the sample change process with the sample Fourier coefficients is shown as a Brownian Bridge process. We suggest to use dynamic statistics such as a sample Brownian Bridge and graphs as statistical animation. Graphs including change PP plots are given by way of illustration with the simulated data.

  • PDF

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2001년도 제2회 생물정보학 국제심포지엄
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF