• 제목/요약/키워드: Statistical Data

검색결과 14,797건 처리시간 0.032초

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구 (Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method)

  • 홍승우;박재규;박성준;정의승
    • 대한인간공학회지
    • /
    • 제29권4호
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • 제30권4호
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

구글 지도에 통계정보를 표현하기 위한 R 함수 개발 (Development of a R function for visualizing statistical information on Google static maps)

  • 한경수;박세진;안정용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권5호
    • /
    • pp.971-981
    • /
    • 2012
  • 구글 지도는 지리 정보를 갖는 데이터에 대한 통계정보를 제공하기 위한 보편화된 수단의 하나로 자리매김하고 있다. 본 연구에서는 R에서 구글 지도를 활용하는 방법에 대해 소개하고, 구글 지도상에 다양한 통계그래프를 표현하기 위한 R 함수를 개발한다. 개발된 함수를 통하여 막대그래프, 원형그래프, 사각형그래프 등과 같은 다양한 통계그래프를 지도상에 표현할 수 있다.

Statistical Analysis of Bivariate Recurrent Event Data with Incomplete Observation Gaps

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.283-290
    • /
    • 2013
  • Subjects can experience two types of recurrent events in a longitudinal study. In addition, there may exist intermittent dropouts that results in repeated observation gaps during which no recurrent events are observed. Therefore, theses periods are regarded as non-risk status. In this paper, we consider a special case where information on the observation gap is incomplete, that is, the termination time of observation gap is not available while the starting time is known. For a statistical inference, incomplete termination time is incorporated in terms of interval-censored data and estimated with two approaches. A shared frailty effect is also employed for the association between two recurrent events. An EM algorithm is applied to recover unknown termination times as well as frailty effect. We apply the suggested method to young drivers' convictions data with several suspensions.

Three-Parameter Gamma Distribution and Its Significance in Structural Reliability

  • Zhao, Yan-Gang;Alfredo H-S. Ang
    • Computational Structural Engineering : An International Journal
    • /
    • 제2권1호
    • /
    • pp.1-10
    • /
    • 2002
  • Information on the distribution of the basic random variables is essential for the accurate evaluation of structural reliability. The usual method for determining the distribution is to fit a candidate distribution to the histogram of available statistical data of the variable and perform appropriate goodness-of-fit tests. Generally, such candidate distributions would have two parameters that may be evaluated from the mean value and standard deviation of the statistical data. In the present paper, a-parameter Gamma distribution, whose parameters can be directly defined in terms of the mean value, standard deviation and skewness of available data, is suggested. The flexibility and advantages of the distribution in fitting statistical data and its significance in structural reliability evaluation are identified and discussed. Numerical examples are presented to demonstrate these advantages.

  • PDF

Radioactive waste sampling for characterisation - A Bayesian upgrade

  • Pyke, Caroline K.;Hiller, Peter J.;Koma, Yoshikazu;Ohki, Keiichi
    • Nuclear Engineering and Technology
    • /
    • 제54권1호
    • /
    • pp.414-422
    • /
    • 2022
  • Presented in this paper is a methodology for combining a Bayesian statistical approach with Data Quality Objectives (a structured decision-making method) to provide increased levels of confidence in analytical data when approaching a waste boundary. Development of sampling and analysis plans for the characterisation of radioactive waste often use a simple, one pass statistical approach as underpinning for the sampling schedule. Using a Bayesian statistical approach introduces the concept of Prior information giving an adaptive sample strategy based on previous knowledge. This aligns more closely with the iterative approach demanded of the most commonly used structured decision-making tool in this area (Data Quality Objectives) and the potential to provide a more fully underpinned justification than the more traditional statistical approach. The approach described has been developed in a UK regulatory context but is translated to a waste stream from the Fukushima Daiichi Nuclear Power Station to demonstrate how the methodology can be applied in this context to support decision making regarding the ultimate disposal option for radioactive waste in a more global context.

정보공개 환경에서 개인정보 보호와 노출 위험의 측정에 대한 통계적 방법 (Review on statistical methods for protecting privacy and measuring risk of disclosure when releasing information for public use)

  • 이용희
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권5호
    • /
    • pp.1029-1041
    • /
    • 2013
  • 최근 빅데이터의 등장과 정보 공개에 대한 급격한 수요 증가에 따라 자료를 일반에게 공개할 때 개인 정보를 보호해야 하는 필요성이 어느 때보다 절실하다. 본 논문에서는 마이크로 자료와 통계분석 서버를 중심으로 현재까지 제시된 개인정보 노출제한를 위한 통계적 방법, 정보 노출의 개념, 노출 위험을 측정하는 기준들을 개괄적으로 소개한다.

Bayesian Inference for Predicting the Default Rate Using the Power Prior

  • Kim, Seong-W.;Son, Young-Sook;Choi, Sang-A
    • Communications for Statistical Applications and Methods
    • /
    • 제13권3호
    • /
    • pp.685-699
    • /
    • 2006
  • Commercial banks and other related areas have developed internal models to better quantify their financial risks. Since an appropriate credit risk model plays a very important role in the risk management at financial institutions, it needs more accurate model which forecasts the credit losses, and statistical inference on that model is required. In this paper, we propose a new method for estimating a default rate. It is a Bayesian approach using the power prior which allows for incorporating of historical data to estimate the default rate. Inference on current data could be more reliable if there exist similar data based on previous studies. Ibrahim and Chen (2000) utilize these data to characterize the power prior. It allows for incorporating of historical data to estimate the parameters in the models. We demonstrate our methodologies with a real data set regarding SOHO data and also perform a simulation study.

Dual Generalized Maximum Entropy Estimation for Panel Data Regression Models

  • Lee, Jaejun;Cheon, Sooyoung
    • Communications for Statistical Applications and Methods
    • /
    • 제21권5호
    • /
    • pp.395-409
    • /
    • 2014
  • Data limited, partial, or incomplete are known as an ill-posed problem. If the data with ill-posed problems are analyzed by traditional statistical methods, the results obviously are not reliable and lead to erroneous interpretations. To overcome these problems, we propose a dual generalized maximum entropy (dual GME) estimator for panel data regression models based on an unconstrained dual Lagrange multiplier method. Monte Carlo simulations for panel data regression models with exogeneity, endogeneity, or/and collinearity show that the dual GME estimator outperforms several other estimators such as using least squares and instruments even in small samples. We believe that our dual GME procedure developed for the panel data regression framework will be useful to analyze ill-posed and endogenous data sets.