• 제목/요약/키워드: data distributions

검색결과 2,596건 처리시간 0.034초

Exponential family of circular distributions

  • Kim, Sung-Su
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권6호
    • /
    • pp.1217-1222
    • /
    • 2011
  • In this paper, we show that any circular density can be closely approximated by an exponential family of distributions. Therefore we propose an exponential family of distributions as a new family of circular distributions, which is absolutely suitable to model any shape of circular distributions. In this family of circular distributions, the trigonometric moments are found to be the uniformly minimum variance unbiased estimators (UMVUEs) of the parameters of distribution. Simulation result and goodness of fit test using an asymmetric real data set show usefulness of the novel circular distribution.

SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성 (An SVD-Based Approach for Generating High-Dimensional Data and Query Sets)

  • 김상욱
    • 정보기술과데이타베이스저널
    • /
    • 제8권2호
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

Data Distributions on Performance of Neural Networks for Two Year Peak Stream Discharges

  • Muttiah, Ranjan S.
    • 한국농업기계학회:학술대회논문집
    • /
    • 한국농업기계학회 1996년도 International Conference on Agricultural Machinery Engineering Proceedings
    • /
    • pp.1073-1080
    • /
    • 1996
  • The impact of the input and output probability distributions on the performance of neural networks to forecast two year peak stream flow (cubic meters per second) is examined for two major river basins of the US. The neural network input consisted of drainage area(square kilometers ) and elevation (meters). When data are normally distributed , the neural networks predict much better than when the data are non-normal and have larger tails in their distributions.

  • PDF

Closeness of Lindley distribution to Weibull and gamma distributions

  • Raqab, Mohammad Z.;Al-Jarallah, Reem A.;Al-Mutairi, Dhaifallah K.
    • Communications for Statistical Applications and Methods
    • /
    • 제24권2호
    • /
    • pp.129-142
    • /
    • 2017
  • In this paper we consider the problem of the model selection/discrimination among three different positively skewed lifetime distributions. Lindley, Weibull, and gamma distributions have been used to effectively analyze positively skewed lifetime data. This paper assesses how much closer the Lindley distribution gets to Weibull and gamma distributions. We consider three techniques that involve the likelihood ratio test, asymptotic likelihood ratio test, and minimum Kolmogorov distance as optimality criteria to diagnose the appropriate fitting model among the three distributions for a given data set. Monte Carlo simulation study is performed for computing the probability of correct selection based on the considered optimality criteria among these families of distributions for various choices of sample sizes and shape parameters. It is observed that overall, the Lindley distribution is closer to Weibull distribution in the sense of likelihood ratio and Kolmogorov criteria. A real data set is presented and analyzed for illustrative purposes.

후진 미분 연산자를 이용한 이산확률분포의 적률 유도 (Derivations of moments for discrete probability distributions using backward difference operators)

  • 조길호
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권3호
    • /
    • pp.505-513
    • /
    • 2011
  • 본 논문의 목적은 후진 미분 연산자를 이용하여 이산확률분포에 대한 원점으로부터의 r차 적률을 구하는 공식을 유도한다. 이 공식을 이용함으로써 r차 적률은 0에서 계산된 $x^r$의 r번째 후진 미분 연산자까지의 일차결합으로써 계산됨을 알 수 있다.

A Projected Exponential Family for Modeling Semicircular Data

  • Kim, Hyoung-Moon
    • 응용통계연구
    • /
    • 제23권6호
    • /
    • pp.1125-1145
    • /
    • 2010
  • For modeling(skewed) semicircular data, we derive a new exponential family of distributions. We extend it to the l-axial exponential family of distributions by a projection for modeling any arc of arbitrary length. It is straightforward to generate samples from the l-axial exponential family of distributions. Asymptotic result reveals that the linear exponential family of distributions can be used to approximate the l-axial exponential family of distributions. Some trigonometric moments are also derived in closed forms. The maximum likelihood estimation is adopted to estimate model parameters. Some hypotheses tests and confidence intervals are also developed. The Kolmogorov-Smirnov test is adopted for a goodness of t test of the l-axial exponential family of distributions. Samples of orientations are used to demonstrate the proposed model.

성인과 어린이의 식품섭취와 영양소 섭취량의 분포에 대한 연구 (Distributional Shape of Food Intake and Nutrition Data for Adults and Children)

  • 문현경;정해랑;황성희
    • 한국식품위생안전성학회지
    • /
    • 제7권2호
    • /
    • pp.113-121
    • /
    • 1992
  • 식품섭취조사가 228명을 대상으로 실시되어, 확률분포의 형태에 대한 연구가 시도되었다. (연령이 19세에서 54세인 96명의 성인남자, 연령이 20세에서 46세인 성인여자, 9세에서 11세인 54명의 남나 어린이, 8세에서 11세인 51명의 여자어린이가 조사되었음). 각 메뉴별로 식품 섭취량의 분포는 형태가 달랐으며, 대부분의 식품 섭취량은 정상분포를 하지 않았다. 두 끼니의 영양 섭취량이, 각 끼니별로 에너지, 단백질, 지방, 탄수화물, 섬유소, 칼슘, 철분, 비타민 A, 티아민, 리보플라빈, 라이신, 비타민 C 가 계산되고, 그 분포형태가 정상분포인가 조사되었다. 성인여자의 경우 메뉴에 따라서 정상분포를 보였다. 성인여자의 영양소섭취의 경우, 첫째끼니의 비타민 C 섭취량과 둘째 끼니의 칼슘이 정상분포이고, 다른 것은 정상분포가 아니었다. 성인여자와 어린이의 경우 정상분포를 보인 영양소도 있었다. 각 영양소 섭취량의 분포에서 어떤 특별한 형태를 찾기는 어려웠으며, 식품섭취나 영양섭취가 정상분포를 하고 있다고 가정하기는 어려웠다. 그러므로, 식품 섭취나 영양 섭취 자료에 정상분포의 가정하에 이루어지는 통계분석을 시도하기 위해서는 이 자료가 정상분포가 확인이 필요할 것이다.

  • PDF

Families of Distributions Arising from Distributions of Ordered Data

  • Ahmadi, Mosayeb;Razmkhah, M.;Mohtashami Borzadaran, G.R.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.105-120
    • /
    • 2015
  • A large family of distributions arising from distributions of ordered data is proposed which contains other models studied in the literature. This extension subsume many cases of weighted random variables such as order statistics, records, k-records and many others in variety. Such a distribution can be used for modeling data which are not identical in distribution. Some properties of the theoretical model such as moment, mean deviation, entropy criteria, symmetry and unimodality are derived. The proposed model also studies the problem of parameter estimation and derives maximum likelihood estimators in a weighted gamma distribution. Finally, it will be shown that the proposed model is the best among the previously introduced distributions for modeling a real data set.

순별증발량 자료의 적정 확률분포형 선정 (Selection of Appropriate Probability Distribution Types for Ten Days Evaporation Data)

  • 김선주;박재흥;강상진
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 1998년도 학술발표회 발표논문집
    • /
    • pp.338-343
    • /
    • 1998
  • This study is to select appropriate probability distributions for ten days evaporation data for the purpose of representing statistical characteristics of real evaporation data in Korea. Nine probability distribution functions were assumed to be underlying distributions for ten days evaporation data of 20 stations with the duration of 20 years. The parameter of each probability distribution function were estimated by the maximum likelihood approach, and appropriate probability distributions were selected from the goodness of fit test. Log Pearson type III model was selected as an appropriate probability distribution for ten days evaporation data in Korea.

  • PDF

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

  • Ma, Zhihua;Xue, Yishu;Hu, Guanyu
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.57-67
    • /
    • 2019
  • Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.