• Title/Summary/Keyword: data distributions

Search Result 2,588, Processing Time 0.034 seconds

Exponential family of circular distributions

  • Kim, Sung-Su
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.6
    • /
    • pp.1217-1222
    • /
    • 2011
  • In this paper, we show that any circular density can be closely approximated by an exponential family of distributions. Therefore we propose an exponential family of distributions as a new family of circular distributions, which is absolutely suitable to model any shape of circular distributions. In this family of circular distributions, the trigonometric moments are found to be the uniformly minimum variance unbiased estimators (UMVUEs) of the parameters of distribution. Simulation result and goodness of fit test using an asymmetric real data set show usefulness of the novel circular distribution.

An SVD-Based Approach for Generating High-Dimensional Data and Query Sets (SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성)

  • 김상욱
    • The Journal of Information Technology and Database
    • /
    • v.8 no.2
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

Data Distributions on Performance of Neural Networks for Two Year Peak Stream Discharges

  • Muttiah, Ranjan S.
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 1996.06c
    • /
    • pp.1073-1080
    • /
    • 1996
  • The impact of the input and output probability distributions on the performance of neural networks to forecast two year peak stream flow (cubic meters per second) is examined for two major river basins of the US. The neural network input consisted of drainage area(square kilometers ) and elevation (meters). When data are normally distributed , the neural networks predict much better than when the data are non-normal and have larger tails in their distributions.

  • PDF

Closeness of Lindley distribution to Weibull and gamma distributions

  • Raqab, Mohammad Z.;Al-Jarallah, Reem A.;Al-Mutairi, Dhaifallah K.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.129-142
    • /
    • 2017
  • In this paper we consider the problem of the model selection/discrimination among three different positively skewed lifetime distributions. Lindley, Weibull, and gamma distributions have been used to effectively analyze positively skewed lifetime data. This paper assesses how much closer the Lindley distribution gets to Weibull and gamma distributions. We consider three techniques that involve the likelihood ratio test, asymptotic likelihood ratio test, and minimum Kolmogorov distance as optimality criteria to diagnose the appropriate fitting model among the three distributions for a given data set. Monte Carlo simulation study is performed for computing the probability of correct selection based on the considered optimality criteria among these families of distributions for various choices of sample sizes and shape parameters. It is observed that overall, the Lindley distribution is closer to Weibull distribution in the sense of likelihood ratio and Kolmogorov criteria. A real data set is presented and analyzed for illustrative purposes.

Derivations of moments for discrete probability distributions using backward difference operators (후진 미분 연산자를 이용한 이산확률분포의 적률 유도)

  • Cho, Kil-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.505-513
    • /
    • 2011
  • In this paper, we obtain the derivations of moments of discrete probability distributions by using the backward difference operators. Also, we presents such derivations for several well-known distributions; they are the binomial, Poisson, geometric, hypergeometric and negative hypergeometric distributions.

A Projected Exponential Family for Modeling Semicircular Data

  • Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1125-1145
    • /
    • 2010
  • For modeling(skewed) semicircular data, we derive a new exponential family of distributions. We extend it to the l-axial exponential family of distributions by a projection for modeling any arc of arbitrary length. It is straightforward to generate samples from the l-axial exponential family of distributions. Asymptotic result reveals that the linear exponential family of distributions can be used to approximate the l-axial exponential family of distributions. Some trigonometric moments are also derived in closed forms. The maximum likelihood estimation is adopted to estimate model parameters. Some hypotheses tests and confidence intervals are also developed. The Kolmogorov-Smirnov test is adopted for a goodness of t test of the l-axial exponential family of distributions. Samples of orientations are used to demonstrate the proposed model.

Distributional Shape of Food Intake and Nutrition Data for Adults and Children (성인과 어린이의 식품섭취와 영양소 섭취량의 분포에 대한 연구)

  • 문현경;정해랑;황성희
    • Journal of Food Hygiene and Safety
    • /
    • v.7 no.2
    • /
    • pp.113-121
    • /
    • 1992
  • Food intake data from 228 persons (96 male adult ranging in age from 19 to 54, 27 female adult ranging in age from 20 to 46, 54 boys ranging in age from 9 to 11, and 51 girls ranging in age from 8 to II) were studied with respect to the shape of the underlying probablity distributions. For each menu items distributional shapes of food intake were different. Most of distributions for food intakes from normaJ distributions. From food intake data of 2 meals nutrition intake data are calculated. For each meal, energy, protein, fat, carbohydrate, fiber, calcium, iron, vitamin A, thiamin, ribofavin, niacin and vitamin C were computed and thier distributions were compared with normal distributions. Distributions for adult female showed normal distributions for some food items. For nutrient intake data from male adults, distributions for vitamin C from 1st meal and calcium from 2nd meal were marginal and the remains were differed from normal distributions. For adult female and childern, distiributions for some nutients were differed from normal distributions. It is hard to find special patterns for each nutrient distributions. Therefore the normal distributions assumptions should be verified prior to applying parametric techniques to thier data. If those assumptions are not valid, non-parametric techniques should be used to analyze their data.

  • PDF

Families of Distributions Arising from Distributions of Ordered Data

  • Ahmadi, Mosayeb;Razmkhah, M.;Mohtashami Borzadaran, G.R.
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.105-120
    • /
    • 2015
  • A large family of distributions arising from distributions of ordered data is proposed which contains other models studied in the literature. This extension subsume many cases of weighted random variables such as order statistics, records, k-records and many others in variety. Such a distribution can be used for modeling data which are not identical in distribution. Some properties of the theoretical model such as moment, mean deviation, entropy criteria, symmetry and unimodality are derived. The proposed model also studies the problem of parameter estimation and derives maximum likelihood estimators in a weighted gamma distribution. Finally, it will be shown that the proposed model is the best among the previously introduced distributions for modeling a real data set.

Selection of Appropriate Probability Distribution Types for Ten Days Evaporation Data (순별증발량 자료의 적정 확률분포형 선정)

  • 김선주;박재흥;강상진
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 1998.10a
    • /
    • pp.338-343
    • /
    • 1998
  • This study is to select appropriate probability distributions for ten days evaporation data for the purpose of representing statistical characteristics of real evaporation data in Korea. Nine probability distribution functions were assumed to be underlying distributions for ten days evaporation data of 20 stations with the duration of 20 years. The parameter of each probability distribution function were estimated by the maximum likelihood approach, and appropriate probability distributions were selected from the goodness of fit test. Log Pearson type III model was selected as an appropriate probability distribution for ten days evaporation data in Korea.

  • PDF

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

  • Ma, Zhihua;Xue, Yishu;Hu, Guanyu
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.57-67
    • /
    • 2019
  • Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.