• Title/Summary/Keyword: clustered data

Search Result 560, Processing Time 0.028 seconds

Modeling clustered count data with discrete weibull regression model

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.413-420
    • /
    • 2022
  • In this study we adapt discrete weibull regression model for clustered count data. Discrete weibull regression model has an attractive feature that it can handle both under and over dispersion data. We analyzed the eighth Korean National Health and Nutrition Examination Survey (KNHANES VIII) from 2019 to assess the factors influencing the 1 month outpatient stay in 17 different regions. We compared the results using clustered discrete Weibull regression model with those of Poisson, negative binomial, generalized Poisson and Conway-maxwell Poisson regression models, which are widely used in count data analyses. The results show that the clustered discrete Weibull regression model using random intercept model gives the best fit. Simulation study is also held to investigate the performance of the clustered discrete weibull model under various dispersion setting and zero inflated probabilities. In this paper it is shown that using a random effect with discrete Weibull regression can flexibly model count data with various dispersion without the risk of making wrong assumptions about the data dispersion.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

Assessment Analysis on Development Potential of the Clustered Settlements in the Released Green-Belt (개발제한구역 해제지역내 집단취락 개발잠재력 평가분석)

  • Choi, Im-Joo;Ahn, Jun-Hong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.11 no.4
    • /
    • pp.112-121
    • /
    • 2008
  • The study aimed to extract development potential on clustered settlements by deciding priority on extracting standardization points taking pure development indexes and future development conditions into account targeting clustered settlements released from green-belt in Gijang-gun, Busan. The study selected individual indexes on 4 areas on aspects of natural, physical, development, approach for objective and scientific analysis through Busan's GIS Data. The results showed that large clustered settlements near shores are areas with high development index value and evaluated as areas with high development potential, and smaller clustered settlements located inland near the west of National Road 14 showed that individual index values were lower thus evaluated as areas with lower development potential.

  • PDF

Tests for homogeneity of proportions in clustered binomial data

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.5
    • /
    • pp.433-444
    • /
    • 2016
  • When we observe binary responses in a cluster (such as rat lab-subjects), they are usually correlated to each other. In clustered binomial counts, the independence assumption is violated and we encounter an extra-variation. In the presence of extra-variation, the ordinary statistical analyses of binomial data are inappropriate to apply. In testing the homogeneity of proportions between several treatment groups, the classical Pearson chi-squared test has a severe flaw in the control of Type I error rates. We focus on modifying the chi-squared statistic by incorporating variance inflation factors. We suggest a method to adjust data in terms of dispersion estimate based on a quasi-likelihood model. We explain the testing procedure via an illustrative example as well as compare the performance of a modified chi-squared test with competitive statistics through a Monte Carlo study.

Efficient striping policy of NOD data on clustered storage server (Clustered Storage Server 환경에서 뉴스 데이터에 적합한 분산 저장방법)

  • 정귀옥;박성호;김영주;정기동
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.89-91
    • /
    • 1998
  • 현대 사회의 정보 요구 증가와 편리함의 추구는 정보통신 기술의 발달과 함께 멀티미디어 데이터 서비스를 급증 시켰다. NOD 데이터의 경우 이러한 요구에 부합하므로, 많은 사용자를 가지게 될 것이며, 그에 따른 제반 요건으로 서버 구현에서 scalability, availability, reliability 등이 중요한 요건이다. 따라서 이러한 요건을 멀티미디어 데이터 특성을 이용한 저장 방법으로 만족시키려는 많은 연구가 있다. 그러나 NOD 시스템에 대한 연구는 미흡한 실정이며 clustered 환경에서의 New 데이터에 대한 연구는 거의 없다. VOD 데이터에 적합한 것으로 알려진 일반적인 저장 방법이 NOD 데이터에 반드시 적합한 것이 아니며, 본 논문에서는 기존에 연구된 데이터 저장 방법 중에서 NOD 데이터의 small volume, skewed popularity distribution 등의 특성을 고려하여 clustered storage server환경에 맞는 striping 정책을 찾는다.

  • PDF

Testing Independence in Contingency Tables with Clustered Data (집락자료의 분할표에서 독립성검정)

  • 정광모;이현영
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.337-346
    • /
    • 2004
  • The Pearson chi-square goodness-of-fit test and the likelihood ratio tests are usually used for testing independence in two-way contingency tables under random sampling. But both of these tests may provide false results for the contingency table with clustered observations. In this case we consider the generalized linear mixed model which includes random effects of clustering in addition to the fixed effects of covariates. Both the heterogeneity between clusters and the dependency within a cluster can be explained via generalized linear mixed model. In this paper we introduce several types of generalized linear mixed model for testing independence in contingency tables with clustered observations. We also discuss the fitting of these models through a real dataset.

Genetic variation and relationship of Artemisia capillaris Thunb.(Compositae) by RAPD analysis

  • Kim, Jung-Hyun;Kim, Dong-Kap;Kim, Joo-Hwan
    • Korean Journal of Plant Resources
    • /
    • v.22 no.3
    • /
    • pp.242-247
    • /
    • 2009
  • Randomly Amplified Polymorphic DNA (RAPD) was performed to define the genetic variation and relationships of Artemisia capillaris. Fifteen populations by the distributions and habitat were collected to conduct RAPD analysis. RAPD markers were observed mainly between 300bp and 1600bp. Total 72 scorable markers from 7 primers were applied to generate the genetic matrix, and 69 bands were polymorphic and only 3 bands were monomorphic. The genetic dissimilarity matrix by Nei's genetic distance (1972) and UPGMA phenogram were produced from the data matrix. Populations of Artemisia capillaris were clustered with high genetic affinities and cluster patterns were correlated with distributional patterns. Two big groups were clustered as southern area group and middle area group. The closest OTUs were GW2 and GG1 in middle area group, and GB1 from southern area group was clustered with OTUs in middle area group. RAPD data was useful to define the genetic variations and relationships of A. capillaris.

The role of family types clustered based on the intra system dynamics elements in explaining housewive's managerial behavior. (가족체계내 역동성요소에 근거한 가족유형에 따른 주부의 가정관리행동)

  • 이연숙
    • Journal of the Korean Home Economics Association
    • /
    • v.34 no.4
    • /
    • pp.295-308
    • /
    • 1996
  • The purpose of this study was to explore how family types clustered based on the intra system dynamics explained housewive's managerial behavior. The data were collected by means of questionnaire distributed to a stratified sample of 544 housewives in Seoul who lived with husband and children. The questionnaires included FACES Ⅱ and Ⅲ, Communication Scale, Managerial behavior Scale and Life Satisfaction Scale. Frequency, percentile, mean, correlation, factor analysis, cluster analysis, One-way ANOVA with Scheffe test, and multiple regression were used to analyze the data. This study had resulted in three major findings. The first was that families were clustered by four types, named structed-separated family, flexible-connected family, change oriented emashed, and rigid-disengaed family. The second finding was that a difference in managerial behavior was found among four types of family. Housewives whose family were more connected each other and adapted more easily to changing situations showed better managerial behavior. The last one was that the managerial behavior of housewives was better explained by family types than socio-demographic variables. The recommendations for future research and the better ways to lead effective managerial behavior were suggested.

  • PDF

Classification of basin characteristics related to inundation using clustering (군집분석을 이용한 침수관련 유역특성 분류)

  • Lee, Han Seung;Cho, Jae Woong;Kang, Ho seon;Hwang, Jeong Geun;Moon, Hae Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF