• Title/Summary/Keyword: clustered data

Search Result 543, Processing Time 0.026 seconds

High-Dimensional Image Indexing based on Adaptive Partitioning ana Vector Approximation (적응 분할과 벡터 근사에 기반한 고차원 이미지 색인 기법)

  • Cha, Gwang-Ho;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.2
    • /
    • pp.128-137
    • /
    • 2002
  • In this paper, we propose the LPC+-file for efficient indexing of high-dimensional image data. With the proliferation of multimedia data, there Is an increasing need to support the indexing and retrieval of high-dimensional image data. Recently, the LPC-file (5) that based on vector approximation has been developed for indexing high-dimensional data. The LPC-file gives good performance especially when the dataset is uniformly distributed. However, compared with for the uniformly distributed dataset, its performance degrades when the dataset is clustered. We improve the performance of the LPC-file for the strongly clustered image dataset. The basic idea is to adaptively partition the data space to find subspaces with high-density clusters and to assign more bits to them than others to increase the discriminatory power of the approximation of vectors. The total number of bits used to represent vector approximations is rather less than that of the LPC-file since the partitioned cells in the LPC+-file share the bits. An empirical evaluation shows that the LPC+-file results in significant performance improvements for real image data sets which are strongly clustered.

An Effective Clustering Procedure for Quantitative Data and Its Application for the Grouping of the Reusable Nuclear Fuel (정량적 자료에 대한 효과적인 군집화 과정 및 사용 후 핵연료의 분류에의 적용)

  • Jing, Jin-Xi;Yoon, Bok-Sik;Lee, Yong-Joo
    • IE interfaces
    • /
    • v.15 no.2
    • /
    • pp.182-188
    • /
    • 2002
  • Clustering is widely used in various fields in order to investigate structural characteristics of the given data. One of the main tasks of clustering is to partition a set of objects into homogeneous groups for the purpose of data reduction. In this paper a simple but computationally efficient clustering procedure is devised and some statistical techniques to validate its clustered results are discussed. In the given procedure, the proper number of clusters and the clustered groups can be determined simultaneously. The whole procedure is applied to a practical clustering problem for the classification of reusable fuels in nuclear power plants.

A step-by-step guide to Generalized Estimating Equations using SPSS in dental research (치의학 분야에서 SPSS를 이용한 일반화 추정방정식의 단계별 안내)

  • Lim, Hoi-Jeong;Park, Su-Hyeon
    • The Journal of the Korean dental association
    • /
    • v.54 no.11
    • /
    • pp.850-864
    • /
    • 2016
  • The Generalized Estimating Equations (GEE) approach is a widely used statistical method for analyzing longitudinal data and clustered data in clinical studies. In dentistry, due to multiple outcomes obtained from one patient, the outcomes produced from an individual patient are correlated with one another. This study focused on the basic ideas of GEE and introduced the types of covariance matrix and working correlation matrix. The quasi-likelihood information criterion (QIC) and quasi-likelihood information criterion approximation ($QIC_u$) were used to select the best working correlation matrix and the best fitting model for the correlated outcomes. The purpose of this study is to show a detailed process for the GEE analysis using SPSS software along with an orthodontic miniscrew example, and to help understand how to use GEE analysis in dental research.

  • PDF

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

Analyzing Clustered and Interval-Censored Data based on the Semiparametric Frailty Model

  • Kim, Jin-Heum;Kim, Youn-Nam
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.5
    • /
    • pp.707-718
    • /
    • 2012
  • We propose a semi-parametric model to analyze clustered and interval-censored data; in addition, we plugged-in a gamma frailty to the model to measure the association of members within the same cluster. We propose an estimation procedure based on EM algorithm. Simulation results showed that our estimation procedure may result in unbiased estimates. The standard error is smaller than expected and provides conservative results to estimate the coverage rate; however, this trend gradually disappeared as the number of members in the same cluster increased. In addition, our proposed method was illustrated with data taken from diabetic retinopathy studies to evaluate the effectiveness of laser photocoagulation in delaying or preventing the onset of blindness in individuals with diabetic retinopathy.

Identifying Spatial Distribution Pattern of Water Quality in Masan Bay Using Spatial Autocorrelation Index and Pearson's r (공간자기상관 지수와 Pearson 상관계수를 이용한 마산만 수질의 공간분포 패턴 규명)

  • Choi, Hyun-Woo;Park, Jae-Moon;Kim, Hyun-Wook;Kim, Young-Ok
    • Ocean and Polar Research
    • /
    • v.29 no.4
    • /
    • pp.391-400
    • /
    • 2007
  • To identify the spatial distribution pattern of water quality in Masan Bay, Pearson's correlation as a common statistic method and Moran's I as a spatial autocorrelation statistics were applied to the hydrological data seasonally collected from Masan Bay for two years ($2004{\sim}2005$). Spatial distribution of salinity, DO and silicate among the hydrological parameters clustered strongly while chlorophyll a distribution displayed a weak clustering. When the similarity matrix of Moran's I was compared with correlation matrix of Pearson's r, only the relationships of temperature vs. salinity, temperature vs. silicate and silicate vs. total inorganic nitrogen showed significant correlation and similarity of spatial clustered pattern. Considering Pearson's correlation and the spatial autocorrelation results, water quality distribution patterns of Masan Bay were conceptually simplified into four types. Based on the simplified types, Moran's I and Pearson's r were compared respectively with spatial distribution maps on salinity and silicate with a strong clustered pattern, and with chlorophyll a having no clustered pattern. According to these test results, spatial distribution of the water quality in Masan Bay could be summed up in four patterns. This summation should be developed as spatial index to be linked with pollutant and ecological indicators for coastal health assessment.

Peak Power Minimization for Clustered VLIW Architectures (분산된 VLIW 구조에서의 최대 전력 최소화 방법)

  • 서재원;김태환;정기석
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.5_6
    • /
    • pp.258-264
    • /
    • 2003
  • VLIW architecture has emerged as one of the most effective architectures in dealing with multimedia applications. In multimedia applications, there is ample potential for parallelizing the execution of multiple operations because such applications typically have data intensive processing which often has limited data and/or control dependencies. As the degree of instruction-level parallelism increases, non-clustered VLIW architectures scale poorly because of the tremendous register port pressure. Therefore, clustered VLIW architecture is definitely preferred over non-clustered VLIW architecture when a higher degree of parallelizing is possible as in the case of multimedia processing However, having multiple clusters in an architecture implies that the amount of hardware is quite large, and therefore, power consumption becomes a very crucial issue. In this paper, we propose an algorithm to minimize the peak power consumption without incurring little or no delay penalty. The effectiveness of our algorithm has been verified by various sets of experiments, and up to 30.7% reduction in the peak power consumption is observed compared with the results that is optimized to minimize resources only.

Cure Rate Model with Clustered Interval Censored Data (군집화된 구간 중도절단자료에 대한 치유율 모형의 적용)

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.21-30
    • /
    • 2014
  • Ordinary survival analysis cannot be applied when a significant fraction of patients may be cured. A cure rate model is the combination of cure fraction and survival model and can be applied to several types of cancer. In this article, the cure rate model is considered in the interval censored data with a cluster effect. A shared frailty model is introduced to characterize the cluster effect and an EM algorithm is used to estimate parameters. A simulation study is done to evaluate the performance of estimates. The proposed approach is applied to the smoking cessation study in which the event of interest is a smoking relapse. Several covariates (including intensive care) are evaluated to be effective for both the occurrence of relapse and the smoke quitting duration.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

A new Clustering Algorithm for GPS Trajectories with Maximum Overlap Interval (최대 중첩구간을 이용한 새로운 GPS 궤적 클러스터링)

  • Kim, Taeyong;Park, Bokuk;Park, Jinkwan;Cho, Hwan-Gue
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.9
    • /
    • pp.419-425
    • /
    • 2016
  • In navigator systems, keeping map data up-to-date is an important task. Manual update involves a substantial cost and it is difficult to achieve immediate reflection of changes with manual updates. In this paper, we present a method for trajectory-center extraction, which is essential for automatic road map generation with GPS data. Though clustered trajectories are necessary to extract the center road, real trajectories are not clustered. To address this problem, this paper proposes a new method using the maximum overlapping interval and trajectory clustering. Finally, we apply the Virtual Running method to extract the center road from the clustered trajectories. We conducted experiments on real massive taxi GPS data sets collected throughout Gang-Nam-Gu, Sung-Nam city and all parts of Seoul city. Experimental results showed that our method is stable and efficient for extracting the center trajectory of real roads.