• Title/Summary/Keyword: clustered data

Search Result 553, Processing Time 0.023 seconds

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang;Gao, Hao-Lin;Li, Bi-Cheng;Hu, Guo-En
    • Transactions on Electrical and Electronic Materials
    • /
    • v.15 no.3
    • /
    • pp.125-129
    • /
    • 2014
  • A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B.
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.543-556
    • /
    • 2015
  • We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

A Study on Measuring the Similarity Among Sampling Sites in Lake Yongdam with Water Quality Data Using Multivariate Techniques (다변량기법을 활용한 용담호 수질측정지점 유사성 연구)

  • Lee, Yosang;Kwon, Sehyug
    • Journal of Environmental Impact Assessment
    • /
    • v.18 no.6
    • /
    • pp.401-409
    • /
    • 2009
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data and understand the characteristics of classified clusters have been discussed for the optimal water quality monitering network. For empirical study, data of two years (2005, 2006) at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in Yongdam reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

A Study on a Robust Clustered Group Multicast in Ad-hoc Networks (에드-혹 네트워크에서 신뢰성 있는 클러스터 기반 그룹 멀티캐스트 방식에 관한 연구)

  • Park, Yang-Jae;Lee, Jeong-Hyun
    • The KIPS Transactions:PartC
    • /
    • v.10C no.2
    • /
    • pp.163-170
    • /
    • 2003
  • In this paper we propose a robust clustered croup Multicast in Ad-hoc network. The proposed scheme applies to weighted clustered Algorithm. Ad-hoc network is a collection of wireless mobile hosts forming a temporary network without the aid of any centralized administration or reliable support services such as wired network and base station. In ad hoc network routing protocol because of limited bandwidth and high mobility robust, simple and energy consume minimal. WCGM method uses a base structure founded on combination weighted value and applies combination weight value to cluster header keeping data transmission by scoped flooding, which is the advantage of the exiting FGMP method. Because this method has safe and reliable data transmission, it shows the effect to decrease both overhead to preserve transmission structure and overhead for data transmission.

An Evaluation of Data Delivery Mechanisms in Clustered Sensor Networks (클러스터 기반 센서 망에서 데이터 전달 방법들의 성능 분석)

  • Park Tae-Keun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3A
    • /
    • pp.304-310
    • /
    • 2006
  • This paper evaluates the performance of three types of data delivery mechanisms in clustered sensor networks, as a basic research to develop an energy efficient topology management scheme. In the first mechanism, one node per cluster(clusterhead) turns on its radio(or wakes up) to transmit and receive RTS/CTS/DATA/ACK messages, but in the second one, k nodes per cluster wake up and participate in the message exchange. In the last mechanism, clusterheads turn on the radio to exchange RTS/CTS messages, and if a clusterhead receives RTS containing its cluster m as a destination, it makes k nodes in the cluster hun on the radio to receive DATA and transmit ACK. Through simulation, we show the energy consumption of the three types of data delivery mechanisms as functions of the number of active nodes per cluster, offered load, and packet loss probability.

Confidence Interval for the Difference or Ratio of Two Median Failure Times from Clustered Survival Data

  • Lee, Seung-Yeoun;Jung, Sin-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.355-364
    • /
    • 2009
  • A simple method is proposed for constructing nonparametric confidence intervals for the difference or ratio of two median failure times. The method applies when clustered survival data with censoring is randomized either (I) under cluster randomization or (II) subunit randomization. This method is simple to calculate and is based on non-parametric density estimation. The proposed method is illustrated with the otology study data and HL-A antigen study data. Moreover, the simulation results are reported for practical sample sizes.

A Clustered Dwarf Structure to Speed up Queries on Data Cubes

  • Bao, Yubin;Leng, Fangling;Wang, Daling;Yu, Ge
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.2
    • /
    • pp.195-210
    • /
    • 2007
  • Dwarf is a highly compressed structure, which compresses the cube by eliminating the semantic redundancies while computing a data cube. Although it has high compression ratio, Dwarf is slower in querying and more difficult in updating due to its structure characteristics. We all know that the original intention of data cube is to speed up the query performance, so we propose two novel clustering methods for query optimization: the recursion clustering method which clusters the nodes in a recursive manner to speed up point queries and the hierarchical clustering method which clusters the nodes of the same dimension to speed up range queries. To facilitate the implementation, we design a partition strategy and a logical clustering mechanism. Experimental results show our methods can effectively improve the query performance on data cubes, and the recursion clustering method is suitable for both point queries and range queries.

Joint Cell Grouping and User Association Scheme for Clustered Heterogeneous Cellular Networks (클러스터 이기종 셀룰러 네트워크를 위한 합동 셀 그룹핑 및 사용자 접속 기법)

  • Park, Jin-Bae;Lee, Hyung Yeol;Choi, Uri;Kim, Kwang Soon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.6
    • /
    • pp.520-527
    • /
    • 2013
  • In this paper, a joint cell grouping and user association technique proposed for a semi-dynamic grouped network MIMO in a clustered heterogeneous cellular network (HCN). With the conventional macro BSs, small cells are being overlaid to increase a spectral efficiency per area and these small cells are expected to be concentrated to support exponentially increasing data traffic in hot spot areas. The main culprits of performance degradation in the clustered HCN are interference and load imbalance. The proposed scheme jointly handles them to maximize a proportional-fair metric. It is shown that the proposed technique can largely improve user average rate and proportional fairness among users than any other conventional schemes in the clustered HCN.

A Striping Strategy Considering Variable Bit Rate in Clustered VOD Servers (클러스터드 VOD 서버에서 가변 비트율을 고려한 스트라이핑 정책)

  • Lee, Jae-Ho;Kim, Jong-Hoon;Ahn, You-Jung
    • Journal of The Korean Association of Information Education
    • /
    • v.2 no.1
    • /
    • pp.10-18
    • /
    • 1998
  • In a VOD server, media data are usually encoded by VBR compression technique such as MPEG, therefore, media stream rates vary. We propose a striping strategy called VCS considering VBR compression in Clustered VOD servers. Simulation are conducted to evaluate and compare the new strategy with a known striping strategy. The results show that the VCS strategy improves the performance.

  • PDF

A Study on Measuring the Similarity Among Sampling Sites in Lake (저수지 수질조사 지점간 유사성 분석)

  • Lee, Yo-Sang;Koh, Deuk-Koo;Lee, Hyun-Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.957-961
    • /
    • 2010
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data. For empirical study, data of two years at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

  • PDF