• Title/Summary/Keyword: index clustering

Search Result 323, Processing Time 0.029 seconds

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.254-262
    • /
    • 2005
  • This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.

An Empirical Comparative Study on the Clustering Measurement Using Fuzzy(Average Index Transformation) DEA and Cross-efficiency Models (퍼지(평균지수변환)DEA모형과 교차효율성모형을 이용한 클러스터링측정에 대한 실증적 비교연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.31 no.1
    • /
    • pp.85-110
    • /
    • 2015
  • The purpose of this paper is to show the clustering trend and the empirical comparison and to choose the clustering ports for 3 Korean ports(Busan, Incheon and Gwangyang Ports) by using the Fuzzy(Average Index Transformation) DEA and Cross-efficiency models for 38 Asian ports during 11 years(2001-2011) with 4 input variables(birth length, depth, total area, and number of crane) and 1 output variable(container TEU). The main empirical results of this paper are as follows. First, clustering results by using Fuzzy(AIT)DEA show that 3 Korean ports[Busan(56.29%), Incheon(57.96%), and Gwangyang(66.80%) each]can increase the efficiency. Second, according to Cross-efficiency model, Busan(Hongkong, Kobe, Manila, Singapore, and Kaosiung etc.), Incheon(Aquaba, Dammam, Karachi, Mohammad Byin Oasim and Davao), and Gwangyang(Damman, Yokohama, Nogoya, Keelong, Kaosiung, and Bangkok) should be clustered with those ports in parentheses. Third, when both Fuzzy(AIT)DEA and Cross-efficiency models are mixed, the empirical result shows that 3 Korean ports[Busan(71.38%), Incheon(103.89%), and Gwangyang(168.55%) each]can increase the efficiency. The efficiency ranking comparison among the three models by using Wilcoxon Signed-rank Test was matched with the average level of 66%-67%. The policy implication of this paper is that Korean port policy planner should introduce the Fuzzy(AIT)DEA, and Cross-efficiency models with the mixed two models when clustering is needed among the Asian ports for enhancing the efficiency of inputs and outputs. Also, the results of SWOT analysis among the clustering ports should be considered.

Incremental Fuzzy Clustering Based on a Fuzzy Scatter Matrix

  • Liu, Yongli;Wang, Hengda;Duan, Tianyi;Chen, Jingli;Chao, Hao
    • Journal of Information Processing Systems
    • /
    • v.15 no.2
    • /
    • pp.359-373
    • /
    • 2019
  • For clustering large-scale data, which cannot be loaded into memory entirely, incremental clustering algorithms are very popular. Usually, these algorithms only concern the within-cluster compactness and ignore the between-cluster separation. In this paper, we propose two incremental fuzzy compactness and separation (FCS) clustering algorithms, Single-Pass FCS (SPFCS) and Online FCS (OFCS), based on a fuzzy scatter matrix. Firstly, we introduce two incremental clustering methods called single-pass and online fuzzy C-means algorithms. Then, we combine these two methods separately with the weighted fuzzy C-means algorithm, so that they can be applied to the FCS algorithm. Afterwards, we optimize the within-cluster matrix and betweencluster matrix simultaneously to obtain the minimum within-cluster distance and maximum between-cluster distance. Finally, large-scale datasets can be well clustered within limited memory. We implemented experiments on some artificial datasets and real datasets separately. And experimental results show that, compared with SPFCM and OFCM, our SPFCS and OFCS are more robust to the value of fuzzy index m and noise.

Clustering Validity of Social Network Subgroup Using Attribute Similarity (속성유사도에 따른 사회연결망 서브그룹의 군집유효성)

  • Yoon, Han-Seong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.75-84
    • /
    • 2021
  • For analyzing big data, the social network is increasingly being utilized through relational data, which means the connection characteristics between entities such as people and objects. When the relational data does not exist directly, a social network can be configured by calculating relational data such as attribute similarity from attribute data of entities and using it as links. In this paper, the composition method of the social network using the attribute similarity between entities as a connection relationship, and the clustering method using subgroups for the configured social network are suggested, and the clustering effectiveness of the clustering results is evaluated. The analysis results can vary depending on the type and characteristics of the data to be analyzed, the type of attribute similarity selected, and the criterion value. In addition, the clustering effectiveness may not be consistent depending on the its evaluation method. Therefore, selections and experiments are necessary for better analysis results. Since the analysis results may be different depending on the type and characteristics of the analysis target, options for clustering, etc., there is a limitation. In addition, for performance evaluation of clustering, a study is needed to compare the method of this paper with the conventional method such as k-means.

A Cluster validity Index for Fuzzy Clustering

  • Lee, Haiyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.9 no.6
    • /
    • pp.621-626
    • /
    • 1999
  • In this paper a new cluster validation index which is heuristic but able to eliminate the monotonically decreasing tendency occurring in which the number of cluster c gets very large and close to the number of data points n is proposed. We review the FCM algorithm and some conventional cluster validity criteria discuss on the limiting behavior of the proposed validity index and provide some numerical examples showing the effectiveness of the proposed cluster validity index.

  • PDF

A new clustering algorithm based on the connected region generation

  • Feng, Liuwei;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2619-2643
    • /
    • 2018
  • In this paper, a new clustering algorithm based on the connected region generation (CRG-clustering) is proposed. It is an effective and robust approach to clustering on the basis of the connectivity of the points and their neighbors. In the new algorithm, a connected region generating (CRG) algorithm is developed to obtain the connected regions and an isolated point set. Each connected region corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. Then, a region expansion strategy and a consensus criterion are used to deal with the points in the isolated point set. Experimental results on the synthetic datasets and the real world datasets show that the proposed algorithm has high performance and is insensitive to noise.

A Novel Method for Clustering Critical Generator by using Stability Indices and Energy Margin (안정도 지수와 에너지 마진을 이용한 불안정 발전기의 clustering 법)

  • Chang Dong-Hwan;Jung Yun-Jae;Chun Yeonghan;Nam Hae-Kon
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.54 no.9
    • /
    • pp.441-448
    • /
    • 2005
  • On-line dynamic security assessment is becoming more and more important for the stable operation of power systems as load level increases. The necessity is getting apparent under Electricity Market environments, as operation of power system is exposed to more various operating conditions. For on-line dynamic security assessment, fast transient stability analysis tool is required for contingency selection. The TEF(Transient Energy Function) method is a good candidate for this purpose. The clustering of critical generators is crucial for the precise and fast calculation of energy margin. In this paper, we propose a new method for fast decision of mode of instability by using stability indices. Case study shows very promising results.

Polynomial Fuzzy Radial Basis Function Neural Network Classifiers Realized with the Aid of Boundary Area Decision

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.6
    • /
    • pp.2098-2106
    • /
    • 2014
  • In the area of clustering, there are numerous approaches to construct clusters in the input space. For regression problem, when forming clusters being a part of the overall model, the relationships between the input space and the output space are essential and have to be taken into consideration. Conditional Fuzzy C-Means (c-FCM) clustering offers an opportunity to analyze the structure in the input space with the mechanism of supervision implied by the distribution of data present in the output space. However, like other clustering methods, c-FCM focuses on the distribution of the data. In this paper, we introduce a new method, which by making use of the ambiguity index focuses on the boundaries of the clusters whose determination is essential to the quality of the ensuing classification procedures. The introduced design is illustrated with the aid of numeric examples that provide a detailed insight into the performance of the fuzzy classifiers and quantify several essentials design aspects.

Identification of Multi-Fuzzy Model by means of HCM Clustering and Genetic Algorithms (HCM 클러스터링과 유전자 알고리즘을 이용한 다중 퍼지 모델 동정)

  • 박호성;오성권
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.370-370
    • /
    • 2000
  • In this paper, we design a Multi-Fuzzy model by means of HCM clustering and genetic algorithms for a nonlinear system. In order to determine structure of the proposed Multi-Fuzzy model, HCM clustering method is used. The parameters of membership function of the Multi-Fuzzy ate identified by genetic algorithms. A aggregate performance index with a weighting factor is used to achieve a sound balance between approximation and generalization abilities of the model. We use simplified inference and linear inference as inference method of the proposed Multi-Fuzzy mode] and the standard least square method for estimating consequence parameters of the Multi-Fuzzy. Finally, we use some of numerical data to evaluate the proposed Multi-Fuzzy model and discuss about the usefulness.

  • PDF

The Document Clustering using LSI of IR (LSI를 이용한 문서 클러스터링)

  • 고지현;최영란;유준현;박순철
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2002.06a
    • /
    • pp.330-335
    • /
    • 2002
  • The most critical issue in information retrieval system is to have adequate results corresponding to user requests. When all documents related with user inquiry retrieve, it is not easy not only to find correct document what user wants but is limited. Therefore, clustering method that grouped by corresponding documents has widely used so far. In this paper, we cluster on the basis of the meaning rather than the index term in the existing document and a LSI method is applied by this reason. Furthermore, we distinguish and analyze differences from the clustering using widely-used K-Means algorithm for the document clustering.

  • PDF