• Title/Summary/Keyword: Cluster validity

Search Result 161, Processing Time 0.024 seconds

Application of Genetic and Local Optimization Algorithms for Object Clustering Problem with Similarity Coefficients (유사성 계수를 이용한 군집화 문제에서 유전자와 국부 최적화 알고리듬의 적용)

  • Yim, Dong-Soon;Oh, Hyun-Seung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.29 no.1
    • /
    • pp.90-99
    • /
    • 2003
  • Object clustering, which makes classification for a set of objects into a number of groups such that objects included in a group have similar characteristic and objects in different groups have dissimilar characteristic each other, has been exploited in diverse area such as information retrieval, data mining, group technology, etc. In this study, an object-clustering problem with similarity coefficients between objects is considered. At first, an evaluation function for the optimization problem is defined. Then, a genetic algorithm and local optimization technique based on heuristic method are proposed and used in order to obtain near optimal solutions. Solutions from the genetic algorithm are improved by local optimization techniques based on object relocation and cluster merging. Throughout extensive experiments, the validity and effectiveness of the proposed algorithms are tested.

Analysis and New Indices of Cluster Validity Indices in Summation Type (합형식의 군집 유효화 지수의 분석과 새로운 지수 개발)

  • Kim Minho;Ramakrishna R.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.598-600
    • /
    • 2005
  • 군집 유효화 평가란 기본적으로 클래스 (Class)에 대한 정보가 주어지지 않은 상태에서 다양한 입력 변수에 의해 발생되는 군집화의 결과들을 평가하여 그들 중에서 주어진 데이터 집합의 자연적인 분할 상태에 가장 적합한 결과를 찾는 기법을 말한다. 군집 유효화 평가에서 그 척도로 사용되는 것이 군집 유효화 지수이다. 본 논문에서는 우선 현존하는 다양한 군집 유효화 지수들 중에서 합 형식을 가지는 지수들을 다룬다. 구체적으로 이 지수들의 설계 원리와 각 지수들의 부합성 (Compliance) 분석한다. 다음으로 분석을 통해 밝혀진 그들의 단점을 보완할 수 있는 새로운 군집 유효화 지수들을 제안한다. 마지막으로 기존의 군집 유효화 지수들을 포함한 새로이 제안한 지수들의 성능을 실험 학습을 통해 평가한다.

  • PDF

Analysis and New Indices of Cluster Validity Indices in Ratio Type (비형식의 군집 유효화 지수의 분석과 새로운 지수 개발)

  • Kim Minho;Ramakrishna R.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.601-603
    • /
    • 2005
  • 군집 유효화 평가는 군집화 알고리즘을 진정한 의미의 비감독 학습이 가능하도록 만든다는 의미에서 그 중요성이 더해지고 있다. 본 논문에서는 이 군집 유효화 평가에 일반적으로 이용되는 군집 유효화 지수들의 설계원리를 분석하고 기존 지수들의 부합성을 분석한다. 우리는 제 (I) 부에서 합 형식의 지수들을 다루었으며, 본 논문에서는 비 형식의 지수들을 다룬다. 합형식의 CVI에서처럼 저역 필터링의 문제점을 해결하였으며, 또한, 부작용 없이 비형식의 지수들의 성능을 향상시킬 수 있는 새로운 기법을 제시한다. 새로운 지수들의 성능은 실험 학습을 통해 제시된다.

  • PDF

Compassion Satisfaction and Compassion Fatigue among Social Workers (사회복지사의 연민만족과 연민피로에 관한 연구)

  • Kim, Yong Seok
    • Korean Journal of Social Welfare
    • /
    • v.69 no.2
    • /
    • pp.271-294
    • /
    • 2017
  • Social workers empathized with clients'distress, feel concerned for clients in distress, and try to reduce their distress. This is called compassion. Compassion is an essential component of social work practice. However, compassion results in both positive results and negative results. The purpose of this study is to evaluate the Korean version of ProQOL which was developed to measure compassion satisfaction and compassion fatigue of helping professionals, to identify levels of compassion satisfaction/fatigue of the participants, and to divide the participants into clusters by clustering variables which are compassion satisfaction/fatigue. A total of 284 social workers residing in Seoul and Surrounding areas participated in this study. Confirmatory factor analysis confirmed that the Korea version of ProQOL is composed of compassion satisfaction factor and compassion fatigue factor as reported in a previous study. Its reliability and validity were satisfactory. The level of participants'compassion satisfaction was above moderate and their level of compassion fatigue was below moderate. Those who are older, have graduate education, have more years of work experience, or have higher positions have more compassion satisfaction. Cluster analysis divided the participants into 3 clusters. Cluster 1 is characterized by moderate compassion satisfaction and low compassion fatigue, Cluster 2 is characterized by low compassion satisfaction and moderate compassion fatigue, and Cluster 3 is characterized by high compassion satisfaction and high compassion fatigue.

  • PDF

A Study on Estimates to Longevity Population of Small Area and Distribution Patterns using Vector based Dasymetric Mapping Method (벡터기반 대시매트릭 기법을 이용한 소지역 장수인구 추정 및 분포패턴에 관한 연구)

  • Choi, Don-Jeong;Kim, Young-Seup;Suh, Yong-Cheol
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.5
    • /
    • pp.479-485
    • /
    • 2011
  • A number of case studies that find distribution of longevity population and influencing factors through the spatial data fusion using GIS techniques are growing. The majority cases of these studies are adopt census administrative boundary data for the spatial analysis. However, these methods cannot fully explain the phenomenon of longevity because there are a variety of spatial characteristics within the census administrative boundaries. Therefore, studies of spatial unit are required that realistically reflect the phenomenon of human longevity. The dasymetric mapping method enables to product of spatial unit more realistic than census administrative boundary map and statistic estimates of small area utilizing diversity spatial information. In this study, elderly population of small area has been estimated within statistically significant level that applied the vector based dasymetric mapping method. Also, the cluster analysis confirmed that the variation of local spatial relationship within census administrative boundary. The result of this study implied that the need for local-level studies of the human longevity and the validity of the dashmetric mapping techniques.

The Statistically and Economically Significant Clustering Method for Economic Clusters in an Urban Region (통계적 및 경제적 유의성을 가진 경제 클러스터 탐식방법에 대한 연구)

  • Shin Jungyeop
    • Journal of the Korean Geographical Society
    • /
    • v.40 no.2 s.107
    • /
    • pp.187-201
    • /
    • 2005
  • With the trend of urban polynucleation, the issue of detecting economic clusters or urban employment centers has been considered as crucial. However, the prior researches had some limitations in detecting economic clusters in the empirical analysis: i.e. inherent inefficiency of density-based clustering methods, difficulty in detecting linear types of spatial clusters and lacks of consideration of economic significance. The purpose of this paper is to propose the clustering method with the procedure of testing statistical and economic significance named as VCEC (Variable Clumping method for Economic Clusters) and to apply it to a case analysis of Erie County, New York, in order to test its validity. By applying a search radius and a total employment as an economic threshold, 'the both statistically and economically significant clusters' were detected in the Erie County, and proved to be efficient.

A Study on Scalable PBFT Consensus Algorithm based on Blockchain Cluster (블록체인을 위한 클러스터 기반의 확장 가능한 PBFT 합의 알고리즘에 관한 연구)

  • Heo, Hoon-Sik;Seo, Dae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.45-53
    • /
    • 2020
  • Blockchain can control transactions in a decentralized way and is already being considered for manufacturing, finance, banking, logistics, and medical industries due to its advantages such as transparency, security, and flexibility. And it is predicted to have a great economic effect. However, Blockchain has a Trilemma that is difficult to simultaneously improve scalability, decentralization and security characteristics. Among them, the biggest limitation of blockchain is scalability, which is very difficult to cope with the constantly increasing number of transactions and nodes. To make the blockchain scalable, higher performance should be achieved by modifying existing consensus methods or by improving the characteristics and network efficiency that affect many ways of scaling. Therefore, in this paper, we propose a cluster-based scalable PBFT consensus algorithm called CBS-PBFT which reduces the message complexity O(n2) of PBFT to O(n), which is a representative consensus algorithm of blockchain, and the validity is verified through simulation experiments.

A Study for Determining the Best Number of Clusters on Temporal Data (Temporal 데이터의 최적의 클러스터 수 결정에 관한 연구)

  • Cho Young-Hee;Lee Gye-Sung;Jeon Jin-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.1
    • /
    • pp.23-30
    • /
    • 2006
  • A clustering method for temporal data takes a model-based approach. This uses automata based model for each cluster. It is necessary to construct global models for a set of data in order to elicit individual models for the cluster. The preparation for building individual models is completed by determining the number of clusters inherent in the data set. In this paper, BIC(Bayesian Information Criterion) approximation is used to determine the number clusters and confirmed its applicability. A search technique to improve efficiency is also suggested by analyzing the relationship between data size and BIC values. A number of experiments have been performed to check its validity using artificially generated data sets. BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large.

  • PDF

A Study on the Assessment of Pollution Level of Precipitation at Kangwha, 1992 (江華地域 降水의 汚染度 評價에 關한 硏究)

  • 강공언;강병욱;김희강
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.11 no.1
    • /
    • pp.57-68
    • /
    • 1995
  • Precipitation samples were collected by a wet-only automatic acid precipitation sampler at Kangwha island on the western coast in Korea, through January until December 1992. pH, electric conductivity and the concentrations of major water-soluble ion components such as N $H_{4}$$^{+}$, $Ca^{2+}$, $K^{+}$, $Mg^{2+}$, N $a^{+}$, N $O_{3}$$^{[-10]}$ , S $O_{4}$$^{2-}$ and C $l^{[-10]}$ were measured. From the result of checking the validity for assesment of pollution level of precipitation samples by pH using correlation analysis between pH and major components, and t-test of chemical composition between acid rain and non-acid rain, pH proved to be not satisfactory for its pillution level. A more comprehensive method is therefore required. In order to estimate the monthly analytical result of chemical composition of precipitation samples comprehensively, a cluster analysis was used among the various multivariate statistical analysis. As a result of making a cluster analysis for separating the monthly precipitation samples into homogeneous patterns by setting the concentrations of nine major water-soluble ion components as a variable, three homogeneous patterns were obtained. The first pattern was a group of months having average ion concentrations, the second a guoup of months having low ion concentration, and the third a group of months having high ion concentrations. Thus, it was indicated that the pollution level of precipitation was higher on February and lower on May, June, August and September than the other months. As a result, this analysis method could be estimated the chemical coposition of precipitation regionally as well as monthly.monthly.

  • PDF

An Evaluation of Sampling Design for Estimating an Epidemiologic Volume of Diabetes and for Assessing Present Status of Its Control in Korea (우리나라 당뇨병의 역학적 규모와 당뇨병 관리현황 파악을 위한 표본설계의 평가)

  • Lee, Ji-Sung;Kim, Jai-Yong;Baik, Sei-Hyun;Park, Ie-Byung;Lee, June-Young
    • Journal of Preventive Medicine and Public Health
    • /
    • v.42 no.2
    • /
    • pp.135-142
    • /
    • 2009
  • Objectives : An appropriate sampling strategy for estimating an epidemiologic volume of diabetes has been evaluated through a simulation. Methods : We analyzed about 250 million medical insurance claims data submitted to the Health Insurance Review & Assessment Service with diabetes as principal or subsequent diagnoses, more than or equal to once per year, in 2003. The database was re-constructed to a 'patient-hospital profile' that had 3,676,164 cases, and then to a 'patient profile' that consisted of 2,412,082 observations. The patient profile data was then used to test the validity of a proposed sampling frame and methods of sampling to develop diabetic-related epidemiologic indices. Results : Simulation study showed that a use of a stratified two-stage cluster sampling design with a total sample size of 4,000 will provide an estimate of 57.04%(95% prediction range, 49.83 - 64.24%) for a treatment prescription rate of diabetes. The proposed sampling design consists, at first, stratifying the area of the nation into "metropolitan/city/county" and the types of hospital into "tertiary/secondary/primary/clinic" with a proportion of 5:10:10:75. Hospitals were then randomly selected within the strata as a primary sampling unit, followed by a random selection of patients within the hospitals as a secondly sampling unit. The difference between the estimate and the parameter value was projected to be less than 0.3%. Conclusions : The sampling scheme proposed will be applied to a subsequent nationwide field survey not only for estimating the epidemiologic volume of diabetes but also for assessing the present status of nationwide diabetes control.