• Title/Summary/Keyword: Number of Clusters

Search Result 938, Processing Time 0.185 seconds

Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter (SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법)

  • Yoon, Dabin;Kim, Deok-Hwan
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.60-70
    • /
    • 2018
  • A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.

Protein Function Finding Systems through Domain Analysis on Protein Hub Network (단백질 허브 네트워크에서 도메인분석을 통한 단백질 기능발견 시스템)

  • Kang, Tae-Ho;Ryu, Jea-Woon;Kim, Hak-Yong;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.259-271
    • /
    • 2008
  • We propose a protein function finding algorithm that is able to predict specific molecular function for unannotated proteins through domain analysis from protein-protein network. To do this, we first construct protein-protein interaction(PPI) network in Saccharomyces cerevisiae from MIPS databases. The PPI network(proteins; 3,637, interactions; 10,391) shows the characteristics of a scale-free network and a hierarchical network that proteins with a number of interactions occur in small and the inherent modularity of protein clusters. Protein-protein interaction databases obtained from a Y2H(Yeast Two Hybrid) screen or a composite data set include random false positives. To filter the database, we reconstruct the PPI networks based on the cellular localization. And then we analyze Hub proteins and the network structure in the reconstructed network and define structural modules from the network. We analyze protein domains from the structural modules and derive functional modules from them. From the derived functional modules with high certainty, we find tentative functions for unannotated proteins.

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

An Energy Efficient Routing Protocol using Transmission Range and Direction for Sensor Networks (센서 네트워크에서 전송범위와 전송방향을 이용한 에너지 효율적인 라우팅 프로토콜)

  • Lee, Hyun-Jun;Lee, Young-Han;Lee, Kyung-Oh
    • The KIPS Transactions:PartC
    • /
    • v.17C no.1
    • /
    • pp.81-88
    • /
    • 2010
  • Sensors in sensor networks are operated by their embedded batteries and they can not work any more if the batteries run out. The data collected by sensors should be transferred to a sink node through the efficient routes. Many energy efficient routing algorithms were proposed. However, the previous algorithms consume more energy since they did not consider the transmission range and direction. In this paper we propose an algorithm TDRP(Transmission range and Direction Routing Protocol) that considers the transmission range and direction for the efficient data transmission. Since TDRP does not produce clusters or grids but four quadrants and send data to the nodes in one quadrant in the direction of the sink node, it has less network overhead. Furthermore since the proposed algorithm sends data to the smaller number of nodes compared to the previous algorithms, the energy efficiency is better than other algorithms in communication node fields that are located in packet transmit directions.

Analysis of the Factors and the Differences in the Awareness about the Capability Groups of the Mediator Manager in General Hospital and the Level of Performance (종합병원 중간관리자의 역량군별 중요도 인식과 수행수준 차이 및 요인분석)

  • Kim, Hee-Sook;Jo, Woo-Hyun;Kim, Young-Hoon;Kim, Tae-Hyun
    • Korea Journal of Hospital Management
    • /
    • v.16 no.3
    • /
    • pp.92-114
    • /
    • 2011
  • The study has its purpose on providing basic resource to enforce the capability of the middle managers by examining the level of performance and the level of awareness about the capabilities of the managers and by understanding the significance of the difference and the reasons for the differences. The source of the study was 195 survey questionnaires that were carried out to the managers of the 9 general hospitals and the method of the analysis was the frequency analysis, analysis of the credibility, matching to sample T-test, independent sample T-test, dispersion analysis, correlation analysis, and multiple linear regression analysis using accumulated variables. The followings are the main result of the study. First, the difference between the level of awareness about the capabilities and the level of performance of the mediator managers in general hospitals had high capability in change management. The following orders were: competence in achievement and behavior, competence in management, competence in recognition, competence in influence, competence in individual effectiveness, and competence in personal relationship service. Second, as the result of the relation analysis in order to understand the correlation between awareness and performance of the mediator managers, everything had significant positive correlation. In the study about the level of importance, the cognitive capability and the management capability had the highest correlation with the correlation number of 0.88. In the study about the level of performance, the cognitive capability, individual capability, and the management capability had the highest correlation with the correlation variable number of 0.79. Third, as the result of studying the reason for the difference between the level of the awareness capability and the level of the performance, lack of the support recognition compensation in the organization level, inappropriate work environment, limit in the regulation were found as the highest reason in the order. As the result of the study, it was concluded that the creation of the efficient capability estimation model and the securement of the system that estimate the capability of the managers should be carried out in order to enforce the capability of the mediator managers in general hospitals.

  • PDF

Analysis of Geographic and Pairwise Distances among Chinese Cashmere Goat Populations

  • Liu, Jian-Bin;Wang, Fan;Lang, Xia;Zha, Xi;Sun, Xiao-Ping;Yue, Yao-Jing;Feng, Rui-Lin;Yang, Bo-Hui;Guo, Jian
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.3
    • /
    • pp.323-333
    • /
    • 2013
  • This study investigated the geographic and pairwise distances of nine Chinese local Cashmere goat populations through the analysis of 20 microsatellite DNA markers. Fluorescence PCR was used to identify the markers, which were selected based on their significance as identified by the Food and Agriculture Organization of the United Nations (FAO) and the International Society for Animal Genetics (ISAG). In total, 206 alleles were detected; the average allele number was 10.30; the polymorphism information content of loci ranged from 0.5213 to 0.7582; the number of effective alleles ranged from 4.0484 to 4.6178; the observed heterozygosity was from 0.5023 to 0.5602 for the practical sample; the expected heterozygosity ranged from 0.5783 to 0.6464; and Allelic richness ranged from 4.7551 to 8.0693. These results indicated that Chinese Cashmere goat populations exhibited rich genetic diversity. Further, the Wright's F-statistics of subpopulation within total (FST) was 0.1184; the genetic differentiation coefficient (GST) was 0.0940; and the average gene flow (Nm) was 2.0415. All pairwise FST values among the populations were highly significant (p<0.01 or p<0.001), suggesting that the populations studied should all be considered to be separate breeds. Finally, the clustering analysis divided the Chinese Cashmere goat populations into at least four clusters, with the Hexi and Yashan goat populations alone in one cluster. These results have provided useful, practical, and important information for the future of Chinese Cashmere goat breeding.

Survey of genetic structure of geese using novel microsatellite markers

  • Lai, Fang-Yu;Tu, Po-An;Ding, Shih-Torng;Lin, Min-Jung;Chang, Shen-Chang;Lin, En-Chung;Lo, Ling-Ling;Wang, Pei-Hwa
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.2
    • /
    • pp.167-179
    • /
    • 2018
  • Objective: The aim of this study was to create a set of microsatellite markers with high polymorphism for the genetic monitoring and genetic structure analysis of local goose populations. Methods: Novel microsatellite markers were isolated from the genomic DNA of white Roman geese using short tandem repeated probes. The DNA segments, including short tandem repeats, were tested for their variability among four populations of geese from the Changhua Animal Propagation Station (CAPS). The selected microsatellite markers could then be used to monitor genetic variability and study the genetic structures of geese from local geese farms. Results: 14 novel microsatellite loci were isolated. In addition to seven known loci, two multiplex sets were constructed for the detection of genetic variations in geese populations. The average of allele number, the effective number of alleles, the observed heterozygosity, the expected heterozygosity, and the polymorphism information content were 11.09, 5.145, 0.499, 0.745, and 0.705, respectively. The results of analysis of molecular variance and principal component analysis indicated a contracting white Roman cluster and a spreading Chinese cluster. In white Roman populations, the CAPS populations were depleted to roughly two clusters when K was set equal to 6 in the Bayesian cluster analysis. The founders of private farm populations had a similar genetic structure. Among the Chinese geese populations, the CAPS populations and private populations represented different clads of the phylogenetic tree and individuals from the private populations had uneven genetic characteristics according to various analyses. Conclusion: Based on this study's analyses, we suggest that the CAPS should institute a proper breeding strategy for white Roman geese to avoid further clustering. In addition, for preservation and stable quality, the Chinese geese in the CAPS and the aforementioned proper breeding scheme should be introduced to geese breeders.

The Classification and Regional Development's Direction of Rural Fishing Area Based on Administrative District (행정구역에 기초한 어촌지역의 유형구분과 지역개발방향)

  • Kim, Jung-Tae
    • Journal of Korean Society of Rural Planning
    • /
    • v.19 no.4
    • /
    • pp.81-93
    • /
    • 2013
  • The selection of land for fishing village development project, and the standard used to classify fishing villages has been determined based on the guidelines developed by fishing village cooperatives. The approach fishing village cooperatives follows is likely to classify fishing villages without first reflecting on the overall development environment of the region, such as other industries and workers in the area. It also acts as a barrier for business promotion or evaluation, because the cooperatives do not match the administrative districts, which are the units of administration, and the main policy enforcement agent in regional development. Against this background, this study aimed to identify categories to situate the development direction, as well as the size and distribution of fishing villages based on eup, myeon, and dong administrative units as defined by the Fishing Villages and Fishery Harbors Act. This study was based on the Census of Agriculture, Forestry, and Fisheries of 2010, and analyzed 826 eups, myeon, and dongs with fishery households using the principal component analysis, and 2-Step cluster analysis methods. Therefore, 95% of the variance was explained using the covariance matrix for types of fishing villages, but it was analyzed as one component focusing on the number and ratio of fishery households, and used the cluster-type analysis, which focused on the sizes of fishing villages. The clusters were categorized into three types: (1) the development type based on the number of fishermen in the eups, myeons and dongs was analyzed as village size (682); (2) administrative district size (121); and (3) total eups, myeons and dongs (23), which revealed that the size of most fishing villages was small. We could explain 73% of the variance using the correlation coefficient matrix, which was divided into three types according to the three principal component scores, namely fishery household power, fishery industry power, and fishing village tourism power. Most fishing villages did not have a clear development direction because all business areas within the region were diversified, and 552 regions could be categorized under the harmonious development type, which is in need of balanced development. The fishery industry type typified by industrial strength included 159 regions in need of an approach based on industrialization of fishery product processing. Specialized production areas, which specialized in producing fishery products, were 115 regions with a high percentage of fishermen. The analysis results indicated that various situations in terms of size and development of fishing villages existed. However, because several regions exist in the form of small village units, it was necessary to approach the project in a manner that directed the diversification of regional development projects, such as places for local residents to relax or enjoy tourism experiences within the region, while considering the overall conditions of the relevant eups, myeons, and dongs. Reinforcement of individual support for fishermen based on the Fisheries Act must take precedence over providing support for fishermen through regional development. In addition, it is necessary to approach the development of fishing villages by focusing on industrializing the processing techniques of fishery products. Areas specialized in the production of fishery products are required to consider the facilities for fisheries production, and must make efforts to increase fishery resources, such as releasing fry.

Stereotactic Vacuum-Assisted Core Biopsy Results for Non-Palpable Breast Lesions

  • Agacayak, Filiz;Ozturk, Alper;Bozdogan, Atilla;Selamoglu, Derya;Alco, Gul;Ordu, Cetin;Pilanci, Kezban Nur;Killi, Refik;Ozmen, Vahit
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.13
    • /
    • pp.5171-5174
    • /
    • 2014
  • Background: The increase in breast cancer awareness and widespread use of mammographic screening has led to an increased detection of (non-palpable) breast cancers that cannot be discovered through physical examination. One of the methods used in the diagnosis of these cancers is vacuum-assisted core biopsy, which prevents a considerable number of patients from undergoing surgical procedures. The aim of this study was to present the results of stereotactic vacuum-assisted core biopsy for suspicious breast lesions. Materials and Methods: Files were retrospectively scanned and data on demographic, radiological and pathological findings were recorded for patients who underwent stereotactic vacuum-assisted core biopsy due to suspicious mammographic findings at the Interventional Radiology Centre of the Florence Nightingale Hospital between January 2010, and April 2013. Statistical analysis was carried out using Pearson's Chi-square, continuity correction, and Fisher's exact tests. Results: The mean age of the patients was 47 years (range: 36-70). Biopsies were performed due to BIRADS 3 lesions in 8 patients, BIRADS 4 lesions in 77 patients, and BIRADS 5 lesions in 3 patients. Mammography elucidated clusters of microcalcifications in 73 patients (83%) and focal lesions (asymmetrical density, distortion) in 15 patients (17%). In terms of complications, 1 patient had a hematoma, and 2 patients had ecchymoses (3/88; 3.3%). The histopathologic results revealed benign lesions in 63 patients (71.6%) and malignant lesions in 25 patients (28.4%). The mean duration of the procedure was 37 minutes (range: 18-55). Although all of the BIRADS 3 lesions were benign, 22 (28.6%) of the BIRADS 4 lesions and all of the BIRADS 5 lesions were malignant. Among the malignant cases, 80% were in situ, and 20% were invasive carcinomas. These patients underwent surgery. Conclusions: In cases where non-palpable breast lesions are considered to be suspicious in mammography scans, the vacuum-assisted core biopsy method provides an accurate histopathologic diagnosis thus preventing a significant number of patients undergoing unnecessary surgical procedures.

Prognostic Evaluation of Categorical Platelet-based Indices Using Clustering Methods Based on the Monte Carlo Comparison for Hepatocellular Carcinoma

  • Guo, Pi;Shen, Shun-Li;Zhang, Qin;Zeng, Fang-Fang;Zhang, Wang-Jian;Hu, Xiao-Min;Zhang, Ding-Mei;Peng, Bao-Gang;Hao, Yuan-Tao
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.14
    • /
    • pp.5721-5727
    • /
    • 2014
  • Objectives: To evaluate the performance of clustering methods used in the prognostic assessment of categorical clinical data for hepatocellular carcinoma (HCC) patients in China, and establish a predictable prognostic nomogram for clinical decisions. Materials and Methods: A total of 332 newly diagnosed HCC patients treated with hepatic resection during 2006-2009 were enrolled. Patients were regularly followed up at outpatient clinics. Clustering methods including the Average linkage, k-modes, fuzzy k-modes, PAM, CLARA, protocluster, and ROCK were compared by Monte Carlo simulation, and the optimal method was applied to investigate the clustering pattern of the indices including platelet count, platelet/lymphocyte ratio (PLR) and serum aspartate aminotransferase activity/platelet count ratio index (APRI). Then the clustering variable, age group, tumor size, number of tumor and vascular invasion were studied in a multivariable Cox regression model. A prognostic nomogram was constructed for clinical decisions. Results: The ROCK was best in both the overlapping and non-overlapping cases performed to assess the prognostic value of platelet-based indices. Patients with categorical platelet-based indices significantly split across two clusters, and those with high values, had a high risk of HCC recurrence (hazard ratio [HR] 1.42, 95% CI 1.09-1.86; p<0.01). Tumor size, number of tumor and blood vessel invasion were also associated with high risk of HCC recurrence (all p< 0.01). The nomogram well predicted HCC patient survival at 3 and 5 years. Conclusions: A cluster of platelet-based indices combined with other clinical covariates could be used for prognosis evaluation in HCC.