• Title/Summary/Keyword: Similar Cluster

Search Result 767, Processing Time 0.033 seconds

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.

A Study of Post-processing Methods of Clustering Algorithm and Classification of the Segmented Regions (클러스터링 알고리즘의 후처리 방안과 분할된 영역들의 분류에 대한 연구)

  • Oh, Jun-Taek;Kim, Bo-Ram;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.7-16
    • /
    • 2009
  • Some clustering algorithms have a problem that an image is over-segmented since both the spatial information between the segmented regions is not considered and the number of the clusters is defined in advance. Therefore, they are difficult to be applied to the applicable fields. This paper proposes the new post-processing methods, a reclassification of the inhomogeneous clusters and a region merging using Baysian algorithm, that improve the segmentation results of the clustering algorithms. The inhomogeneous cluster is firstly selected based on variance and between-class distance and it is then reclassified into the other clusters in the reclassification step. This reclassification is repeated until the optimal number determined by the minimum average within-class distance. And the similar regions are merged using Baysian algorithm based on Kullbeck-Leibler distance between the adjacent regions. So we can effectively solve the over-segmentation problem and the result can be applied to the applicable fields. Finally, we design a classification system for the segmented regions to validate the proposed method. The segmented regions are classified by SVM(Support Vector Machine) using the principal colors and the texture information of the segmented regions. In experiment, the proposed method showed the validity for various real-images and was effectively applied to the designed classification system.

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.

Molecular Characterization of Small-Spored Alternaria Species (소형의 포자를 형성하는 Alternaria 균류의 분자생물학적 특징)

  • Kim, Byung-Ryun;Park, Myung-Soo;Cho, Hye-Sun;Yu, Seung-Hun
    • Research in Plant Disease
    • /
    • v.11 no.1
    • /
    • pp.56-65
    • /
    • 2005
  • To establish taxonomic system of morphologically similar species of small-spored Alternaria, phylogenetic analysis of internal transcribed spacer (ITS 1, ITS 2 and 5.8S rDNA) and mitochondrial small subunit (mt SSU) rDNA sequences and URP-PCR fingerprinting analysis from 11 species ofAlternaria were performed. Phylogenetic analysis of ITS and mt SSU rDNA sequences revealed that 10 out of 11 species of the smallspored Alternaria were phylogenetically identical with a bootstrap value of 100%. A. infectoria only was phylogenetically differentiated from the other species. The results suggest that the 10 small-spored Alternaria species are very closely related evolutionally and the markers can not be used for differentiation of the smallspored Alternaria species. URP-PCR fingerprinting analysis from eleven species of smallspored Alternaria using 10 URP primers showed that it was possible to differentiate the species, although genetic similarities were found among the species. The Alternaria sp. from common pokeweed could be distinguished from other species by URP-PCR analysis, and it was considered as a new species. A. infectoria could be easily distinguished from the other 10 species by phylogenetic analysis of ITS and mt SSU rDNA sequences and the URPPCR fingerprinting analysis.

Creative Cultural Localization Ways and IT Market of the EU to Converge the Creative Industries (창조융합시장을 위한 유럽 연합 (EU)의 시장과문화적 지역특화방안)

  • Seo, Dae-Sung
    • Journal of Distribution Science
    • /
    • v.13 no.1
    • /
    • pp.27-33
    • /
    • 2015
  • Purpose - The ICT market in the EU is lagging behind that of the US; however, algorithm and software development within the EU have grown steadily, and they involve focusing on the creative cultural convergence conceptualized as part of Horizon 2020 and connecting neighboring markets in the EE and the Mediterranean region. It is essential to study the requirements to market the EU's creative ICT development in emerging industrial countries after examining its applicability in these countries. Research design, data, and methodology - This study deals with data pertaining to the EU's creative industry and competitive edge. The global cultural expansion of the EU facilitates a new concept involving not only low-cost IT products to enhance local cultural artifacts through R&D and the construction of efficient infrastructure services, but also information exchange with a realistic commercialization of the technology that can be applied for creative cultural localization. In the European industry, research on algorithms has been applied for the benefit of consumers. We investigated how the process is conducted in the EU. Results - Europe needs to adjust its economic structure to the local culture as part of IT distribution convergence. The convergence has been converted into a production algorithm with IT in the form of low-cost production. This is because there is an attempt to improve the quality of transport infrastructure, workforce availability, and the distribution of the distance to the local industries and consumers, using IT algorithms. Integrated into the manufacturing industry, based on the ICT infrastructure and solutions, smart localized regional clusters are formed with the help of grafting. Europe has own strategy to increase the number of hub-and-spoke cities. Europe is now becoming integrated, with an EPC system for regional cooperation rather than national competition in ICT technology. Europe has also been recognized in this study as changing the step-by-step paradigm for global competitiveness through new creative culture industries. Conclusions - As a result, there are several ways of converging with others through EU R&D intensity; therefore, the EU can be seen as successfully increasing marginal value, which is useful in developing a special industrial cluster or local cultural cities that create converged development by connecting people and objects with IT. In fact, when compared to the US, Europe has a strong culture and the car industries have a tendency to overshadow the IT industries with integration of services in IT distribution. Considering the rapid environmental changes, the convergence of IT services is likely to take place in Europe, similar to the pharmaceutical industry and the automotive industry. This requires a focus on human resources and automated systems management. The trend is to move away from low-wage industries, switched to key personnel centers of the local university-industry. EU emphasizes the creation of IT market demand in Europe involving local cultural convergence for marketing as the second step to strengthen the economic hub-and-spoke areas.

The Leaf Morphological Variation of Ten Regions of Natural Populations of Machilus thunbergii in Korea (후박나무 10개 천연집단의 엽형질 변이)

  • Yang, Byeong-Hoon;Song, Jeong-Ho;Lee, Jae-Cheon;Park, Young-Goo
    • Journal of agriculture & life science
    • /
    • v.45 no.3
    • /
    • pp.25-33
    • /
    • 2011
  • This study was conducted to examine genetic variation on leaf characteristics of Machilus thunbergii populations. Ten populations were subjected to multivariate analysis for 9 characteristics of leaf morphology. Average length of leaf blade, leaf width, petiole length, vein number were 9.8cm, 4.0cm, 1.8cm, 8.4 respectively, while angle of leaf base and leaf apex were $67.9^{\circ}$ and $78^{\circ}$ respectively. The coefficient of variation (C.V.) on leaf characteristics was 20% which indicate similar features among the populations. Nested analysis showed statistically signigicant differences among populations as well as among individuals within populations. Genetic relationship between populations using complete linkage method showed four groups to Euclidean distance 1.2 and did not show a tendency to cluster into the same group. There were three principal components that had a meaningful eigenvalue over 1.0 among the 9 components. The explanatory power of the top three main components on the total variation was 92.8%. The first principal component (PC) was explained about 40.3% which is mainly correlated with maximum leaf width and the second PC was explained about 28.7% which is correlated with leaf blade length. The third PC was explained about 23.8% which is correlated with petiole length ($X_3$). These characters were important factors for analysis of the relationship among natural populations of M. thunbergii.

Statistical Analysis of the Spatio-temporal Water Quality Characteristics of the Nakdong River (낙동강수계 수질의 시·공간적 특성에 대한 통계학적 분석)

  • Seo, Mijin;Cho, Changdae;Im, Taehyo;Kim, Sanghun;Yoon, Hyunjeong;Kim, Yongseok;Kim, Gyeonghoon
    • Journal of Environmental Science International
    • /
    • v.28 no.3
    • /
    • pp.303-320
    • /
    • 2019
  • Water quality is characterized by various complex factors. Therefore, a systematic understanding of water quality trends is required to carry out a proper evaluation. In this study, we analyzed the spatio-temporal water quality characteristics of the Nakdong River using five-year data from 2012 to 2016. Data was collected on the pH, DO, BOD, COD, SS, TN, TP, TOC, WT, EC, $NH_3-N$, $NO_3-N$, $PO_4-P$, Chl-a, rainfall, and total and fecal coliforms. A total of 38 water quality measurement stations, from Andong1 to Gupo, were considered. Statistical analyses including trend, cluster, and factor analyses were conducted to identify the dominant water quality components affecting the Nakdong River. The Nakdong River was spatially classified into three groups for up-stream (Andong1 to Sangju1), mid/up-stream (Donam to Dalseong), and mid/down-stream (Hwawonnaru to Gupo) data collection, and temporally into two groups for summer/fall (7~10), and the rest of the season (11~6) data. The water quality of the entire Nakdong River showed trends similar to the mid/down-stream section, which indicates the importance of water quality management in this section. Suspended solids, phosphorus, and coliform groups were established as important factors to be considered in the summer/fall season across the river, especially in the mid/down-stream section. Nitrogen and organic matter were identified as important factors to be considered in the rest of the season, especially in the mid/up-stream section. This study could help determine the water quality components that should be intensively monitored in the Nakdong River.

Comparison of Benthic Macroinvertebrate Communities at Two Headwater Streams Located with Different Temperature Regions in South Korea (온도 분포가 다른 두 산림 하천의 저서성대형무척추동물 군집 특성 비교)

  • Lee, Da-Yeong;Lee, Dae-Seong;Park, Chanwoo;Yun, Soon Jin;Lim, Jong-Hwan;Park, Young-Seuk
    • Korean Journal of Ecology and Environment
    • /
    • v.54 no.2
    • /
    • pp.87-95
    • /
    • 2021
  • Macroinvertebrates in forest streams affect the overall health of other streams in the same water system. In this study, we compared differences in the benthic macroinvertebrate community at two headwater streams located at different latitudes in the southern and northern parts of South Korea. We calculated the community temperature index (CTI), which represents the thermal preferences of the benthic communities. Hierarchical cluster analyses (HCA) were conducted to compare the similarities among sampling sites. In addition, we analyzed the relationship between community composition and environmental and community characteristics using non-metric multidimensional scaling (NMDS). Our results showed that CTI was significantly different between the two regions, indicating that these benthic macroinvertebrate communities have different thermal preferences. These two regions were clearly distinguished from each other in the HCA; furthermore, seasonal differences in benthic community composition were observed within each region. The functional feeding groups present in the benthic macroinvertebrate communities were different even though their habitat was similar.

SIEM System Performance Enhancement Mechanism Using Active Model Improvement Feedback Technology (능동형 모델 개선 피드백 기술을 활용한 보안관제 시스템 성능 개선 방안)

  • Shin, Youn-Sup;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.896-905
    • /
    • 2021
  • In the field of SIEM(Security information and event management), many studies try to use a feedback system to solve lack of completeness of training data and false positives of new attack events that occur in the actual operation. However, the current feedback system requires too much human inputs to improve the running model and even so, those feedback from inexperienced analysts can affect the model performance negatively. Therefore, we propose "active model improving feedback technology" to solve the shortage of security analyst manpower, increasing false positive rates and degrading model performance. First, we cluster similar predicted events during the operation, calculate feedback priorities for those clusters and select and provide representative events from those highly prioritized clusters using XAI (eXplainable AI)-based event visualization. Once these events are feedbacked, we exclude less analogous events and then propagate the feedback throughout the clusters. Finally, these events are incrementally trained by an existing model. To verify the effectiveness of our proposal, we compared three distinct scenarios using PKDD2007 and CSIC2012. As a result, our proposal confirmed a 30% higher performance in all indicators compared to that of the model with no feedback and the current feedback system.

Vertical distribution and vascular plants in the Gakho mountain (Yeongdong-gun), Korea (각호산(영동군)의 관속식물과 수직분포)

  • Jung-Hyun Kim;Jin-Suk Kim;Sookyung Shin;Tae-Im Heo;Young Hoon Kim;Sunghyuk Park;Jin-Seok Kim
    • Korean Journal of Environmental Biology
    • /
    • v.41 no.1
    • /
    • pp.60-88
    • /
    • 2023
  • This study was conducted to investigate the vertical distribution and vascular plants in the Gakho mountain. Form the results of three field surveys from May 2022 to September 2022, a total of 478 total taxa, representing 426 species, 11 subspecies, 35 varieties, four forms, and two hybrids were identified, which were categorized in 282 genera and 94 families. We identified the elevational distribution ranges of 398 taxa of vascular plants. Among them, 19 taxa were endemic to Korea, one taxon was identified as a rare plant. The floristic target plants amounted to 72 taxa, specifically two taxa of grade V, two taxa of grade IV, 16 taxa of grade III, 27 taxa of grade II, and 25 taxa of grade I. Further, 71 taxa were identified as northern lineage plants. A total of 19 taxa of alien plants were identified, with a Naturalized Index of 4.0%, an Urbanization Index of 6.6%, and three plants that disturbed the ecosystem. The result of analyzing the pattern of species richness showed a reversed hump shape with minimum richness at mid-high elevation. A cluster analysis showed a high degree of similarity between adjacent elevation sections that are geographically adjacent with similar habitat environments. Warmth index in the Gakho mountain ranged from 57.2℃·month to 84.2℃·month. Our results provide basic data on vascular plants and valuable information on elevational distribution ranges of current plant species in the Gakho mountain, which could serve as a baseline for comparison of the shifts in elevation under future climate change.