• Title/Summary/Keyword: Agglomerative

Search Result 68, Processing Time 0.021 seconds

Face Search Method Based on Face Feature Extraction and Clustering (얼굴 특징 추출 및 클러스터링을 활용한 얼굴 검색 기법)

  • Shin, Junho;Kim, Jong-hwan;Cho, Sukhee;Kim, Junghak;Koh, Yeong Jun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • fall
    • /
    • pp.95-96
    • /
    • 2021
  • 최근 미디어의 발전으로 빠른 속도로 많은 양의 사람들의 얼굴이 포함된 사진, 동영상들이 인터넷에 업로드 되고 있다. 이러한 현상에 맞춰 인공지능을 활용한 얼굴 인식 기술의 놀라운 발전이 있었으나, 대규모 데이터셋에서 임의의 인물을 검색하는 경우에서는 연산량과 저장공간의 부담이 존재한다. 특히, 인터넷에 존재하는 수많은 불법 촬영물에서 피해자를 정확하고 신속하게 검색하기 위해서는 효율적인 얼굴 검색 시스템이 필요하다. 따라서, 본 논문은 얼굴 특징 추출과 클러스터링을 활용하여 방대한 양의 불법 촬영물 셋에서 피해자 동영상을 효율적으로 검색할 수 있는 기법을 제안한다. 불법 촬영물 동영상 검색 실험 환경을 만들기 위해 YouTube Faces [1] 데이터셋으로 유사 동영상 셋을 만들고 이 환경에서 실험을 진행한다. 얼굴 특징 추출 모델은 ResNet100 네트워크를 CosFace 손실함수와 Glint360K 데이터셋으로 학습시킨 모델 [2]을 사용한다. 추출된 얼굴 특징들을 HAC(Hierarchical Agglomerative Clustering) 알고리즘으로 클러스터링 한 후, 클러스터 대푯값을 통해 얼굴 검색 실험을 했을 때의 실험 결과를 분석한다.

  • PDF

A detailed analysis of nearby young stellar moving groups

  • Lee, Jinhee
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.63.3-63.3
    • /
    • 2019
  • Nearby young moving groups (NYMGs hereafter) are gravitationally unbound loose young stellar associations located within 100 pc of the Sun. Since NYMGs are crucial laboratories for studying low-mass stars and planets, intensive searches for NYMG members have been performed. For identification of NYMG members, various strategies and methods have been applied. As a result, the reliability of the members in terms of membership is not uniform, which means that a careful membership re-assessment is required. In this study, I developed a NYMG membership probability calculation tool based on Bayesian inference (Bayesian Assessment of Moving Groups: BAMG). For the development of the BAMG tool, I constructed ellipsoidal models for nine NYMGs via iterative and self-consistent processes. Using BAMG, memberships of claimed members in the literature (N~2000) were evaluated, and 35 per cent of members were confirmed as bona fide members of NYMGs. Based on the deficiency of low-mass members appeared in mass function using these bona fide members, low mass members from Gaia DR2 are identified. About 2000 new M dwarf and brown dwarf candidate members were identified. Memberships of ~70 members with RV from Gaia were confirmed, and the additional ~20 members were confirmed via spectroscopic observation. Not relying on previous knowledge about the existence of nine NYMGs, unsupervised machine learning analyses were applied to NYMG members. K-means and Agglomerative Clustering algorithms result in similar trends of grouping. As a result, six previously known groups (TWA, beta-Pic, Carina, Argus, AB Doradus, and Volans-Carina) were rediscovered. Three the other known groups are recognized as well; however, they are combined into two new separate groups (ThOr+Columba and TucHor+Columba).

  • PDF

The Spatial Pattern and Structure of Industrial Agglomerations in Korea : Towards a Regional Innovation System (우리나라 산업집적의 공간적 패턴과 구조 분석 -한국형 지역혁신체제 구축의 시사점 -)

  • Jeong Jun-Ho;Kim Sun-Bae
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.8 no.1
    • /
    • pp.17-29
    • /
    • 2005
  • This study has attempted to analyze the spatial structure of industrial agglomerations with elaborated spatial econometric techniques. First of all, spatial patterns and structures of industrial agglomerations in Korea show a multi-polar spatial pattern of industrial agglomeration, Major industries from industrial agglomerations in the Seoul Metropolitan Area, part of the Chungcheong Area and Dongnam Area. Second, as some industrial agglomerations show an agglomerative pattern beyond a regionally based-administrative jurisdiction, the effects of agglomeration seem to be produced across regionally based-administrative jurisdictions. Finally, it can be considered that industrial agglomerations have generally been produced by spatial divisions of labor in which the functions of conception and execution are separated from each other. According to this results, in designing regional innovation systems, their spatial coverage should draw upon an extended region with a few adjacent provinces, and there is a need to form networked clusters in order to sufficiently capitalize upon the spatial spillovers of agglomerations.

  • PDF

Hierarchical Clustering-Based Cloaking Algorithm for Location-Based Services (위치 기반 서비스를 위한 계층 클러스터 기반 Cloaking 알고리즘)

  • Lee, Jae-Heung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.8
    • /
    • pp.1155-1160
    • /
    • 2013
  • The rapid growth of smart phones has made location-based services (LBSs) widely available. However, the use of LBS can raise privacy issues, as LBS can allow adversaries to violate the location privacy of users. There has been a considerable amount of research on preserving user location privacy. Most of these studies try to preserve location privacy by achieving what is known as location K-anonymity. In this paper, we propose a hierarchical clustering-based spatial cloaking algorithm for LBSs. The proposed algorithm constructs a tree using a modified version of agglomerative hierarchical clustering. The experimental results show, in terms of the ASR size, that the proposed algorithm is better than Hilbert Cloak and comparable to RC-AR (R-tree Cloak implementation of Reciprocal with an Asymmetric R-tree split). In terms of the ASR generation time, the proposed algorithm is much better in its performance than RC-AR and similar in performance to Hilbert Cloak.

Selection and Classification of Bacterial Strains Using Standardization and Cluster Analysis

  • Lee, Sang Moo;Kim, Kyoung Hoon;Kim, Eun Joong
    • Journal of Animal Science and Technology
    • /
    • v.54 no.6
    • /
    • pp.463-469
    • /
    • 2012
  • This study utilized a standardization and cluster analysis technique for the selection and classification of beneficial bacteria. A set of synthetic data consisting of 100 individual variables with three characteristics was created for analysis. The three characteristics assigned to each independent variable were designated to have different numeric scales, averages, and standard deviations. The variables were bacterial isolates at random, and the three characteristics were fermentation products, including cell yield, antioxidant activity of culture, and enzyme production. A standardization method utilizing a standard normal distribution equation to record fermentation yields of each isolate was employed to weight their different numeric scales and deviations. Following transformation, the data set was analyzed by cluster analysis. The Manhattan method for dissimilarity matrix construction along with complete linkage technique, an agglomerative method for hierarchical cluster analysis, was employed using statistical computing program R. A total of 100 isolates were classified into groups A, B, and C. In a comparison of the characteristics of each group, all characteristics in groups A and C were higher than those of group B. Isolates displaying higher cell yield were classified as group A, whereas those isolates showing high antioxidant activity and enzyme production were assigned to group C. The results of the cluster analysis can be useful for the classification of numerous isolates and the preparation of an isolation pool using numerical or statistical tools. The present study suggests that a simple technique can be applied to screen and select beneficial microbes using the freely downloadable statistical computing program R.

Underdetermined Blind Source Separation from Time-delayed Mixtures Based on Prior Information Exploitation

  • Zhang, Liangjun;Yang, Jie;Guo, Zhiqiang;Zhou, Yanwei
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.5
    • /
    • pp.2179-2188
    • /
    • 2015
  • Recently, many researches have been done to solve the challenging problem of Blind Source Separation (BSS) problems in the underdetermined cases, and the “Two-step” method is widely used, which estimates the mixing matrix first and then extracts the sources. To estimate the mixing matrix, conventional algorithms such as Single-Source-Points (SSPs) detection only exploits the sparsity of original signals. This paper proposes a new underdetermined mixing matrix estimation method for time-delayed mixtures based on the receiver prior exploitation. The prior information is extracted from the specific structure of the complex-valued mixing matrix, which is used to derive a special criterion to determine the SSPs. Moreover, after selecting the SSPs, Agglomerative Hierarchical Clustering (AHC) is used to automaticly cluster, suppress, and estimate all the elements of mixing matrix. Finally, a convex-model based subspace method is applied for signal separation. Simulation results show that the proposed algorithm can estimate the mixing matrix and extract the original source signals with higher accuracy especially in low SNR environments, and does not need the number of sources before hand, which is more reliable in the real non-cooperative environment.

Property-based Hierarchical Clustering of Peers using Mobile Agent for Unstructured P2P Systems (비구조화 P2P 시스템에서 이동에이전트를 이용한 Peer의 속성기반 계층적 클러스터링)

  • Salvo, MichaelAngelG.;Mateo, RomeoMarkA.;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.189-198
    • /
    • 2009
  • Unstructured peer-to-peer systems are most commonly used in today's internet. But file placement is random in these systems and no correlation exists between peers and their contents. There is no guarantee that flooding queries will find the desired data. In this paper, we propose to cluster nodes in unstructured P2P systems using the agglomerative hierarchical clustering algorithm to improve the search method. We compared the delay time of clustering the nodes between our proposed algorithm and the k-means clustering algorithm. We also simulated the delay time of locating data in a network topology and recorded the overhead of the system using our proposed algorithm, k-means clustering, and without clustering. Simulation results show that the delay time of our proposed algorithm is shorter compared to other methods and resource overhead is also reduced.

  • PDF

A Market Segmentation Scheme Based on Customer Information and QAP Correlation between Product Networks (고객정보와 상품네트워크 유사도를 이용한 시장세분화 기법)

  • Jeong, Seok-Bong;Shin, Yong Ho;Koo, Seo Ryong;Yoon, Hyoup-Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.4
    • /
    • pp.97-106
    • /
    • 2015
  • In recent, hybrid market segmentation techniques have been widely adopted, which conduct segmentation using both general variables and transaction based variables. However, the limitation of the techniques is to generate incorrect results for market segmentation even though its methodology and concept are easy to apply. In this paper, we propose a novel scheme to overcome this limitation of the hybrid techniques and to take an advantage of product information obtained by customer's transaction data. In this scheme, we first divide a whole market into several unit segments based on the general variables and then agglomerate the unit segments with higher QAP correlations. Each product network represents for purchasing patterns of its corresponding segment, thus, comparisons of QAP correlation between product networks of each segment can be a good measure to compare similarities between each segment. A case study has been conducted to validate the proposed scheme. The results show that our scheme effectively works for Internet shopping malls.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Active Learning based on Hierarchical Clustering (계층적 군집화를 이용한 능동적 학습)

  • Woo, Hoyoung;Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.10
    • /
    • pp.705-712
    • /
    • 2013
  • Active learning aims to improve the performance of a classification model by repeating the process to select the most helpful unlabeled data and include it to the training set through labelling by expert. In this paper, we propose a method for active learning based on hierarchical agglomerative clustering using Ward's linkage. The proposed method is able to construct a training set actively so as to include at least one sample from each cluster and also to reflect the total data distribution by expanding the existing training set. While most of existing active learning methods assume that an initial training set is given, the proposed method is applicable in both cases when an initial training data is given or not given. Experimental results show the superiority of the proposed method.