• Title/Summary/Keyword: Cluster-based Search

Search Result 141, Processing Time 0.026 seconds

A Hashing Method Using PCA-based Clustering (PCA 기반 군집화를 이용한 해슁 기법)

  • Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.6
    • /
    • pp.215-218
    • /
    • 2014
  • In hashing-based methods for approximate nearest neighbors(ANN) search, by mapping data points to k-bit binary codes, nearest neighbors are searched in a binary embedding space. In this paper, we present a hashing method using a PCA-based clustering method, Principal Direction Divisive Partitioning(PDDP). PDDP is a clustering method which repeatedly partitions the cluster with the largest variance into two clusters by using the first principal direction. The proposed hashing method utilizes the first principal direction as a projective direction for binary coding. Experimental results demonstrate that the proposed method is competitive compared with other hashing methods.

Optimal Number of Super-peers in Clustered P2P Networks (클러스터 P2P 네트워크에서의 최적 슈퍼피어 개수)

  • Kim Sung-Hee;Kim Ju-Gyun;Lee Sang-Kyu;Lee Jun-Soo
    • The KIPS Transactions:PartC
    • /
    • v.13C no.4 s.107
    • /
    • pp.481-490
    • /
    • 2006
  • In a super-peer based P2P network, The network is clustered and each cluster is managed by a special peer, called a super-peer which has information of all peers in its cluster. This clustered P2P model is known to have efficient information search and less traffic load. In this paper, we first estimate the message traffic cost caused by peer's query, join and update actions within a cluster as well as between the clusters and with these values, we present the optimal number of super-peers that minimizes the traffic cost for the various size of super-peer based P2P networks.rks.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

Fast Multi-Resolution Exhaustive Search Algorithm Based on Clustering for Efficient Image Retrieval (효율적인 영상 검색을 위한 클러스터링 기반 고속 다 해상도 전역 탐색 기법)

  • Song, Byeong-Cheol;Kim, Myeong-Jun;Ra, Jong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.2
    • /
    • pp.117-128
    • /
    • 2001
  • In order to achieve optimal retrieval, i.e., to find the best match to a query according to a certain similarity measure, the exhaustive search should be performed literally for all the images in a database. However, the straightforward exhaustive search algorithm is computationally expensive in large image databases. To reduce its heavy computational cost, this paper presents a fast exhaustive multi-resolution search algorithm based on image database clustering. Firstly, the proposed algorithm partitions the whole image data set into a pre-defined number of clusters having similar feature contents. Next, for a given query, it checks the lower bound of distances in each cluster, eliminating disqualified clusters. Then, it only examines the candidates in the remaining clusters. To alleviate unnecessary feature matching operations in the search procedure, the distance inequality property is employed based on a multi-resolution data structure. The proposed algorithm realizes a fast exhaustive multi-resolution search for either the best match or multiple best matches to the query. Using luminance histograms as a feature, we prove that the proposed algorithm guarantees optimal retrieval with high searching speed.

  • PDF

A CLUSTER SURVEY AROUND THE UNIDENTIFIED EGRET SOURCES

  • KAWASAKI WATARU;TOTANI TOMONORI
    • Journal of The Korean Astronomical Society
    • /
    • v.38 no.2
    • /
    • pp.141-144
    • /
    • 2005
  • Based on optical galaxy data, we executed a systematic search for galaxy clusters around the 15 steady unidentified EGRET GeV gamma-ray sources in high Galactic-latitude sky ([b] > $30^{\circ}$). We found a strong correlation with 3.7$\sigma$ level between close cluster pairs (merging cluster candidates) and the unidentified EGRET sources, though, in contrast, no correlation with single clusters. This result implies that merging clusters of galaxies are a possible candidate for the origin of high galactic-latitude, steady unidentified EGRET gamma-ray sources.

The Effect of Shopping Orientations on Clothing Purchasing Behavior according to Residence (거주지별 쇼핑 성향이 의복 구매 행동에 미치는 영향)

  • Lim Kyung-Bock
    • The Research Journal of the Costume Culture
    • /
    • v.14 no.3 s.62
    • /
    • pp.366-380
    • /
    • 2006
  • The purpose of this study was to examine the effect of shopping orientations on clothing purchasing behavior according to residence. The data was obtained from questionnaires filled out by 530 females living in Seoul and Jecheon. For data analysis, factor analysis, ANOVA, t-test, Cronbach's $\alpha$, Duncan's multiple range test and cluster analysis were used. For shopping orientation, five factors of orientation were found and labeled as hedonism, brand and store loyalty, conformity, economy and rationality factor. Based on five shopping orientation factors, women were classified into five clusters(self-centered and rational, recreational, economy and shopping low involvement, economical and conformative cluster). Each cluster showed significantly different clothing purchasing behaviors (problem recognition, information search and evaluative criteria) and had different demographic variables(age, income, marital status and school career). Finally, residence and shopping orientations have influenced various clothing purchasing behavior. In conclusion, residence was the important factor which influenced on shopping orientation and clothing purchasing behavior.

  • PDF

Search for galaxy clusters in SA22

  • Kim, Jae-Woo;Im, Myungshin;Hyun, Minhee
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.37 no.2
    • /
    • pp.83.1-83.1
    • /
    • 2012
  • The galaxy cluster is a good laboratory to test the cosmological model as well as the evolution of galaxies in the dense region. However the lack of wide and deep near-IR datasets has prevented to identify galaxy clusters at z>1. Here we merge a wide, deep near-IR datasets of UKIDSS DXS (J and K bands) and IMS (J band) with the CFHT Legacy Survey (CFHTLS) ugriz catalogue to detect galaxy clusters. We identify candidate galaxy clusters at z>0.8, where the near-IR dataset plays an important role to detect galaxies efficiently. The cluster mass is also estimated based on the cluster richness and the semi-analytical cosmological simulation.

  • PDF

A Geometrical Center based Two-way Search Heuristic Algorithm for Vehicle Routing Problem with Pickups and Deliveries

  • Shin, Kwang-Cheol
    • Journal of Information Processing Systems
    • /
    • v.5 no.4
    • /
    • pp.237-242
    • /
    • 2009
  • The classical vehicle routing problem (VRP) can be extended by including customers who want to send goods to the depot. This type of VRP is called the vehicle routing problem with pickups and deliveries (VRPPD). This study proposes a novel way to solve VRPPD by introducing a two-phase heuristic routing algorithm which consists of a clustering phase and uses the geometrical center of a cluster and route establishment phase by applying a two-way search of each route after applying the TSP algorithm on each route. Experimental results show that the suggested algorithm can generate better initial solutions for more computer-intensive meta-heuristics than other existing methods such as the giant-tour-based partitioning method or the insertion-based method.

Stackelberg Game between Multi-Leader and Multi-Follower for Detecting Black Hole and Warm Hole Attacks In WSN

  • S.Suganthi;D.Usha
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.159-167
    • /
    • 2023
  • Objective: • To detect black hole and warm hole attacks in wireless sensor networks. • To give a solution for energy depletion and security breach in wireless sensor networks. • To address the security problem using strategic decision support system. Methods: The proposed stackelberg game is used to make the spirited relations between multi leaders and multi followers. In this game, all cluster heads are acts as leaders, whereas agent nodes are acts as followers. The game is initially modeled as Quadratic Programming and also use backtracking search optimization algorithm for getting threshold value to determine the optimal strategies of both defender and attacker. Findings: To find optimal payoffs of multi leaders and multi followers are based on their utility functions. The attacks are easily detected based on some defined rules and optimum results of the game. Finally, the simulations are executed in matlab and the impacts of detection of black hole and warm hole attacks are also presented in this paper. Novelty: The novelty of this study is to considering the stackelberg game with backtracking search optimization algorithm (BSOA). BSOA is based on iterative process which tries to minimize the objective function. Thus we obtain the better optimization results than the earlier approaches.

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.