• Title/Summary/Keyword: Hashing

Search Result 213, Processing Time 0.027 seconds

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

  • Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.300-307
    • /
    • 2012
  • When the volume of data grows big, some simple tasks could become a significant concern. Nearest neighbor search is such a task which finds from a data set the k nearest data points to queries. Locality-sensitive hashing techniques have been developed for approximate but fast nearest neighbor search. This paper introduces the notion of locality-sensitive hashing and surveys the locality-sensitive hashing techniques. It categories them based on several criteria, presents their characteristics, and compares their performance.

Locality-Sensitive Hashing for Data with Categorical and Numerical Attributes Using Dual Hashing

  • Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.98-104
    • /
    • 2014
  • Locality-sensitive hashing techniques have been developed to efficiently handle nearest neighbor searches and similar pair identification problems for large volumes of high-dimensional data. This study proposes a locality-sensitive hashing method that can be applied to nearest neighbor search problems for data sets containing both numerical and categorical attributes. The proposed method makes use of dual hashing functions, where one function is dedicated to numerical attributes and the other to categorical attributes. The method consists of creating indexing structures for each of the dual hashing functions, gathering and combining the candidates sets, and thoroughly examining them to determine the nearest ones. The proposed method is examined for a few synthetic data sets, and results show that it improves performance in cases of large amounts of data with both numerical and categorical attributes.

Extended Interactive Hashing Protocol (확장된 Interactive Hashing 프로토콜)

  • 홍도원;장구영;류희수
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.3
    • /
    • pp.95-102
    • /
    • 2002
  • Interactive hashing is a protocol introduced by Naor, Ostrovsk Venkatesan, $Yung^{[1]}$ with t-1 round complexity and $t^2$ - 1 bits communication complexity for given t bits string. In this paper, we propose more efficiently extended interactive hashing protocol with t/m- 1 round complexity and $t^2$/m - m bits communication complexity than NOVY protocol when m is a divisor of t, and prove the security of this.

An Implementation and Evaluation of Large-Scale Dynamic Hashing Directories (대규모 동적 해싱 디렉토리의 구현 및 평가)

  • Kim, Shin-Woo;Lee, Yong-Kyu
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.7
    • /
    • pp.924-942
    • /
    • 2005
  • Recently, large-scale directories have been developed for LINUX cluster file systems to store and retrieve huge amount of data. One of them, GFS directory, has attracted much attention because it is based on extendible hashing, one of dynamic hashing techniques, to support fast access to files. One distinctive feature of the GFS directory is the flat structure where all the leaf nodes are located at the same level of the tree. Hut one disadvantage of the mode structure is that the height of the mode tree has to be increased to make the tree flat after a byte is inserted to a full tree which cannot accommodate it. Thus, one byte addition makes the height of the whole mode tree grow, and each data block of the new tree needs one more link access than the old one. Another dynamic hashing technique which can be used for directories is linear hashing and a couple of researches have shown that it can get better performance at file access times than extendible hashing. [n this research, we have designed and implemented an extendible hashing directory and a linear hashing directory for large-scale LINUX cluster file systems and have compared performance between them. We have used the semi-flat structure which is known to have better access performance than the flat structure. According to the results of the performance evaluation, the linear hashing directory has shown slightly better performance at file inserts and accesses in most cases, whereas the extendible hashing directory is somewhat better at space utilization.

  • PDF

A Study of Index Method Based on Main Memory (메모리 기반의 인덱스 기법에 관한 연구)

  • Hong, G.C.;Moon, B.J.
    • Electronics and Telecommunications Trends
    • /
    • v.16 no.6 s.72
    • /
    • pp.29-40
    • /
    • 2001
  • 본 고에서는 디스크 기반의 정보검색시스템의 성능을 높이는 것을 목표로, 주기억장치 상주형 정보검색시스템에 적합한 주기억장치 기반의 인덱싱 기법을 비교 평가하고자 한다. 인덱스는 인덱스를 구성하는 키의 순서가 유지되는지의 여부에 따라 크게 두 종류로 나눌 수 있는데, 키가 일정한 순서로 유지되는 트리 계열과 키의 순서와 관계없이 무작위로 유지되는 해시 계열로 구분할 수 있다. 트리 계열 인덱스는 일정한 범위가 주어지는 연산을 처리할 때 유용하게 사용될 수 있으며, 해시 계열 인덱스는 특정한 키에 의한 빠른 데이터 접근을 제공한다. 트리 계열 인덱스로는 AVL 트리, B+ 트리, T 트리 등이 있으며, 해시 계열 인덱스로는 체인 버켓 해싱(Chained Bucket Hashing: CBH), 확장 해싱(Extendible Hashing: EH), 선형 해싱(Linear Hashing: LH), 수정된 선형 해싱(Modified Linear Hashing), 다중 디렉토리 해싱(Multi-directory Hashing) 및 확장된 체인 버켓 해싱(Extendible Chained Bucket Hashing: ECBH) 등이 있다.

A Novel Perceptual Hashing for Color Images Using a Full Quaternion Representation

  • Xing, Xiaomei;Zhu, Yuesheng;Mo, Zhiwei;Sun, Ziqiang;Liu, Zhen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.12
    • /
    • pp.5058-5072
    • /
    • 2015
  • Quaternions have been commonly employed in color image processing, but when the existing pure quaternion representation for color images is used in perceptual hashing, it would degrade the robustness performance since it is sensitive to image manipulations. To improve the robustness in color image perceptual hashing, in this paper a full quaternion representation for color images is proposed by introducing the local image luminance variances. Based on this new representation, a novel Full Quaternion Discrete Cosine Transform (FQDCT)-based hashing is proposed, in which the Quaternion Discrete Cosine Transform (QDCT) is applied to the pseudo-randomly selected regions of the novel full quaternion image to construct two feature matrices. A new hash value in binary is generated from these two matrices. Our experimental results have validated the robustness improvement brought by the proposed full quaternion representation and demonstrated that better performance can be achieved in the proposed FQDCT-based hashing than that in other notable quaternion-based hashing schemes in terms of robustness and discriminability.

A Hashing Method Using PCA-based Clustering (PCA 기반 군집화를 이용한 해슁 기법)

  • Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.6
    • /
    • pp.215-218
    • /
    • 2014
  • In hashing-based methods for approximate nearest neighbors(ANN) search, by mapping data points to k-bit binary codes, nearest neighbors are searched in a binary embedding space. In this paper, we present a hashing method using a PCA-based clustering method, Principal Direction Divisive Partitioning(PDDP). PDDP is a clustering method which repeatedly partitions the cluster with the largest variance into two clusters by using the first principal direction. The proposed hashing method utilizes the first principal direction as a projective direction for binary coding. Experimental results demonstrate that the proposed method is competitive compared with other hashing methods.

Dynamic Hashing Algorithm for Retrieval Using Hangeul Name on Navigation System

  • Lee, Jung-Hwa
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.3
    • /
    • pp.282-286
    • /
    • 2011
  • Recently, a name retrieval function is widely used on navigation systems. In this paper, we propose the new dynamic hashing algorithm for a name retrieval function on it. The proposed dynamic hashing algorithm by constructing an index using the variance information of character is the better than existing methods in terms of storage capacity and retrieval speed. The algorithm proposed in this paper can be useful on systems that have limited resources as well as navigation systems.

FLASH : A Main Memory Storage System

  • Kim, Pyung-Chul;Jung, Byung-Gwan;Kim, Moon-Ja
    • The Journal of Information Technology and Database
    • /
    • v.1 no.2
    • /
    • pp.103-125
    • /
    • 1994
  • In this paper, we introduce a new main memory storage system called FLASH that is designed for real-time applications. The FLASH system is characterized by the memory residency of data and a new fast and dynamic hashing scheme called extendible chained bucket hashing. We compared the performance of the new hashing algorithm with other well-known ones. Also, we carried out an experiment to compare the overall performance of the FLASH system with a commercial one. Both comparison results show that the new hashing scheme and the FLASH system outperforms other competitives.

  • PDF

Robust 3D Hashing Algorithm Using Key-dependent Block Surface Coefficient (키 기반 블록 표면 계수를 이용한 강인한 3D 모델 해싱)

  • Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.1-14
    • /
    • 2010
  • With the rapid growth of 3D content industry fields, 3D content-based hashing (or hash function) has been required to apply to authentication, trust and retrieval of 3D content. A content hash can be a random variable for compact representation of content. But 3D content-based hashing has been not researched yet, compared with 2D content-based hashing such as image and video. This paper develops a robust 3D content-based hashing based on key-dependent 3D surface feature. The proposed hashing uses the block surface coefficient using shape coordinate of 3D SSD and curvedness for 3D surface feature and generates a binary hash by a permutation key and a random key. Experimental results verified that the proposed hashing has the robustness against geometry and topology attacks and has the uniqueness of hash in each model and key.