• Title/Summary/Keyword: locality - based similarity

Search Result 8, Processing Time 0.018 seconds

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

  • Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.300-307
    • /
    • 2012
  • When the volume of data grows big, some simple tasks could become a significant concern. Nearest neighbor search is such a task which finds from a data set the k nearest data points to queries. Locality-sensitive hashing techniques have been developed for approximate but fast nearest neighbor search. This paper introduces the notion of locality-sensitive hashing and surveys the locality-sensitive hashing techniques. It categories them based on several criteria, presents their characteristics, and compares their performance.

A Dynamic Locality Sensitive Hashing Algorithm for Efficient Security Applications

  • Mohammad Y. Khanafseh;Ola M. Surakhi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.79-88
    • /
    • 2024
  • The information retrieval domain deals with the retrieval of unstructured data such as text documents. Searching documents is a main component of the modern information retrieval system. Locality Sensitive Hashing (LSH) is one of the most popular methods used in searching for documents in a high-dimensional space. The main benefit of LSH is its theoretical guarantee of query accuracy in a multi-dimensional space. More enhancement can be achieved to LSH by adding a bit to its steps. In this paper, a new Dynamic Locality Sensitive Hashing (DLSH) algorithm is proposed as an improved version of the LSH algorithm, which relies on employing the hierarchal selection of LSH parameters (number of bands, number of shingles, and number of permutation lists) based on the similarity achieved by the algorithm to optimize searching accuracy and increasing its score. Using several tampered file structures, the technique was applied, and the performance is evaluated. In some circumstances, the accuracy of matching with DLSH exceeds 95% with the optimal parameter value selected for the number of bands, the number of shingles, and the number of permutations lists of the DLSH algorithm. The result makes DLSH algorithm suitable to be applied in many critical applications that depend on accurate searching such as forensics technology.

Document Summarization using Topic Phrase Extraction and Query-based Summarization (주제어구 추출과 질의어 기반 요약을 이용한 문서 요약)

  • 한광록;오삼권;임기욱
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.488-497
    • /
    • 2004
  • This paper describes the hybrid document summarization using the indicative summarization and the query-based summarization. The learning models are built from teaming documents in order to extract topic phrases. We use Naive Bayesian, Decision Tree and Supported Vector Machine as the machine learning algorithm. The system extracts topic phrases automatically from new document based on these models and outputs the summary of the document using query-based summarization which considers the extracted topic phrases as queries and calculates the locality-based similarity of each topic phrase. We examine how the topic phrases affect the summarization and how many phrases are proper to summarization. Then, we evaluate the extracted summary by comparing with manual summary, and we also compare our summarization system with summarization mettled from MS-Word.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

An advanced reversible data hiding algorithm based on the similarity between neighboring pixels

  • Jung, Soo-Mok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.2
    • /
    • pp.33-42
    • /
    • 2016
  • In this paper, an advanced reversible data hiding algorithm which takes the advantage of the spatial locality in image was proposed. Natural image has a spatial locality. The pixel value of a natural image is similar to the values of neighboring pixels. So, using the neighboring pixel values, it is possible to precisely predict the pixel value. Frequency increases significantly at the peak point of the difference histogram using the predicted values. Therefore, it is possible to increase the amount of data to be embedded. By using the proposed algorithm, visually high quality stego-image can be generated, the original cover image and the embedded data can be extracted from the stego-image without distortion. The embedding data into the cover image of the proposed algorithm is much lager than that of the previous algorithm. The performance of the proposed algorithm was verified by experiment. The proposed algorithm is very useful for the reversible data hiding.

A Study on Malware Clustering Technique Using API Call Sequence and Locality Sensitive Hashing (API 콜 시퀀스와 Locality Sensitive Hashing을 이용한 악성코드 클러스터링 기법에 관한 연구)

  • Goh, Dong Woo;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.1
    • /
    • pp.91-101
    • /
    • 2017
  • API call sequence analysis is a kind of analysis using API call information extracted in target program. Compared to other techniques, this is advantageous as it can characterize the behavior of the target. However, existing API call sequence analysis has an issue of identifying same characteristics to different function during the analysis. To resolve the identification issue and improve performance of analysis, this study includes the method of API abstraction technique in addition to existing analysis. From there on, similarity between target programs is computed and clustered into similar types by applying LSH to abstracted API call sequence from analyzed target. Thus, this study can attribute in improving the accuracy of the malware analysis based on discovered information on the types of malware identified.

Fast, Flexible Text Search Using Genomic Short-Read Mapping Model

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.518-528
    • /
    • 2016
  • The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short-read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency-based fragment ordering method and a locality-aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.

A Study on its Formation of the Ulsan Dutbeki Dance: Focusing on Local Features in the Ulsan District. (향토성에 의한 울산덧배기춤의 형상화에 관한 연구)

  • Choi, Heung-Kee
    • (The) Research of the performance art and culture
    • /
    • no.41
    • /
    • pp.187-218
    • /
    • 2020
  • Ulsan Dutbeki is a local dance handed down by the Ulsan people through custom. This study was discussed on the locality of Ulsan Dutbeki. The method of this study is as follows. First of all, the perception of Dutbeki from the perspective of Ulsan's local characteristic. First, Ulsan Dutbeki is based on the local characteristic of the southeastern coastal area of the Korean peninsula. Second, Dutbeki features local characteristics of Ulsan as a military cultural area. Third, in Dutbeki, there is a local culture of Ulsan which was originated from the village Dongjeol and outdoor performances. Next, the researcher perceived Ulsan Dutbeki which had been handed down through custom and approached its shape. The origins of the shape are, firstly, the speech tone and gestures of Ulsan people. Secondly, folk plays related to worshiping martial arts and military training. Thirdly, the characteristics of the Dutbeki dance in coastal areas of Gyeongsangdo. Fourth, local custom displayed at the village festival of Ulsan. Ulsan is a region of Gyeongsang culture area and has similarity with other localities. However, this study limited its comparisons with regard to Dutbeki that were originated from the local characteristics of other regions. The results of this study recognized Ulsan Dutbeki as a local dance in Ulsan area. In other words, this study perceived Dutbeki, which had been an entertaining component of traditional lifestyle, as an intangible cultural heritage and studied the form in every conceivable way from an artistic point of view.