• Title/Summary/Keyword: Content Hashing

Search Result 22, Processing Time 0.028 seconds

Image Deduplication Based on Hashing and Clustering in Cloud Storage

  • Chen, Lu;Xiang, Feng;Sun, Zhixin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1448-1463
    • /
    • 2021
  • With the continuous development of cloud storage, plenty of redundant data exists in cloud storage, especially multimedia data such as images and videos. Data deduplication is a data reduction technology that significantly reduces storage requirements and increases bandwidth efficiency. To ensure data security, users typically encrypt data before uploading it. However, there is a contradiction between data encryption and deduplication. Existing deduplication methods for regular files cannot be applied to image deduplication because images need to be detected based on visual content. In this paper, we propose a secure image deduplication scheme based on hashing and clustering, which combines a novel perceptual hash algorithm based on Local Binary Pattern. In this scheme, the hash value of the image is used as the fingerprint to perform deduplication, and the image is transmitted in an encrypted form. Images are clustered to reduce the time complexity of deduplication. The proposed scheme can ensure the security of images and improve deduplication accuracy. The comparison with other image deduplication schemes demonstrates that our scheme has somewhat better performance.

A Novel Technique for Detection of Repacked Android Application Using Constant Key Point Selection Based Hashing and Limited Binary Pattern Texture Feature Extraction

  • MA Rahim Khan;Manoj Kumar Jain
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.141-149
    • /
    • 2023
  • Repacked mobile apps constitute about 78% of all malware of Android, and it greatly affects the technical ecosystem of Android. Although many methods exist for repacked app detection, most of them suffer from performance issues. In this manuscript, a novel method using the Constant Key Point Selection and Limited Binary Pattern (CKPS: LBP) Feature extraction-based Hashing is proposed for the identification of repacked android applications through the visual similarity, which is a notable feature of repacked applications. The results from the experiment prove that the proposed method can effectively detect the apps that are similar visually even that are even under the double fold content manipulations. From the experimental analysis, it proved that the proposed CKPS: LBP method has a better efficiency of detecting 1354 similar applications from a repository of 95124 applications and also the computational time was 0.91 seconds within which a user could get the decision of whether the app repacked. The overall efficiency of the proposed algorithm is 41% greater than the average of other methods, and the time complexity is found to have been reduced by 31%. The collision probability of the Hashes was 41% better than the average value of the other state of the art methods.

Improved Hashing Method for HEVC Screen Content Coding (향상된 해쉬 기법을 통한 HEVC 스크린 콘텐츠 코딩 성능 개선 기법)

  • Heo, Jeonghwan;Kim, Ilseung;Jeong, Jechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.246-249
    • /
    • 2016
  • 본 논문에서는 화면 내 블록 카피 (IntraBC: Intra Block Copy) 예측 기술의 압축 성능 분석과 향상된 해쉬 기법을 통한 HEVC (High Efficiency Video Coding) 스크린 콘텐츠 코딩 성능 기법을 제안한다. 현재 SCM (Screen Content Coding Test Model) 에 채택 된 화면 내 블록 카피 기술에서는 $16{\times}16$ 블록에는 1차원 탐색을 수행하고 $8{\times}8$블록에서는 해쉬기반 전역 탐색을 수행하여 해쉬가 일치하는 블록들과 RD-Cost를 수행한다. 현재의 해쉬기반 전역탐색에는 기울기 (Gradient) 위주의 해쉬 구성으로 인해 해쉬가 고르게 분포하지 않아, RD-Cost 수행횟수가 과도하게 많아지는 문제가 있다. 제안하는 방법은 전역적 화면 내 블록 카피의 해쉬 구성 방법을 개선함으로써, 기존 SCM-6.1 대비 0.46%의 BDBR 향상을 확인하였다.

  • PDF

Implementation of System Retrieving Multi-Object Image Using Property of Moments (모멘트 특성을 이용한 다중 객체 이미지 검색 시스템 구현)

  • 안광일;안재형
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.454-460
    • /
    • 2000
  • To retrieve complex data such as images, the content-based retrieval method rather than keyword based method is required. In this paper, we implemented a content-based image retrieval system which retrieves object of user query effectively using invariant moments which have invariant properties about linear transformation like position transition, rotation and scaling. To extract the shape feature of objects in an image, we propose a labeling algorithm that extracts objects from an image and apply invariant moments to each object. Hashing method is also applied to reduce a retrieval time and index images effectively. The experimental results demonstrate the high retrieval efficiency i.e precision 85%, recall 23%. Consequently, our retrieval system shows better performance than the conventional system that cannot express the shale of objects exactly.

  • PDF

Rosary : Topology-Aware Structured P2P Overlay Network for CDN System (Rosary : CDN 시스템을 위한 구조화된 토폴러지-인식 P2P 오버레이 네트워크)

  • Shin Soo-Young;Namgoong Jung-ll;Park Soo-Hyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12B
    • /
    • pp.818-830
    • /
    • 2005
  • Recently, Peer-to-Peer (P2P) overlay networks like CAN, Chord, Pastry and Tapestry offer a novel platform for scalable and decentralized distributed applications. These systems provide efficient and fault-tolerant routing, object location, and load balancing within a self-organizing overlay network. Content Delivery Network (CDN) is an intermediate layer of infrastructure that helps to efficiently deliver the multimedia content from content providers to clients. In this paper, We propose a topology-aware P2P overlay network for CDN, Rosary, in which CDN servers perform Intra-Pastry and Inter-Pastry routing based on a two-level structured overlay network. This proposed system extends pastry by adapting itself to CDN environments, where a semi-hashing based scheme for Intra-Pastry routing is introduced, and dynamic landmark technology is used to construct the topology-aware overlay network. Through simulations on NS-2, it is shown that Rosary is scalable, efficient, and flexible.

Concentric Circle-Based Image Signature for Near-Duplicate Detection in Large Databases

  • Cho, A-Young;Yang, Won-Keun;Oh, Weon-Geun;Jeong, Dong-Seok
    • ETRI Journal
    • /
    • v.32 no.6
    • /
    • pp.871-880
    • /
    • 2010
  • Many applications dealing with image management need a technique for removing duplicate images or for grouping related (near-duplicate) images in a database. This paper proposes a concentric circle-based image signature which makes it possible to detect near-duplicates rapidly and accurately. An image is partitioned by radius and angle levels from the center of the image. Feature values are calculated using the average or variation between the partitioned sub-regions. The feature values distributed in sequence are formed into an image signature by hash generation. The hashing facilitates storage space reduction and fast matching. The performance was evaluated through discriminability and robustness tests. Using these tests, the particularity among the different images and the invariability among the modified images are verified, respectively. In addition, we also measured the discriminability and robustness by the distribution analysis of the hashed bits. The proposed method is robust to various modifications, as shown by its average detection rate of 98.99%. The experimental results showed that the proposed method is suitable for near-duplicate detection in large databases.

Technique for Estimating the Number of Active Flows in High-Speed Networks

  • Yi, Sung-Won;Deng, Xidong;Kesidis, George;Das, Chita R.
    • ETRI Journal
    • /
    • v.30 no.2
    • /
    • pp.194-204
    • /
    • 2008
  • The online collection of coarse-grained traffic information, such as the total number of flows, is gaining in importance due to a wide range of applications, such as congestion control and network security. In this paper, we focus on an active queue management scheme called SRED since it estimates the number of active flows and uses the quantity to indicate the level of congestion. However, SRED has several limitations, such as instability in estimating the number of active flows and underestimation of active flows in the presence of non-responsive traffic. We present a Markov model to examine the capability of SRED in estimating the number of flows. We show how the SRED cache hit rate can be used to quantify the number of active flows. We then propose a modified SRED scheme, called hash-based two-level caching (HaTCh), which uses hashing and a two-level caching mechanism to accurately estimate the number of active flows under various workloads. Simulation results indicate that the proposed scheme provides a more accurate estimation of the number of active flows than SRED, stabilizes the estimation with respect to workload fluctuations, and prevents performance degradation by efficiently isolating non-responsive flows.

  • PDF

Multi-match Packet Classification Scheme Combining TCAM with an Algorithmic Approach

  • Lim, Hysook;Lee, Nara;Lee, Jungwon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.27-38
    • /
    • 2017
  • Packet classification is one of the essential functionalities of Internet routers in providing quality of service. Since the arrival rate of input packets can be tens-of-millions per second, wire-speed packet classification has become one of the most challenging tasks. While traditional packet classification only reports a single matching result, new network applications require multiple matching results. Ternary content-addressable memory (TCAM) has been adopted to solve the multi-match classification problem due to its ability to perform fast parallel matching. However, TCAM has a fundamental issue: high power dissipation. Since TCAM is designed for a single match, the applicability of TCAM to multi-match classification is limited. In this paper, we propose a cost- and energy-efficient multi-match classification architecture that combines TCAM with a tuple space search algorithm. The proposed solution uses two small TCAM modules and requires a single-cycle TCAM lookup, two SRAM accesses, and several Bloom filter query cycles for multi-match classifications.

A Distributed Layer 7 Server Load Balancing (분산형 레이어 7 서버 부하 분산)

  • Kwon, Hui-Ung;Kwak, Hu-Keun;Chung, Kyu-Sik
    • The KIPS Transactions:PartA
    • /
    • v.15A no.4
    • /
    • pp.199-210
    • /
    • 2008
  • A Clustering based wireless internet proxy server needs a layer-7 load balancer with URL hashing methods to reduce the total storage space for servers. Layer-4 load balancer located in front of server cluster is to distribute client requests to the servers with the same contents at transport layer, such as TCP or UDP, without looking at the content of the request. Layer-7 load balancer located in front of server cluster is to parse client requests in application layer and distribute them to servers based on different types of request contents. Layer 7 load balancer allows servers to have different contents in an exclusive way so that it can minimize the total storage space for servers and improve overall cluster performance. However, its scalability is limited due to the high overhead of parsing requests in application layer as different from layer-4 load balancer. In order to overcome its scalability limitation, in this paper, we propose a distributed layer-7 load balancer by replacing a single layer-7 load balancer in the conventional scheme by a single layer-4 load balancer located in front of server cluster and a set of layer-7 load balancers located at server cluster. In a clustering based wireless internet proxy server, we implemented the conventional scheme by using KTCPVS(Kernel TCP Virtual Server), a linux based layer-7 load balancer. Also, we implemented the proposed scheme by using IPVS(IP Virtual Server), a linux-based layer-4 load balancer, installing KTCPVS in each server, and making them work together. We performed experiments using 16 PCs. Experimental results show scalability and high performance of the proposed scheme, as the number of servers grows, compared to the conventional scheme.

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.