• Title/Summary/Keyword: Chunk Algorithm

Search Result 9, Processing Time 0.024 seconds

Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage

  • Ellappan, Manogar;Abirami, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1342-1359
    • /
    • 2021
  • The data deduplication technique identifies the duplicates and minimizes the redundant storage data in the backup server. The chunk level deduplication plays a significant role in detecting the appropriate chunk boundaries, which solves the challenges such as minimum throughput and maximum chunk size variance in the data stream. To provide the solution, we propose a new chunking algorithm called Dynamic Prime Chunking (DPC). The main goal of DPC is to dynamically change the window size within the prime value based on the minimum and maximum chunk size. According to the result, DPC provides high throughput and avoid significant chunk variance in the deduplication system. The implementation and experimental evaluation have been performed on the multimedia and operating system datasets. DPC has been compared with existing algorithms such as Rabin, TTTD, MAXP, and AE. Chunk Count, Chunking time, throughput, processing time, Bytes Saved per Second (BSPS) and Deduplication Elimination Ratio (DER) are the performance metrics analyzed in our work. Based on the analysis of the results, it is found that throughput and BSPS have improved. Firstly, DPC quantitatively improves throughput performance by more than 21% than AE. Secondly, BSPS increases a maximum of 11% than the existing AE algorithm. Due to the above reason, our algorithm minimizes the total processing time and achieves higher deduplication efficiency compared with the existing Content Defined Chunking (CDC) algorithms.

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

  • Na Li;Cheng Li;Honglie Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.631-640
    • /
    • 2023
  • Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.

A Parallel Loop Scheduling Algorithm on Multiprocessor System Environments (다중프로세서 시스템 환경에서 병렬 루프 스케쥴링 알고리즘)

  • 이영규;박두순
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.3
    • /
    • pp.309-319
    • /
    • 2000
  • The purpose of a parallel scheduling under a multiprocessor environment is to carry out the scheduling with the minimum synchronization overhead, and to perform load balance for a parallel application program. The processors calculate the chunk of iteration and are allocated to carry out the parallel iteration. At this time, it frequently accesses mutually exclusive global memory so that there are a lot of scheduling overhead and bottleneck imposed. And also, when the distribution of the parallel iteration in the allocated chunk to the processor is different, the different execution time of each chunk causes the load imbalance and badly affects the capability of the all scheduling. In the paper. we investigate the problems on the conventional algorithms in order to achieve the minimum scheduling overhead and load balance. we then present a new parallel loop scheduling algorithm, considering the locality of the data and processor affinity.

  • PDF

Flexible Multimedia Streaming Based on the Adaptive Chunk Algorithm (적응 청크 알고리즘 기반 멀티미디어 스트리밍 알고리즘)

  • Kim Dong-Hwan;Kim Jung-Keun;Chang Tae-Gyu
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.5
    • /
    • pp.324-326
    • /
    • 2005
  • An adaptive Chunk algorithm is newly devised and a collaborative streaming is designed for high quality multimedia streaming service under time varying traffic conditions. An LMS based prediction filter is used to compensate the effect of time varying background traffic of the WAN. The underflow is generated for the $20\~28\%$ of the data stored in the central server by applying the FARIMA(Fractional Autoregressive Integrated Moving Average) traffic modeling method. The proposed algorithm is tested with the MPEG-2 video files and compensates $71\~85\%$ of central stream underflow.

Improved Parallel Loop Scheduling Algorithm on Shared Memory Systems (공유메모리 시스템에서 개선된 병렬 루프 스케쥴링 알고리즘)

  • 이영규;박두순
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2000.04a
    • /
    • pp.453-457
    • /
    • 2000
  • 병렬 시스템 환경에서 최적의 스케쥴링을 수행하기 위해서는 병렬성을 가진 iteration 들에 대해 최소의 동기화 오버헤드와 load balance 가 달성하도록 스케쥴링을 수행해야한다. 다중 프로세서들은 실행을 위하여 메모리로부터 iteration 들에 대한 chunk를 계산한 후 할당받게 된다. 이때, 각 프로세서들의 상호 배타적인 메모리 접근으로 많은 오버헤드 및 병목현상이 발생된다. 또한, 프로세서에게 할당된 chunk 내 iteration 들의 실행시간 분포가 서로 상이한 경우에는 load imbalance 의 원인이 되어 결과적으로 전체 스케쥴링에 나쁜 영향을 준다. 따라서, 최적의 스케쥴링을 수행하기 위해서 본 논문에서는 기존의 스케쥴링 방법들에서 문제점들을 도출하고 자료의 국부성과 프로세서 동족성을 고려한 개선된 병렬 루프 알고리즘을 제안하고, 성능평가를 통해 개선된 알고리즘이라는 것을 보였다.

  • PDF

Accuracy Improvement of RTT Measurement on the Alternate Path in SCTP (SCTP에서 대체 경로의 RTT 정확도 향상)

  • Kim, Ye-Na;Park, Woo-Ram;Kim, Jong-Hyuk;Park, Tae-Keun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.5B
    • /
    • pp.509-516
    • /
    • 2009
  • The Stream Control Transmission Protocol(SCTP) is a reliable transport layer protocol that provides several features. Multihoming is the one of the features and allows an association(SCTP's term for a connection) between two endpoints to use multiple paths. One of the paths, called a primary path, is used for initial data transmission and in the case of retransmission an alternate path is used. SCTP's current retransmission policy attempts to improve the chance of success by sending all retransmissions to an alternate destination address. However, SCTP's current retransmission policy has been shown to actually degrade performance in many circumstances. It is because that, due to Karn's algorithm, successful retransmissions on the alternate path cannot be used to update RTT(Round-Trip Time) estimation for the alternate path. In this paper we propose a scheme to avoid such performance degradation. We utilize 2bits which is not used in the flag field of DATA and SACK chunks to disambiguate original transmissions from retransmissions and to keep RTT and RTO(Retransmission Time-Out) values more accurate.

HRSF: Single Disk Failure Recovery for Liberation Code Based Storage Systems

  • Li, Jun;Hou, Mengshu
    • Journal of Information Processing Systems
    • /
    • v.15 no.1
    • /
    • pp.55-66
    • /
    • 2019
  • Storage system often applies erasure codes to protect against disk failure and ensure system reliability and availability. Liberation code that is a type of coding scheme has been widely used in many storage systems because its encoding and modifying operations are efficient. However, it cannot effectively achieve fast recovery from single disk failure in storage systems, and has great influence on recovery performance as well as response time of client requests. To solve this problem, in this paper, we present HRSF, a Hybrid Recovery method for solving Single disk Failure. We present the optimal algorithm to accelerate failure recovery process. Theoretical analysis proves that our scheme consumes approximately 25% less amount of data read than the conventional method. In the evaluation, we perform extensive experiments by setting different number of disks and chunk sizes. The results show that HRSF outperforms conventional method in terms of the amount of data read and failure recovery time.

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection (자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상)

  • Hwang, Young-Sook;Chung, Hoo-jung;Park, So-Young;Kwak, Young-Jae;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

Mobile Code Authentication Schemes that Permit Overlapping of Execution and Downloading (다운로드와 수행의 병행을 허용하는 모바일 코드 인증 기법)

  • Park Yongsu;Cho Yookun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.3
    • /
    • pp.115-124
    • /
    • 2005
  • When the application code is downloaded into the mobile device, it is important to provide authentication. Usually, mobile code execution is overlapped with downloading to reduce transfer delay. To the best of our knowledge, there has not been any algorithm to authenticate the mobile code in this environment. In this paper, we present two efficient code authentication schemes that permit overlapping of execution and downloading under the two cases: the first is when the order of transmission of code chunks is determined before the transmission and the second is when this order is determined during the transmission. The proposed methods are based on hash chaining and authentication trees, respectively. Especially, the latter scheme utilizes previously received authentication informations to verify the currently received chunk, which reduces both communication overhead and verification delay. When the application code consists of n chunks, communication overheads of the both schemes are 0(n) and verification delays of these two schemes are O(1) and O(log n), respectively.