Search | Korea Science

Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage

Ellappan, Manogar;Abirami, S
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.4
- /
- pp.1342-1359
- /
- 2021
The data deduplication technique identifies the duplicates and minimizes the redundant storage data in the backup server. The chunk level deduplication plays a significant role in detecting the appropriate chunk boundaries, which solves the challenges such as minimum throughput and maximum chunk size variance in the data stream. To provide the solution, we propose a new chunking algorithm called Dynamic Prime Chunking (DPC). The main goal of DPC is to dynamically change the window size within the prime value based on the minimum and maximum chunk size. According to the result, DPC provides high throughput and avoid significant chunk variance in the deduplication system. The implementation and experimental evaluation have been performed on the multimedia and operating system datasets. DPC has been compared with existing algorithms such as Rabin, TTTD, MAXP, and AE. Chunk Count, Chunking time, throughput, processing time, Bytes Saved per Second (BSPS) and Deduplication Elimination Ratio (DER) are the performance metrics analyzed in our work. Based on the analysis of the results, it is found that throughput and BSPS have improved. Firstly, DPC quantitatively improves throughput performance by more than 21% than AE. Secondly, BSPS increases a maximum of 11% than the existing AE algorithm. Due to the above reason, our algorithm minimizes the total processing time and achieves higher deduplication efficiency compared with the existing Content Defined Chunking (CDC) algorithms.
https://doi.org/10.3837/tiis.2021.04.009 인용 PDF KSCI HTML

Effect of Packaging Method on the Storage Stability of Hair Tail Products (포장방법이 칼치제품의 저장성에 미치는 영향)

Jo, Kil-Suk;Kim, Hyun-Ku;Kang, Tong-Sam;Shin, Dong-Hwa
- Korean Journal of Food Science and Technology
- /
- v.20 no.1
- /
- pp.45-51
- /
- 1988
To improve the individual packaging method and extend the shelf life of hair tail(Trichiurus japonicus), salted an unsalted hair tail chunk (cut in 8-10cm) were packaged in laminated plastic film bag(Nylon/PE: $20{\mu}m,\;12{\times}15cm$) filled with with free-$O_2$ absorber, in vacuum, and stored at 0 and/or $5^{\circ}C$. The other samples were packaged in plastic foam trays, overwraped with oxygen permeable film(control), and stored at same temperature. Volatile basic nitrogen (VBN), trimethylamine (TMA) and viable cell counts (VCC) were progressed with increase of storage time, but thiobarbituric acid (TBA) values decreased gradually after reaching at a maximum peak in 5-15 days. Judging from 4 chemical components, VBN was the most available component in quality judgement of hair tail chunk and its upper limiting content was 29 mg%. Regression equation for shelf life prediction of hair tail chunk with sensory evalution and VBN component was determined.
PDF

A Bitmap Index for Chunk-Based MOLAP Cubes (청크 기반 MOLAP 큐브를 위한 비트맵 인덱스)

Lim, Yoon-Sun;Kim, Myung
- Journal of KIISE:Databases
- /
- v.30 no.3
- /
- pp.225-236
- /
- 2003
MOLAP systems store data in a multidimensional away called a 'cube' and access them using way indexes. When a cube is placed into disk, it can be Partitioned into a set of chunks of the same side length. Such a cube storage scheme is called the chunk-based MOLAP cube storage scheme. It gives data clustering effect so that all the dimensions are guaranteed to get a fair chance in terms of the query processing speed. In order to achieve high space utilization, sparse chunks are further compressed. Due to data compression, the relative position of chunks cannot be obtained in constant time without using indexes. In this paper, we propose a bitmap index for chunk-based MOLAP cubes. The index can be constructed along with the corresponding cube generation. The relative position of chunks is retained in the index so that chunk retrieval can be done in constant time. We placed in an index block as many chunks as possible so that the number of index searches is minimized for OLAP operations such as range queries. We showed the proposed index is efficient by comparing it with multidimensional indexes such as UB-tree and grid file in terms of time and space.
PDF KSCI

A Z-Index based MOLAP Cube Storage Scheme (Z-인덱스 기반 MOLAP 큐브 저장 구조)

Kim, Myung;Lim, Yoon-Sun
- Journal of KIISE:Databases
- /
- v.29 no.4
- /
- pp.262-273
- /
- 2002
MOLAP is a technology that accelerates multidimensional data analysis by storing data in a multidimensional array and accessing them using their position information. Depending on a mapping scheme of a multidimensional array onto disk, the sliced of MOLAP operations such as slice and dice varies significantly. [1] proposed a MOLAP cube storage scheme that divides a cube into small chunks with equal side length, compresses sparse chunks, and stores the chunks in row-major order of their chunk indexes. This type of cube storage scheme gives a fair chance to all dimensions of the input data. Here, we developed a variant of their cube storage scheme by placing chunks in a different order. Our scheme accelerates slice and dice operations by aligning chunks to physical disk block boundaries and clustering neighboring chunks. Z-indexing is used for chunk clustering. The efficiency of the proposed scheme is evaluated through experiments. We showed that the proposed scheme is efficient for 3~5 dimensional cubes that are frequently used to analyze business data.
PDF KSCI

A Study of Method to Restore Deduplicated Files in Windows Server 2012 (윈도우 서버 2012에서 데이터 중복 제거 기능이 적용된 파일의 복원 방법에 관한 연구)

Son, Gwancheol;Han, Jaehyeok;Lee, Sangjin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.27 no.6
- /
- pp.1373-1383
- /
- 2017
Deduplication is a function to effectively manage data and improve the efficiency of storage space. When the deduplication is applied to the system, it makes it possible to efficiently use the storage space by dividing the stored file into chunks and storing only unique chunk. However, the commercial digital forensic tool do not support the file system analysis, and the original file extracted by the tool can not be executed or opened. Therefore, in this paper, we analyze the process of generating chunks of data for a Windows Server 2012 system that can apply deduplication, and the structure of the resulting file(Chunk Storage). We also analyzed the case where chunks that are not covered in the previous study are compressed. Based on these results, we propose the method to collect deduplicated data and reconstruct the original file for digital forensic investigation.
https://doi.org/10.13089/JKIISC.2017.27.6.1373 인용 PDF KSCI HTML

Large Storage Performance and Optimization Study using blockwrite (blockwrite를 이용한 대형 스토리지 성능 측정 및 최적화 연구)

Kim, Hyo-Ryoung;Song, Min-Gyu;Kang, Yong-Woo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.16 no.6
- /
- pp.1145-1152
- /
- 2021
In order to optimize the performance of 1.4P large storage, the characteristics of each chunk mode were investigated, and the chunk 512K mode was selected in terms of I/O speed. NVME storage system was configured and used to measure data server performance of large storage. By measuring the change in throughput according to the number of threads of the 1.4P large storage, the characteristics of the large storage system were identified, and it was confirmed that the performance was up to 133Gbps with a block size of 32KB. As a result of data transmission/reception experiment using globus-url-copy of GridFTP, it was found that this large storage has a throughput of 33Gbps.
https://doi.org/10.13067/JKIECS.2021.16.6.1145 인용 PDF KSCI

HRSF: Single Disk Failure Recovery for Liberation Code Based Storage Systems

Li, Jun;Hou, Mengshu
- Journal of Information Processing Systems
- /
- v.15 no.1
- /
- pp.55-66
- /
- 2019
Storage system often applies erasure codes to protect against disk failure and ensure system reliability and availability. Liberation code that is a type of coding scheme has been widely used in many storage systems because its encoding and modifying operations are efficient. However, it cannot effectively achieve fast recovery from single disk failure in storage systems, and has great influence on recovery performance as well as response time of client requests. To solve this problem, in this paper, we present HRSF, a Hybrid Recovery method for solving Single disk Failure. We present the optimal algorithm to accelerate failure recovery process. Theoretical analysis proves that our scheme consumes approximately 25% less amount of data read than the conventional method. In the evaluation, we perform extensive experiments by setting different number of disks and chunk sizes. The results show that HRSF outperforms conventional method in terms of the amount of data read and failure recovery time.
https://doi.org/10.3745/JIPS.01.0035 인용 PDF KSCI HTML

Sanitization of Open-Source Based Deduplicated Filesystem (오픈 소스 중복 제거 파일시스템에서의 완전 삭제)

Cho, Hyeonwoong;Kim, SeulGi;Kwon, Taekyoung
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.26 no.5
- /
- pp.1141-1149
- /
- 2016
Deduplicated filesystem can reduce usage of storage. However, it be able to recover deleted block. We studied sanitization of deduplicated filesystem, LessFS which is based on FUSE(Filesystem in USErspace). First, we show a vulnerability recover deleted data in the deduplicated filesystem. We implement sanitization of deduplicated filesystem considering the part of fingerprint DB with data blocks. It takes 60~70 times compared to without sanitization. Which means access time to fingerprint DB and overhead derived from increase of number of chunk have a critical impact on sanitization time. But in case of more than 65,536 Byte of chunksize, it is faster than normal filesystem without deduplication.
https://doi.org/10.13089/JKIISC.2016.26.5.1141 인용 PDF KSCI HTML

Physicochemical Properties of Brown Rice During Storage in Laminated Film Pouches (플라스틱 적층 필름 포장재를 이용한 현미의 저장중 물리화학적 변화)

Han, Jae-Gyoung;Kim, Kwan;Kang, Kil-Jin;Kim, Sung-Kon;Lee, Sang-Kyu
- Korean Journal of Food Science and Technology
- /
- v.28 no.4
- /
- pp.714-719
- /
- 1996
Changes in physicochemical properties of brown rice (Chu-chunk byeo, Japonica type) kept at $4^{\circ}C,\;20^{\circ}C,\;30^{\circ}C\;and\;40^{\circ}C$ in laminated film (4 layer) pouches were analyzed. Hardness of brown rice grain increased during storage periods. In case of color changes, 'L' value increased during storage below $30^{\circ}C$, and decreased after three months at $40^{\circ}C$. The 'a' value did not change below $30^{\circ}C$, but increased at $40^{\circ}C$. The 'b' value increased under all storage conditions. Percent germination was above 97% below $20^{\circ}C$ regardlesss of storage period and was inhibited by heat above $30^{\circ}C$. The rice did not germinate after one month storage at $40^{\circ}C$ Gelatinization enthalpy and gelatinization peak temperature by DSC of brown rice powder decreased during storage. As a result, storage below $20^{\circ}C$ in laminated film pouch is recommended far brown rice.
PDF

Novel schemes of CQI Feedback Compression based on Compressive Sensing for Adaptive OFDM Transmission

Li, Yongjie;Song, Rongfang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.5 no.4
- /
- pp.703-719
- /
- 2011
In multi-user wireless communication systems, adaptive modulation and scheduling are promising techniques for increasing the system throughput. However, a mass of wireless recourse will be occupied and spectrum efficiency will be decreased to feedback channel quality indication (CQI) of all users in every subcarrier or chunk for adaptive orthogonal frequency division multiplexing (OFDM) systems. Thus numerous limited feedback schemes are proposed to reduce the system overhead. The recently proposed compressive sensing (CS) theory provides a new framework to jointly measure and compress signals that allows less sampling and storage resources than traditional approaches based on Nyquist sampling. In this paper, we proposed two novel CQI feedback schemes based on general CS and subspace CS, respectively, both of which could be used in a wireless OFDM system. The feedback rate with subspace CS is greatly decreased by exploiting the subspace information of the underlying signal. Simulation results show the effectiveness of the proposed methods, with the same feedback rate, the throughputs with subspace CS outperform the discrete cosine transform (DCT) based method which is usually employed, and the throughputs with general CS outperform DCT when the feedback rate is larger than 0.13 bits/subcarrier.
https://doi.org/10.3837/tiis.2011.04.005 인용 PDF KSCI

Search Result 15, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)