• Title/Summary/Keyword: data duplication

Search Result 202, Processing Time 0.029 seconds

Data Deduplication Method using PRAM Cache in SSD Storage System (SSD 스토리지 시스템에서 PRAM 캐시를 이용한 데이터 중복제거 기법)

  • Kim, Ju-Kyeong;Lee, Seung-Kyu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.117-123
    • /
    • 2013
  • In the recent cloud storage environment, the amount of SSD (Solid-State Drive) replacing with the traditional hard disk drive is increasing. Management of SSD for its space efficiency has become important since SSD provides fast IO performance due to no mechanical movement whereas it has wearable characteristics and does not provide in place update. In order to manage space efficiency of SSD, data de-duplication technique is frequently used. However, this technique occurs much overhead because it consists of data chunking, hasing and hash matching operations. In this paper, we propose new data de-duplication method using PRAM cache. The proposed method uses hierarchical hash tables and LRU(Least Recently Used) for data replacement in PRAM. First hash table in DRAM is used to store hash values of data cached in the PRAM and second hash table in PRAM is used to store hash values of data in SSD storage. The method also enhance data reliability against power failure by maintaining backup of first hash table into PRAM. Experimental results show that average writing frequency and operation time of the proposed method are 44.2% and 38.8% less than those of existing data de-depulication method, respectively, when three workloads are used.

Evolutionary and Comparative Genomics to Drive Rational Drug Design, with Particular Focus on Neuropeptide Seven-Transmembrane Receptors

  • Furlong, Michael;Seong, Jae Young
    • Biomolecules & Therapeutics
    • /
    • v.25 no.1
    • /
    • pp.57-68
    • /
    • 2017
  • Seven transmembrane receptors (7TMRs), also known as G protein-coupled receptors, are popular targets of drug development, particularly 7TMR systems that are activated by peptide ligands. Although many pharmaceutical drugs have been discovered via conventional bulk analysis techniques the increasing availability of structural and evolutionary data are facilitating change to rational, targeted drug design. This article discusses the appeal of neuropeptide-7TMR systems as drug targets and provides an overview of concepts in the evolution of vertebrate genomes and gene families. Subsequently, methods that use evolutionary concepts and comparative analysis techniques to aid in gene discovery, gene function identification, and novel drug design are provided along with case study examples.

A Phylogenetic Analysis for Hox Linked Gene Families of Vertebrates

  • Kim, Sun-Woo;Jung, Gi-La;Lee, Jae-Hyoun;Park, Ha-Young;Kim, Chang-Bae
    • Animal cells and systems
    • /
    • v.12 no.4
    • /
    • pp.261-267
    • /
    • 2008
  • The human chromosomes 2, 7, 12 and 17 show genomic homology around Hox gene clusters, is taken as evidence that these paralogous gene families might have arisen from a ancestral chromosomal segment through genome duplication events. We have examined protein data from vertebrate and invertebrate genomes to analyze the phylogenetic history of multi-gene families with three or more of their representatives linked to human Hox clusters. Topology comparison based upon statistical significance and information of chromosome location for these genes examined have revealed many of linked genes coduplicated with Hox gene clusters. Most linked genes to Hox clusters share the same evolutionary history and are duplicated in concert with each other. We conclude that gene families linked to Hox clusters may be suggestion of ancient genome duplications.

An Enhanced Handoff Mechanism for Cellular IP (Cellular IP 핸드오프 성능개선)

  • Kim, Gyeong-A;Kim, Jong-Gwon;Park, Jae-Yun
    • The KIPS Transactions:PartC
    • /
    • v.9C no.1
    • /
    • pp.89-96
    • /
    • 2002
  • Handoff is one of the most important factors that ma? degrade the performance of TCP connections in wireless data networks. In this paper, we present a lossless and duplication free handoff scheme called LPM (Last Packet Marking) for improving Cellular If semisoft handoff. LPM signals the safe handoff cue by sending a specially marked packet to mobile hosts. SPM (Semisoft rePly Message) is the only newly introduced control packet. Our performance study shows that LPM achieves lossless packet delivery without duplication and increases TCP throughput significantly.

Evidence of genome duplication revealed by sequence analysis of multi-loci expressed sequence tagesimple sequence repeat bands in Panax ginseng Meyer

  • Kim, Nam-Hoon;Choi, Hong-Il;Kim, Kyung Hee;Jang, Woojong;Yang, Tae-Jin
    • Journal of Ginseng Research
    • /
    • v.38 no.2
    • /
    • pp.130-135
    • /
    • 2014
  • Background: Panax ginseng, the most famous medicinal herb, has a highly duplicated genome structure. However, the genome duplication of P. ginseng has not been characterized at the sequence level. Multiple band patterns have been consistently observed during the development of DNA markers using unique sequences in P. ginseng. Methods: We compared the sequences of multiple bands derived from unique expressed sequence tagsimple sequence repeat (EST-SSR) markers to investigate the sequence level genome duplication. Results: Reamplification and sequencing of the individual bands revealed that, for each marker, two bands around the expected size were genuine amplicons derived from two paralogous loci. In each case, one of the two bands was polymorphic, showing different allelic forms among nine ginseng cultivars, whereas the other band was usually monomorphic. Sequences derived from the two loci showed a high similarity, including the same primer-binding site, but each locus could be distinguished based on SSR number variations and additional single nucleotide polymorphisms (SNPs) or InDels. A locus-specific marker designed from the SNP site between the paralogous loci produced a single band that also showed clear polymorphism among ginseng cultivars. Conclusion: Our data imply that the recent genome duplication has resulted in two highly similar paralogous regions in the ginseng genome. The two paralogous sequences could be differentiated by large SSR number variations and one or two additional SNPs or InDels in every 100 bp of genic region, which can serve as a reliable identifier for each locus.

Technical analysis of Cloud storage for Cloud Computing (클라우드 컴퓨팅을 위한 클라우드 스토리지 기술 분석)

  • Park, Jeong-Su;Jung, Sung-Jae;Bae, Yu-Mi;Kyung, Ji-Hun;Sung, Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.757-760
    • /
    • 2012
  • Cloud storage system that cloud computing providers provides large amounts of data storage and processing of cloud computing is a key component. Large vendors (such as Facebook, YouTube, Google) in the mass sending of data through the network quickly and easily share photos, videos, documents, etc. from heterogeneous devices, such as tablets, smartphones, and the data that is stored in the cloud storage using was approached. At time, growth and development of the globally data, the cloud storage business model emerging is getting. Analysis new network storage cloud storage services concepts and technologies, including data manipulation, storage virtualization, data replication and duplication, security, cloud computing core.

  • PDF

Research on Minimizing Access to RDF Triple Store for Efficiency in Constructing Massive Bibliographic Linked Data (극대용량 서지 링크드 데이터 구축의 효율성을 위한 RDF 트리플 저장소 접근 최소화에 관한 연구)

  • Lee, Moon-Ho;Choi, Sung-Pil
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.3
    • /
    • pp.233-257
    • /
    • 2017
  • In this paper, we propose an effective method to convert and construct the MEDLINE, the world's largest biomedical bibliographic database, into linked data. To do this, we first derive the appropriate RDF schema by analyzing the MEDLINE record structure in detail, and convert each record into a valid RDF file in the derived schema. We apply the dual batch registration method to streamline the subject URI duplication checking procedure when merging all RDF files in the converted record unit and storing it in a single RDF triple storage. By applying this method, the number of RDF triple storage accesses for the subject URI duplication is reduced from 26,597,850 to 2,400, compared with the sequential configuration of linked data in units of RDF files. Therefore, it is expected that the result of this study will provide an important opportunity to eliminate the inefficiency in converting large volume bibliographic record sets into linked data, and to secure promptness and timeliness.

Asymmetric Index Management Scheme for High-capacity Compressed Databases (대용량 압축 데이터베이스를 위한 비대칭 색인 관리 기법)

  • Byun, Si-Woo;Jang, Seok-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.293-300
    • /
    • 2016
  • Traditional databases exploit a record-based model, where the attributes of a record are placed contiguously in a slow hard disk to achieve high performance. On the other hand, for read-intensive data analysis systems, the column-based compressed database has become a proper model because of its superior read performance. Currently, flash memory SSD is largely recognized as the preferred storage media for high-speed analysis systems. This paper introduces a compressed column-storage model and proposes a new index and its data management scheme for a high-capacity data warehouse system. The proposed index management scheme is based on the asymmetric index duplication and achieves superior search performance using the master index and compact index, particularly for large read-mostly databases. In addition, the data management scheme contributes to the read performance and high reliability by compressing the related columns and replicating them in two mirrored SSD. Based on the results of the performance evaluation under the high workload conditions, the data management scheme outperforms the traditional scheme in terms of the search throughput and response time.

Development of Digital 3D Real Object Duplication System and Process Technology (디지털 3차원 실물복제기 시스템 및 공정기술 개발)

  • Kim D.S.;An Y.J.;Lee W.H.;Choi B.O.;Chang M.H.;Baek Y.J.;Choi K.H.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2005.06a
    • /
    • pp.732-737
    • /
    • 2005
  • Distal 3D Real Object Duplication System(RODS) consists of 3D Scanner and Solid Freeform Fabrication System(SFFS). It is a device to make three-dimensional objects directly from the drawing or the scanning data. In this research, we developed an office type SFFS based on Three Dimensional Printing Process and a industrial SFFS using Dual Laser. An office type SFFS applied sliding mode control with sliding perturbation observer(SMCSPO) algorithm for control of this system. And we measured process variables about droplet diameter measurement and powder bed formation etc. through experiments. Also, in order to develop more elaborate and speedy system for large objects than existing SLS process, this study applies a new Selective Multi-Laser Sintering(SMLS) process and 3-axis dynamic Focusing Scanner for scanning large area instead of the existing $f\theta$ lens. In this process, the temperature has a great influence on sintering of the polymer. Also the laser parameters are considered like that laser beam power, scan speed, scan spacing. Now, this study is in progress to eveluate the effect of experimental parameters on the sintering process.

  • PDF

A Study on the Development and Maintenance of Embedded SQL based Information Systems (임베디드 SQL 기반 정보시스템의 개발 및 관리 방법에 대한 연구)

  • Song, Yong-Uk
    • The Journal of Information Systems
    • /
    • v.19 no.4
    • /
    • pp.25-49
    • /
    • 2010
  • As companies introduced ERP (Enterprise Resource Planning) systems since the middle of 1990s, the databases of the companies has become centralized and gigantic. The companies are now developing data-mining based applications on those centralized and gigantic databases for knowledge management. Almost of them are using $Pro^*C$/C++, a embedded SQL programming language, and it's because the $Pro^*C$/C++ is independent of platforms and also fast. However, they suffer from difficulties in development and maintenance due to the characteristics of corporate databases which have intrinsically large number of tables and fields. The purpose of this research is to design and implement a methodology which makes it easier to develop and maintain embedded SQL applications based on relational databases. Firstly, this article analyzes the syntax of $Pro^*C$/C++ and addresses the concept of repetition and duplication which causes the difficulties in development and maintenance of corporate information systems. Then, this article suggests a management architecture of source codes and databases in which a preprocessor generates $Pro^*C$/C++ source codes by referring a DB table specification, which would solve the problem of repetition and duplication. Moreover, this article also suggests another architecture of DB administration in which the preprocessor generates DB administration commands by referring the same table specification, which would solve the problem of repetition and duplication again. The preprocessor, named $PrePro^*C$, has been developed under the UNIX command-line prompt environment to preprocess $Pro^*C$/C++ source codes and SQL administration commands, and is under update to be used in another DB interface environment like ODBC and JDBC, too.