• Title/Summary/Keyword: Duplicate Elimination

Search Result 8, Processing Time 0.023 seconds

Efficient Privacy-Preserving Duplicate Elimination in Edge Computing Environment Based on Trusted Execution Environment (신뢰실행환경기반 엣지컴퓨팅 환경에서의 암호문에 대한 효율적 프라이버시 보존 데이터 중복제거)

  • Koo, Dongyoung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.9
    • /
    • pp.305-316
    • /
    • 2022
  • With the flood of digital data owing to the Internet of Things and big data, cloud service providers that process and store vast amount of data from multiple users can apply duplicate data elimination technique for efficient data management. The user experience can be improved as the notion of edge computing paradigm is introduced as an extension of the cloud computing to improve problems such as network congestion to a central cloud server and reduced computational efficiency. However, the addition of a new edge device that is not entirely reliable in the edge computing may cause increase in the computational complexity for additional cryptographic operations to preserve data privacy in duplicate identification and elimination process. In this paper, we propose an efficiency-improved duplicate data elimination protocol while preserving data privacy with an optimized user-edge-cloud communication framework by utilizing a trusted execution environment. Direct sharing of secret information between the user and the central cloud server can minimize the computational complexity in edge devices and enables the use of efficient encryption algorithms at the side of cloud service providers. Users also improve the user experience by offloading data to edge devices, enabling duplicate elimination and independent activity. Through experiments, efficiency of the proposed scheme has been analyzed such as up to 78x improvements in computation during data outsourcing process compared to the previous study which does not exploit trusted execution environment in edge computing architecture.

Analysis and Elimination of Side Channels during Duplicate Identification in Remote Data Outsourcing (원격 저장소 데이터 아웃소싱에서 발생하는 중복 식별 과정에서의 부채널 분석 및 제거)

  • Koo, Dongyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.4
    • /
    • pp.981-987
    • /
    • 2017
  • Proliferation of cloud computing services brings about reduction of the maintenance and management costs by allowing data to be outsourced to a dedicated third-party remote storage. At the same time, the majority of storage service providers have adopted a data deduplication technique for efficient utilization of storage resources. When a hash tree is employed for duplicate identification as part of deduplication process, size information of the attested data and partial information about the tree can be deduced from eavesdropping. To mitigate such side channels, in this paper, a new duplicate identification method is presented by exploiting a multi-set hash function.

A Study on Duplicate Detection Algorithm in Union Catalog (종합목록의 중복레코드 검증을 위한 알고리즘 연구)

  • Cho, Sun-Yeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.4
    • /
    • pp.69-88
    • /
    • 2003
  • This study intends to develop a new duplicate detection algorithm to improve database quality. The new algorithm is developed to analyze by variables of language and bibliographic type, and it checks elements in bibliographic data not just MARC fields. The algorithm computes the degree of similarity and the weight values to avoid possible elimination of records by simple input error. The study was peformed on the 7,649 newly uploaded records during the last one year against the 210,000 sample master database. The findings show that the new algorithm has improved the duplicates recall rate by 36.2%.

A Study on Adaptive Knowledge Automatic Acquisition Model from Case-Based Reasoning System (사례 기반 추론 시스템에서 적응 지식 자동 획득 모델에 관한 연구)

  • 이상범;김영천;이재훈;이성주
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.05a
    • /
    • pp.81-86
    • /
    • 2002
  • In current CBR(Case-Based Reasoning) systems, the case adaptation is usually performed by rule-based method that use rules hand-coded by the system developer. So, CBR system designer faces knowledge acquisition bottleneck similar to those found in traditional expert system design. In this thesis, 1 present a model for learning method of case adaptation knowledge using case base. The feature difference of each pair of cases are noted and become the antecedent part of an adaptation rule, the differences between the solutions in the compared cases become the consequent part of the rule. However, the number of rules that can possibly be discovered using a learning algorithm is enormous. The first method for finding cases to compare uses a syntactic measure of the distance between cases. The threshold fur identification of candidates for comparison is fixed th the maximum number of differences between the target and retrived case from all retrievals. The second method is to use similarity metric since the threshold method may not be an accurate measure. I suggest the elimination method of duplicate rules. In the elimination process, a confidence value is assigned to each rule based on its frequency. The learned adaptation rules is applied in riven target Problem. The basic. process involves search for all rules that handle at least one difference followed by a combination process in which complete solutions are built.

  • PDF

Cost Benefit Analysis of Foreign Research Information Centers (외국학술지지원센터 운영 사업의 비용편익분석)

  • Kim, Kwang-Seok;Oh, Dong-Geun;Yeo, Ji-Suk
    • Journal of Korean Library and Information Science Society
    • /
    • v.43 no.1
    • /
    • pp.287-301
    • /
    • 2012
  • This article analyzes the costs and benefits of the seven individuals centers of Foreign Research Information Center. Results of the investment feasibility analysis based on 30 years time span show that 0.99 of BCR (Benefit-Cost Ratio), 5.49% of IRR (Internal Rate of Return), and -507 million Won of NPV (Net Present Value). Sensibility analysis suggests that BCR can be influenced by the journal usage, social rate of discount, and elimination of the duplicate journal among individual centers.

A Study on the Construction CALS system and the Road Fine System Connections (건설CALS시스템과 과태료부과시스템의 연계방안 연구)

  • Kim, Tae-Hak;Ju, Ki-Beom
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.12
    • /
    • pp.6111-6117
    • /
    • 2012
  • This paper is a type of business representatives who are responsible for the burden he want to relieve. In addition, to ensure transparency and to contribute to the Elimination of irregularities in cracking down on overcharging to research. In order to avoid duplicate entry of fixed checkpoints of modern equipment was installed in the automatic collection of information inspection. And, if the information is the same information to crack down imposed the Road Fine system linkage system Constrution CALS system to research. Land administration office imposed a fine after confirmation information to crack down on the auditory system, the auditory crackdown imposed fine and proceed to determine the status of your work since. This study was carried out and any future crackdown on overcharging gathering information for the efficient production of surgical intervention for improved accuracy considering that frequency.

Prediction Survey on Construction Guarantee Market Due to the Restructuring of the Construction Industry's Production System (건설산업 생산체계 개편에 따른 건설보증시장 변화 예측 조사)

  • Kim, Sungil;Chang, Chulki;Yoo, Hyunji
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.1
    • /
    • pp.63-71
    • /
    • 2021
  • The construction guarantee market is a rear market affected by changes in the construction industry and market. As the restructuring of the construction industry's production system is being carried out, such as the abolition of regulations on business field between general and specialty contractors, major changes are expected not only in the construction market but also in the construction guarantee market. In construction guarantee market, there are currently three Contractor Financial Cooperatives, which are divided based on business field and business type(General, Specialty and Plant & Mechanical). The abolition of business field regulation will have various effects on construction guarantee market, such as forming a competitive structure among the three Contractor Financial Cooperatives. Therefore the role of the construction guarantee institutions are also required to change. This study predicted the changes in the construction guarantee market after the restructuring of construction industry and analyzed the ripple effect on the market. This study reviewed the details of the reorganization plan on the construction industry, policies and statistical data related to construction guarantee, and lastly conducted the survey on each member of the three Contractor Financial Cooperatives to analyze the usage behavior in the future guarantee market. Based on the result of this study, the members of both the General Contractor Financial Cooperative and Specialty Contractor Financial Cooperative are not willing to change the existing institutions, but a lot of them are expected to use other institutions in duplicate. The members of the Plant & Mechanical Contractor Financial Cooperative are most willing to use other guarantee institutions.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.