• Title/Summary/Keyword: Data Anonymization

Search Result 31, Processing Time 0.018 seconds

Hybrid Recommendation Algorithm for User Satisfaction-oriented Privacy Model

  • Sun, Yinggang;Zhang, Hongguo;Zhang, Luogang;Ma, Chao;Huang, Hai;Zhan, Dongyang;Qu, Jiaxing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3419-3437
    • /
    • 2022
  • Anonymization technology is an important technology for privacy protection in the process of data release. Usually, before publishing data, the data publisher needs to use anonymization technology to anonymize the original data, and then publish the anonymized data. However, for data publishers who do not have or have less anonymized technical knowledge background, how to configure appropriate parameters for data with different characteristics has become a more difficult problem. In response to this problem, this paper adds a historical configuration scheme resource pool on the basis of the traditional anonymization process, and configuration parameters can be automatically recommended through the historical configuration scheme resource pool. On this basis, a privacy model hybrid recommendation algorithm for user satisfaction is formed. The algorithm includes a forward recommendation process and a reverse recommendation process, which can respectively perform data anonymization processing for users with different anonymization technical knowledge backgrounds. The privacy model hybrid recommendation algorithm for user satisfaction described in this paper is suitable for a wider population, providing a simpler, more efficient and automated solution for data anonymization, reducing data processing time and improving the quality of anonymized data, which enhances data protection capabilities.

ShareSafe: An Improved Version of SecGraph

  • Tang, Kaiyu;Han, Meng;Gu, Qinchen;Zhou, Anni;Beyah, Raheem;Ji, Shouling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5731-5754
    • /
    • 2019
  • In this paper, we redesign, implement, and evaluate ShareSafe (Based on SecGraph), an open-source secure graph data sharing/publishing platform. Within ShareSafe, we propose De-anonymization Quantification Module and Recommendation Module. Besides, we model the attackers' background knowledge and evaluate the relation between graph data privacy and the structure of the graph. To the best of our knowledge, ShareSafe is the first platform that enables users to perform data perturbation, utility evaluation, De-A evaluation, and Privacy Quantification. Leveraging ShareSafe, we conduct a more comprehensive and advanced utility and privacy evaluation. The results demonstrate that (1) The risk of privacy leakage of anonymized graph increases with the attackers' background knowledge. (2) For a successful de-anonymization attack, the seed mapping, even relatively small, plays a much more important role than the auxiliary graph. (3) The structure of graph has a fundamental and significant effect on the utility and privacy of the graph. (4) There is no optimal anonymization/de-anonymization algorithm. For different environment, the performance of each algorithm varies from each other.

Re-anonymization Technique for Dynamic Data Using Decision Tree Based Machine Learning (결정트리 기반의 기계학습을 이용한 동적 데이터에 대한 재익명화기법)

  • Kim, Young Ki;Hong, Choong Seon
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.21-26
    • /
    • 2017
  • In recent years, new technologies such as Internet of Things, Cloud Computing and Big Data are being widely used. And the type and amount of data is dramatically increasing. This makes security an important issue. In terms of leakage of sensitive personal information. In order to protect confidential information, a method called anonymization is used to remove personal identification elements or to substitute the data to some symbols before distributing and sharing the data. However, the existing method performs anonymization by generalizing the level of quasi-identifier hierarchical. It requires a higher level of generalization in case where k-anonymity is not satisfied since records in data table are either added or removed. Loss of information is inevitable from the process, which is one of the factors hindering the utility of data. In this paper, we propose a novel anonymization technique using decision tree based machine learning to improve the utility of data by minimizing the loss of information.

A Study on Service-based Secure Anonymization for Data Utility Enhancement (데이터 유용성 향상을 위한 서비스 기반의 안전한 익명화 기법 연구)

  • Hwang, Chikwang;Choe, Jongwon;Hong, Choong Seon
    • Journal of KIISE
    • /
    • v.42 no.5
    • /
    • pp.681-689
    • /
    • 2015
  • Personal information includes information about a living human individual. It is the information identifiable through name, resident registration number, and image, etc. Personal information which is collected by institutions can be wrongfully used, because it contains confidential information of an information object. In order to prevent this, a method is used to remove personal identification elements before distributing and sharing the data. However, even when the identifier such as the name and the resident registration number is removed or changed, personal information can be exposed in the case of a linking attack. This paper proposes a new anonymization technique to enhance data utility. To achieve this, attributes that are utilized in service tend to anonymize at a low level. In addition, the anonymization technique of the proposal can provide two or more anonymized data tables from one original data table without concern about a linking attack. We also verify our proposal by using the cooperative game theory.

Anonymizing Graphs Against Weight-based Attacks with Community Preservation

  • Li, Yidong;Shen, Hong
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.3
    • /
    • pp.197-209
    • /
    • 2011
  • The increasing popularity of graph data, such as social and online communities, has initiated a prolific research area in knowledge discovery and data mining. As more real-world graphs are released publicly, there is growing concern about privacy breaching for the entities involved. An adversary may reveal identities of individuals in a published graph, with the topological structure and/or basic graph properties as background knowledge. Many previous studies addressing such attacks as identity disclosure, however, concentrate on preserving privacy in simple graph data only. In this paper, we consider the identity disclosure problem in weighted graphs. The motivation is that, a weighted graph can introduce much more unique information than its simple version, which makes the disclosure easier. We first formalize a general anonymization model to deal with weight-based attacks. Then two concrete attacks are discussed based on weight properties of a graph, including the sum and the set of adjacent weights for each vertex. We also propose a complete solution for the weight anonymization problem to prevent a graph from both attacks. In addition, we also investigate the impact of the proposed methods on community detection, a very popular application in the graph mining field. Our approaches are efficient and practical, and have been validated by extensive experiments on both synthetic and real-world datasets.

Enhanced Hybrid Privacy Preserving Data Mining Technique

  • Kundeti Naga Prasanthi;M V P Chandra Sekhara Rao;Ch Sudha Sree;P Seshu Babu
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.99-106
    • /
    • 2023
  • Now a days, large volumes of data is accumulating in every field due to increase in capacity of storage devices. These large volumes of data can be applied with data mining for finding useful patterns which can be used for business growth, improving services, improving health conditions etc. Data from different sources can be combined before applying data mining. The data thus gathered can be misused for identity theft, fake credit/debit card transactions, etc. To overcome this, data mining techniques which provide privacy are required. There are several privacy preserving data mining techniques available in literature like randomization, perturbation, anonymization etc. This paper proposes an Enhanced Hybrid Privacy Preserving Data Mining(EHPPDM) technique. The proposed technique provides more privacy of data than existing techniques while providing better classification accuracy. The experimental results show that classification accuracies have increased using EHPPDM technique.

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments (하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발)

  • Kim, Dae-Ho;Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.11
    • /
    • pp.1785-1792
    • /
    • 2017
  • Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.

AMV: A k-anonymization technique minimizing the cloaking region (AMV: 클로킹 영역을 최소화하는 k-익명화 기법)

  • Song, Doohee;Heo, Minjae;Sim, Jongwon;Hwang, Sori;Song, Moonbae;Park, Kwangjin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.9-14
    • /
    • 2014
  • In this paper, we propose AMV scheme which supports k-anonymization by using vectors for mobile clients. AMV can produces the minimal cloaking area using motion vector information of users (clients). The main reason for minimizing cloaking area is a server has to send the object information to all users who request the spatial queries. The experimental results show that the proposed AMV has superior performance over existing methods.

OHDSI OMOP-CDM Database Security Weakness and Countermeasures (OHDSI OMOP-CDM 데이터베이스 보안 취약점 및 대응방안)

  • Lee, Kyung-Hwan;Jang, Seong-Yong
    • Journal of Information Technology Services
    • /
    • v.21 no.4
    • /
    • pp.63-74
    • /
    • 2022
  • Globally researchers at medical institutions are actively sharing COHORT data of patients to develop vaccines and treatments to overcome the COVID-19 crisis. OMOP-CDM, a common data model that efficiently shares medical data research independently operated by individual medical institutions has patient personal information (e.g. PII, PHI). Although PII and PHI are managed and shared indistinguishably through de-identification or anonymization in medical institutions they could not be guaranteed at 100% by complete de-identification and anonymization. For this reason the security of the OMOP-CDM database is important but there is no detailed and specific OMOP-CDM security inspection tool so risk mitigation measures are being taken with a general security inspection tool. This study intends to study and present a model for implementing a tool to check the security vulnerability of OMOP-CDM by analyzing the security guidelines for the US database and security controls of the personal information protection of the NIST. Additionally it intends to verify the implementation feasibility by real field demonstration in an actual 3 hospitals environment. As a result of checking the security status of the test server and the CDM database of the three hospitals in operation, most of the database audit and encryption functions were found to be insufficient. Based on these inspection results it was applied to the optimization study of the complex and time-consuming CDM CSF developed in the "Development of Security Framework Required for CDM-based Distributed Research" task of the Korea Health Industry Promotion Agency. According to several recent newspaper articles, Ramsomware attacks on financially large hospitals are intensifying. Organizations that are currently operating or will operate CDM databases need to install database audits(proofing) and encryption (data protection) that are not provided by the OMOP-CDM database template to prevent attackers from compromising.

Assessing the Impact of Defacing Algorithms on Brain Volumetry Accuracy in MRI Analyses

  • Dong-Woo Ryu;ChungHwee Lee;Hyuk-je Lee;Yong S Shim;Yun Jeong Hong;Jung Hee Cho;Seonggyu Kim;Jong-Min Lee;Dong Won Yang
    • Dementia and Neurocognitive Disorders
    • /
    • v.23 no.3
    • /
    • pp.127-135
    • /
    • 2024
  • Background and Purpose: To ensure data privacy, the development of defacing processes, which anonymize brain images by obscuring facial features, is crucial. However, the impact of these defacing methods on brain imaging analysis poses significant concern. This study aimed to evaluate the reliability of three different defacing methods in automated brain volumetry. Methods: Magnetic resonance imaging with three-dimensional T1 sequences was performed on ten patients diagnosed with subjective cognitive decline. Defacing was executed using mri_deface, BioImage Suite Web-based defacing, and Defacer. Brain volumes were measured employing the QBraVo program and FreeSurfer, assessing intraclass correlation coefficient (ICC) and the mean differences in brain volume measurements between the original and defaced images. Results: The mean age of the patients was 71.10±6.17 years, with 4 (40.0%) being male. The total intracranial volume, total brain volume, and ventricle volume exhibited high ICCs across the three defacing methods and 2 volumetry analyses. All regional brain volumes showed high ICCs with all three defacing methods. Despite variations among some brain regions, no significant mean differences in regional brain volume were observed between the original and defaced images across all regions. Conclusions: The three defacing algorithms evaluated did not significantly affect the results of image analysis for the entire brain or specific cerebral regions. These findings suggest that these algorithms can serve as robust methods for defacing in neuroimaging analysis, thereby supporting data anonymization without compromising the integrity of brain volume measurements.