• 제목/요약/키워드: Data Anonymization

검색결과 31건 처리시간 0.022초

Hybrid Recommendation Algorithm for User Satisfaction-oriented Privacy Model

  • Sun, Yinggang;Zhang, Hongguo;Zhang, Luogang;Ma, Chao;Huang, Hai;Zhan, Dongyang;Qu, Jiaxing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권10호
    • /
    • pp.3419-3437
    • /
    • 2022
  • Anonymization technology is an important technology for privacy protection in the process of data release. Usually, before publishing data, the data publisher needs to use anonymization technology to anonymize the original data, and then publish the anonymized data. However, for data publishers who do not have or have less anonymized technical knowledge background, how to configure appropriate parameters for data with different characteristics has become a more difficult problem. In response to this problem, this paper adds a historical configuration scheme resource pool on the basis of the traditional anonymization process, and configuration parameters can be automatically recommended through the historical configuration scheme resource pool. On this basis, a privacy model hybrid recommendation algorithm for user satisfaction is formed. The algorithm includes a forward recommendation process and a reverse recommendation process, which can respectively perform data anonymization processing for users with different anonymization technical knowledge backgrounds. The privacy model hybrid recommendation algorithm for user satisfaction described in this paper is suitable for a wider population, providing a simpler, more efficient and automated solution for data anonymization, reducing data processing time and improving the quality of anonymized data, which enhances data protection capabilities.

ShareSafe: An Improved Version of SecGraph

  • Tang, Kaiyu;Han, Meng;Gu, Qinchen;Zhou, Anni;Beyah, Raheem;Ji, Shouling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권11호
    • /
    • pp.5731-5754
    • /
    • 2019
  • In this paper, we redesign, implement, and evaluate ShareSafe (Based on SecGraph), an open-source secure graph data sharing/publishing platform. Within ShareSafe, we propose De-anonymization Quantification Module and Recommendation Module. Besides, we model the attackers' background knowledge and evaluate the relation between graph data privacy and the structure of the graph. To the best of our knowledge, ShareSafe is the first platform that enables users to perform data perturbation, utility evaluation, De-A evaluation, and Privacy Quantification. Leveraging ShareSafe, we conduct a more comprehensive and advanced utility and privacy evaluation. The results demonstrate that (1) The risk of privacy leakage of anonymized graph increases with the attackers' background knowledge. (2) For a successful de-anonymization attack, the seed mapping, even relatively small, plays a much more important role than the auxiliary graph. (3) The structure of graph has a fundamental and significant effect on the utility and privacy of the graph. (4) There is no optimal anonymization/de-anonymization algorithm. For different environment, the performance of each algorithm varies from each other.

결정트리 기반의 기계학습을 이용한 동적 데이터에 대한 재익명화기법 (Re-anonymization Technique for Dynamic Data Using Decision Tree Based Machine Learning)

  • 김영기;홍충선
    • 정보과학회 논문지
    • /
    • 제44권1호
    • /
    • pp.21-26
    • /
    • 2017
  • 사물인터넷, 클라우드 컴퓨팅, 빅데이터 등 새로운 기술의 도입으로 처리하는 데이터의 종류와 양이 증가하면서, 개인의 민감한 정보가 유출되는 것에 대한 보안이슈가 더욱 중요시되고 있다. 민감정보를 보호하기 위한 방법으로 데이터에 포함된 개인정보를 공개 또는 배포하기 전에 일부를 삭제하거나 알아볼 수 없는 형태로 변환하는 익명화기법을 사용한다. 그러나 준식별자의 일반화 수준을 계층화하여 익명화를 수행하는 기존의 방법은 데이터 테이블의 레코드가 추가 또는 삭제되어 k-익명성을 만족하지 못하는 경우에 더 높은 일반화 수준을 필요로 한다. 이와 같은 과정으로 인한 정보의 손실이 불가피하며 이는 데이터의 유용성을 저해하는 요소이다. 따라서 본 논문에서는 결정트리 기반의 기계학습을 적용하여 기존의 익명화방법의 정보손실을 최소화하여 데이터의 유용성을 향상시키는 익명화기법을 제안한다

데이터 유용성 향상을 위한 서비스 기반의 안전한 익명화 기법 연구 (A Study on Service-based Secure Anonymization for Data Utility Enhancement)

  • 황치광;최종원;홍충선
    • 정보과학회 논문지
    • /
    • 제42권5호
    • /
    • pp.681-689
    • /
    • 2015
  • 개인정보는 살아 있는 개인에 관한 정보로서 성명, 주민등록번호 및 영상 등을 통하여 개인을 알아볼 수 있는 정보를 말한다. 정보주체의 민감한 정보를 포함하고 있는 개인정보는 유출시 각종 범죄에 악용될 수 있다. 이를 막기 위해 데이터를 공개하거나 배포하기 전에 개인 식별 요소를 제거하는 방법을 사용한다. 하지만 이름이나 주민등록번호 등의 식별자를 삭제 또는 변경하여 정보의 공개를 제한하더라도, 다른 데이터와 연결하여 분석하면 개인정보가 노출될 가능성이 존재한다. 이러한 문제점을 해결하기 위하여 본 논문에서는 서비스에 활용될 속성은 낮은 수준의 익명화를 수행하여 실제 사용될 정보의 유용성을 높이고, 그와 함께 연결 공격을 방지하여 하나의 원본 데이터 테이블에서 둘 이상의 익명화된 테이블을 동시에 제공할 수 있는 익명화 기법을 제안한다. 그리고 협조적 게임이론에 기반을 둔 실험을 통해 본 제안의 우수성을 입증한다.

Anonymizing Graphs Against Weight-based Attacks with Community Preservation

  • Li, Yidong;Shen, Hong
    • Journal of Computing Science and Engineering
    • /
    • 제5권3호
    • /
    • pp.197-209
    • /
    • 2011
  • The increasing popularity of graph data, such as social and online communities, has initiated a prolific research area in knowledge discovery and data mining. As more real-world graphs are released publicly, there is growing concern about privacy breaching for the entities involved. An adversary may reveal identities of individuals in a published graph, with the topological structure and/or basic graph properties as background knowledge. Many previous studies addressing such attacks as identity disclosure, however, concentrate on preserving privacy in simple graph data only. In this paper, we consider the identity disclosure problem in weighted graphs. The motivation is that, a weighted graph can introduce much more unique information than its simple version, which makes the disclosure easier. We first formalize a general anonymization model to deal with weight-based attacks. Then two concrete attacks are discussed based on weight properties of a graph, including the sum and the set of adjacent weights for each vertex. We also propose a complete solution for the weight anonymization problem to prevent a graph from both attacks. In addition, we also investigate the impact of the proposed methods on community detection, a very popular application in the graph mining field. Our approaches are efficient and practical, and have been validated by extensive experiments on both synthetic and real-world datasets.

Enhanced Hybrid Privacy Preserving Data Mining Technique

  • Kundeti Naga Prasanthi;M V P Chandra Sekhara Rao;Ch Sudha Sree;P Seshu Babu
    • International Journal of Computer Science & Network Security
    • /
    • 제23권6호
    • /
    • pp.99-106
    • /
    • 2023
  • Now a days, large volumes of data is accumulating in every field due to increase in capacity of storage devices. These large volumes of data can be applied with data mining for finding useful patterns which can be used for business growth, improving services, improving health conditions etc. Data from different sources can be combined before applying data mining. The data thus gathered can be misused for identity theft, fake credit/debit card transactions, etc. To overcome this, data mining techniques which provide privacy are required. There are several privacy preserving data mining techniques available in literature like randomization, perturbation, anonymization etc. This paper proposes an Enhanced Hybrid Privacy Preserving Data Mining(EHPPDM) technique. The proposed technique provides more privacy of data than existing techniques while providing better classification accuracy. The experimental results show that classification accuracies have increased using EHPPDM technique.

하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발 (Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments)

  • 김대호;김종욱
    • 한국멀티미디어학회논문지
    • /
    • 제20권11호
    • /
    • pp.1785-1792
    • /
    • 2017
  • Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.

AMV: 클로킹 영역을 최소화하는 k-익명화 기법 (AMV: A k-anonymization technique minimizing the cloaking region)

  • 송두희;허민재;심종원;황소리;송문배;박광진
    • 인터넷정보학회논문지
    • /
    • 제15권6호
    • /
    • pp.9-14
    • /
    • 2014
  • 본 논문에서, 우리는 이동 벡터(motion vector)를 이용하여 이동 중인 클라이언트의 k-익명화를 지원하는 AMV 기법을 제안한다. AMV는 이동 벡터 정보를 이용하여 사용자(클라이언트)들의 최소 클로킹 영역을 만들 수 있다. 클로킹 영역을 줄이는 주된 이유는 서버가 공간 질의를 요청한 모든 사용자에게 다수의 객체 정보(질의 결과)를 전송해야 하기 때문이다. 실험 결과를 통하여 기존 기법 보다 AMV 기법의 성능이 우수함을 증명하였다.

OHDSI OMOP-CDM 데이터베이스 보안 취약점 및 대응방안 (OHDSI OMOP-CDM Database Security Weakness and Countermeasures)

  • 이경환;장성용
    • 한국IT서비스학회지
    • /
    • 제21권4호
    • /
    • pp.63-74
    • /
    • 2022
  • Globally researchers at medical institutions are actively sharing COHORT data of patients to develop vaccines and treatments to overcome the COVID-19 crisis. OMOP-CDM, a common data model that efficiently shares medical data research independently operated by individual medical institutions has patient personal information (e.g. PII, PHI). Although PII and PHI are managed and shared indistinguishably through de-identification or anonymization in medical institutions they could not be guaranteed at 100% by complete de-identification and anonymization. For this reason the security of the OMOP-CDM database is important but there is no detailed and specific OMOP-CDM security inspection tool so risk mitigation measures are being taken with a general security inspection tool. This study intends to study and present a model for implementing a tool to check the security vulnerability of OMOP-CDM by analyzing the security guidelines for the US database and security controls of the personal information protection of the NIST. Additionally it intends to verify the implementation feasibility by real field demonstration in an actual 3 hospitals environment. As a result of checking the security status of the test server and the CDM database of the three hospitals in operation, most of the database audit and encryption functions were found to be insufficient. Based on these inspection results it was applied to the optimization study of the complex and time-consuming CDM CSF developed in the "Development of Security Framework Required for CDM-based Distributed Research" task of the Korea Health Industry Promotion Agency. According to several recent newspaper articles, Ramsomware attacks on financially large hospitals are intensifying. Organizations that are currently operating or will operate CDM databases need to install database audits(proofing) and encryption (data protection) that are not provided by the OMOP-CDM database template to prevent attackers from compromising.

Assessing the Impact of Defacing Algorithms on Brain Volumetry Accuracy in MRI Analyses

  • Dong-Woo Ryu;ChungHwee Lee;Hyuk-je Lee;Yong S Shim;Yun Jeong Hong;Jung Hee Cho;Seonggyu Kim;Jong-Min Lee;Dong Won Yang
    • 대한치매학회지
    • /
    • 제23권3호
    • /
    • pp.127-135
    • /
    • 2024
  • Background and Purpose: To ensure data privacy, the development of defacing processes, which anonymize brain images by obscuring facial features, is crucial. However, the impact of these defacing methods on brain imaging analysis poses significant concern. This study aimed to evaluate the reliability of three different defacing methods in automated brain volumetry. Methods: Magnetic resonance imaging with three-dimensional T1 sequences was performed on ten patients diagnosed with subjective cognitive decline. Defacing was executed using mri_deface, BioImage Suite Web-based defacing, and Defacer. Brain volumes were measured employing the QBraVo program and FreeSurfer, assessing intraclass correlation coefficient (ICC) and the mean differences in brain volume measurements between the original and defaced images. Results: The mean age of the patients was 71.10±6.17 years, with 4 (40.0%) being male. The total intracranial volume, total brain volume, and ventricle volume exhibited high ICCs across the three defacing methods and 2 volumetry analyses. All regional brain volumes showed high ICCs with all three defacing methods. Despite variations among some brain regions, no significant mean differences in regional brain volume were observed between the original and defaced images across all regions. Conclusions: The three defacing algorithms evaluated did not significantly affect the results of image analysis for the entire brain or specific cerebral regions. These findings suggest that these algorithms can serve as robust methods for defacing in neuroimaging analysis, thereby supporting data anonymization without compromising the integrity of brain volume measurements.