• Title/Summary/Keyword: anonymization

Search Result 48, Processing Time 0.025 seconds

An Anonymization Scheme Protecting User Identification Threat in Profile-based LBS Model (프로필을 고려한 위치 기반 서비스 모델에서 사용자 식별 위협을 막는 익명화 기법)

  • Chung, Seung-Joo;Park, Seog
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.170-174
    • /
    • 2010
  • 최근 무선 인터넷에서 사용자의 위치정보가 다양한 응용의 정보 요소로 활용되기 시작하였고, 이러한 응용의 하나로 위치기반 서비스(Location-Based Service: LBS)가 주목을 받고 있다. 그러나 위치기반 서비스에서는 서비스를 요청하는 사용자가 자신의 정확한 위치 정보를 데이터베이스 서버로 보내기 때문에 사용자의 개인 정보가 노출될 수 있는 취약성을 지니고 있다. 이에 모바일 사용자가 안전하고 편리하게 위치기반 서비스를 사용하기 위한 개인 정보보호 방법이 요구되었다. 사용자의 위치 정보를 보호하기 위해 전통적인 데이터베이스에서의 개인정보 보호를 위해 사용되었던 K-anonymity의 개념이 적용되었고, 그에 따른 익명화를 수행할 수 있는 모델이 제시되었다. 하지만 기존 연구되었던 모델들은 오직 사용자의 정확한 위치 정보만을 민감한 속성으로 고려하여 익명화를 수행하였기 때문에, 이후 제시된 사용자의 프로필 정보를 고려한 모델에 대해서는 기존의 익명화만으로는 완전한 프라이버시를 보장할 수 없게 되어 추가적인 처리 과정을 필요로 하게 되었다. 본 연구는 프로필 정보를 고려한 위치기반 서비스 모델에서 Private-to-Public 질의가 주어지는 경우에 발생하는 추가적인 개인 식별의 위협에 관한 문제를 정의하고 이에 대한 해결책을 제시하며, 또한 제안 기법이 사용자 정보 보호를 보장하며 기존 방안보다 효율적임을 보인다.

  • PDF

Efficient K-Anonymization Implementation with Apache Spark

  • Kim, Tae-Su;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.17-24
    • /
    • 2018
  • Today, we are living in the era of data and information. With the advent of Internet of Things (IoT), the popularity of social networking sites, and the development of mobile devices, a large amount of data is being produced in diverse areas. The collection of such data generated in various area is called big data. As the importance of big data grows, there has been a growing need to share big data containing information regarding an individual entity. As big data contains sensitive information about individuals, directly releasing it for public use may violate existing privacy requirements. Thus, privacy-preserving data publishing (PPDP) has been actively studied to share big data containing personal information for public use, while preserving the privacy of the individual. K-anonymity, which is the most popular method in the area of PPDP, transforms each record in a table such that at least k records have the same values for the given quasi-identifier attributes, and thus each record is indistinguishable from other records in the same class. As the size of big data continuously getting larger, there is a growing demand for the method which can efficiently anonymize vast amount of dta. Thus, in this paper, we develop an efficient k-anonymity method by using Spark distributed framework. Experimental results show that, through the developed method, significant gains in processing time can be achieved.

Multi-Layer Bitcoin Clustering through Off-Chain Data of Darkweb (다크웹 오프체인 데이터를 이용한 다계층 비트코인 클러스터링 기법)

  • Lee, Jin-hee;Kim, Min-jae;Hur, Junbeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.4
    • /
    • pp.715-729
    • /
    • 2021
  • Bitcoin is one of the cryptocurrencies, which is decentralized and transparent. However, due to its anonymity, it is currently being used for the purpose of transferring funds for illegal transactions in darknet markets. To solve this problem, clustering heuristic based on the characteristics of a Bitcoin transaction has been proposed. However, we found that the previous heuristis suffer from high false negative rates. In this study, we propose a novel heuristic for bitcoin clustering using off-chain data. Specifically, we collected and analyzed user review data from Silk Road 4 as off-chain data. As a result, 31.68% of the review data matched the actual Bitcoin transaction, and false negatives were reduced by 91.7% in the proposed method.

Ethical Issues on Environmental Health Study

  • Hyein WOO
    • Journal of Research and Publication Ethics
    • /
    • v.4 no.1
    • /
    • pp.9-14
    • /
    • 2023
  • Purpose: Adequate public input and participation in environmental health research must be provided to ensure accurate results from studies involving human exposure to potentially hazardous substances. By addressing these ethical issues associated with environmental health research, this study can help reduce risks for individuals participating in studies and whole communities affected by their impactful findings. Research design, data and methodology: The current research should have followed the rule of qualitative textual research, searching and exploring the adequate prior resources such as books and peer-reviewed journal articles so that the current author could screen proper previous works which are acceptable for the content analysis. Results: The current research has figured out four ethical issues to improve environmental health study as follows: (1) Lack of Guidance for Collecting and Utilizing Data Ethically, (2) Insufficient Consideration Is Given to Vulnerable Populations When Conducting Studies, (3) Unclear Standards Exist for Protecting the Privacy Of Participant's Personal Information, and (4) Conducting Socially and Religiously Acceptable Research in Various Communities. Conclusions: This research concludes that future researchers should consider implementing anonymization techniques where possible so that findings are still accessible, but the risk posed by disclosing identifying information remains minimized during the analysis/publication stages.

Models for Privacy-preserving Data Publishing : A Survey (프라이버시 보호 데이터 배포를 위한 모델 조사)

  • Kim, Jongseon;Jung, Kijung;Lee, Hyukki;Kim, Soohyung;Kim, Jong Wook;Chung, Yon Dohn
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.195-207
    • /
    • 2017
  • In recent years, data are actively exploited in various fields. Hence, there is a strong demand for sharing and publishing data. However, sensitive information regarding people can breach the privacy of an individual. To publish data while protecting an individual's privacy with minimal information distortion, the privacy- preserving data publishing(PPDP) has been explored. PPDP assumes various attacker models and has been developed according to privacy models which are principles to protect against privacy breaching attacks. In this paper, we first present the concept of privacy breaching attacks. Subsequently, we classify the privacy models according to the privacy breaching attacks. We further clarify the differences and requirements of each privacy model.

Analysis of Privacy Violation Possibility of Partially Anonymized Big Data (온라인 상에 공개된 부분 익명화된 빅데이터의 프라이버시 침해 가능성 분석)

  • Jung, Kang-soo;Park, Seog;Choi, Dae-seon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.665-679
    • /
    • 2018
  • With the development of information and communication technology, especially wireless Internet technology and the spread of smart phones, digital data has increased. As a result, privacy issues which concerns about exposure of personal sensitive information are increasing. In this paper, we analyze the privacy vulnerability of online big data in domestic internet environment, especially focusing on portal service, and propose a measure to evaluate the possibility of privacy violation. For this purpose, we collected about 50 million user posts from the potal service contents and extracted the personal information. we find that potal service user can be identified by the extracted personal information even though the user id is partially anonymized. In addition, we proposed a risk measurement evaluation method that reflects the possibility of personal information linkage between service using partial anonymized ID and personal information exposure level.

Efficient Dummy Generation for Protecting Location Privacy (개인의 위치를 보호하기 위한 효율적인 더미 생성)

  • Cai, Tian-Yuan;Song, Doo-Hee;Youn, Ji-Hye;Lee, Won-Gyu;Kim, Yong-Kab;Park, Kwang-Jin
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.9 no.6
    • /
    • pp.526-533
    • /
    • 2016
  • The researches protecting user's location in location-based services(LBS) have received much attention. Especially k-anonymity is the most popular privacy preservation method. k-anonymization means that it selects k-1 other dummies or clients to make the cloaking region. This reduced the probability of the query issuer's location being exposed to untrusted parties to 1/k. But query's location may expose to adversary when k-1 dummies are concentrated in query's location or there is dummy in where query can not exist. Therefore, we proposed the dummy system model and algorithm taking the real environment into account to protect user's location privacy. And we proved the efficiency of our method in terms of experiment result.

Experiment and Implementation of a Machine-Learning Based k-Value Prediction Scheme in a k-Anonymity Algorithm (k-익명화 알고리즘에서 기계학습 기반의 k값 예측 기법 실험 및 구현)

  • Muh, Kumbayoni Lalu;Jang, Sung-Bong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.1
    • /
    • pp.9-16
    • /
    • 2020
  • The k-anonymity scheme has been widely used to protect private information when Big Data are distributed to a third party for research purposes. When the scheme is applied, an optimal k value determination is one of difficult problems to be resolved because many factors should be considered. Currently, the determination has been done almost manually by human experts with their intuition. This leads to degrade performance of the anonymization, and it takes much time and cost for them to do a task. To overcome this problem, a simple idea has been proposed that is based on machine learning. This paper describes implementations and experiments to realize the proposed idea. In thi work, a deep neural network (DNN) is implemented using tensorflow libraries, and it is trained and tested using input dataset. The experiment results show that a trend of training errors follows a typical pattern in DNN, but for validation errors, our model represents a different pattern from one shown in typical training process. The advantage of the proposed approach is that it can reduce time and cost for experts to determine k value because it can be done semi-automatically.

Robust Data, Event, and Privacy Services in Real-Time Embedded Sensor Network Systems (실시간 임베디드 센서 네트워크 시스템에서 강건한 데이터, 이벤트 및 프라이버시 서비스 기술)

  • Jung, Kang-Soo;Kapitanova, Krasimira;Son, Sang-H.;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.37 no.6
    • /
    • pp.324-332
    • /
    • 2010
  • The majority of event detection in real-time embedded sensor network systems is based on data fusion that uses noisy sensor data collected from complicated real-world environments. Current research has produced several excellent low-level mechanisms to collect sensor data and perform aggregation. However, solutions that enable these systems to provide real-time data processing using readings from heterogeneous sensors and subsequently detect complex events of interest in real-time fashion need further research. We are developing real-time event detection approaches which allow light-weight data fusion and do not require significant computing resources. Underlying the event detection framework is a collection of real-time monitoring and fusion mechanisms that are invoked upon the arrival of sensor data. The combination of these mechanisms and the framework has the potential to significantly improve the timeliness and reduce the resource requirements of embedded sensor networks. In addition to that, we discuss about a privacy that is foundation technique for trusted embedded sensor network system and explain anonymization technique to ensure privacy.

Analyzing Learners Behavior and Resources Effectiveness in a Distance Learning Course: A Case Study of the Hellenic Open University

  • Alachiotis, Nikolaos S.;Stavropoulos, Elias C.;Verykios, Vassilios S.
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.3
    • /
    • pp.6-20
    • /
    • 2019
  • Learning analytics, or educational data mining, is an emerging field that applies data mining methods and tools for the exploitation of data coming from educational environments. Learning management systems, like Moodle, offer large amounts of data concerning students' activity, performance, behavior, and interaction with their peers and their tutors. The analysis of these data can be elaborated to make decisions that will assist stakeholders (students, faculty, and administration) to elevate the learning process in higher education. In this work, the power of Excel is exploited to analyze data in Moodle, utilizing an e-learning course developed for enhancing the information computer technology skills of school teachers in primary and secondary education in Greece. Moodle log files are appropriately manipulated in order to trace daily and weekly activity of the learners concerning distribution of access to resources, forum participation, and quizzes and assignments submission. Learners' activity was visualized for every hour of the day and for every day of the week. The visualization of access to every activity or resource during the course is also obtained. In this fashion teachers can schedule online synchronous lectures or discussions more effectively in order to maximize the learners' participation. Results depict the interest of learners for each structural component, their dedication to the course, their participation in the fora, and how it affects the submission of quizzes and assignments. Instructional designers may take advice and redesign the course according to the popularity of the educational material and learners' dedication. Moreover, the final grade of the learners is predicted according to their previous grades using multiple linear regression and sensitivity analysis. These outcomes can be suitably exploited in order for instructors to improve the design of their courses, faculty to alter their educational methodology, and administration to make decisions that will improve the educational services provided.