• Title/Summary/Keyword: sensitive information

Search Result 2,348, Processing Time 0.028 seconds

Learning fair prediction models with an imputed sensitive variable: Empirical studies

  • Kim, Yongdai;Jeong, Hwichang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.251-261
    • /
    • 2022
  • As AI has a wide range of influence on human social life, issues of transparency and ethics of AI are emerging. In particular, it is widely known that due to the existence of historical bias in data against ethics or regulatory frameworks for fairness, trained AI models based on such biased data could also impose bias or unfairness against a certain sensitive group (e.g., non-white, women). Demographic disparities due to AI, which refer to socially unacceptable bias that an AI model favors certain groups (e.g., white, men) over other groups (e.g., black, women), have been observed frequently in many applications of AI and many studies have been done recently to develop AI algorithms which remove or alleviate such demographic disparities in trained AI models. In this paper, we consider a problem of using the information in the sensitive variable for fair prediction when using the sensitive variable as a part of input variables is prohibitive by laws or regulations to avoid unfairness. As a way of reflecting the information in the sensitive variable to prediction, we consider a two-stage procedure. First, the sensitive variable is fully included in the learning phase to have a prediction model depending on the sensitive variable, and then an imputed sensitive variable is used in the prediction phase. The aim of this paper is to evaluate this procedure by analyzing several benchmark datasets. We illustrate that using an imputed sensitive variable is helpful to improve prediction accuracies without hampering the degree of fairness much.

An Extended Frequent Pattern Tree for Hiding Sensitive Frequent Itemsets (민감한 빈발 항목집합 숨기기 위한 확장 빈발 패턴 트리)

  • Lee, Dan-Young;An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.18D no.3
    • /
    • pp.169-178
    • /
    • 2011
  • Recently, data sharing between enterprises or organizations is required matter for task cooperation. In this process, when the enterprise opens its database to the affiliates, it can be occurred to problem leaked sensitive information. To resolve this problem it is needed to hide sensitive information from the database. Previous research hiding sensitive information applied different heuristic algorithms to maintain quality of the database. But there have been few studies analyzing the effects on the items modified during the hiding process and trying to minimize the hided items. This paper suggests eFP-Tree(Extended Frequent Pattern Tree) based FP-Tree(Frequent Pattern Tree) to hide sensitive frequent itemsets. Node formation of eFP-Tree uses border to minimize impacts of non sensitive frequent itemsets in hiding process, by organizing all transaction, sensitive and border information differently to before. As a result to apply eFP-Tree to the example transaction database, the lost items were less than 10%, proving it is more effective than the existing algorithm and maintain the quality of database to the optimal.

Privacy-Preserving IoT Data Collection in Fog-Cloud Computing Environment

  • Lim, Jong-Hyun;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.43-49
    • /
    • 2019
  • Today, with the development of the internet of things, wearable devices related to personal health care have become widespread. Various global information and communication technology companies are developing various wearable health devices, which can collect personal health information such as heart rate, steps, and calories, using sensors built into the device. However, since individual health data includes sensitive information, the collection of irrelevant health data can lead to personal privacy issue. Therefore, there is a growing need to develop technology for collecting sensitive health data from wearable health devices, while preserving privacy. In recent years, local differential privacy (LDP), which enables sensitive data collection while preserving privacy, has attracted much attention. In this paper, we develop a technology for collecting vast amount of health data from a smartwatch device, which is one of popular wearable health devices, using local difference privacy. Experiment results with real data show that the proposed method is able to effectively collect sensitive health data from smartwatch users, while preserving privacy.

A Task Scheduling Strategy in Cloud Computing with Service Differentiation

  • Xue, Yuanzheng;Jin, Shunfu;Wang, Xiushuang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5269-5286
    • /
    • 2018
  • Task scheduling is one of the key issues in improving system performance and optimizing resource management in cloud computing environment. In order to provide appropriate services for heterogeneous users, we propose a novel task scheduling strategy with service differentiation, in which the delay sensitive tasks are assigned to the rapid cloud with high-speed processing, whereas the fault sensitive tasks are assigned to the reliable cloud with service restoration. Considering that a user can receive service from either local SaaS (Software as a Service) servers or public IaaS (Infrastructure as a Service) cloud, we establish a hybrid queueing network based system model. With the assumption of Poisson arriving process, we analyze the system model in steady state. Moreover, we derive the performance measures in terms of average response time of the delay sensitive tasks and utilization of VMs (Virtual Machines) in reliable cloud. We provide experimental results to validate the proposed strategy and the system model. Furthermore, we investigate the Nash equilibrium behavior and the social optimization behavior of the delay sensitive tasks. Finally, we carry out an improved intelligent searching algorithm to obtain the optimal arrival rate of total tasks and present a pricing policy for the delay sensitive tasks.

Secure Training Support Vector Machine with Partial Sensitive Part

  • Park, Saerom
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.1-9
    • /
    • 2021
  • In this paper, we propose a training algorithm of support vector machine (SVM) with a sensitive variable. Although machine learning models enable automatic decision making in the real world applications, regulations prohibit sensitive information from being used to protect privacy. In particular, the privacy protection of the legally protected attributes such as race, gender, and disability is compulsory. We present an efficient least square SVM (LSSVM) training algorithm using a fully homomorphic encryption (FHE) to protect a partial sensitive attribute. Our framework posits that data owner has both non-sensitive attributes and a sensitive attribute while machine learning service provider (MLSP) can get non-sensitive attributes and an encrypted sensitive attribute. As a result, data owner can obtain the encrypted model parameters without exposing their sensitive information to MLSP. In the inference phase, both non-sensitive attributes and a sensitive attribute are encrypted, and all computations should be conducted on encrypted domain. Through the experiments on real data, we identify that our proposed method enables to implement privacy-preserving sensitive LSSVM with FHE that has comparable performance with the original LSSVM algorithm. In addition, we demonstrate that the efficient sensitive LSSVM with FHE significantly improves the computational cost with a small degradation of performance.

An Implementation of Web-based Unified Randomized Response System for Obtaining Sensitive Information and Application Method

  • Lee, Gi-Sung;Nam, Ki-Seong;Son, Chang-Kyoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1237-1250
    • /
    • 2006
  • In this paper we develop the web-based unified randomized response system for obtaining more reliable response to the sensitive characteristic such as a crime of violence at home, and a bribing and so on. This survey system embody to apply with from the classical to recently research, for example from the Warner's model to the 2-stage model. In addition, our survey system is able to link between the typical and the randomized response system. Finally, our survey system looks into a variation according to various sensitive questions as well as it can be used for a single question.

  • PDF

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang;Gao, Hao-Lin;Li, Bi-Cheng;Hu, Guo-En
    • Transactions on Electrical and Electronic Materials
    • /
    • v.15 no.3
    • /
    • pp.125-129
    • /
    • 2014
  • A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

Combined Procedure of Direct Question and Randomized Response Technique

  • Choi, Kyoung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.275-278
    • /
    • 2003
  • In this paper, a simple and obvious procedure is presented that allows to estimate $\pi$, the population proportion of a sensitive group. Suggested procedure is combined procedure of direct question and randomized response technique. It is found that the proposed procedure is more efficient than Warner's(1965).

  • PDF

A Conditional Unrelated Question Model with Quantitative Attribute

  • Lee, Gi Sung;Hong, Ki Hak
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.753-765
    • /
    • 2001
  • We suggest a quantitative conditional unrelated question model that can be used in obtaining more sensitive information. For whom say "yes" about the less 7han sensitive question .B we ask only about the more sensitive variable X. We extend our model to two sample case when there is no information about the true mean of the unrelated variable Y. Finally we compare the efficiency of our model with that of Greenberg et al.′s.

  • PDF

A Study on Reinforcing Non-Identifying Personal Sensitive Information Management on IoT Environment (IoT 환경의 비식별 개인 민감정보관리 강화에 대한 연구)

  • Yang, Yoon-Min;Park, Soon-Tai;Kim, Yong-Min
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.8
    • /
    • pp.34-41
    • /
    • 2020
  • An era of stabilizing IoT markets and rapid expansion is coming. In an IoT environment, communication environments where objects take the lead in communication can occur depending on the situation, and communication with unspecified IoT environments has increased the need for thorough management of personal sensitive information. Although there are benefits that can be gained by changing environment due to IoT, there are problems where personal sensitive information is transmitted in the name of big data without even knowing it. For the safe management of personal sensitive information transmitted through sensors in IoT environment, the government plans to propose measures to enhance information protection in IoT environment as the use of non-identifiable personal information in IoT environment is expected to be activated in earnest through the amendment of the Data 3 Act and the initial collection method.