[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2022.29.2.251

Learning fair prediction models with an imputed sensitive variable: Empirical studies

Kim, Yongdai (Department of Statistics, Seoul National University)
Jeong, Hwichang (Department of Statistics, Seoul National University)

Publication Information

Communications for Statistical Applications and Methods / v.29, no.2, 2022 , pp. 251-261 More about this Journal

Abstract

As AI has a wide range of influence on human social life, issues of transparency and ethics of AI are emerging. In particular, it is widely known that due to the existence of historical bias in data against ethics or regulatory frameworks for fairness, trained AI models based on such biased data could also impose bias or unfairness against a certain sensitive group (e.g., non-white, women). Demographic disparities due to AI, which refer to socially unacceptable bias that an AI model favors certain groups (e.g., white, men) over other groups (e.g., black, women), have been observed frequently in many applications of AI and many studies have been done recently to develop AI algorithms which remove or alleviate such demographic disparities in trained AI models. In this paper, we consider a problem of using the information in the sensitive variable for fair prediction when using the sensitive variable as a part of input variables is prohibitive by laws or regulations to avoid unfairness. As a way of reflecting the information in the sensitive variable to prediction, we consider a two-stage procedure. First, the sensitive variable is fully included in the learning phase to have a prediction model depending on the sensitive variable, and then an imputed sensitive variable is used in the prediction phase. The aim of this paper is to evaluate this procedure by analyzing several benchmark datasets. We illustrate that using an imputed sensitive variable is helpful to improve prediction accuracies without hampering the degree of fairness much.

Keywords

AI; bias; fair prediction; imputed sensitive variable;

Citations & Related Records

Reference

1	Chzhen E, Denis C, Hebiri M, Oneto L, and Pontil M (2019). Leveraging labeled and unlabeled data for consistent fair binary classification. In Advances in Neural Information Processing Systems, 32, 12760-12770.
2	Wei D, Ramamurthy KN, and du Pin Calmon F (2021). Optimized Score Transformation for Fair Classification. In Proceedings of Machine Learning Research, 108, 1673-1683.
3	Jiang R, Pacchiano A, Stepleton T, Jiang H, and Chiappa S (2020). Wasserstein fair classification. In Uncertainty in Artificial Intelligence, 862-872, PMLR.
4	Kamiran F and Calders T (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33, 1-33. DOI
5	Kusner MJ, Loftus J, Russell C, and Silva R (2017). Counterfactual fairness, In Advances in Neural Information Processing Systems, 30, 4066-4076.
6	Menon AK andWilliamson RC (2018). The cost of fairness in binary classification, In Conference on Fairness, Accountability and Transparency, 107-118, PMLR.
7	Kamishima T, Akaho S, Asoh H, and Sakuma J (2012). Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 35-50, Springer.
8	Kleinberg J, Ludwig J, Mullainathan S, and Rambachan A (2018). Algorithmic fairness. In Aea Papers and Proceedings, 108, 22-27.
9	Mehrabi N, Morstatter F, Saxena N, Lerman K, and Galstyan A (2021). A survey on bias and fairness in machine learning, ACM Computing Surveys (CSUR), 54, 1-35.
10	Angwin J, Larson J, Mattu S, and Kirchnerb L (2016). Machine bias, ProPublica, 23, 139-159.
11	Corbett-Davies S, Pierson E, and Feller A, Goel S, and Huq A (2017). Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 797-806, ACM.
12	Bechavod Y and Ligett K (2017). Learning Fair Classifiers: A Regularization-Inspired Approach, arXiv preprint arXiv:1707.00044.
13	Celis LE, Huang L, Keswani V, and Vishnoi NK (2019). Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 319-328, ACM.
14	Cho J, Hwang G, and Suh C (2020). A fair classifier using kernel density estimation. In 34th Conference on Neural Information Processing Systems, 33, 15088-15099.
15	Hardt M, Price E, and Srebro N (2016). Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, 3315-3323.
16	Barocas S and Selbst AD (2016). Big data's disparate impact, California Law Review, 104, 671-732.
17	Coston A, Ramamurthy K N, Wei D, Varshney KR, Speakman S, Mustahsan Z, and Chakraborty S (2019). Fair transfer learning with missing protected attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 91-98.
18	Donini M, Oneto L, Ben-David S, Shawe-Taylor J, and Pontil M (2018). Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, 31, 2791-2801.
19	Dwork C, Hardt M, Pitassi T, Reingold O, and Zemel R (2012). Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, 10, 214-226.
20	Feldman M, Friedler SA, Moeller J, Scheidegger C, and Venkatasubramanian S (2015). Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 259-268, ACM.
21	Ingold D and Soper S (2016).Amazon Doesn't Consider the Race of Its Customers. Should it, Bloomberg, April.
22	Webster K, Recasens M, Axelrod V, and Baldridge J (2018). Mind the gap: A balanced corpus of gendered ambiguous pronouns, Transactions of the Association for Computational Linguistics, 6, 605-617. DOI
23	Narasimhan H (2018). Learning with complex loss functions and constraints, In International Conference on Artificial Intelligence and Statistics, 1646-1654. PMLR.
24	Pleiss G, Raghavan M, Wu F, Kleinberg J, and Weinberger KQ (2017). On fairness and calibration. In Advances in Neural Information Processing Systems, 5680-5689.
25	Vogel R, Bellet A, and Clemencon (2020). Learning Fair Scoring Functions: Fairness Definitions, Algorithms and Generalization Bounds for Bipartite Ranking, arXiv preprint arXiv:2002.08159.
26	Calmon F,Wei D, Vinzamuri B, Ramamurthy KN, and Varshney KR (2017). Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, 30, 3992-4001.
27	Kamiran F, Karim A, and Zhang X (2012). Decision theory for discrimination-aware classification. In 2012 IEEE 12th International Conference on Data Mining, 924-929, IEEE.
28	Xu D, Yuan S, Zhang L, and WU X (2018). Fairgan: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data), 570-575. IEEE.
29	Zafar MB, Valera I, Rogriguez MG, and Gummadi KP (2017). Fairness constraints: Mechanisms for fair classification, In Artificial Intelligence and Statistics, 962-970.
30	Zemel R, Wu Y, Swersky K, Pitassi T, and Dwork C (2013). Learning fair representations. In International Conference on Machine Learning, 325-333, PMLR.
31	Creager E, Madras D, Jacobsen JH,Weis MA, Swersky K, Pitassi T, and Zemel R (2019). Flexibly fair representation learning by disentanglement. In International Conference on Machine Learning, 1436-1445, PMLR.
32	Dua D and Graff C (2017). UCI machine learning repository.
33	Fish B, Kun J, and Lelkes A D (2016). A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, 144-152, SIAM.
34	Zafar MB, Valera I, Rodriguez MG, and Gummadi KP (2019). Fairness constraints: A flexible approach for fair classification, Journal of Machine Learning Research, 20, 1-42.
35	Dixon L, Li J, Sorensen J, Thain N, and Vasserman L (2018). Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 67-73.
36	Goh G, Cotter A, Gupta M, and Friedlander M (2016). Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems, 29, 2415-2423.
37	Quadrianto N, Sharmanska V, and Thomas O (2019). Discovering fair representations in the data domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8227-8236.