DOI QR코드

DOI QR Code

Causality, causal discovery, causal inference and counterfactuals in Civil Engineering: Causal machine learning and case studies for knowledge discovery

  • M.Z. Naser (School of Civil & Environmental Engineering and Earth Sciences (SCEEES), Clemson University) ;
  • Arash Teymori Gharah Tapeh (School of Civil & Environmental Engineering and Earth Sciences (SCEEES), Clemson University)
  • 투고 : 2022.11.09
  • 심사 : 2022.12.19
  • 발행 : 2023.04.25

초록

Much of our experiments are designed to uncover the cause(s) and effect(s) behind a phenomenon (i.e., data generating mechanism) we happen to be interested in. Uncovering such relationships allows us to identify the true workings of a phenomenon and, most importantly, to realize and articulate a model to explore the phenomenon on hand and/or allow us to predict it accurately. Fundamentally, such models are likely to be derived via a causal approach (as opposed to an observational or empirical mean). In this approach, causal discovery is required to create a causal model, which can then be applied to infer the influence of interventions, and answer any hypothetical questions (i.e., in the form of What ifs? Etc.) that commonly used prediction- and statistical-based models may not be able to address. From this lens, this paper builds a case for causal discovery and causal inference and contrasts that against common machine learning approaches - all from a civil and structural engineering perspective. More specifically, this paper outlines the key principles of causality and the most commonly used algorithms and packages for causal discovery and causal inference. Finally, this paper also presents a series of examples and case studies of how causal concepts can be adopted for our domain.

키워드

참고문헌

  1. AISC (2022), AISC Shapes Database v15.0H, American Institute of Steel Construction Database, Chicago, IL, USA. https://www.aisc.org/search/?query=shapesdatabase&pageSize=10&page=1.
  2. Allen, G.I. (2020), "Handbook of graphical models", J. Am. Stat. Assoc., 115(531), 1555-1557. https://doi.org/10.1080/01621459.2020.1801279.
  3. Bareinboim, E., Tian, J. and Pearl, J. (2014), "Recovering from selection bias in causal and statistical inference", Proceedings of the National Conference on Artificial Intelligence, Quebec City, Quebec, Canada, July.
  4. Quantumblacklabs/Causalnex (2021), Causalnex: A Python Library That Helps Data Scientists to Infer Causation Rather Than Observing Correlation, https://github.com/quantumblacklabs/causalnex.
  5. Beyzatlar, M.A., Karacal, M. and Yetkiner, H. (2014), "Granger-causality between transportation and GDP: A panel data approach", Transp. Res. A: Policy Pract., 63, 43-55. https://doi.org/10.1016/j.tra.2014.03.001.
  6. Blyth, C.R. (1972), "On Simpson's paradox and the sure-thing principle", J. Am. Stat. Assoc., 67(338), 364-366. https://doi.org/10.1080/01621459.1972.10482387
  7. Bnlearn (2020), Bnlearn - Bayesian Network Structure Learning. https://www.bnlearn.com/
  8. Bollen, K.A. and Pearl, J. (2013), "Eight myths about causality and structural equation models", Handbook of Causal Analysis for Social Research, Springer, Dordrecht, The Netherlands.
  9. Bunge, M. (1979), Causality and Modern Science, Courier Corporation, North Chelmsford, Massachusetts,
  10. CausalGAM (2020), CRAN - Package CausalGAM, https://cran.rproject.org/web/packages/CausalGAM/index.html.
  11. Center for Causal Discovery (2022), Data Science Research - Center for Causal Discovery, https://www.ccd.pitt.edu/people/data-science-research.
  12. Chambliss, D.F. and Schutt, R.K. (2013), "Causation and experimental design", Making Sense of the Social World: Methods of Investigation, SAGE Publications, New York, NY, USA.
  13. Correa, J.D. and Bareinboim, E. (2017), "Causal effect identification by adjustment under confounding and selection biases", 31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, CA, USA, February.
  14. Dablander, F. (2020), "An introduction to causal inference", PsyArXiv, 2020, 1-15. https://doi.org/10.31234/osf.io/b3fkw.
  15. Dimensions (2021), Dimensions.ai., https://www.dimensions.ai/.
  16. Dosilovic, F.K., Brcic, M. and Hlupic, N. (2018), "Explainable artificial intelligence: A survey", 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2018 - Proceedings, Opatija, Croatia, May.
  17. Dzeroski, S. (2009), "Relational data mining", Data Mining and Knowledge Discovery Handbook, Springer, Boston, MA,
  18. EconML. (2022), EconML - Microsoft Research, https://www.microsoft.com/enus/research/project/econml/2022).
  19. Erdal, H., Erdal, M., Simsek, O. and Erdal, H.I. (2018), "Prediction of concrete compressive strength using nondestructive test results", Comput, Concrete, 21(4), 407-417. https://doi.org/10.12989/cac.2018.21.4.407.
  20. Forney, A. and Mueller, S. (2022), "Causal inference in AI education: A primer", J. Causal Inference, 10(1), 141-173. https://doi.org/10.1515/jci-2021-0048.
  21. Gibb, A., Lingard, H., Behm, M. and Cooke, T. (2014), "Construction accident causality: Learning from different countries and differing consequences", Constr. Manag. Econ., 32(5), 446-459. https://doi.org/10.1080/01446193.2014.907498.
  22. Glymour, C., Schemes, R., Spirtes, P. and Meek, C. (1994), "Regression and causation", Report CMU-PHIL-60; Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA,
  23. Glymour, C., Zhang, K. and Spirtes, P. (2019), "Review of causal discovery methods based on graphical models", Front. Gene., 10, 524. https://doi.org/10.3389/fgene.2019.00524.
  24. Glynn, A.N. and Kashin, K. (2018), "Front-Door versus back-door adjustment with unmeasured confounding: Bias Formulas for front-door and hybrid adjustments with application to a job training program", J. Am. Stat. Assoc., 113(523), 1040-1049. https://doi.org/10.1080/01621459.2017.1398657.
  25. Heinze-Deml, C., Maathuis, M.H. and Meinshausen, N. (2018), "Causal structure learning", Annual Review of Statistics and Its Application, 5, 371-391. https://doi.org/10.1146/annurev-statistics-031017-100630.
  26. Hertz, K.D.D. (2003), "Limits of spalling of fire-exposed concrete", Fire Saf. J., 38(2), 103-116. https://doi.org/10.1016/S0379-7112(02)00051-6.
  27. Holland, P.W. (1986), "Statistics and causal inference", J. Am. Stat. Assoc., 81(396), 945-960. https://doi.org/10.1080/01621459.1986.10478354
  28. Huntington-Klein, N. (2021), The Effect : An Introduction to Research Design and Causality, Chapman and Hall/CRC, Boca Raton, FL,
  29. Imbens, G.W. (2020), "Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics", J. Econ. Liter., 58(4), 1129-1179. https://doi.org/10.1257/jel.20191597.
  30. Imbens, G.W. and Rubin, D.B. (2015), Causal Inference: For Statistics, Social, and Biomedical Sciences, Cambridge University Press, New York, NY,
  31. Kalainathan, D. and Goudet, O. (2022), Causal Discovery Toolbox Documentation - Causal Discovery Toolbox 0.5.23 documentation, https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html.
  32. Khoury, G.A. (2000), "Effect of fire on concrete and concrete structures", Prog. Struct. Eng. Mater., 2(4), 429-447. https://doi.org/10.1002/pse.51.
  33. Klemme, H.F. (2020), "Hume, David: A treatise of human nature", Kindlers Literatur Lexikon, Stuttgart Stuttgart, Germany.
  34. Kodur, V.K.R. (2000), "Spalling in high strength concrete exposed to fire: Concerns, causes, critical parameters and cures", Advanced Technology in Structural Engineering, American Society of Civil Engineers, Reston, VA,
  35. Kovalerchuk, B., Ahmad, M.A. and Teredesai, A. (2021), "Survey of explainable machine learning with visual and granular methods beyond quasi-Explanations", Studies in Computational Intelligence, Springer, Cham, Switzerland.
  36. Kovalerchuk, B. and Vityaev, E. (2000), Data Mining in Finance: Advances in Relational and Hybrid Methods|Guide Books, Kluwer Academic Publishers, Dordrecht, The Netherlands.
  37. Lewis, D. (1973), "Causation", J. Philos., 70(17), 556-567. https://doi.org/10.2307/2025310.
  38. Hossin, M. and Sulaiman, M.N. (2015), "A Review on evaluation metrics for data classification evaluations", Int. J. Data Min. Knowl. Manag. Pr, 5(2), 1. https://doi.org/10.5121/ijdkp.2015.5201.
  39. Marti-Vargas, J.R., Ferri, F.J. and Yepes, V. (2013), "Prediction of the transfer length of prestressing strands with neural networks", Comput. Concrete, 12(2), 187-209. https://doi.org/10.12989/cac.2013.12.2.187.
  40. Michotte, A. (2017), The Perception of Causality, Routledge, New York, NY,
  41. Mitchell, T. (1997), Machine Learning, McGraw Hill, New York, NY,
  42. Muggleton, S. (1991), "Inductive logic programming", New Gen. Comput., 8, 295-318. https://doi.org/10.1007/BF03037089.
  43. Naser, M.Z. and Alavi, A.H. (2021), "Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences", Arch. Struct. Constr., 2021, 1-19. https://doi.org/10.1007/s44150-021-00015-8.
  44. Naser, M.Z. and Kodur, V.K. (2022), "Explainable machine learning using real, synthetic and augmented fire tests to predict fire resistance and spalling of RC columns", Eng. Struct., 253, 113824. https://doi.org/10.1016/j.engstruct.2021.113824.
  45. Nogueira, A.R., Gama, J. and Ferreira, C.A. (2021), "Causal discovery in machine learning: Theories and applications", J. Dyn. Games, 8(3), 203. https://doi.org/10.3934/jdg.2021008.
  46. Nogueira, A.R., Pugnana, A., Ruggieri, S., Pedreschi, D. and Gama, J. (2022), "Methods and tools for causal discovery and causal inference", Wiley Interdiscip. Rev.: Data Min. Knowl. Discov., 12(2), e1449. https://doi.org/10.1002/widm.1449.
  47. pcalg (2022), Methods for Graphical Models and Causal Inference [R package pcalg version 2.7-6], Comprehensive R Archive Network (CRAN).
  48. Pearl, J. (2009a), "Causal inference in statistics: An overview", Stat. Surv., 3, 96-146. https://doi.org/10.1214/09-SS057.
  49. Pearl, J. (2009b), Causality, Cambridge University Press, Cambridge, UK.
  50. Pearl, J. (2013), "Causal diagrams and the identification of causal effects", Causality: Models, Reasoning, and Inference, Cambridge University Press, Cambridge, UK.
  51. Pearl, J. and Mackenzie, D. (2018a), The Book of Why: The New Science of Cause and Effect-Basic Books, Basic Books, New York, NY,
  52. Pearl, J. and Mackenzie, D. (2018b), The Book of Why: The New Science of Cause and Effect, Notices of the American Mathematical Society, Basic Books, New York, NY,
  53. Ramsey, J., Glymour, M., Sanchez-Romero, R. and Glymour, C. (2017), "A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images", Int. J. Data Sci. Anal., 3, 121-129. https://doi.org/10.1007/s41060-016-0032-z.
  54. Rubin, D.B. (2005), "Causal inference using potential outcomes", J. Am. Stat. Assoc., 100(469), 322-331. https://doi.org/10.1198/016214504000001880.
  55. Salmon, W.C. (2003), Causality and Explanation, Oxford University Press, Oxford, UK.
  56. Sanjayan, G. and Stocks, L.J. (1993), "Spalling of high-strength silica fume concrete in fire", ACI Mater. J., 90(2), 170-173. https://doi.org/10.14359/4015.
  57. Scheines, R. (1996), An Introduction to Causal Inference, Carnegie Mellon University, Pittsburgh, PA,
  58. Scholkopf, B. (2019), "Causality for machine learning", arXiv preprint, 1911, 10500.
  59. Sharma, A. and Kiciman, E. (2019), "DoWhy: A Python package for causal inference", https://github.com/microsoft/dowhy.
  60. Spirtes, P., Glymour, C. and Scheines, R. (2000), "Causation, prediction, and search (Springer lecture notes in statistics)", Lecture Notes in Statistics, MIT Press, Cambridge, MA,
  61. Spirtes, P. and Zhang, K. (2016), "Causal discovery and inference: concepts and recent methodological advances", Appl. Informat., 3(1), 1-28. https://doi.org/10.1186/s40535-016-0018-x.
  62. Surveys|NCSES|NSF (2022), https://www.nsf.gov/statistics/surveys.cfm.
  63. Thelwall, M. (2018), "Dimensions: A competitor to Scopus and the Web of Science?", J. Informetr., 12(2), 430-435. https://doi.org/10.1016/j.joi.2018.03.006.
  64. TIGRAMITE (2022), GitHub - Jakobrunge/Tigramite: Tigramite is a Python Package for Causal Inference with a Focus on Time Series Data, https://github.com/jakobrunge/tigramite.
  65. Tong, T. and Yu, T.E. (2018), "Transportation and economic growth in China: A heterogeneous panel cointegration and causality analysis", J. Transp. Geogr., 73, 120-130. https://doi.org/10.1016/j.jtrangeo.2018.10.016.
  66. Triantafillou, S. and Tsamardinos, I. (2016), "Score based vs constraint based causal learning in the presence of confounders", CEUR Workshop Proceedings.
  67. Uber Technologies (2020), About Causal ML - Causalml Documentation, Uber Technologies Inc., San Francisco, USA. https://causalml.readthedocs.io/en/latest/about.html.
  68. Vowels, M.J., Camgoz, N.C. and Bowden, R. (2021), "D'ya like DAGs? A survey on structure learning and causal discovery", ACM Comput. Surv., 55(4), 1-36. https://doi.org/10.1145/3527154.
  69. Wagner, C.H. (1982), "Simpson's paradox in real life", American Statistician, 36(1), 46-48. https://doi.org/10.1080/00031305.1982.10482778
  70. Wardhana, K. and Hadipriono, F.C. (2003a), "Analysis of recent bridge failures in the United States", J. Perfor. Constr. Facil., 17(3), 144-150. https://doi.org/10.1061/(ASCE)0887-3828(2003)17:3(144).
  71. Wasserman, L. (2021), "Causal inference", Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA, USA.
  72. Yaswanth, K.K., Revathy, J. and Gajalakshmi, P. (2021), "Artificial intelligence for the compressive strength prediction of novel ductile geopolymer composites", Comput. Concrete, 28(1), 55-68. https://doi.org/10.12989/cac.2021.28.1.055.
  73. Yu, K., Li, J. and Liu, L. (2016), "A review on algorithms for constraint-based causal discovery", arXiv preprint, 1611, 03977.