DOI QR코드

DOI QR Code

A study on rethinking EDA in digital transformation era

DX 전환 환경에서 EDA에 대한 재고찰

  • Seoung-gon Ko (Department of Applied Statistics, Gachon University)
  • 고승곤 (가천대학교 응용통계학과)
  • Received : 2023.11.02
  • Accepted : 2023.11.27
  • Published : 2024.02.29

Abstract

Digital transformation refers to the process by which a company or organization changes or innovates its existing business model or sales activities using digital technology. This requires the use of various digital technologies - cloud computing, IoT, artificial intelligence, etc. - to strengthen competitiveness in the market, improve customer experience, and discover new businesses. In addition, in order to derive knowledge and insight about the market, customers, and production environment, it is necessary to select the right data, preprocess the data to an analyzable state, and establish the right process for systematic analysis suitable for the purpose. The usefulness of such digital data is determined by the importance of pre-processing and the correct application of exploratory data analysis (EDA), which is useful for information and hypothesis exploration and visualization of knowledge and insights. In this paper, we reexamine the philosophy and basic concepts of EDA and discuss key visualization information, information expression methods based on the grammar of graphics, and the ACCENT principle, which is the final visualization review standard, for effective visualization.

디지털 전환(digital transformation)이란 기업이나 조직이 기존의 비즈니스 모델이나 영업 활동을 디지털 기술을 활용하여 변화시키거나 새롭게 혁신하는 과정을 말한다. 이는 시장에서의 경쟁력 강화, 고객 경험 개선 그리고 새로운 사업의 발굴 등을 위하여 다양한 디지털 기술들 - 클라우드 컴퓨팅, IoT, 인공 지능 등 - 의 활용이 요구된다. 또한 시장, 고객 그리고 생산 환경에 대한 지식과 통찰을 도출할 수 있도록 올바른 데이터의 선택, 분석 가능한 상태로의 데이터 전처리(preprocessing) 그리고 목적에 적합한 체계적인 분석들에 대한 올바른 프로세스 정립을 필요로 한다. 이러한 디지털 빅 데이터의 유용성은 적합한 전처리와 함께 정보 및 가설 탐색 그리고 지식과 통찰의 시각화를 위한 탐색적 데이터 분석(exploratory data analysis; EDA)의 올바른 적용이 결정한다. 본 논문에서는 EDA의 철학과 기본 개념에 대하여 재고찰과 함께 효과적인 시각화를 위하여 시각화 핵심 정보, 그래프 문법(grammar of graphics)에 기초한 정보 표현 방법 그리고 최종 시각화 검토 기준인 ACCENT 원칙을 논의한다.

Keywords

References

  1. Ali SM, Gupta N, Nayak GK, and Lenka RK (2016). Big data visualization: Tools and challenges. In Proceedings of 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Greater Noida, India, 656-660.
  2. Avella Medina M and Ronchetti E (2015). Robust statistics: A selective overview and new directions, Wiley Interdisciplinary Reviews: Computational Statistics, 7, 372-393. https://doi.org/10.1002/wics.1363
  3. Becker RA and Chambers JM (1984). S: An Interactive Environment for Data Analysis and Graphics, Chapman and Hall/CRC, London.
  4. Boukhelifa N and Duke DJ (2009). Uncertainty visualization: Why might it fail?. In CHI'09 Extended Abstracts on Human Factors in Computing Systems (pp. 4051-4056), Boston, United States.
  5. Breiman L (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author), Statistical Science, 16, 199-231. https://doi.org/10.1214/ss/1009213726
  6. Bertin J (1983). Semiology of Graphics, University of Wisconsin Press, Madison, WI.
  7. Burn DA (1993). Designing effective statistical graphs. Chapter 22 in Handbook of Statistics, Vol. 9 [C.R. Rao ed.], Available from: https://www.sciencedirect.com/science/article/abs/pii/S0169716105801464
  8. Camacho J (2014). Visualizing big data with compressed score plots: Approach and research challenges, Chemometrics and Intelligent Laboratory Systems, 135, 110-125. https://doi.org/10.1016/j.chemolab.2014.04.011
  9. Cao L (2017). Data science: A comprehensive overview, ACM Computing Surveys (CSUR), 50, 1-42. https://doi.org/10.1145/3076253
  10. Chambers JM, Cleveland WS, Kleiner B, and Tukey PA (1983). Graphical Methods for Data Analysis, Wadsworth, Belmont CA.
  11. Ebert C and Duarte CHC (2018). Digital transformation, IEEE Software, 35, 16-21. https://doi.org/10.1109/MS.2018.2801537
  12. Firouzi F, Farahani B, Ibrahim M, and Chakrabarty K (2018). Keynote paper: From EDA to IoT eHealth: promises, challenges, and solutions, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37, 2965-2978. https://doi.org/10.1109/TCAD.2018.2801227
  13. Fobes (2017). Available from: 〈https://www.forbes.com/sites/peterbendorsamuel/〉
  14. Friendly M (2008). A brief history of data visualization. In Handbook of Data Visualization (pp. 15-56), Springer, Berlin, Heidelberg.
  15. Friendly M and Meyer D (2015). Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data (Vol. 120), CRC Press.
  16. Friendly M (2021). A History of Data Visualization and Graphic Communication, Harvard University Press, Cambridge, MA.
  17. Gartner (2016). 2017 Planning Guide for Data and Analytics, Available from: https://www.gartner.com/en/documents/3471553
  18. Glazer N (2011). Challenges with graph interpretation: A review of the literature, Studies in Science Education, 47, 183-210. https://doi.org/10.1080/03057267.2011.605307
  19. Grolemund G and Wickham H (2014). A cognitive interpretation of data analysis, International Statistical Review, 82, 184-204. https://doi.org/10.1111/insr.12028
  20. Hastie T, Tibshirani R, Friedman JH, and Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York.
  21. Hepworth K (2017). Big data visualization: Promises & pitfalls, Communication Design Quarterly Review, 4, 7-19. https://doi.org/10.1145/3071088.3071090
  22. Hoaglin DC, Mosteller F, and Tukey JW (1983). Understanding Robust and Exploratory Data Anlysis. Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, New York.
  23. Hoaglin DC, Mosteller F, and Tukey JW (Eds) (2011). Exploring Data Tables, Trends, and Shapes, John Wiley & Sons, New York.
  24. Huber PJ (2004). Robust Statistics, John Wiley & Sons, New York.
  25. Hur M and Moon S (2010). Exploratory Data Analysis, FREEACADEMY, Seoul.
  26. Ihaka R and Gentleman R (1996). R: A language for data analysis and graphics, Journal of Computational and Graphical Statistics, 5, 299-314. https://doi.org/10.1080/10618600.1996.10474713
  27. Ko S (2019). Data-driven Analytics for early achievement of DX - Focusing on Palm business, unpublished seminar, LG Internationals
  28. Ko S (2022). A study on guidance of data preprocessing for process analysis in digital transformation era, Journal of Creativity and Innovation, 15, 247-278.
  29. Ko S (2023). Exploratory Data Analysis using R and Statistical Methods, KYOWOO, Seoul.
  30. Miller S (2011). Inquidia Consulting(formerly Open BI). From the March 2011 Supplement to Current Population Survey data, Available from: https://www.re3data.org/repository/r3d100011860
  31. Ministry of Employment and Labor (2022). Number of employees by job in the domestic IT industry in 2021, Available from: https://www.moel.go.kr
  32. Morrell AJH (Ed) (1968). Information Processing 68: Proceedings of IFIP Congress 1968, Edinburgh, 5-10 August 1968, Amsterdam: North-Holland Publishing Company.
  33. Mosteller F and Tukey JW (1977). Data Analysis and Regression: A Second Course in Statistics, Addison-Wesley Series in Behavioral Science: Quantitative Methods. Reading, MA: Addison-Wesley.
  34. Pearl J (2009). Causal inference in statistics: An overview, Statistical Surveys, 3, 96-146. https://doi.org/10.1214/09-SS057
  35. Provost F and Fawcett T (2013). Data science and its relationship to big data and data-driven decision making, Big Data, 1, 51-59. https://doi.org/10.1089/big.2013.1508
  36. Rousseeuw PJ and Leroy AM (1987). Robust Regression and Outlier Detection, Wiley, New York.
  37. Tufte ER (2001). The Visual Display of Quantitative Information (Vol. 2, p. 9), Graphics Press, Cheshire, CT.
  38. Tukey JW (1962). The future of data analysis, The Annals of Mathematical Statistics, 33, 1-67. https://doi.org/10.1214/aoms/1177704711
  39. Tukey JW (1977). Exploratory Data Analysis, Reading, Massachusetts, USA.
  40. Venables WN and Ripley BD (1999). Modern Applied Statistics with S-PLUS (3rd ed), Springer, New York.
  41. Tukey and Tukey (1985). Computer graphics and exploratory data analysis: An introduction. In Proceedings of the Sixth Annual Conference and Exposition: Computer Graphics85, Fairfax, VA: National Computer Graphics Association, III:773-85.
  42. Wickham H and Wickham H (2009). ggplot2: Elegant Graphics for Data Analysis, Springer, New York.
  43. Wickham H (2010). A layered grammar of graphics, Journal of Computational and Graphical Statistics, 19, 3-28. https://doi.org/10.1198/jcgs.2009.07098
  44. Wilkinson L (1999). The Grammar of Graphics, Springer, New York.
  45. Wu J (1997). Statistics = data science, Available from: http://www2.isye.gatech.edu/jeffwu/presentations/data-science.pdf
  46. Zoubir AM, Koivunen V, Ollila E, and Muma M (2018). Robust Statistics for Signal Processing, Cambridge University Press, New York.