DOI QR코드

DOI QR Code

PLS Path Modeling to Investigate the Relations between Competencies of Data Scientist and Big Data Analysis Performance : Focused on Kaggle Platform

데이터 사이언티스트의 역량과 빅데이터 분석성과의 PLS 경로모형분석 : Kaggle 플랫폼을 중심으로

  • Han, Gyeong Jin (Management of Technology, Sungkyunkwan University) ;
  • Cho, Keuntae (Management of Technology, Sungkyunkwan University)
  • 한경진 (성균관대학교 기술경영학과) ;
  • 조근태 (성균관대학교 기술경영학과)
  • Received : 2015.07.16
  • Accepted : 2015.12.24
  • Published : 2016.04.15

Abstract

This paper focuses on competencies of data scientists and behavioral intention that affect big data analysis performance. This experiment examined nine core factors required by data scientists. In order to investigate this, we conducted a survey to gather data from 103 data scientists who participated in big data competition at Kaggle platform and used factor analysis and PLS-SEM for the analysis methods. The results show that some key competency factors have influential effect on the big data analysis performance. This study is to provide a new theoretical basis needed for relevant research by analyzing the structural relationship between the individual competencies and performance, and practically to identify the priorities of the core competencies that data scientists must have.

Keywords

References

  1. Chiang, R. H., Goes, P., and Stohr, E. A. (2012), Business intelligence and analytics education, and program development : A unique opportunity for the information systems discipline, ACM Transactions on Management Information Systems(TMIS), 3(3), 12.
  2. Chiang, R. M., Kauffman, R. J., and Kwon, Y. (2014), Understanding the paradigm shift to computational social science in the presence of big data, Decision Support Systems, 63, 67-80. https://doi.org/10.1016/j.dss.2013.08.008
  3. Chin, W. W. (1998), The partial least squares approach to structural equation modeling, Modern Methods for Business Research, 295(2), 295-336.
  4. Cho, S. G., Cho, J., and Kim, S. B. (2015), Discovering meaningful trends in the inaugural addresses of United States presidents via text mining, Journal of the Korean Institute of Industrial Engineers, 41(5), 453-460. https://doi.org/10.7232/JKIIE.2015.41.5.453
  5. Cho, W.-S. (2013), A study on the education and training methods of Data scientist, Science and Technology Policy, 23(3), 44-55.
  6. Cohen, J. (1977), Statistical power analysis for the behavioral sciences, Lawrence Erlbaum Associates, Inc.
  7. Conway, D. (2010), The data science venn diagram, Dataists, Retrieved February, 9, 2012 (http://drewconway.com/zia/2013/3/26/the-datascience-venn-diagram).
  8. Davenport and Thomas, H. (2012), The human side of big data and Highperformance analytics, International Institute for Analytics (http://www.ndm.net/datawarehouse/pdf/Research_Human_Side_of_Big_Data_and_High_Performance_Analytics.pdf).
  9. Dhar, V. (2013), Data science and prediction, Communications of the ACM, 56(12), 64-73. https://doi.org/10.1145/2500499
  10. Dino, M. J. S. and de Guzman, A. B. (2015), Using partial least squares (PLS) in predicting behavioral intention for telehealth use among filipino elderly, Educational Gerontology, 41(1), 53-68. https://doi.org/10.1080/03601277.2014.917236
  11. Dinter, B., Douglas, D., Chiang, R. H., Mari, F., Ram, S., and Schoder, D. (2014), Big data panel at SIGDSS Pre-ICIS 2013 : A Swiss-army knife? the profile of a data scientist, Reshaping Society through Analytics, Collaboration, and Decision Support : Role of Business Intelligence and Social Media, 18, 7.
  12. Fenn, J. and LeHong, H. (2011), Hype cycle for emerging technologies, Gartner.
  13. Hair, J. F., Sarstedt, M., Pieper, T. M., and Ringle, C. M. (2012), The use of partial least squares structural equation modeling in strategic management research : a review of past practices and recommendations for future applications, Long Range Planning, 45(5), 320-340. https://doi.org/10.1016/j.lrp.2012.09.008
  14. Hair Jr, J. F., Hult, G. T. M., Ringle, C., and Sarstedt, M. (2013), A primer on partial least squares structural equation modeling (PLSSEM), Sage Publications.
  15. Hollis, C. (2011), IDC digital universe study : big data is here, now what.
  16. Jung, H. and Song, S.-K. (2012), Strategy for cultivating talent in the world of big data, Journal of Internet Computing and Services, 13(3), 45-50.
  17. Kart, L., Heudecker, N., and Buytendijk, F. (2013), Survey analysis : big data adoption in 2013 shows substance behind the hype, Gartner Report GG0255160.
  18. Kim, M. and Koo, P. (2013), A study on big data based investment strategy using internet search trends, Journal of the Korean Operations Research and Management Science Society, 38(4), 53-63. https://doi.org/10.7737/JKORMS.2013.38.4.053
  19. Kim, S. W., Kim, G. G., and Yoon, B. K. (2014), A study on a way to utilize big data analytics in the defense area, Journal of the Korean Operations Research and Management Science Society, 39(2), 1-20.
  20. Laney, D. and Kartpaper, L. (2012), Emerging role of the data scientist and the art of data science, Gartner Inc, Stamford.
  21. LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., and Kruschwitz, N. (2013), Big data, analytics and the path from insights to value, MIT Sloan Management Review, 21, 20-32.
  22. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., and Roxburgh, C. (2011), Big data : The next frontier for innovation, competition, and productivity, McKinsey Global Institute.
  23. Martinez, M. G. and Walton, B. (2014), The wisdom of crowds : The potential of online communities as a tool for data analysis, Technovation, 34(4), 203-214. https://doi.org/10.1016/j.technovation.2014.01.011
  24. Nomura Research Institute (2012), The era of big data, IT Solutions Frontier.
  25. Nunnally, J. C. and Bernstein, I. H. (1994), Psychometric theory, New York : McGraw-Hill.
  26. Pantai, K. L. (2012), PLS path model for testing the moderating effects in the relationships among formative IS usage variables of academic digital libraries, Australian Journal of Basic and Applied Sciences, 6(7), 365-374.
  27. Patil, D. J. (2011), Building data science teams, O'Reilly Media, Inc.
  28. Patil, D. J. and Davenport, T. H. (2012), Data scientist, Harvard Business Review, 90, 70-76.
  29. Rahul, D. (2012), Data/Web Analyst vs. Data Scientist (http://blogs.splunk.com/2012/05/16/analytics-staffing-for-big-data/).
  30. Rauser, J. (2011), What is data scientist? (http://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-what-is-a-data-scientist/).
  31. Tenenhaus, M., Vinzi, V. E., Chatelin, Y. M., and Lauro, C. (2005), PLS path modeling, Computational statistics and data analysis, 48(1), 159-205. https://doi.org/10.1016/j.csda.2004.03.005
  32. Thorp, J. (2003), The information paradox : realizing the business benefits of information technology, McGraw-Hill Ryerson.
  33. Venkatesh, V., Morris, M. G., Davis, G. B., and Davis, F. D. (2003), User acceptance of information technology : Toward a unified view, MIS Quarterly, 27(3), 425-478. https://doi.org/10.2307/30036540
  34. Vidgen, R. (2014), Creating business value from big data and business analytics : organizational, managerial and human resource implications (http://www.nemode.ac.uk/wp-content/uploads/2014/07/Vidgen-2014-NEMODE-big-data-scientist-report-final.pdf).
  35. Wamba, S. F., Akter, S., Edwards, A., Chopin, G., and Gnanzou, D. (2015), How 'big data' can make big impact : Findings from a systematic review and a longitudinal case study, International Journal of Production Economics, 165, 234-246. https://doi.org/10.1016/j.ijpe.2014.12.031
  36. Will Cukierski (2015), Improved Kaggle Rankings (http://blog.kaggle.com/2015/05/13/improved-kaggle-rankings/).