DOI QR코드

DOI QR Code

Building an SNS Crawling System Using Python

Python을 이용한 SNS 크롤링 시스템 구축

  • Received : 2018.10.03
  • Accepted : 2018.10.23
  • Published : 2018.10.31

Abstract

Everything is coming into the world of network where modern people are living. The Internet of Things that attach sensors to objects allows real-time data transfer to and from the network. Mobile devices, essential for modern humans, play an important role in keeping all traces of everyday life in real time. Through the social network services, information acquisition activities and communication activities are left in a huge network in real time. From the business point of view, customer needs analysis begins with SNS data. In this research, we want to build an automatic collection system of SNS contents of web environment in real time using Python. We want to help customers' needs analysis through the typical data collection system of Instagram, Twitter, and YouTube, which has a large number of users worldwide. It is stored in database through the exploitation process and NLP process by using the virtual web browser in the Python web server environment. According to the results of this study, we want to conduct service through the site, the desired data is automatically collected by the search function and the netizen's response can be confirmed in real time. Through time series data analysis. Also, since the search was performed within 5 seconds of the execution result, the advantage of the proposed algorithm is confirmed.

현대인이 살고 있는 네트워크 세상으로 모든 사물들이 들어오고 있다. 사물에 센서를 부착하는 사물인터넷의 영향으로 인해 네트워크로 실시간 데이터를 주고받는 것이 가능해졌다. 현대인들의 필수품인 모바일 디바이스는 일상생활의 모든 자취를 실시간으로 남기는 역할을 하고 있다. 바로 소셜 네트워크 서비스를 통하여 정보획득 활동과 커뮤니케이션 활동을 실시간으로 거대한 네트워크에 남기고 있는 것이다. 비즈니스 관점에서 고객의 니즈 분석은 바로 SNS 자료에서부터 시작된다는 등가가 성립된다. 본 연구는 웹 환경의 SNS 콘텐츠를 파이썬을 이용하여 실시간으로 자동 수집시스템을 구축하고자 한다. 세계적으로 많은 이용자수를 확보하고 있는 인스타그램, 트위터, 유튜브의 비정형적 데이터 수집 시스템을 통하여 고객의 니즈 분석에 도움이 되고자 한다. 파이썬의 웹드라이버 환경에서 가상 웹브라우저를 이용하여 마이닝 처리와 NLP 과정을 거쳐 DB에 저장된다. 본 연구의 결과 웹페이지를 통하여 서비스를 진행하고자하며 검색 기능만으로 원하는 데이터가 자동 수집되며 데이터의 시계열 분석을 통하여 네티즌의 이슈 반응을 실시간으로 확인할 수 있었다. 또한 검색부터 실행결과가 나오기까지 5초 이내 이루어지므로 제시된 알고리즘의 우수성을 확인하였다.

Keywords

References

  1. D. Laney, "3D Data Management: Controlling Data Volume, Velocity and Variety," Gartner, 2001.
  2. Kim, S. H., Chang, S. H., and Lee, S. W., "Consumer Trend Platform Development for Combination Analysis of Structured and Unstructured Big Data," Journal of Digital Convergence, Vol. 15, No. 6, pp. 133-143, 2017. https://doi.org/10.14400/JDC.2017.15.6.133
  3. Chol, J. H., and Jun, S. H., "Bayesian Inference for Technology Analysis of Artificial Inteilglence," Journal of Korean Institute of Intelligent Systems, Vol. 28, No. 4, pp. 411-416, 2018. https://doi.org/10.5391/JKIIS.2018.28.4.411
  4. Wu, Y., Wang, N., Kropczynski, J. and Carroll, J. M., "The Appropriation of GitHub for Curation," PeerJ Computer Science, Vol. 3, pp. 134, 2017. https://doi.org/10.7717/peerj-cs.134
  5. Park, O. M., Moon, O. K., Wui, H. K., and Jung, Y. C., "Semantic Web Services Technologies Towards Future Converged Services,." The Journal of The Korean Institute of Communication Sciences, Vol. 27, No. 5, pp. 30-35, 2010.
  6. D. Fensel, M. Kerrigan, and M. Zaremba, Implementing Semantic Web Services, Springer, 2008.
  7. https://www.kisti.re.kr
  8. https://www.taglive.net
  9. Yoon, Y. K., "A Study on Contents Curation of Portal Sites," Journal of the Korea Entertainment Industry Association, Vol. 8, No. 4, pp. 31-43, 2014.
  10. Nam, M, J., Lee, E. J., and Shin, J. Y., "A Method for User Sentiment Classification Using Instagram Hashtags," Journal of Korea Multimedia Society, Vol. 18, No. 11, pp. 1391-1399, 2015. https://doi.org/10.9717/KMMS.2015.18.11.1391
  11. Min, S. G., and Kim, S. H., "Study on Curation Service Design through Mobile Information Visualization Analysis," JCD, Vol. 63, pp. 296-305, 2018.
  12. Lee, S. W., Lee, S. M., and Joo, H. M., "A Study of Global Social Network Service Diffusion: An Examination of Facebook Diffusion," Information Society and Media, Vol. 19, No. 1, pp. 1-22, 2018.
  13. Lee, I. S., Kim, K. K., and Lee, A. R., "A Big Data Analysis Methodology for Examining Emerging Trend Zones Identified by SNS Users : Focusing on the Spatial Analysis Using Instagram Data," Information Systems Review, Vol. 20, No. 2, pp. 63-85, 2018.
  14. Park, H. W., and Choi, K. H., "Doing Social Big Data Analytics: A Reflection on Research Question, Data Format, and Statistical Test-Convergent Aspects," Journal of Digital Convergence, Vol. 14, No. 12, pp. 591-597, 2016. https://doi.org/10.14400/JDC.2016.14.12.591
  15. Hwang, Y. Y., Lee, K. S., and Choi, S. A., "A Study on the Difference between Young and Old Generation of SNS Behavior," Journal of the Korea Industrial Information Systems Research, Vol. 20, No. 1, pp. 63-77, 2015. https://doi.org/10.9723/JKSIIS.2015.20.1.063
  16. www.jobkorea.co.kr
  17. Kim, H. J., Lee, T. H., Ryu, S. E., and Kim, N. L., "A Study on Text Mining Methods to Analyze Civil Complaints: Structured Association Analysis," Journal of the Korea Industrial Information Systems Research, Vol. 23, No. 3, pp. 13-24, 2018. https://doi.org/10.9723/JKSIIS.2018.23.3.013
  18. Lee, J. H., and Lee, H. K., "A Study on Unstructured Text Mining Algorithm through R Programming Based on Data Dictionary," Journal of the Korea Industrial Information Systems Research, Vol. 20, No. 2, pp. 113-124, 2015. https://doi.org/10.9723/jksiis.2015.20.2.113
  19. Lee, J. H., and Lee, H. K., "Designing Real-Time Web Mining and Analyzing System," The Journal of Internet Electronic Commerce Resarch, Vol. 18, No. 1, pp. 115-131, 2018.
  20. H. Chen, R. H. L. Chiang, and V. C. Storey, "Business Intelligence and Analytics: From Big Data to Big Impact," MIS Quartly, Vol. 36, No. 4, pp. 1165-1188, 2012.
  21. H. Baars, and H. G. Kemper, "Management Support with Structured and Unstructured Data an Integrated Business Intelligence Framework," Information Systems Management, Vol. 25, No. 2, pp. 132-148, 2008. https://doi.org/10.1080/10580530801941058
  22. A. Gandomi, and M. Haider, "Beyond the Hype: Big Data Concepts, Methods, and Analytics," International Journal of Information Management, Vol. 35, No. 2, pp. 137-144, 2015. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
  23. Seo, D. M., and Jung, H. M., "Intelligent Web Crawler for Supporting Big Data Analysis Services," Journal of the Korea Contents Association, Vol. 13, No. 12, pp. 575-584, 2013. https://doi.org/10.5392/JKCA.2013.13.12.575
  24. Choi, S. J., and Kim, J. B., "Examine the Relationships Between Portal Article of Naver and Real Time Search Word Using Web Crawling," AJMAHS, Vol. 7, No. 11, pp. 787-794, 2017.
  25. Kim, H. S., Han, N., and Lim, S, J., "Web Crawler Service Implementation for Information Retrieval Based on Big Data Analysis," Journal of Digital Contents Society, Vol. 18, No. 5, pp. 933-942, 2017. https://doi.org/10.9728/DCS.2017.18.5.933
  26. Park, S. J., "A Topic Analysis of SW Education Text Data Using R," Journal of The Korean Association of Information Education, Vol. 19, No. 4, pp. 517-524, 2015. https://doi.org/10.14352/jkaie.2015.19.4.517
  27. http://pypl.github.io/PYPL.html
  28. Lee, J, H., Ji, Y. R., and Chae, S. C., "Application of Symbolic Computation, Visualization, and Stochastic Simulation by Using Python in Science Education," School Science Journal, Vol. 12, No. 1, pp. 85-96, 2018.
  29. Yoo, I. H., "The Design of SW Education for Elementary School Using Python and Robots," Journal of The Korean Association of Information Education, Vol. 9, No. 1, pp. 149-155, 2018.
  30. Lee, J. H., "Spyder(Scientific PYthon Development EnviRonment)," The Korean Institute of Electrical Engineers, Vol. 65, No. 5, pp. 41-48, 2016. https://doi.org/10.5370/KIEEP.2016.65.1.041
  31. Park, S. J., "A Study on the Utilization of Big Data Using Python," Journal of Korean Society of Technical Education and Training, Vol. 23, No. 1, pp. 31-40, 2018. https://doi.org/10.29279/kostet.2018.23.1.31
  32. www.kisa.or.kr
  33. www.seleniumhq.org/docs/03_webdriver.jsp
  34. www.kisdi.re.kr

Cited by

  1. 빅데이터를 통한 브랜드 평가 맵 제안 : 현대자동차 제품 평가 중심으로 vol.19, pp.4, 2020, https://doi.org/10.9716/kits.2020.19.4.001
  2. SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구 vol.29, pp.3, 2020, https://doi.org/10.5859/kais.2020.29.3.237
  3. 절정-대미 원칙을 기반으로 설계된 유튜브 동영상 콘텐츠가 시청자 반응에 미치는 영향에 관한 연구 vol.26, pp.2, 2021, https://doi.org/10.9723/jksiis.2021.26.2.043