Browse > Article
http://dx.doi.org/10.9723/jksiis.2018.23.5.061

Building an SNS Crawling System Using Python  

Lee, Jong-Hwa (부경대학교 경영학부)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.23, no.5, 2018 , pp. 61-76 More about this Journal
Abstract
Everything is coming into the world of network where modern people are living. The Internet of Things that attach sensors to objects allows real-time data transfer to and from the network. Mobile devices, essential for modern humans, play an important role in keeping all traces of everyday life in real time. Through the social network services, information acquisition activities and communication activities are left in a huge network in real time. From the business point of view, customer needs analysis begins with SNS data. In this research, we want to build an automatic collection system of SNS contents of web environment in real time using Python. We want to help customers' needs analysis through the typical data collection system of Instagram, Twitter, and YouTube, which has a large number of users worldwide. It is stored in database through the exploitation process and NLP process by using the virtual web browser in the Python web server environment. According to the results of this study, we want to conduct service through the site, the desired data is automatically collected by the search function and the netizen's response can be confirmed in real time. Through time series data analysis. Also, since the search was performed within 5 seconds of the execution result, the advantage of the proposed algorithm is confirmed.
Keywords
Bigdata; SNS; Web Mining; Web Crawler; Python;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Hwang, Y. Y., Lee, K. S., and Choi, S. A., "A Study on the Difference between Young and Old Generation of SNS Behavior," Journal of the Korea Industrial Information Systems Research, Vol. 20, No. 1, pp. 63-77, 2015.   DOI
2 www.jobkorea.co.kr
3 Kim, H. J., Lee, T. H., Ryu, S. E., and Kim, N. L., "A Study on Text Mining Methods to Analyze Civil Complaints: Structured Association Analysis," Journal of the Korea Industrial Information Systems Research, Vol. 23, No. 3, pp. 13-24, 2018.   DOI
4 Lee, J. H., and Lee, H. K., "A Study on Unstructured Text Mining Algorithm through R Programming Based on Data Dictionary," Journal of the Korea Industrial Information Systems Research, Vol. 20, No. 2, pp. 113-124, 2015.   DOI
5 Lee, J. H., and Lee, H. K., "Designing Real-Time Web Mining and Analyzing System," The Journal of Internet Electronic Commerce Resarch, Vol. 18, No. 1, pp. 115-131, 2018.
6 H. Chen, R. H. L. Chiang, and V. C. Storey, "Business Intelligence and Analytics: From Big Data to Big Impact," MIS Quartly, Vol. 36, No. 4, pp. 1165-1188, 2012.
7 H. Baars, and H. G. Kemper, "Management Support with Structured and Unstructured Data an Integrated Business Intelligence Framework," Information Systems Management, Vol. 25, No. 2, pp. 132-148, 2008.   DOI
8 A. Gandomi, and M. Haider, "Beyond the Hype: Big Data Concepts, Methods, and Analytics," International Journal of Information Management, Vol. 35, No. 2, pp. 137-144, 2015.   DOI
9 Seo, D. M., and Jung, H. M., "Intelligent Web Crawler for Supporting Big Data Analysis Services," Journal of the Korea Contents Association, Vol. 13, No. 12, pp. 575-584, 2013.   DOI
10 Choi, S. J., and Kim, J. B., "Examine the Relationships Between Portal Article of Naver and Real Time Search Word Using Web Crawling," AJMAHS, Vol. 7, No. 11, pp. 787-794, 2017.
11 Park, S. J., "A Study on the Utilization of Big Data Using Python," Journal of Korean Society of Technical Education and Training, Vol. 23, No. 1, pp. 31-40, 2018.   DOI
12 Park, S. J., "A Topic Analysis of SW Education Text Data Using R," Journal of The Korean Association of Information Education, Vol. 19, No. 4, pp. 517-524, 2015.   DOI
13 http://pypl.github.io/PYPL.html
14 Lee, J, H., Ji, Y. R., and Chae, S. C., "Application of Symbolic Computation, Visualization, and Stochastic Simulation by Using Python in Science Education," School Science Journal, Vol. 12, No. 1, pp. 85-96, 2018.
15 Yoo, I. H., "The Design of SW Education for Elementary School Using Python and Robots," Journal of The Korean Association of Information Education, Vol. 9, No. 1, pp. 149-155, 2018.
16 Lee, J. H., "Spyder(Scientific PYthon Development EnviRonment)," The Korean Institute of Electrical Engineers, Vol. 65, No. 5, pp. 41-48, 2016.   DOI
17 www.kisa.or.kr
18 www.seleniumhq.org/docs/03_webdriver.jsp
19 Kim, H. S., Han, N., and Lim, S, J., "Web Crawler Service Implementation for Information Retrieval Based on Big Data Analysis," Journal of Digital Contents Society, Vol. 18, No. 5, pp. 933-942, 2017.   DOI
20 www.kisdi.re.kr
21 D. Fensel, M. Kerrigan, and M. Zaremba, Implementing Semantic Web Services, Springer, 2008.
22 D. Laney, "3D Data Management: Controlling Data Volume, Velocity and Variety," Gartner, 2001.
23 Kim, S. H., Chang, S. H., and Lee, S. W., "Consumer Trend Platform Development for Combination Analysis of Structured and Unstructured Big Data," Journal of Digital Convergence, Vol. 15, No. 6, pp. 133-143, 2017.   DOI
24 Chol, J. H., and Jun, S. H., "Bayesian Inference for Technology Analysis of Artificial Inteilglence," Journal of Korean Institute of Intelligent Systems, Vol. 28, No. 4, pp. 411-416, 2018.   DOI
25 Wu, Y., Wang, N., Kropczynski, J. and Carroll, J. M., "The Appropriation of GitHub for Curation," PeerJ Computer Science, Vol. 3, pp. 134, 2017.   DOI
26 Park, O. M., Moon, O. K., Wui, H. K., and Jung, Y. C., "Semantic Web Services Technologies Towards Future Converged Services,." The Journal of The Korean Institute of Communication Sciences, Vol. 27, No. 5, pp. 30-35, 2010.
27 https://www.kisti.re.kr
28 https://www.taglive.net
29 Yoon, Y. K., "A Study on Contents Curation of Portal Sites," Journal of the Korea Entertainment Industry Association, Vol. 8, No. 4, pp. 31-43, 2014.
30 Nam, M, J., Lee, E. J., and Shin, J. Y., "A Method for User Sentiment Classification Using Instagram Hashtags," Journal of Korea Multimedia Society, Vol. 18, No. 11, pp. 1391-1399, 2015.   DOI
31 Min, S. G., and Kim, S. H., "Study on Curation Service Design through Mobile Information Visualization Analysis," JCD, Vol. 63, pp. 296-305, 2018.
32 Lee, S. W., Lee, S. M., and Joo, H. M., "A Study of Global Social Network Service Diffusion: An Examination of Facebook Diffusion," Information Society and Media, Vol. 19, No. 1, pp. 1-22, 2018.
33 Lee, I. S., Kim, K. K., and Lee, A. R., "A Big Data Analysis Methodology for Examining Emerging Trend Zones Identified by SNS Users : Focusing on the Spatial Analysis Using Instagram Data," Information Systems Review, Vol. 20, No. 2, pp. 63-85, 2018.
34 Park, H. W., and Choi, K. H., "Doing Social Big Data Analytics: A Reflection on Research Question, Data Format, and Statistical Test-Convergent Aspects," Journal of Digital Convergence, Vol. 14, No. 12, pp. 591-597, 2016.   DOI