DOI QR코드

DOI QR Code

Structuring of unstructured big data and visual interpretation

부산지역 교통관련 기사를 이용한 비정형 빅데이터의 정형화와 시각적 해석

  • Received : 2014.10.03
  • Accepted : 2014.11.07
  • Published : 2014.11.30

Abstract

We analyzed the articles from "Kukje Shinmun" and "Busan Ilbo", which are two local newpapers of Busan Metropolitan City. The articles cover from January 1, 2013 to December 31, 2013. Meaningful pattern inherent in 2889 articles of which the title includes "Busan" and "Traffic" and related data was analyzed. Textmining method, which is a part of datamining, was used for the social network analysis (SNA). HDFS and MapReduce (from Hadoop ecosystem), which is open-source framework based on JAVA, were used with Linux environment (Uubntu-12.04LTS) for the construction of unstructured data and the storage, process and the analysis of big data. We implemented new algorithm that shows better visualization compared with the default one from R package, by providing the color and thickness based on the weight from each node and line connecting the nodes.

2013년 1월 1일부터 2013년 12월 31일까지의 부산지역지인 국제신문과 부산일보의 기사들 중 제목에 '부산'과 '교통'을 동시에 포함한 2889건의 기사 내용의 관계 또는 관련 있는 데이터에 내재되어 있는 의미 있는 패턴을 찾아내고자한다. 데이터마이닝 (datamining)의 일부인 텍스트마이닝(textmining)의 기법을 이용하여 사회네트워크분석 (SNA; social network analysis)을 실시하였다. 비정형 데이터의 정형화를 위해 빅데이터의 저장, 처리 및 분석을 위해 자바 기반의 오픈소스 프레임워크인 하둡 생태계 (Hadoop ecosystem)의 HDFS와 맵리듀스 (MapReduce)를 Linux (Ubuntu-12.04LTS) 환경에서 이용하였고, 기존의 R패키지에서 제공되는 사회 네트워크 분석보다 효율적인 시각화를 위해 각 노드 및 선에 비율에 따른 가중치를 주어 색상과 굵기로 해석할 수 있도록 새로운 알고리즘을 구현하였다.

Keywords

References

  1. Barnes, J. (1954). Class and committees in a Norwegian island parish. Human Relations, 7, 39-58. https://doi.org/10.1177/001872675400700102
  2. Chae, M., Kang, M. and Kim, Y. (2013). Documents recommendation using large citation data. Journal of the Korean Data & Information Science Society, 24, 999-1011. https://doi.org/10.7465/jkdi.2013.24.5.999
  3. Cho, J. (2012). Inflow and outflow analysis of double majors using social network analysis. Journal of the Korean Data & Information Science Society, 23, 693-701. https://doi.org/10.7465/jkdi.2012.23.4.693
  4. Choi, H., Park, H. and Park, C. (2013). Support vector machines for big data analysis. Journal of the Korean Data & Information Science Society, 24, 989-998. https://doi.org/10.7465/jkdi.2013.24.5.989
  5. Choi, S., Kang, C., Choi, H. and Kang, B. (2011). Social network analysis for a soccer game. Journal of the Korean Data & Information Science Society, 22, 1053-1063.
  6. Huh, M. (2010). Introduction to social network analysis using R, Freedom Academy, Seoul.
  7. Kim, Y. and Cho, K. (2013). Big data and statistics. Journal of the Korean Data & Information Science Society, 24, 959-974. https://doi.org/10.7465/jkdi.2013.24.5.959
  8. Ko, Y. and Kim, J. (2013). Analysis of big data using Rhipe. Journal of the Korean Data & Information Science Society, 24, 975-987. https://doi.org/10.7465/jkdi.2013.24.5.975
  9. Park, J., Lee, Y., Kang, D. and Won, J. (2013). Hadoop and MapReduce. Journal of the Korean Data & Information Science Society, 24, 1013-1027. https://doi.org/10.7465/jkdi.2013.24.5.1013
  10. Son, D. (2002). Social network analysis, Kyungmoon Publishers, Seoul.

Cited by

  1. A study on the nation images of the big three exporting countries in East Asia shown in Wikipedia English-Edition vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1071
  2. Irregular Bigdata Analysis and Considerations for Civil Complaint Based on Design Thinking vol.9, pp.8, 2018, https://doi.org/10.13106/ijidb.2018.vol9.no8.51.
  3. R을 이용한 대학의 학과 명칭 분석 vol.22, pp.6, 2014, https://doi.org/10.6109/jkiice.2018.22.6.829