DOI QR코드

DOI QR Code

온라인 커뮤니티 사용자의 행동 패턴을 고려한 동일 사용자의 닉네임 식별 기법

A Method for Identifying Nicknames of a User based on User Behavior Patterns in an Online Community

  • 박상현 (서강대학교 컴퓨터공학과) ;
  • 박석 (서강대학교 컴퓨터공학과)
  • 투고 : 2017.07.20
  • 심사 : 2017.11.28
  • 발행 : 2018.02.15

초록

온라인 커뮤니티란 SNS와 달리 사용자들이 닉네임을 통해 익명으로 관심사와 취미를 공유하는 가상 그룹 서비스이다. 그런데 이런 익명성을 악의적으로 활용하는 사용자들이 존재하고, 닉네임의 변경으로 인해 동일 사용자의 데이터가 서로 다른 닉네임에 존재하는 데이터 파편화 문제가 발생할 수 있다. 또한 온라인 커뮤니티에서는 닉네임을 변경하는 일이 빈번하므로 동일 사용자를 식별하는데 어려움을 겪는다. 따라서 본 논문에서는 이러한 문제를 해결하기 위해 온라인 커뮤니티 특성을 고려한 사용자의 행동 패턴 특징 벡터를 제시하며, 관계 패턴이라는 새로운 암시적 행동 패턴을 제안함과 동시에 랜덤 포레스트 분류기를 이용한 동일 사용자의 닉네임을 식별하는 기법을 제안한다. 또한 실제 온라인 커뮤니티 데이터를 수집해 제안한 행동패턴과 분류기를 이용해 동일 사용자를 유의미한 수준으로 식별할 수 있음을 실험적으로 보인다.

An online community is a virtual group whose members share their interests and hobbies anonymously with nicknames unlike Social Network Services. However, there are malicious user problems such as users who write offensive contents and there may exist data fragmentation problems in which the data of the same user exists in different nicknames. In addition, nicknames are frequently changed in the online community, so it is difficult to identify them. Therefore, in this paper, to remedy these problems we propose a behavior pattern feature vectors for users considering online community characteristics, propose a new implicit behavior pattern called relationship pattern, and identify the nickname of the same user based on Random Forest classifier. Also, Experimental results with the collected real world online community data demonstrate that the proposed behavior pattern and classifier can identify the same users at a meaningful level.

키워드

과제정보

연구 과제번호 : 차분 프라이버시 기반 비식별화 기술 개발

연구 과제 주관 기관 : 정보통신기술진흥센터

참고문헌

  1. Brown, J., Broderick, A. J. and Lee, N., "Word of Mouth Communication within Online Communities: Conceptualizing the Online Social Network," Journal of Interactive Marketing, Vol. 21, No. 3, pp. 2-20, 2007. https://doi.org/10.1002/dir.20082
  2. Korea Information Society Development Institute, "Comparison of Social Relationship Formation Mechanism between SNS and Online Community," pp. 70-72, 2012. (in Korean)
  3. Korea Internet & Security Agency, "2015 Survey of the Internet Usage," pp. 65-66, 2015. (in Korean)
  4. Daeseon Choi, Seok Hyun Kim, Jin-Man Cho, Seung-Hun Jin, "Big Data Privacy Risk Analysis," Review of KISSC, Vol. 13, No. 2, 2013. 6. (in Korean)
  5. Arthur, C. (2006). What is the 1% rule? In The guardian. UK: Guardian News and Media.
  6. Perito, D., Castelluccia, C., Kaafar, M. A., & Manils, P. "How unique and traceable are usernames?," International Symposium on Privacy Enhancing Technologies Symposium, pp. 1-17, Jul. 2011.
  7. Zafarani, R, Liu, H. "Connecting users across social media sites: a behavioral-modeling approach," Proc. of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 41-49, Aug. 2013.
  8. Malhotra, A., Totti, L., Meira Jr, W., Kumaraguru, P., Almeida, V., "Studying user footprints in different online social networks. In Advances," 2012 IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM), pp. 1065-1070, Aug. 2012.
  9. Mu, X., Zhu, F., Lim, E. P., Xiao, J., Wang, J., Zhou, Z. H., "User Identity Linkage by Latent User Space Modelling," Proc. of the 22th ACM SIGKDD international conference on Knowledge discovery and data mining, Aug. 2016.
  10. Yutao Z., Jie T. Zhilin Y., Jian P,, Philip. S., "Cosnet: Connecting heterogeneous social networks with local and global consistency," Proc. of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1485-1494, Aug. 2015.
  11. Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K. "Identifying Users Across Social Tagging Systems," Proc. of the Fifth International AAAI Conference on Weblogs and Social Media, pp. 522-525, Jul. 2011.
  12. Goga, O., Lei, H., Parthasarathi, S. H. K., Friedland, G., Sommer, R., Teixeira, R., "Exploiting innocuous activity for correlating users across sites," Proc. of the 22nd international conference on World Wide Web, pp. 447-458, May. 2013.
  13. Liu S., Wang S., Zhu F., Zhang J., Krishnan R., "Hydra: Large-scale social identity linkage via heterogeneous behavior modeling," Proc. of the 2014 ACM SIGMOD, pp. 51-62, Jun. 2014.
  14. Ioffe, S. Improved consistent sampling, "weighted minhash and l1 sketching," 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 246-255, Dec. 2010.
  15. Keogh, E., "Exact indexing of dynamic time warping," Proc. of the 28th international conference on Very Large Data Bases, pp. 406-417, Aug. 2002.
  16. Breiman, L, "Random forests," Machine learning, Vol. 45, No. 1, pp. 5-32. 2001. https://doi.org/10.1023/A:1010933404324
  17. https://en.wikipedia.org/wiki/Confusion_matrix
  18. Gilles Louppe, "Understanding Random Forests: From Theory To Practice," PhD thesis, University of Liege, pp. 87-98, Jul. 2014.