DOI QR코드

DOI QR Code

A Study on the Fraud Detection in an Online Second-hand Market by Using Topic Modeling and Machine Learning

토픽 모델링과 머신 러닝 방법을 이용한 온라인 C2C 중고거래 시장에서의 사기 탐지 연구

  • 이동우 (알티데이터랩 ) ;
  • 민진영 (조선대학교 경상대학 경영학부 )
  • Received : 2021.05.17
  • Accepted : 2021.08.06
  • Published : 2021.11.30

Abstract

As the transaction volume of the C2C second-hand market is growing, the number of frauds, which intend to earn unfair gains by sending products different from specified ones or not sending them to buyers, is also increasing. This study explores the model that can identify frauds in the online C2C second-hand market by examining the postings for transactions. For this goal, this study collected 145,536 field data from actual C2C second-hand market. Then, the model is built with the characteristics from postings such as the topic and the linguistic characteristics of the product description, and the characteristics of products, postings, sellers, and transactions. The constructed model is then trained by the machine learning algorithm XGBoost. The final analysis results show that fraudulent postings have less information, which is also less specific, fewer nouns and images, a higher ratio of the number and white space, and a shorter length than genuine postings do. Also, while the genuine postings are focused on the product information for nouns, delivery information for verbs, and actions for adjectives, the fraudulent postings did not show those characteristics. This study shows that the various features can be extracted from postings written in C2C second-hand transactions and be used to construct an effective model for frauds. The proposed model can be also considered and applied for the other C2C platforms. Overall, the model proposed in this study can be expected to have positive effects on suppressing and preventing fraudulent behavior in online C2C markets.

온라인 C2C 중고거래에 대한 수요가 증가하고 있으나 물품을 보내지 않거나 명시한 것과 다른 물건을 보내는 방식으로 부당한 금전적 이득을 챙기려는 사기 행위자들의 수도 증가하고 있다. 본 연구는 이러한 사기를 미연에 방지하기 위한 머신 러닝 방법을 이용한 사기 탐지 모델을 구축하였다. 이를 위해 대표적 C2C 중고거래 플랫폼인 중고나라에서 145,536건의 거래 게시글을 수집하였다. 이후 이들 게시글에서 토픽 모델링 기법을 이용하여 상품 설명 내용의 주제를 추출하였으며, 상품 설명의 언어적 특성, 준언어적 특성, 상품의 특성, 게시글의 포스팅 특성, 구매자 특성, 거래 특성들을 추출하였다. 이를 XGBoost 방법에 기반한 머신 러닝 모델을 구축하여 사기 게시글을 탐지하였다. 분석 결과, 사기 게시글은 글 자체의 길이가 대체로 짧고, 제공하는 정보가 적고 상대적으로 구체적이지 않은 것으로 나타났으며 명사를 상대적으로 적게 쓰고 이미지도 사용하지 않거나 적게 사용하는 글이 대부분인 것으로 나타났다. 또한 상대적으로 숫자와 공백의 비율이 높게 나타났으며 정상 게시글의 경우 명사의 경우 상품의 정보, 동사의 경우 전달, 형용사의 경우는 행위와 관련된 단어들이 사용되었으나 사기 게시글은 뚜렷한 주제를 가지지 못하는 것으로 나타났다. 본 연구는 전화번호나 계좌번호를 사용한 기존의 방법과 달리 다양한 게시글의 특성으로 사기 여부를 탐지하는 모델을 구축했다는 점에서 학술적, 실무적 시사점을 가지고 있다.

Keywords

References

  1. 김진홍, 안희동, "코로나 불황 속 상반된 중고시장의 흥행요인 분석", 한국정보처리학회 학술대회논문집, 제27권, 제2호, 2020, pp. 151-152. 
  2. 김하정, 조지영, 곽영태, "블록체인을 활용한 P2P 중고거래 플랫폼", 한국정보과학회 학술발표논문집, 2019, pp. 1645-1647. 
  3. 더치트, "피해 사례통계", 2021, accessed Mar 28. 2021, Retrieved from https://thecheat.co.kr/rb/?mod=_statistics 
  4. 문옥영, 한국어 진술서에서 책임회피 시 나타나는 거짓의 언어․심리적 특징 (석사학위논문), 경기대학교 일반대학원, 수원, 2011. 
  5. 이경남, 전계형, "블록체인을 이용한 중고거래 플랫폼 개선방안 연구", 디지털융복합연구, 제16권, 제9호, 2018, pp. 133-145.  https://doi.org/10.14400/JDC.2018.16.9.133
  6. 이나은, 이상원, "컴퓨터 기반 매개 커뮤니케이션 내 준 언어적 신호에 대한 탐색적 연구", 한국 HCI 학회 학술대회, 2017, pp. 746-749. 
  7. 이동은, "우리의 지갑을 노리는 해커, 사이버 사기의 진화", KISO 저널, 제35권, 2019, pp. 52-60. 
  8. 이보한, 나종연, "소비자 간 거래 플랫폼에서의 신뢰의 구성과 형성요인", 소비자학연구, 제31권, 제3호, 2020, pp. 167-191.  https://doi.org/10.35736/JCS.31.3.8
  9. 이석준, 거짓 진술에서 SCAN의 은폐 탐지율에 관한 연구 (석사학위논문), 경기대학교 일반대학원, 수원, 2015. 
  10. 이세정, 거짓말 행동 특징에 대한 신념: 軍수사관 중심으로 (석사학위논문), 경기대학교 행정.사회복지대학원, 경기도, 2018. 
  11. 조유빈, "시장 규모 20조, 중고장터의 '이유있는' 변신", 시사저널, 2020. 3. 5, Retrieved from https://www.sisajournal.com/news/articleView.html?idxno=196345. 
  12. 황현정, 문현수, 이영석, "온라인 중고시장에 서 판매글 신뢰도 분석", 한국정보과학회 학술발표논문집, 2017, pp. 1853-1855. 
  13. Akehurst, L., G. Kohnken, A. Vrij, and R. Bull, "Lay persons' and police officers' beliefs regarding deceptive behaviour", Applied Cognitive Psychology, Vol.10, No.6, 1996, pp. 461-471.  https://doi.org/10.1002/(SICI)1099-0720(199612)10:6<461::AID-ACP413>3.0.CO;2-2
  14. Barse, E. L., H. Kvarnstrom, and E. Jonsson, "Synthesizing test data for fraud detection systems", Paper presented at the Proceedings of the 19th Annual Computer Security Applications Conference, Las Vegas, USA, 2003. 
  15. Bond Jr, C. F. and B. M. DePaulo, "Accuracy of deception judgments", Personality and social psychology Review, Vol.10, No.3, 2006, pp. 214-234.  https://doi.org/10.1207/s15327957pspr1003_2
  16. Brown, P. F., S. A. Della Pietra, V. J. Della Pietra, J. C. Lai, and R. L. Mercer, "An estimate of an upper bound for the entropy of English", Computational Linguistics, Vol.18, No.1, 1992, pp. 31-40. 
  17. Chang, J., S. Gerrish, C. Wang, J. Boyd-Graber, and D. Blei, "Reading tea leaves: How humans interpret topic models", Advances in Neural Information Processing Systems, Vol.22, 2009, pp. 288-296. 
  18. Chen, T. and C. Guestrin, "Xgboost: A scalable tree boosting system", Paper presented at the Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, New York, USA, 2016. 
  19. Chua, C. E. H. and J. Wareham, "Fighting internet auction fraud: An assessment and proposal", Computer, Vol.37, No.10, 2004, pp. 31-37.  https://doi.org/10.1109/MC.2004.165
  20. de Roux, D., B. Perez, A. Moreno, M. D. P. Villamil, and C. Figueroa, "Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach", Paper presented at the Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, USA, 2018. 
  21. Dimoka, A., Y. Hong, and P. A. Pavlou, "On product uncertainty in online markets: Theory and evidence", MIS Quarterly, Vol.36, No.2, 2012, pp. 395-426.  https://doi.org/10.2307/41703461
  22. Ekman, P. and M. O'Sullivan, "Who can catch a liar?", American Psychologist, Vol.46, No.9, 1991, pp. 913-920.  https://doi.org/10.1037/0003-066X.46.9.913
  23. Fanning, K., K. O. Cogger, and R. Srivastava, "Detection of management fraud: A neural network approach", Intelligent Systems in Accounting, Finance and Management, Vol.4, No.2, 1995, pp. 113-126.  https://doi.org/10.1002/j.1099-1174.1995.tb00084.x
  24. Fernando, A. G., B. Sivakumaran, and L. Suganthi, "Comparison of perceived acquisition value sought by online second-hand and new goods shoppers", European Journal of Marketing, Vol.52, No.7/8, 2018, pp. 1412-1438.  https://doi.org/10.1108/EJM-01-2017-0048
  25. Ford, B. J., H. Xu, and I. Valova, "A real-time self-adaptive classifier for identifying suspicious bidders in online auctions", The Computer Journal, Vol.56, No.5, 2013, pp. 646-663.  https://doi.org/10.1093/comjnl/bxs025
  26. Ghosh, S. and D. L. Reilly, "Credit card fraud detection with a neural-network", Paper presented at the Proceedings of the 27th Hawaii International Conference, Wailea, USA, 1994. 
  27. Gupta, A., "The evolution of fraud: Ethical implications in the age of large-scale data breaches and widespread artificial intelligence solutions deployment", International Telecommunication Union Journal, Vol.1, 2018, pp. 1-7. 
  28. Han, H., W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning", Paper presented at the Proceedings of the International Conference on Intelligent Computing, Hefei, China, 2005. 
  29. Hart, C. L., L. P. Hudson, D. G. Fillmore, and J. D. Griffith, "Managerial beliefs about the behavioral cues of deception", Individual Differences Research, Vol.4, No.3, 2006, pp. 176-183. 
  30. Jones, K. S., "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol.28, No.1, 1972, pp. 11-21.  https://doi.org/10.1108/eb026526
  31. Kalman, Y. M. and D. Gergle, "CMC cues enrich lean online communication: The case of letter and punctuation mark repetitions", Paper presented at the Proceedings of the 5th Mediterranean Conference on Information Systems, Tel Aviv, Israel, 2010. 
  32. KOTRA, "중국 중고시장 성장", 2020, Retrieve d from https://news.kotra.or.kr/user/globalBbs/kotranews/782/globalBbsDataView.do?setIdx=243&dataIdx=179992. 
  33. Lemel, R., "C2C E-comerce: The state of academic research in disposing goods online", Copyright 2020 by Institute for Global Business Research, Nashville, TN, USA, 12, 2020. 
  34. Li, Y. and L. Chen, "Risk evaluation for C2C e-commerce via an improved credit counting method", Internet Technology Letters, Vol.222, No.3, 2020. 
  35. Liebman, N. and D. Gergle, "It's (Not) simply a matter of time: The relationship between CMC cues and interpersonal affinity", Paper presented at the Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 2016. 
  36. Lin, J. W., M. I. Hwang, and J. D. Becker, "A fuzzy neural network for assessing the risk of fraudulent financial reporting", Managerial Auditing Journal, Vol.18, No.8, 2003, pp. 657-665.  https://doi.org/10.1108/02686900310495151
  37. Little, B. B., W. L. Johnston Jr, A. C. Lovell, R. M. Rejesus, and S. A. Steed, "Collusion in the US crop insurance program: Applied data mining", Paper presented at the Proceedings of the 2002 SIAM International Conference on Data Mining, 2002. 
  38. Luhn, H. P., "A statistical approach to mechanized encoding and searching of literary information", IBM Journal of Research and Development, Vol.1, No.4, 1957, pp. 309-317.  https://doi.org/10.1147/rd.14.0309
  39. Lundberg, S. and S.-I. Lee, "A unified approach to interpreting model predictions", arXiv preprint arXiv:1705.07874, 2017. 
  40. Maes, S., K. Tuyls, B. Vanschoenwinkel, and B. Manderick, "Credit card fraud detection using Bayesian and neural networks", Paper presented at the Proceedings of the 1st International Naiso Congress on Neuro Fuzzy Technologies, 2002. 
  41. Newman, D., J. H. Lau, K. Grieser, and T. Baldwin, "Automatic evaluation of topic coherence", Paper presented at the Proceedings of the 2010 Annual Conference, 2010. 
  42. Newman, M. L., J. W. Pennebaker, D. S. Berry, and J. M. Richards, "Lying words: Predicting deception from linguistic styles", Personality and Social Psychology Bulletin, Vol.29, No.5, 2003, pp. 665-675.  https://doi.org/10.1177/0146167203029005010
  43. Rasheed, L. O. and A. Olukemi, "Reputation system for fraud detection in nigerian consumer-to-consumer e-commerce", Journal of Computer Science, Vol.7, No.2, 2019, pp. 49-60.  https://doi.org/10.15640/jcsit.v7n2a6
  44. Shapley, L. S., Notes on the N-person Game--II: The Value of an N-person Game, Rand Corporation, 1951. 
  45. Sinayobye, J. O., F. Kiwanuka, and S. K. Kyanda, "A state-of-the-art review of machine learning techniques for fraud detection research", Paper presented at the Proceedings of the 2018 Symposium on Software Engineering in Africa, 2018. 
  46. Smith, N., Reading between the Lines: An Evaluation of the Scientific Content Analysis Technique (SCAN), London, England: Home Office, 2001. 
  47. Tsang, S., Y. S. Koh, G. Dobbie, and S. Alam, "Detecting online auction shilling frauds using supervised learning", Expert Systems with Applications, Vol.41, No.6, 2014, pp. 3027-3040.  https://doi.org/10.1016/j.eswa.2013.10.033
  48. Vrij, A., Detecting Lies and Deceit: Pitfalls and Opportunities, New Jersey, USA: John Wiley & Sons, 2008. 
  49. Vrij, A., K. Edward, and R. Bull, "People's insight into their own behaviour and speech content while lying", British Journal of Psychology, Vol.92, No.2, 2001, pp. 373-389.  https://doi.org/10.1348/000712601162248
  50. Wang, Y. and W. Xu, "Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud", Decision Support Systems, Vol.105, 2018, pp. 87-95.  https://doi.org/10.1016/j.dss.2017.11.001
  51. Xue, H.-Y. and D.-H. Yang, "Implementing circular consumption by means of second-hand goods market", Paper presented at the Proceedings of the 2010 International Conference on Management and Service Science, 2010. 
  52. Yamamoto, H. and H. Ohshima, "Proactive or reactive? Platform governance strategy in C2C marketplace", Paper presented at the Proceedings of the Pacific Asia Conference on Information Systems (PACIS), 2017. 
  53. Yoshida, T. and H. Ohwada, "Shill bidder detection for online auctions", Paper presented at the Proceedings of the Pacific Rim International Conference on Artificial Intelligence, 2010. 
  54. Zainuddin, A., J. Junaidi, and R. D. Putra, "Design of e-commerce payment system at tokopedia online shopping site, Aptisi Transactions On Management, Vol.1, No.2, 2017, pp. 143-155.  https://doi.org/10.33050/atm.v1i2.666
  55. Zuckerman, M., B. M. DePaulo, and R. Rosenthal, "Verbal and nonverbal communication of deception", In Advances in experimental social psychology (Vol.14, pp. 1-59). Amsterdam, Netherlands: Elsevier, 1981. 
  56. Strumbelj, E. and I. Kononenko, "Explaining prediction models and individual predictions with feature contributions", Knowledge and InformaTion Systems, Vol.41, No.3, 2014, pp. 647-665. https://doi.org/10.1007/s10115-013-0679-x