Browse > Article
http://dx.doi.org/10.13088/jiis.2017.23.3.119

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality  

Choi, Sukjae (Humanitas BigData Research Center, Kyung Hee University)
Lee, Jungwon (School of Management, Kyung Hee University)
Kwon, Ohbyung (School of Management, Kyung Hee University)
Publication Information
Journal of Intelligence and Information Systems / v.23, no.3, 2017 , pp. 119-138 More about this Journal
Abstract
Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.
Keywords
SVM; Financial Fraud Detection; Cybercrime; Crisis Management; Text Mining;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Balamurugan, S., R. Rajaram, G. Athiappan and M. Muthupandian, "Data Mining Techniques for Suspicious Email Detection: A Comparative Study," Proceeding of the IADIS European Conference Data Mining 2007, (2007), 213-217.
2 Banerjee, A., Barman, D., Faloutsos, M., & Bhuyan, L. N. Cyber-fraud is one typo away. INFOCOM 2008. The 27th Conference on Computer Communications. IEEE (2008) (pp. 1939-1947). IEEE.
3 Bayer, M., W. Sommer and A. Schacht, "Reading emotional words within sentences: The impact of arousal and valence on event-related potentials," International Journal of Psychophysiology, Vol.78, No.3 (2010), 299-307.   DOI
4 Castell, M. R. F. and L. B. Dacuycuy, "Exploring the use of exchange market pressure and RMU deviation indicator for early warning system (EWS) in the ASEAN+3 region," DLSU Business & Economics Review, Vol.18, No.2 (2009), 1-30.
5 Comfort, L. K., "Crisis Management in Hindsight: Cognition, Communication, Coordination, and Control," Public Administration Review, Vol. 67, No.1 (2007), 189-197.   DOI
6 Choi, S., Jeon, J., Subrata, B., Kwon, O., "An efficient estimation of place brand image power based on text mining technology," Journal of Korea Intelligent Information Systems, Vol. 21, No.2 (2015), 113-129. (최석재, 전종식, 권오병, "텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법," 지능정보연구, Vol.21, No.2 (2015), 113-129.)   DOI
7 Choi, S. Song, Y., Kwon, O., "Analyzing contextual polarity of unstructured data for measuring subjective well-being," Journal of Intelligent Information Systems, Vol.22, No.1 (2016), 83-105. (최석재, 송영은, 권오병, "주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부 정성 분석 방법," 지능정보연구, Vol. 22, No.1 (2016), 83-105.)   DOI
8 Cui, M., Jin, Y. and Kwon, O., "A method of analyzing sentiment polarity of multilingual social media : A case of korean-chinese languages," Journal of Intelligent Information Systems, Vol.22, No.3 (2016), 91-111. (최미나, 진윤선, 권오병, "다국어 소셜미디어에 대한 감성분석 방법 개발," 지능정보연구, Vol. 22, No.3 (2016), 91-111.)   DOI
9 DeAngelo, H. and R. M. Stulz, "Liquid-claim production, risk management, and bank capital structure: Why high leverage is optimal for banks," Journal of Financial Economics, Vol.116, No.2 (2015), 219-236.   DOI
10 Dionne, G., "Risk management: History, definition, and critique," Risk Management and Insurance Review, Vol.16, No.2 (2013), 147-166.   DOI
11 Hassan, A. B., F. D. Lass and J. Makinde, "Cybercrime in Nigeria: Causes, Effects and the Way Out," ARPN Journal of Science and Technology, Vol.2, No.7 (2012), 626-631.
12 Flores, C., "Management of catastrophic risks considering the existence of early warning systems," Scandinavian Actuarial Journal, Vol.1 (2009), 38-62.
13 Folino, G., A. Forestiero, G. Papuzzo and G. Spezzano, "A grid portal for solving geoscience problems using distributed knowledge discovery services," Future Generation Computer Systems, Vol.26, No.1 (2010), 87-96.   DOI
14 Grace, M. F., J. T. Leverty, R. D. Phillips and P. Shimpi, "The value of investing in enterprise risk management," Journal of Risk and Insurance, Vol.82, No.2 (2015), 289-316.   DOI
15 Joachims, T., "Text categorization with support vector machines: Learning with many relevant features," Technical Report LS8-Report, Universitaet Dortmund, 1997.
16 Henderson, L. J., "Emergency and disaster: Pervasive risk and public bureaucracy in developing nations," Public Organization Review, Vol.4, No.2 (2004), 103-119.   DOI
17 Holton, C., "Identifying disgruntled employee systems fraud risk through text mining: a simple solution for a multi-billion dollar problem," Decision Support Systems, Vol.46, No.4 (2009), 853-864.   DOI
18 Jans, M., N. Lybaert and K. Vanhoof, "Internal fraud risk reduction: results of a data mining case study," International Journal of Accounting Information Systems, Vol.11, No.1 (2010), 17-41.   DOI
19 Kim, J. and Kwon, O., "A method of predicting service time based on voice of customer data," Journal of the Korea society of IT services, Vol. 15 (2016), 197-210. (김정훈, 권오병, "고객의 소리 (VOC) 데이터를 활용한 서비스 처리 시간 예측방법," 한국IT 서비스학회지, Vol.15 (2016), 197-210.)   DOI
20 Kumari, A., K. Sharma, and M. Sharma, "Predictive Analysis of Cyber Crime Against Women in India and Laws Prohibiting Them," International Journal of Innovations & Advancement in Computer Science, Vol.4, No.3 (2015), 1-6.
21 Lin, M., X. Ke and A.B. Whinston, "Vertical differentiation and a comparison of online advertising models," Journal of Management Information Systems, Vol.29, No.1 (2012), 195-236.   DOI
22 Mazurczyk, W., T. Holt, and K. Szczypiorski, "Guest Editors' Introduction: Special Issue on Cyber Crime," IEEE Transactions on Dependable and Secure Computing, Vol.13, No.2 (2016), 146-147.   DOI
23 McEntire, David A. The status of emergency management theory: Issues, barriers, and recommendations for improved scholarship. University of North Texas. Department of Public Administration. Emergency Administration and Planning, (2004).
24 Sadgrove, K. The complete guide to business risk management. Routledge, 2016.
25 Nykodym N., R. Taylor and J. Vilela, "Criminal profiling and insider cyber crime," Digital Investigation, Vol.2 (2005), 261-267.   DOI
26 Perez-Gonzalez, F., and H. Yun, "Risk management and firm value: Evidence from weather derivatives," The Journal of Finance, Vol.68, No.5 (2013), 2143-2176.   DOI
27 Petak, W. J., "A Challenge for Public Administration," Public Administration Review, Vol.45 (1985), 3-7.   DOI
28 Waugh, W. L., and G. Streib, "Collaboration and leadership for effective emergency management," Public administration review, Vol.66, No.1 (2006), 131-140.   DOI
29 Sahami, M., S. Dumais and D. Heckerman and E. Horvitz, "A Bayesian Approach to Filtering Junk E-Mail," In Learning for Text Categorization: Papers from the 1998 workshop, Vol.62 (1998), 98-105.
30 Sreenivasulu, V., and R.S. Prasad, "A Methodology for Cyber Crime Identification using Email Corpus based on Gaussian Mixture Model," International Journal of Computer Applications, Vol.117, No.13 (2015), 29-32.   DOI
31 Yates, D, and S. Paquette, "Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake," International Journal of Information Management, Vol.31 (2011), 6-13.   DOI
32 Michaelidou, N., N. T. Siamagka and G. Christodoulides, "Usage, barriers and measurement of social media marketing: An exploratory investigation of small and medium B2B brands," Industrial Marketing Management, Vol.40 (2011), 1153-1159.   DOI
33 Zhao. L. and Y. Jiang, "A game theoretic optimization model between project risk set and measurement," International Journal of Information Technology & Decision Making, Vol.8, No.4 (2009), 769-786.   DOI