Browse > Article
http://dx.doi.org/10.9716/KITS.2019.18.2.143

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning  

Woo, Deock-Chae (국민대학교 데이터사이언스학과)
Moon, Hyun Sil (경희대학교 경영대학 & AI경영연구센터)
Kwon, Suhnbeom (국민대학교 경영학부)
Cho, Yoonho (국민대학교 경영학부)
Publication Information
Journal of Information Technology Services / v.18, no.2, 2019 , pp. 143-159 More about this Journal
Abstract
Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.
Keywords
Machine Learning; Deep Learning; Feature Engineering; Automated Feature Extraction; Transaction Data;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Ahn, S.M., "Deep Learning Architectures and Applications", Journal of Intelligence and Information Systems, Vol.22, No.2, 2016, 127-142.   DOI
2 Alex, S., S.H. Seo, and Y. Kwon, "Development of Deep Learning Models for Multi-class Sentiment Analysis", Journal of Information Technology Services, Vol.16, No.4, 2017, 149-160.   DOI
3 Babaee, M., D.T. Dinh, and G. Rigoll, "A deep convolutional neural network for video sequence background subtraction", Pattern Recognition, Vol.76, 2018, 635-649.   DOI
4 Chollet, F., Deep Learning with Python, Manning Publications Company, New York, 2017.
5 Balaji, A. and A. Allen, "Benchmarking Automatic Machine Learning Frameworks", arXiv preprint arXiv:1808.06492, 2018.
6 Bansal, T., D. Belanger, and A. McCallum, "Ask the gru : Multi-task learning for deep text recommendations", In Proceedings of the 10th ACM Conference on Recommender Systems, 2016, 107-114.
7 Barkan, O. and N. Koenigstein, "Item2vec : neural item embedding for collaborative filtering", In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing, 2016, 1-6.
8 Kanter, J.M. and K. Veeramachaneni, "Deep feature synthesis : Towards automating data science endeavors", In 2015 IEEE International Conference on Data Science and Advanced Analytics, 2015, 1-10.
9 Tibshirani, R.J., "Statistical Learning with Big Data", In the Joint Statistical Meetings 2017, 2017.
10 Sun, Z., J. Yang, J. Zhang, A. Bozzon, Y. Chen, and C. Xu, "MRLR : Multi-level Representation Learning for Personalized Ranking in Recommendation", In 26th International Joint conferences on Artificial Intelligence, 2017, 2807-2813.
11 Thomas, R., An Introduction to Deep Learning for Tabular Data, 2018. Available at https://www.fast.ai/2018/04/29/categorical-embeddings(Downloaded 28 February, 2019)
12 Wallach, H.M., "Topic modeling : beyond bag-ofwords", In Proceedings of the 23rd International Conference on Machine Learning, 2006, 977-984.
13 Pembeci, I., "Using word embeddings for ontology enrichment", International Journal of Intelligent Systems and Applications in Engineering, Vol.4, No.3, 2016, 49-56.   DOI
14 Wang, Y., L. Kung, and T.A. Byrd, "Big data analytics : Understanding its capabilities and potential benefits for healthcare organizations", Technological Forecasting and Social Change, Vol.126, 2018, 3-13.   DOI
15 Wang, Y. and X.J. Wang, "A new approach to feature selection in text classification", In 2005 International conference on machine learning and cybernetics, 2005, 3814-3819.
16 Wu, L., S.C. Hoi, and N. Yu, "Semantics-preserving bag-of-words models and applications, "IEEE Transactions on Image Processing", Vol.19, No.7, 2010, 1908-1920.   DOI
17 Zhang, D., H. Xu, Z., Su, and Y. Xu, "Chinese comments sentiment classification based on word2vec and SVMperf", Expert Systems with Applications, Vol.42, No.4, 2015, 1857-1863.   DOI
18 Zhang, Y., R. Jin, and Z.H., Zhou, "Understanding bag-of-words model : a statistical framework", International Journal of Machine Learning and Cybernetics, Vol.1, No.1-4, 2010, 43-52.   DOI
19 Katz, G., E.C.R. Shin, and D. Song, "Explorekit : Automatic feature generation and selection", In 2016 IEEE 16th International Conference on Data Mining, 2016, 979-984.
20 Bradley, A.P., "The use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern recognition, Vol.30, No.7, 1997, 1145-1159.   DOI
21 Kohavi, R., "A study of cross-validation and bootstrap for accuracy estimation and model selection", In the International Joint Conference on Artificial Intelligence, Vol.14, No.2, 1995, 1137-1145.
22 Krizhevsky, A., I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", In Advances in neural information processing systems, 2012, 1097-1105.
23 Lam, H.T., J.M. Thiebaut, M. Sinn, B. Chen, T. Mai, and O. Alkan, "One button machine for automating feature engineering in relational databases", arXiv preprint arXiv : 1706.00327, 2017.
24 LaValle, S., E. Lesser, R. Shockley, M.S., Hopkins, and N. Kruschwitz, "Big data, analytics and the path from insights to value", MIT Sloan Management Review, Vol.52, No.2, 2011, 21-31.
25 Lee, H., D. Lim, and H. Zo, "Personal Information Overload and User Resistance in the Big Data Age", Journal of Intelligence and Information Systems, Vol.19, No.1, 2013, 125-139.   DOI
26 Lee, J.J., S.B. Kwon, and S.M. Ahn, "Sementic Analysis Using Deep Learning Model based on Phoneme-level Korean", Journal of Information Technology Services, Vol.17, No.1, 2018, 77-89.
27 Deng, L. and Y. Liu, Deep Learning in Natural Language Processing, Springer, Singapore, 2018.
28 Chen, T. and C. Guestrin, "Xgboost : A scalable tree boosting system", In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, 785-794.
29 Cho, K., B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation", arXiv preprint arXiv : 1406.1078, 2014.
30 Chung, J., C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling", arXiv preprint arXiv : 1412.3555, 2014.
31 Dhingra, B., H. Liu, Z. Yang, W.W. Cohen, and R. Salakhutdinov, "Gated-attention readers for text comprehension", arXiv preprint arXiv : 1606.01549, 2016.
32 Domingos, P.M., "A few useful things to know about machine learning", Communications of the ACM, Vol.55, No.10, 2012, 78-87.   DOI
33 Faust, O., Y. Hagiwara, T.J. Hong, O.S. Lih, and U.R. Acharya, "Deep learning for healthcare applications based on physiological signals : a review", Computer methods and programs in biomedicine , Vol.161, 2018, 1-13.   DOI
34 Ghosh, S. and M.S. Desarkar, "Class Specific TF-IDF Boosting for Short-text Classification : Application to Short-texts Generated During Disasters", In Companion of the The Web Conference 2018 on The Web Conference 2018, 2018, 1629-1637.
35 Barnaghi, P., A. Sheth, and C. Henson, "From data to actionable knowledge : big data challenges in the web of things", IEEE Intelligent Systems, Vol.6, 2013, 6-11.   DOI
36 Ozsoy, M.G., "From word embeddings to item recommendation", arXiv preprint arXiv : 1601.01356, 2016.
37 Mitchell, T.M. Machine Learning, McGraw-Hill, New York, 1997.
38 Mikolov, T., I. Sutskever, K., Chen, G.S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality", In Advances in neural information processing systems, 2013, 3111-3119.
39 Muller, A.C. and S. Guido, Introduction to machine learning with Python : a guide for data scientists, O'Reilly Media, Inc., California, 2016.
40 Ng., A., Machine Learning and AI via brain simulations, 2013, Available at http://datascien ceassn.org/sites/default/files/Machine%20 Learning%20and%20AI%20via%20Brain% 20Simulations.pdf(Downloaded 28 February, 2019).
41 Pal, N.R. and S.K. Pal, "A review on image segmentation techniques", Pattern recognition, Vol.26, No.9, 1993, 1277-1294.   DOI
42 Park, C.Y., I.H., Jang, and Z.K. Lee, "Authorship Attribution of Web Texts with Korean Language Applying Deep Learning Method", Journal of Information Technology Services, Vol.15, No.3, 2016, 147-155.   DOI
43 Park, J. and Y. Cho, "Clickstream Big Data Mining for Demographics based Digital Marketing", Journal of Intelligence and Information Systems, Vol.22, No.3, 2016, 143-163.   DOI
44 Rusinol, M. and J. Llados, "Logo spotting by a bag-of-words approach for document categorization", In 2009 10th international conference on document analysis and recognition, 2009, 111-115.
45 Hochreiter, S. and J. Schmidhuber, "Long shortterm memory", Neural computation, Vol.9, No.8, 1997, 1735-1780.   DOI
46 Zheng, A. and A. Casari, Feature Engineering for Machine Learning : Principles and Techniques for Data Scientists, O'Reilly Media, Inc., California, 2018.
47 Zhou, P., Z. Qi, S., Zheng, J., Xu, H., Bao, and B., Xu, "Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling", arXiv preprint arXiv : 1611.06639, 2016.
48 Zhou, Q., N. Yang, F. Wei, C. Tan, H. Bao, and M. Zhou, "Neural question generation from text : A preliminary study", In National CCF Conference on Natural Language Processing and Chinese Computing, 2017, 662-671.
49 Hanley, J.A. and B.J. McNeil, "The meaning and use of the area under a receiver operating characteristic(ROC) curve", Radiology, Vol. 143, No.1, 1982, 29-36.   DOI
50 He, K., X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers : Surpassing humanlevel performance on imagenet classification", In Proceedings of the IEEE international conference on computer vision, 2015, 1026-1034.
51 Goodfellow, I., Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
52 IBM, Extracting business value from the 4 V's of big data, 2017, Available at https://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data(Downloaded 28 February, 2019)
53 Jaderberg, M., A. Vedaldi, and A. Zisserman, "Deep features for text spotting", In European conference on computer vision, 2014, 512-528.
54 Johnson, R. and T. Zhang, "Effective use of word order for text categorization with convolutional neural networks", arXiv preprint arXiv : 1412.1058, 2014.
55 Jordan, M.I. and T.M. Mitchell, "Machine learning : Trends, perspectives, and prospects", Science , Vol.349, No.6245, 2015, 255-260.   DOI
56 Joulin, A., E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification", arXiv preprint arXiv : 1607.01759, 2016.
57 Jozefowicz, R., W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architectures", In International Conference on Machine Learning, 2015, 2342-2350.
58 Sikka, K., T. Wu, J., Susskind, and M. Bartlett, "Exploring bag of words architectures in the facial expression domain", In European Conference on Computer Vision, 2012, 250-259.
59 Sarkar, D.J. Understanding Feature Engineering (Part 1)-Continuous Numeric Data, 2018, Available at https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b (Downloaded 28 February, 2019)
60 Sharif Razavian, A., H. Azizpour, J. Sullivan, and S. Carlsson, "CNN features off-the-shelf: an astounding baseline for recognition", In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition Workshops, 2014, 806-813.
61 Snoek, J., H. Larochelle, and R.P. Adams, "Practical bayesian optimization of machine learning algorithms", In Advances in Neural Information Processing Systems, 2012, 2951-2959.