• Title/Summary/Keyword: 초기 설계단계

Search Result 1,002, Processing Time 0.021 seconds

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.