DOI QR코드

DOI QR Code

An Optimized User Behavior Prediction Model Using Genetic Algorithm On Mobile Web Structure

  • Hussan, M.I. Thariq (Department of Computer Science and Engineering, Paavai Engineering College) ;
  • Kalaavathi, B. (Department of Computer Science and Engineering, KSR Institute for Engineering & Technology)
  • 투고 : 2014.06.21
  • 심사 : 2015.04.29
  • 발행 : 2015.05.31

초록

With the advancement of mobile web environments, identification and analysis of the user behavior play a significant role and remains a challenging task to implement with variations observed in the model. This paper presents an efficient method for mining optimized user behavior prediction model using genetic algorithm on mobile web structure. The framework of optimized user behavior prediction model integrates the temporary and permanent register information and is stored immediately in the form of integrated logs which have higher precision and minimize the time for determining user behavior. Then by applying the temporal characteristics, suitable time interval table is obtained by segmenting the logs. The suitable time interval table that split the huge data logs is obtained using genetic algorithm. Existing cluster based temporal mobile sequential arrangement provide efficiency without bringing down the accuracy but compromise precision during the prediction of user behavior. To efficiently discover the mobile users' behavior, prediction model is associated with region and requested services, a method called optimized user behavior Prediction Model using Genetic Algorithm (PM-GA) on mobile web structure is introduced. This paper also provides a technique called MAA during the increase in the number of models related to the region and requested services are observed. Based on our analysis, we content that PM-GA provides improved performance in terms of precision, number of mobile models generated, execution time and increasing the prediction accuracy. Experiments are conducted with different parameter on real dataset in mobile web environment. Analytical and empirical result offers an efficient and effective mining and prediction of user behavior prediction model on mobile web structure.

키워드

1. Introduction

In mobile environments, one of the promising applications is to obtain the requested information from the user within a short span of time with increasing time limitation to surf the web pages. With the advancement in resource access strategy, few researchers concentrated on the problem of caching and pre fetching to recover system routine with the support of World Wide Web environments.

In today’s scenario, there are many databases in the web that in a way generate dynamic web pages on the basis of the queries given by the user queries. These web databases construct the hidden Web, which usually consists of much larger amount of high quality, usually in the form of structured information and has a growth rate in a faster way than the static web. These forms of web databases are highly accessible on the basis of the query posted by the user in the interface through which users can submit queries. As soon as the query is received, the job of the web server is to retrieve the corresponding results from the database and provide them to the user. To construct a system that provides the users to integrate and at the same time, in a way compare the results of the query that are returned from multiple web databases. Finally, the crucial task is to perform the matching of the different sources’ records that refer to the same real-world entity.

Cluster-based Temporal Mobile Sequential Arrangement Mine (CTMSP-Mine) identified the cluster based mobile arrangements. In [1], the subsequent mobile behaviors were predicted. In CTMSP-Mine, user clusters were then obtained using Cluster-Objectbased Smart Cluster Affinity Search Technique (CO-Smart-CAST) where the similarities between users were measured. The method Location-Based Services (LBS) for Mobile Commerce (MC) through mobile phones was presented in [2]. It was based on the cellular network which was composed of several base stations but failed in predicting the attacks from the user side.

However, the deficiency of existing CTMSP-Mine, studies is that it measured only one of the characteristics (i.e.) either the location or service being requested. But both the movement and service requests should be measured at the same time in order to obtain the complete information of user behavior arrangements. In the mobile environments of CTMSP-Mine, each service request submitted was related with the user current location. It acquires positioning techniques like GPS embedded on mobile devices. A series of requests from the user forms a location-service stream. CTMSP-Mine research aims at mining the users’ behavior arrangements such that appropriate services are predicted and recommended for users.

With the differing nature of information and the distributed mode, it has made difficult to obtain the information related to specific user. Although with the introduction of web search engines that has reduced overloading of information to a certain extent, the information retrieved by the user still contains a lot of redundancy.

Given a database that changes over time and the sequence of request made by the user changing over time based on the preferences of user, the objective of similarity-profiled temporal association mining is to identify all the items associated whose variations over time are same as to the sequence of reference based on the value threshold. With the introduction of similarity-profiled temporal association mining several significant relationships of data occur for a specific event over time.

Using data points and a measure of distance used, clustering involves the process of dividing the data set into subsets, referred to as the cluster, in such a way that the data in each subset share certain properties in common. Certain properties that are common in nature are evaluated in a quantitative manner with the optimality measure involving intra-cluster distance or maximum inter-cluster distance, etc. Clustering, one of the important tools to identify the hidden structures in large databases has been studied by many researchers and presented with many algorithms in the literature.

CTMSP-Mine uses data mining algorithm to identify the behavioral arrangements of the users’ in mobile web environments. In addition, few strategies related to prediction were presented for determining the behavior of user on the basis of location and services as requested by the user. It used the movement of the user based on the services as requested by the user for identifying the behavioral arrangements of the user.

CTMSP-Mine discovers sequential mobile access arrangements containing equally movement and service requests simultaneously in an efficient way. Rules were generated accordingly in CTMSP-Mine to obtain the prediction arrangement which were then stored in the form of rule warehouse for online recommendation according to the chronological behavior of the users. Concept-based User Profiles (CUP) [14] included both positive and negative preferences of the users in order to make efficient differentiation between similar and dissimilar queries. Preference mining rules were applied in CUP to improve the average similarity values for similar and dissimilar queries. However, this model did not served as an efficient prediction model.

Determining the user's behavior patterns using PM-GA in mobile web systems is presented. Furthermore, some equivalent prediction strategies are proposed for determining the user's behavior in terms of location and relation services. A PM-GA mechanism is composed of the user's chronological movement related with requested services for efficient discovery of the user's behavioral patterns. PM-GA efficiently discovers sequential mobile access patterns containing equal movement and service requests simultaneously. PM-GA first generates three categories of rules, predict the next position, next requested service and next position with related service correspondingly. Following this, the generated rules are stored in form of category in a rule warehouse for online recommendation according to the chronological behavior of the user.

Based on the aforementioned techniques, a novel optimized user behavior model to increase the prediction accuracy is presented using genetic algorithm associated with region and requested services. The rest of this paper is arranged as follows: Section 2 describes the survey of papers. Section 3 introduces the architecture diagram of the proposed scheme. Section 3.1 describes the overall framework whereas section 3.2 describes the proposed method and algorithm followed by section 3.3 which designs the prediction strategy. Section 4 shows the experimental results and comparison made with other methods. Section 5 describes the conclusion.

 

2. Literature Review

The most important goal of data mining is to identify the arrangements occurring in the databases using associations, classification models, sequential arrangements and so on. Frequent arrangement discovery in [4] involved the process of identifying the feature sets or items that appeared frequently. With the application of fuzzy logic for frequent arrangement discovery provided a mathematical framework that was compatible with poorly quantitative yet qualitatively significant data.

Deploying of arrangement and with the evolving of arrangement, an original and effective arrangement discovery technique [5] was presented. The effective arrangement discovery technique improved the effectiveness of applying and updating the exposed arrangements which was relevant to the decision but consumed higher cost. A service provided introduced in [2] transformed the data prior to supply it to the service provider for resemblance queries on the data being changed. This techniques afford motivating trade-offs between query cost and accuracy.

An indexing scheme in [8] was presented that provided energy and latency efficient processing of full-text searches over the wireless broadcast data stream. A lot of access methods and index structures in the past used for full-text searches were designed for the storage of data in disk but not in wireless broadcast channels. An energy-efficient collaborative target tracking paradigm was developed in [9]. A Mutual Information based Sensor Selection (MISS) algorithm was adopted for participation in the fusion process. MISS allowed the sensor nodes with the highest mutual information to transmit data so that the energy consumption was reduced when the desired target position estimation accuracy was met.

Issues related to the spatiotemporal data depiction was presented in [6] for challenges associated with the spatiotemporal data depiction, analysis, mining and information visualization. Different types of data mining tasks that included association rules, classification and clustering for measuring the respective information from spatiotemporal datasets were examined and reviewed. Genetic Algorithm as used in [3] generated high quality Association Rules with four metrics namely confidence, completeness, interestingness and comprehensibility but at the cost of time.

The challenge related to sequence mining sequences using Spatio-Temporal Association Rules were solved in [7]. Theoretical results were subjugated in order to expand an efficient algorithm, which was established to have linear run time with the numerous interesting sequences exposed. A lattice for drill down and roll up investigative analysis of the sequence arrangements was also presented. Finally, demonstrable and interesting arrangements possessing the above characteristics was established for real world in the field of animal tracking.

Seeds Affinity Propagation (SAP) in [10], an original semi-supervised text clustering algorithm, was designed to obtain higher F-measure and lower entropy and also significantly improved the clustering execution time. A new method called Dark Block Extraction (DBE) [11], was investigated for mechanically estimating the number of clusters in unlabeled data sets, which was based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) on data set, that used numerous ordinary image and sign processing technique.

Logical Analysis of Data (LAD) and Shadow Clustering (SC) were combined in [12] for retrieving the logical products using a kernel approach followed by LAD. LAD consist of a breadth first enumeration model for the whole prime implicates whose degree was not greater than a fixed maximum,‘d’. But the computational cost required by LAD prevents its request even for moderately small size of the input domain. To efficiently discover user's behavior arrangements in mobile systems, a technique named Chronological Movement Related Arrangement Mine (CMRPM) was introduced.

In [13], a dynamic load balancing strategy was presented for performing association rule mining under a grid environment. This load balancing strategy was built upon a hierarchical grid model using three levels Super Coordinator, Coordinator, processing nodes.

To investigate on mobile web structure by overcoming the above limitations, a technique named optimized user behavior Prediction Model using Genetic Algorithm (PM-GA) on mobile web structure is presented by discovering the integrated log files and reporting the accurate search results.

The contributions of user behavior Prediction Model using Genetic Algorithm (PM-GA) on mobile web structure include the following:

 

3. User Behavior Prediction Model using Genetic Algorithm (PM-GA)

The workflow of optimized user behavior arrangement mining using genetic algorithm on mobile web structure (PM-GA) is split into three phases namely Data Incorporation Phase, Mining Phase and Prediction Phase. The dispersed property of mobile system is stored in different databases with the users' position regarding movement and users' service requests. The Fig. 1 shows the framework of PM-GA for predicting the user behavior and to achieve precision and to increase the prediction accuracy.

Fig. 1.System Architecture of PM-GA

The first phase involved in the design of PM-GA is the Data Incorporation Phase whose purpose is to organize and integrate the records into one single dataset. These records are obtained from distributed registers like Permanent Information Register (PIR) and Temporary Information Register (TIR). The Permanent Information Register stores the information of permanent subscriber in a mobile environment whereas the Temporary Information Register maintains the information of temporary user that involves the present region to obtain the requests from the subscribers who are out of the reachable area. The information obtained from both the permanent and temporary register is integrated to form an integrated log file. This integrated log file is then accessed which comprises of user ID, service ID, timestamp, region and duration involved for the specific user in PM-GA system.

During the second phase, mining include a mobile access arrangement (MAA) to identify the frequent models with the help of the integrated log files. Finally, the user submits a service request from various positions and the time at which the request is placed. It includes the information relating to the current region from where the user places a request, the current requested service and the time during the Prediction Phase of PM-GA system. The prediction component then retrieves the matched rules from the repository of web log files according to the user's present behavior. Consequently, the best suggested results in PM-GA are returned to the service agent. The service agent incorporates the recommendation results into the requested service page as a new rendered page for sending to the user.

3.1 Data Incorporation Phase

The first phase involved in PM-GA consists of the data incorporation phase, which integrates the information obtained from the permanent and temporary information register as illustrated in Fig. 2. The integrated log files include, the corresponding user ID, followed by the session established for the user denoted by the session ID. Next, the time of request made by the user is denoted by the time stamp, their corresponding region and the duration of the session for the user to be maintained in the specific session.

Fig. 2.Data Incorporation

3.2 Mathematical Model of Identifying Mobile Access Arrangement

The mathematical model for identifying the mobile access arrangement (MAA) is given below which discover the mobile access arrangement efficiently. Consider two sets namely `P' as position set and `S' as service set respectively. Let us define an ordered pair O = (p, s) where

For each element p in P and s in S, performs an ordered pair O= (p, s), where ‘p’ and ‘s’ are considered as the initial and subsequent element of O. The two ordered pairs (p1, s1) and (p2, s2) are said to be equivalent where p1= p2 and s1= s2 in PM-GA system. Let O be the set of all ordered pairs such that

Let Q=<(o1,q1),(o2,q2),…..,(on,qn)>, where element (oi, qi) is composed of an ordered pair o with time point q with Q representing mobile access arrangement in mobile with length n, namely for n-arrangement. Meanwhile, (oi, qi) is defined as earlier than (oj,qj) if and only if oi

3.3 Construction of MAA

This section discusses in detail about the second phase involved in PM-GA, the mining phase using the mobile access arrangement (MAA). The input to MAA forms the information extracted from the integration of permanent and temporary information register stored in the integrated log files. The mining phase involved in PM-GA includes two parts namely (i) Design of MAA tree and (ii) mining of mobile access models which is described in detail in the forthcoming section.

The objective of constructing MAA tree is to collect the mobile access arrangement from the memory in a compact form for effective development of arrangement and also to increase the level of precision. The advantage of using MAA tree is that it requires minimum amount of physical database to mine all the frequent models. The detailed process of MAA for PM-GA system is constructed with the help of an example as illustrated in Table 1 and Fig. 3.

Table 1.Sample Mobile Access Arrangement

Fig. 3.Construction of MAA Tree

Given the MAA mobile access arrangement as listed in Table 1, the construction of MAA Tree is shown in Fig. 3. As an example, to start with the first mobile access arrangement is extracted from <(p, 1) (q, 3) (r, 5) (s, 7)> and is inserted into the MAA Tree. The traversal counts of each node’s label are increased by 1.

3.4 Algorithm for Mining MAA

Once the MAA Tree is constructed using the integrated log files as input, mining of MAA is performed. The detailed algorithm for MAA is shown below. The MAA tree is constructed in an iterative manner and the process is continued until the termination criteria are met. The algorithm given below illustrates the mobile access arrangement using the integrated log files as input.

Initially, the MAA Tree is scanned with the condition that the value for count is greater than the support threshold δ. The temporary values are then stored in a temporary set Temp. Next, the conditional checking is performed to see that if Temp does not have any value, the prefix arrangement of current MAA Tree is returned to the user as output. If the condition is not satisfied, for each label P in Temp, all the nodes with label name as P are stored into a temporary set P_tmp. If Temp is empty, then the prefix arrangement is given as output and the procedure is stop. Otherwise, every label P in P_tmp is combined with l that results in a newer prefix arrangement is denoted as (P, S), which is then appended to the prefix arrangement of current MAA Tree. As a result, a new MAA Tree is obtained using the new prefix arrangement.

3.5 Prediction Strategy for PM-GA

The mobile arrangement discovered using the MAA Tree predict the behavior of mobile users. Based on three parameters, the region, service and time, the prediction strategy is determined as given below. The prediction rule used for next region is shown in equation (3).

The prediction rule used for next requested service is shown in equation (4).

Once the predicted rules are obtained using the region and service, the temporal characteristics are applied at suitable time interval table by segmenting the logs. The suitable time interval table that split the huge data logs is obtained using Genetic Algorithm. Pruning strategy of PPMWS reduces not only the number of mobile arrangements to be generated but also the execution time for the mining of mobile arrangement. Once the rules are obtained from MAA, the rules are applied based on the region and the service being requested by the mobile user. With the help of time interval table, it is easy to identify different user behaviors at different time intervals. With the help of GA a significant time interval table is obtained. The steps involved in the design of GA-based User Behavior Model are illustrated below:

The above GA-based User Behavior Model efficiently identifies the user behavior with the help of the time interval table. The results obtained shows the significant time interval used to obtain the user behavior according to the next requested service and region.

 

4. Experimental Results

Prediction model for determining user behavior on Mobile Web Structure using GA (PM-GA) provides efficient outcome after numerous experiments that have been processed using msnbc.com anonymous web dataset and Entree Chicago Recommendation Data Dataset extracted using UCI repository. Prediction model for determining user behavior on Mobile Web Structure is implemented in Java platform using Weka tool. In msnbc.com anonymous web dataset are extracted from Internet Information Server (IIS) logs. The msnbc.com anonymous web dataset match each sequence according to the page views of a user for a period of twenty-four hour and every occasion in the sequence corresponds to a users’ request for a page. At the level of URL, the requests are not recorded. It is recorded at the level of page category and each page requests served by means of a caching mechanism.

Entree Chicago Recommendation Data Dataset is a recommended system to the user for evaluating PM-GA based on factors such as cuisine, price style, atmosphere, similarity to a restaurant in another city. The performance of the proposed optimized user behavior Prediction model using Genetic Algorithm (PM-GA) on mobile web structure is measured in terms of precision, number of mobile arrangements generated, execution time and prediction accuracy.

In this work we have seen how the optimized user behavior prediction model identifies the user behavior on mobile structure. The proposed system considers the number of mobile arrangements and precision. In addition, precision measures the degree of actuality in mining data in web services. Data mining precision is dependent on how data is collected and is usually judged by comparing numerous measurements from the same or different sources. It is measured in percentage (%).

Next, the execution efficiency measure the time in which single instruction is executed using mobile data collected from integrated log.

Fig. 4 describes the efficient way of mining the data using the anonymous web dataset. The experiments used here examine the impact of precision in mining the mobile data in web services. The optimized user behavior prediction model on Mobile Web Structure (PM-GA) is compared with the existing CTMSP-Mine [1] and CUP [14]. PM-GA effectively fetch the prediction model arrangements with the help of the MAA tree generated. Experiments showed that the MAA algorithm is more accurate by performing the multiple events in different situations. Compared to the existing work, the PM-GA achieves 2-23% higher level of accuracy than compared to the CTMSP-Mine and 8-31% compared to CUP.

Fig. 4.Performance of Precision

Fig. 5 describes the execution time based on the size of database in anonymous web dataset. Every sequence of PM-GA uses the msnbc.com anonymous web dataset which corresponds to user page views. It performs the faster execution in PM-GA when compared to the CTMSP-Mine [1] and CUP [14]. This is because of the two separate prediction strategies applied in PM-GA, for region and request placed by the user according to the time interval table. With the correctly GA-based prediction strategy, the execution efficiency gets improved whereas in case of the existing CUP, though efficient separation was made and similar and dissimilar queries, but was not oriented towards effective prediction. Comparatively the execution time is 10-29% lesser than compared with CTMSP-Mine and 25-45% compared to CUP respectively.

Fig. 5.Performance of Execution time

Fig. 6 describes the number of user behavior models generated based on the different set of user groups. The PM-GA uses mobile access arrangement of tree structure to effectively recognize the position, service and predict the position and service of mobile users simultaneously. The PM-GA uses the efficient way to increase the number of mobile arrangements to be generated when compared with CTMSP-Mine [1] and CUP [14] using Entree Chicago Recommendation Data Dataset. Comparatively, the efficiency obtained is increased by 7-30% when compared to the existing CTMSP-Mine method and 23-48% compared to CUP.

Fig. 6.Performance of number of user behavior models generated

Fig. 7 illustrates the prediction accuracy generated and detailed comparison is made with the existing CTMSP-mine. The prediction accuracy is higher using PM-GA because of the application of GA-based significant time interval table that results in the increase of the prediction accuracy whereas the overall quality of the resulting query, prediction was not performed using CUP. The rate or prediction accuracy is increased by 11-19% than when compared using the existing CTMSP-Mine and 18-30% compared to CUP.

Fig. 7.Measure of Prediction accuracy with respect to the number of user groups

 

5. Conclusion

In this work, a novel architecture and algorithm using an optimized user behavior prediction model called the PM-GA is designed which allows tree structure for determining the user behavior. The flexibility of the model is demonstrated, showing that it achieves comparable quality and increases the prediction accuracy to its counterpart while providing significant number of mobile arrangements generated, that it is possible to make it equivalent to the existing CTMSP-Mine models by manipulating the number of user groups and the size of database. The model shows good precision with respect to the user groups, increasing the prediction model arrangements being generated significantly. The experiential assessment and sensitivity analysis under various system conditions shows that PM-GA conveys improved performance in terms of precision, mobile arrangements generated execution time and prediction accuracy using the GA-based significant time interval table. An analytical and empirical result offers an efficient and effective arrangement mining and predicts the performance arrangements in mobile systems.

The importance of this contribution stems from its flexibility to accommodate the mobile sequential arrangements in networks through region and service information in addition to the time with which the behavior model are recorded. It also allows higher precision rate to be obtained with minimum execution time by accessing the integrated logs. In addition, we provide increase in the number of mobile arrangements being generated using MAA. For future work, we extend this model to activate prioritization which accesses the specific web structure frequently which highly requires a significant balanced network in terms of height and partition.

피인용 문헌

  1. Evaluation of Mobile Application in User's Perspective: Case of P2P Lending Apps in FinTech Industry vol.11, pp.2, 2015, https://doi.org/10.3837/tiis.2017.02.027