DOI QR코드

DOI QR Code

Adaptive Cloud Offloading of Augmented Reality Applications on Smart Devices for Minimum Energy Consumption

  • Chung, Jong-Moon (School of Electrical & Electronic Engineering, Yonsei University) ;
  • Park, Yong-Suk (School of Electrical & Electronic Engineering, Yonsei University) ;
  • Park, Jong-Hong (School of Electrical & Electronic Engineering, Yonsei University) ;
  • Cho, HyoungJun (School of Electrical & Electronic Engineering, Yonsei University)
  • Received : 2015.03.26
  • Accepted : 2015.06.08
  • Published : 2015.08.31

Abstract

The accuracy of an augmented reality (AR) application is highly dependent on the resolution of the object's image and the device's computational processing capability. Naturally, a mobile smart device equipped with a high-resolution camera becomes the best platform for portable AR services. AR applications require significant energy consumption and very fast response time, which are big burdens to the smart device. However, there are very few ways to overcome these burdens. Computation offloading via mobile cloud computing has the potential to provide energy savings and enhance the performance of applications executed on smart devices. Therefore, in this paper, adaptive mobile computation offloading of mobile AR applications is considered in order to determine optimal offloading points that satisfy the required quality of experience (QoE) while consuming minimum energy of the smart device. AR feature extraction based on SURF algorithm is partitioned into sub-stages in order to determine the optimal AR cloud computational offloading point based on conditions of the smart device, wireless and wired networks, and AR service cloud servers. Tradeoffs in energy savings and processing time are explored also taking network congestion and server load conditions into account.

Keywords

1. Introduction

Augmented reality (AR) is an emerging field in information technology in which video images taken by a camera are enhanced with computer-generated virtual objects or video/audio information in real-time. As shown in Fig. 1, the image of an object may be acquired using the built-in camera of a smartphone and processed to obtain additional information about the object. This information is made available and presented to the user by overlaying it with the real view of the object, thereby augmenting the reality. AR introduces a whole new way of human-computer interaction, and it provides endless opportunities for applications in diverse fields including, but not limited to, industrial, commercial, and entertainment areas.

Fig. 1.AR application based on MVS where context-sensitive information is displayed on the smartphone’s screen after receiving the associated information from the AR cloud server

Smart devices such as smartphones and tablet computers are ideal platforms for AR applications, providing the necessary imaging, sensory, and networking peripherals. Smart devices today come equipped with powerful processors, graphic processing units, high-resolution cameras and displays, location sensors, and high-speed wireless network interfaces. As smart devices are becoming a popular and reasonably priced commodity, the number of AR applications and their users is expected to rapidly increase within a few years [1]. At the same time, AR applications can greatly enhance mobile user experience by serving as an interface itself, making mobile search transparent to the user and reduce search efforts. AR requires little interaction from the user since the smart device senses and analyzes the surroundings and provides location based or context sensitive information in real-time.

Even though smart devices are seeing an overall performance increase, they are still incomparable to desktop computers and servers in terms of performance capacity. Many applications, AR applications inclusive, are still computationally intensive to be fully supported on a smart device. In addition, the specification and performance increase in smart devices consequently has imposed more stringent energy consumption constraints on these battery-powered devices. Recently, mobile cloud computing has emerged to fill this gap in performance and save energy [2]. In mobile cloud computing, smart devices make use of external resources accessible via wireless networks. Computationally intensive tasks are offloaded to the cloud server instead of being processed locally on the mobile device. Offloading is the process of loading or transferring a section of application execution to more powerful processing platforms such as servers or clouds. Offloading can potentially save both energy and time for completing a given task on the mobile device.

In AR, mobile visual search (MVS) applications in particular can benefit from mobile cloud offloading. MVS is based on object recognition. MVS performs visual search in which the data obtained from the image queried is compared and matched against a database of images. The database used for visual search is quite massive and cannot be located locally on the smart device due to memory constraints. Therefore, the database needs to reside on the server side and offloading becomes essential for MVS applications. MVS involves extensive search and matching for comparison. Therefore, algorithms and tasks involved in MVS are also computationally intensive, which affects the battery power consumption of the mobile device [3]. Offloading may decrease the processing load of mobile devices and save energy, and consequently, it can extend the use time and battery lifetime of the mobile device.

Although computational offloading provides certain benefits to MVS applications, it may not always be beneficial to offload from the user experience point of view. If the network is congested or if the cloud server is overloaded or unreachable, the incurred processing delay at the AR cloud server could result in an annoying or intolerable user experience. The amount of mobile network traffic and the load on cloud servers have busy day and busy hour (BDBH) periods that result in significant fluctuations in processing speed and delay time. Some of these variations have patterns that are predictable, but many are not. Therefore, real-time delay factors that affect user experience, such as mobile network traffic conditions and AR cloud server status, need to be taken into account when offloading decisions are made. This is why adaptable computation offloading control is necessary and can be very effective. Previous works related to computational offloading focus on either maximizing energy savings or optimizing mobile application responsiveness. In order to be truly useful, focus should be given in maximizing user experience, balancing energy and time savings accordingly under the given conditions and circumstances.

In this paper, mobile computation offloading of MVS AR applications is considered, taking into account varying network traffic and server conditions. In the following sections, tradeoffs between computation time, efficiency, and mobile device energy savings are analyzed. The goal is to determine potential and optimal mobile offloading points under given conditions and priorities that satisfy user quality of experience (QoE) and provide device energy savings.

 

2. Mobile Visual Search

In this section, the MVS AR application and its analysis method are explained. The basic steps involved in MVS are shown in Fig. 1. MVS applications use the smart device’s built-in camera in order to acquire a snapshot picture or motion video image of the scenery or object. Images are not compared pixel-by-pixel for object recognition. Instead, distinct characteristics called “features” are extracted from the snapshot. Pictures may be taken from different angles, distances, or lighting conditions. Therefore, features extracted should be robust against scale (i.e., different sizes), rotation, illumination, or viewpoint in order to be useful for visual search. Extracting features makes data to be processed smaller and more manageable as well. The extracted features are then compared to other sets of features previously stored in a database. Based on the number of feature matches in common, a set of candidate images is selected from the database. Geometric verification is further performed on the selected images to verify that the matching features between the two images being compared are consistent with changes in viewpoints. If two images are determined to be the same, additional information associated with the feature is retrieved and provided to the user. For example, when a snapshot of a product is taken, the product is identified by the MVS application by finding matching product features from the database. Once identified, information associated with the product such as price, manufacturer, contents, etc. can be retrieved and provided to the user. The retrieved information may be in any format the application chooses it to be. The information may be in text format and presented to the user by overlaying it on top of the original image. If a scenery image is taken at a tourist site, video clips or voice guides may be provided. The possibilities of creating diverse applications using AR on smart devices are virtually endless.

Although it is possible to process all the MVS steps on the smart device, due to excessive energy consumption, it is preferable to partition the tasks between the mobile client and AR cloud server as seen in Fig. 1. Image acquisition and feature extraction take place on the mobile smart device since the image needs to be acquired at the user’s location. Extracting the features of the image and transmitting them over the network also reduces the payload size compared to transmitting the original image captured. Feature matching and verification against the database takes place on the cloud server since most visual search databases are too memory intensive to be supported on the mobile smart device.

The key MVS process performed on the smart device’s platform is feature extraction. As previously mentioned, the features extracted for object recognition need to be robust enough to match images of different scales, rotations, and viewpoints. Many different algorithms for feature detection and description have been developed over the years, the best known being Scale Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF). SIFT uses difference of Gaussian and the Gaussian pyramid to find features [4]. SURF makes use of Hessian blobs and uses box filters instead of Gaussian kernels to simplify and speed up computation [5].

The theoretical complexity of SIFT and SURF is (O(mn + k)) where m and n represent the width and height of the image (both in units of pixels), respectively, and k represents the number of key points or interest points [6]. Interest points are the distinctive features of the image. The theoretical complexity of SIFT and SURF imply that the computation increases linearly with the dimensions or size of the image to be processed. As smart devices evolve, higher resolution cameras and displays will be at the user’s disposal, processing ultra-high resolution images ranging from 8 to 20 megapixels. The introduction of ultra-high resolution images enables accurate AR feature identification, but at the same time this creates a huge burden in terms of processing data for MVS AR applications. Therefore, computation for feature extraction on the smart device will significantly increase. Sending vast amounts of data over a wireless link for visual image search may congest the network. Processing the offloaded data on the server will also take more time and resources. In the process, the energy consumption of the mobile device will also increase due to possible retransmissions and timeouts. Therefore, it is important to determine the optimal offloading point for feature extraction that can balance the load between the cloud server and the smart device.

In this paper, the optimal offloading point within the SURF feature extraction process is investigated to achieve further performance enhancement and energy savings on the mobile device when performing MVS. SIFT provides the best results, but SURF produces good matching performance at a faster, reduced computational complexity [7]. Therefore, for the purposes of this paper, SURF is used for the evaluation of feature extraction offloading.

Fig. 2 shows SURF feature extraction subdivided into six steps. The step-1 Grayscale Image Generation (GIG) process changes the original JPEG image captured by the device into a gray valued image in order to make it robust to color modifications. The step-2 Integral Image Generation (IIG) process builds an integral image from the grayscale image which allows fast calculation of summations over image sub-regions. The step-3 Response Map Generation (RMG) process constructs the scale-space in order to detect interest points using the determinant of the image’s Hessian matrix. Using the scale response maps generated in the previous stage, the maxima and minima (which are used as the actual interest points) are detected during the Interest Point Detection (IPD) in step-4. In order to achieve invariance to image rotation, each detected interest point is assigned a reproducible orientation in the Orientation Assignment (OA) process in step-5. This orientation provides rotation invariance. The step-6 Descriptor Extraction (DE) is the process where an interest point is uniquely identified to be distinguished from other interest points. In terms of computation, GIG and IIG are trivial while IPD is the most complex among the steps. The processes after step-6 have to be executed at the AR cloud server, and the final AR information will be returned to the smart device for display.

Fig. 2.Process steps involved in feature extraction based on SURF

The input and output data file sizes (in units of bits) at each step is also shown in Fig. 2. H, W, and I represent the image height, image width, and number of interest points, respectively. It shows that the output data size at each step is dependent on the size of the image queried. The output data size increases in relation to the increasing image resolution. The output data size is dependent on the type of image, since the number of interest points detected varies depending on the image being processed. If an image has many interest points, the output data size increases. GIG, IIG, RMG, and IPD process the image on a pixel-by-pixel basis, so these stages are dependent on the size of the image (i.e., H and W), while OA and DE are also dependent on the number of interest points detected (i.e., I) in addition to H and W. GIG outputs a 32-bit grayscale image of the query image, and since each pixel is represented as 32 bits, the output data size becomes W x H x 32 bits. IIG generates a 32-bit integral image from the grayscale image generated by GIG. Since each pixel of the integral image is also represented as 32 bits, the size of the resulting integral image is W x H x 32 bits. The integral image is used by RMG to create scale spaces, where the scale space is divided into octaves which represent a series of filter response maps. Octaves encompass a scaling factor of 2, so the size of the filter corresponding to the image is divided by 2 at each subsequent scale (i.e., H and W are divided by 2 at each subsequent scale). The number of octaves may vary based on the settings, where in this particular example, 4 octaves are used. For the first octave, the scale space is constructed for 4 filter sizes (9x9, 15x15, 21x21, and 27x27), which is represented as 4(W/2 x H/2) in the output equation for RMG in Fig. 2. For the second octave, a scale space is constructed for 2 filter sizes (39x39 and 51x51), which is represented as 2(W/4 x H/4). For the third octave, the scale space is constructed for filter sizes of 75x75 and 99x99, which is represented as 2(W/8 x H/8), and for the fourth octave, the scale space is constructed for filter sizes of 147x147 and 195x195, which is represented as 2(W/16 x H/16). The constructed scale space is used to detect interest points in the IPD stage, where the detected interest points are represented as a vector in rectangular coordinates of x and y. The output of the IPD stage includes the x and y vector coordinates of each of the I detected interest points along with the IIG file, which are sent to the OA stage. The OA process computes the orientation information which is saved as a vector along with the x and y coordinates. The output of the OA stage contains the orientation information of each of the I detected interest points along with the IIG file, which are sent to the DE stage. The DE process generates a descriptor vector of length 64 for each interest point, which results in a size of 64 x 32 bits x I. Each element of the descriptor vector represents an intensity pattern that preserves spatial information of the interest point.

 

3. Offloading Point Decision

The size of the data to be transmitted varies depending on the offloading point. The objective is to select an offload point that can save energy and satisfy user QoE requirements (i.e., time bounded performance requirements). In this section, the basic offloading scenario for feature extraction process is defined. Fig. 3 shows the possible offloading switching points between the smart device and the AR cloud server. The smart device offloads by transmitting the output data at step-S (ranging from step-1 to step-6 in SURF) to the AR cloud server. If S < 6, the cloud server will carry on the feature extraction processes on behalf of the smart device beginning at step-S+1. If offloading takes place, the server will execute the remaining feature extraction process until completion, all the way to the final step SF, corresponding to step-6 DE (i.e., SF = 6 in SURF). The amount of data processing at step-n is represented as αn (in units of bits). Therefore, the total feature extraction data processing by the smart device can be represented as the summation of α1 to αS, and the total feature extraction data processing by the cloud server can be represented as the summation of αS+1 to αSF, presented in Fig. 3.

Fig. 3.Computation offloading example based on several SURF feature extraction process steps between the smart device and the cloud server

Fig. 4 shows the overall time involved in mobile cloud offloading TAR (in units of seconds), for the MVS AR application. TM represents the time required by the smart device. This is the feature extraction process time spent by the smart device before offloading at step-S. The smart device needs to transmit offloading data and receive MVS results over the wireless link. The overhead in time incurred for transmission (uplink) and reception (downlink) are TUL and TDL, respectively. Additional overhead is incurred by data traversing various routers and switches within the wired network, which is represented by TRS. TC represents the time spent by the cloud server in performing the offloaded feature extraction from step-S+1. TDB represents the time spent by the cloud server to search and identify matching features in the AR database.

Fig. 4.Total time required for MVS AR application when mobile cloud offloading is used

The time can be further detailed as the amount of data divided by the data processing speed. Table 1 lists the detailed computation offloading parameters involved. Feature extraction processing time at the smart device can be obtained from which is based on the total data processed by the smart device (i.e., summation of α1 to αS) divided by the smart device’s parameters vm and dm. The delay influence factor dm is normalized as 0 ≤ dm ≤ 1 in which dm =1 results in no delay and dm=0 results in infinite delay. Other delay influence factors that need to be considered in this analysis are dc for the cloud server, dUL for uplink, and dDL for downlink, all of which are defined the same way as dm. The feature extraction processing time at the server can be obtained from which is based on the total data processed by the server (i.e. summation of αS+1 to αSF) divided by the cloud server’s parameters vc and dc. TDB(I,H,W) represents the time required by the AR database. The time consumed over the wireless network is for uplink and for downlink, where F(S) and FF represent the amount of data sent uplink and downlink, respectively. Commonly, FF < F(S) since FF only consists of the final results, such as AR information of the extracted features and position information on where to place this information on the image.

Table 1.Computation offloading parameters

The total time required for the AR application TAR is upper bounded by the required QoE time TQoE. Since TAR must be less than or equal to TQoE, equation (1) becomes the constraint of the energy minimizing adaptive offloading point control process.

The energy consumed by the smart device ESD (based on offloading feature extraction at step S) involves the energy for processing up to step S, the energy for transmitting the output file of step S, and the energy to receive the results from the cloud server. The energy for processing up to step S can be obtained by multiplying the smart device’s energy consumption parameter ε to the process bit amount The energy for transmission of output and reception of results can be obtained by considering the power consumption parameters PUL for uplink and PDL for downlink respectively multiplied to the time durations of for uplink and for downlink. Therefore, ESD can be represented as in (2).

As transmission requires more power compared to reception (i.e., PDL < PUL) and since the intermediate data transmitted for feature extraction is much larger than the result data returned by the cloud server (i.e., FF < F(S)). Considering the influence of both of these inequalities, it is safe to assume that For the analysis in this paper, the term will be neglected for simplification.

 

4. Experiments & Performance Analysis

In this section, a performance analysis of the AR experiments conducted on smartphones is presented. If only energy consumption is considered, the processing time may increase significantly, affecting the performance of the AR process and leading to an unbearable time delay for the user. Therefore, an execution time limit needs to be imposed when attempting to minimize the energy consumption of the smart device. Therefore the time requirement of (1) and the energy consumption profile of (2) are used together in determining the offloading point that consumes the least amount of energy for the smart device while satisfying the QoS requirements. For this analysis, first (1) is organized in terms of and the inequality is inserted into (2), to obtain the energy upper bound (EBound) of ESD(S) presented in (3).

Based on constraint (3), the value of S that results in the minimum ESD(S) value can be obtained.

For each MVS iteration, the adaptive computation offloading process shown in Fig. 5 is performed. First, all the relevant parameters are gathered. The parameters are computed and updated as summarized in Table 1. Then for all possible offloading switching points S, the corresponding ESD and EBound are computed. The switching point S that satisfies the constraint ESD ≤ EBound and gives the maximum energy savings (i.e., minimum EBound) is selected as the offload point. If no S satisfies the constraint (i.e., ESD > EBound), the S with minimum ESD is selected. Local computation of feature extraction is done up to step S, and computation offloading is performed at step S+1.

Fig. 5.Flow chart of adaptive computation offloading process.

Experiments were conducted based on actual measurements using a Nexus One (HTC-PB99400) smartphone where a desktop PC server was used to emulate the AR cloud server and database, as shown in Fig. 6. The Nexus One smart device runs on Android 2.3 and has a 1 GHz CPU, 480x800 display resolution, and 5 Mpixel rear camera. The desktop server uses a Windows operating system with an Intel Core2 Quad CPU 2.50 GHz and 4 GB of RAM. Images of varying resolutions of 640x480, 1024x768, and 1280x960 were tested in the AR process to measure the energy consumption and processing time. The measured values were divided by the number of pixels and interest points, and their average and standard deviation values were used in the performance analysis. The average energy per bit consumed by the feature extraction process on the smartphone was measured to be ε = 0.0011 J/bit. Statistical analysis was also performed on the network traffic data sampled for delay influence factor computation. The Kolmogorov-Smirnov (K-S) test was used to verify the probability distribution function (PDF) of the measured data. The K-S test can be used to compare a sample with a reference probability distribution. For each empirical distribution of the measured data and the cumulative distribution function (CDF) of the candidate distribution, values of distance and significance level are calculated, where n is the number of measurements and ε is the maximum difference between the empirical data and the CDF of the candidate distribution. The distribution with the smallest D and the largest α is considered as the proper distribution of the measured data. Test results show that the measured data comes from a normal distribution.

Fig. 6.Experiment setup showing Nexus One smartphone, power meter connections, and PC server as the emulated AR cloud server and database

The experiment measurement results were compared to the resulting values of (1), (2), and (3), which confirmed an accurate match. Fig. 7-(a) compares ESD(S) and EBound) based on dc = 0.01 and dUL = 1. In Fig. 7-(a), among the points that satisfy (3) (i.e., ESD ≤ EBound)), the minimum energy consuming step-S can be found. By extending this method for various parameter combinations, a comprehensive view of the experimental results is presented in Fig. 7-(b). Fig. 7-(b) presents the optimal offloading points based on a variety of dc and dUL conditions. The graph shows that even under poor network or server conditions there are varying points in time where it is more effective to offload rather than execute the entire feature extraction process on the smart device. For instance, given server load dc = 0.1 and maximum upload throughput dUL = 1, offloading after the GIG step results in minimum energy consumption for the smartphone, however, under heavy server load conditions of dc = 0.01 and high uplink network traffic congestion conditions of dUL = 0.1, offloading after the DE step will result in minimum energy consumption for the smartphone, while satisfying the QoE requirements of (3). In conclusion, the offloading point for feature extraction that results in minimum energy consumption for the smart device can be easily found and is highly dependent on the conditions of the smart device, server, and network.

Fig. 7.Experiment results: (a) energy consumed when offloading; (b) minimum energy offloading points that satisfy (3) for SURF feature extraction process steps depending on uplink delay and server congestion

 

5. Conclusion

Smart devices are optimal platforms for AR applications. In the future, performance enhancements in smart devices and their cameras will results in more accurate and powerful AR applications, thereby contributing to the usefulness and popularity of AR services. AR applications in mobile devices that use database searches for object recognition can benefit from computational offloading to the AR cloud server, which may lead to increase in battery life of the smart device and also reduce the AR execution time. In order to benefit from offloading, it is crucial to determine the appropriate offloading point by taking into account the varying network and server conditions.

In this paper, offloading the feature extraction process based on SURF for a mobile visual search AR application has been analyzed. The energy and time constraints have been considered to determine the optimal offloading point. Results show that various computation points may exist, and through proper selection a reduction in overall energy consumption of the smart device can be achieved. Partial execution of the process on the smart device can also decrease the load on the cloud server, thereby avoiding cloud overloading in processing capability and memory space, which are important during BDBH periods.

References

  1. Thomas Olsson and Markus Salo, "Online User Survey on Current Mobile Augmented Reality Applications," Proc. IEEE ISMAR 2011, pp. 75-84, October 26-29, 2011. Article (CrossRef Link).
  2. K. Kumar and Y. Lu, "Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?," Computer, vol. 43, no. 4, pp. 51-56, 2010. Article (CrossRef Link). https://doi.org/10.1109/MC.2010.98
  3. B. Girod, V. Chandrasekhar, R. Grzeszczuk, and Y. Reznik, "Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard," IEEE Multimedia, vol. 18, no. 3, pp. 86-94, 2011. Article (CrossRef Link). https://doi.org/10.1109/MMUL.2011.48
  4. David Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int. J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. Article (CrossRef Link). https://doi.org/10.1023/B:VISI.0000029664.99615.94
  5. H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-Up Robust Features (SURF)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008. Article (CrossRef Link). https://doi.org/10.1016/j.cviu.2007.09.014
  6. P. Drews, R. de Bem, and A. de Melo, "Analyzing and Exploring Feature Detectors in Images," in Proc. of IEEE INDIN 2011, pp. 305-310, 2011. Article (CrossRef Link).
  7. L. Juan and O. Gwun, "A Comparison of SIFT, PCA-SIFT and SURF," Int. J. of Image Processing, vol. 3, no. 4, pp. 143-152, 2009. Article (CrossRef Link).

Cited by

  1. Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications vol.6, pp.3, 2015, https://doi.org/10.1109/lwc.2017.2696539
  2. Adaptive Resource Management and Provisioning in the Cloud Computing: A Survey of Definitions, Standards and Research Roadmaps vol.11, pp.9, 2015, https://doi.org/10.3837/tiis.2017.09.006
  3. Augmented Reality based Low Power Consuming Smartphone Control Scheme vol.11, pp.10, 2015, https://doi.org/10.3837/tiis.2017.10.026