1. Introduction
Internet of Thing (IoT) applications, such as smart monitoring, smart transportation and home automation, depend on the exact location of monitored objects. Therefore, location-based services (LBS) has become widely used. Localization technology is important to the development of LBS. Higher localization accuracy allows the provision of more relevant information. Therefore, improving the localization accuracy is an important issue.
LBS can be roughly divided into outdoor localization and indoor localization. However, indoor and outdoor environments are very different, with respect to complexity and the need to consider weather conditions. Many studies of indoor localization have been published and a certain degree of accuracy has been achieved. Outdoor localization is based on a well-developed and widely used technology, called Global Positioning System (GPS). However, the accuracy of GPS is easily affected by the weather and obstacles. The most serious shortcoming of GPS is that a GPS module consumes a lot of power so it cannot be used to track an object for a long time. Additionally, GPS does not have a communication function. If it needs to be flexibly used in IoT, then communication modules must be used at the same time. Received Signal Strength Indicator (RSSI) have been used for outdoor localization. This method combines communication and localization functions. For example, some people use Wi-Fi or bluetooth for localization, which consumes less power and provides communication functions. However, such communication systems have short transmission distances and their signal strength is reduced by obstacles. Therefore, more gateways must be established to achieve the desired localization accuracy.
Since low-power wide-area network (LPWAN) technology uses the low frequency band (operating below the GHz band), its penetration rate is high and it is robust against multipath effects. Of LPWAN-related technologies, Some studies have compared the stabilities of LoRa, Wi-Fi, and BLE signals in an indoor environment [1]. Independently of the distance between signal transmitters and the receiver in the same field, the variation of the LoRa signal strength is less than the variations of the signal strengths associated with the other two technologies, indicating that the LoRa signal is very stable [2].
Outdoor localization techniques are quite diverse. Improving localization accuracy by improving the training structure of the overall algorithm outdoors with high environmental complexity is critical. Therefore, in this study, an AI@LBS system that uses the fingerprint mechanism with the clustering algorithm in unsupervised learning is developed for outdoor localization. The contributions of this work are as follows.
1. This study proposes an intelligent LoRa-based positioning system to provide accurate location data for outdoor environment.
2. The proposed architecture comprises four layers, which are data collecting, data preprocessing, data training and data testing layers.
3. The data noise is filtered using the DBSCAN algorithm, increasing the positioning accuracy from 95.37% to 97.38%.
4. The problem of data imbalance is addressed using the SMOTE technique, increasing the positioning accuracy from 97.38% to 99.17%.
2. Related Work
This section introduces the related work including the positioning methods, the localization techniques and the machine learning mechanisms used in the work.
2.1 Positioning Methods
This section introduces the related work including the positioning methods, the localization techniques and the machine learning mechanisms used in the work. The positioning methods include Angle of Arrival (AOA) [3], Time of Arrival (TOA) [4] and Time Difference of Arrival (TDoA) [5].
2.2 Localization Techniques
Proximity localization involves finding the target's closest gateway and locating the target at the gateway [6]. The localization accuracy of this method depends on the density of the gateway. Geometric localization uses fewer gateways. This method obtains connection information from targets and multiple gateways and then performs more complex geometric operations, such as trilateration and triangulation, to estimate locations [7]. The last category involves pattern recognition. It is often called fingerprint localization [8]. Fingerprinting uses special patterns that are determined from signal characteristics that are extracted from signal data that are captured at particular locations called reference points.
2.3 LPWAN Localization [9,10]
In recent years, LPWAN has been developed and deployed around the world to cater for future IoT demand. These LPWAN technologies can be divided into two categories according to the frequency band distribution: NB-IoT is a frequently discussed LPWAN technology in licensed bands, compared to the most notable LPWAN technologies in unlicensed Bands: Sigfox, LoRa, etc.
2.3.1 NB-IoT-based Localization
At present, there are some NB-IoT-related literatures in the field of localization to provide their research results. Lin et al. [11] identify the design challenges of localization support in LTE-M and NB-IoT. And then provides an overview of the Observed Time Difference of Arrival (OTDOA) architecture, which is a downlink based localization method. Finally summarized the designs of OTDOA localization reference signals. Hu et al. [12] use TDoA to deploy NB-IoT system. To calculate OTDOA for NB-IoT localization, they propose a successive interference cancelation algorithm. The average error in the simulated environment is reduced from 80 meters to 20 meters, which shows that if the interference factor is reduced, the localization error will be greatly reduced. Radnosrati et al [13] uses the OTDoA measurement to provide device tracking strategies in the NB-IoT system, which can consume less power by using NB-IoT.
2.3.2 Sigfox-based Localization
The previous research on Sigfox's localization mainly focused on the signal strength. Sallouha et al [14] uses Sigfox technology to evaluate the localization performance of fingerprints in an actual outdoor environment. It exploits the presence of some GPS nodes by using their fingerprinting in a real Sigfox deployment. This technique is suitable for scenes where categories are separated. When the distance between categories is far away, the location classification accuracy is as high as 100%, which means that the larger the category gap, the better the segmentation, but in this case, it does not meet the actual application situation. Janssen et al [6] designed a localization method that combines open source Wi-Fi BSSID databases with Sigfox. The researchers carried mobile devices and moved in Antwerp, Belgium, to get the latest Wi-Fi BSSID. The same batch of researchers further explored the localization accuracy of Sigfox through the K-NN algorithm of fingerprints [15]. This paper uses the data set provided by another study [16], which covers a larger range of fields. Compared with the reference paper [16], the localization error of Janssen’s study [6] is reduced from 689 meters to 340 meters, which is equivalent to double the localization accuracy. It can be seen that the algorithm of machine learning combined with fingerprint can achieve a good localization performance [17].
2.3.3 LoRa-based Localization
LoRa can support geolocation, high receiver sensitivity (-136 dBm), and long distances between the transmitter and the receiver [18]. LoRa enables it to distinguish messages from different signal sources, so it is very useful for tracking functions in urban situations with reflections. Various studies of geolocation-based LoRa technology have been performed in recent years, and it has been found to be highly accurate with an accuracy of up to 4m. LoRa is also an open source and information about its implementation, layers, packet structure, communication protocols, and other related features, is readily available. Large-scale LoRa data sets that have been collected in rural and urban areas are available [16]. LoRa networks can be based on TDOA method, ensuring accuracy over hundreds to thousands of kilometers [19].
3. Proposed AI@LBS System
In this study, the proposed AI@LBS system architecture is shown in Fig. 1. The proposed architecture comprises four layers, which are data collecting, data preprocessing, data training and data testing layers. In the data collecting layer, data that are transmitted between the LoRa node and the LoRa gateway in the same field are collected. The collected data comprise training data and testing data. The data preprocessing layer is responsible for processing the training dataset, including feature evaluation, noise filtering, and data imbalance learning. The main purpose of this layer is to reduce data clutter and make the data set as effective as possible during the training stage. In the data training layer, the machine learning algorithm is used to build the positioning system, AI@LBS. To investigate the feasibility of the proposed system, validation and field testing are performed during data testing. Fig. 2 presents the system flow, which is described with reference to campus positioning applications below. The data collection and features extraction from different gateways introduces in section 3.1. The features evaluation based on ANOVA and select feature with higher F-value introduces in section 3.2.1. The noise filtering based on DBSCAN introduces in section 3.2.2. The imbalance learning based on SMOTE introduces in section 3.2.3. The data training layer introduces in section 3.3. The data testing layer introduces in section 3.4.
Fig. 1. Proposed AI@LBS System Architecture
Fig. 2. Proposed AI@LBS System Flow
3.1 Data Collecting Layer
The goal of the campus localization application in this work is that when an individual who is holding a mobile device walks to any location on campus, that device is located in the closest building. Collected data consist of information about landmarks at the school, such as buildings and playgrounds. The GPS is used with the Google timeline function to provide the latitude and longitude of the target. Finally, the data that are collected on campus are transmitted to the cloud for computation by LoRa networking.
The original, collected data that are stored in the cloud include information such as time, bandwidth, frequency, datasize, device eui, gateway eui, humidity, temperature, RSSI, uplink count, SNR, and more. The sample data is shown in Fig. 3. From the observation of the received data include the RSSI and SNR that are received by the LoRa gateway and environmental factors such as temperature, humidity and carbon dioxide, as measured by sensors that is attached in the target device. These factors are referred to as features in the study.
Fig. 3. The Collected Data
Since many obstacles are present on the campus, this fingerprint mechanism is used for localization, as described in Section 2.2. This mechanism firstly measures the feature-related data at the predefined reference points on campus. In the signal masking area on campus, the RSSI and SNR values are set to -200 dBm and +20, respectively. An example for a 3 gateways scenario, if the data at a reference point are received only by gateway A and gateway B, then gateway C will not receive a signal from the target, and the signal strength of gateway C is - 200 dBm.
In this study, data from the LoRa-based target are collected every 5s. Each datum will be received from the different gateways and uploaded to the cloud server for subsequent processing. Then, the data from various gateways are merged into a single dataset. For the oncampus application in this study, three LoRa gateways ideally receive data from the target simultaneously. However, the problem of data loss in transmission cannot be prevented, especially in a very noise outdoor environment. Therefore, the data that are received from the three gateways are merged, according to their timestamps. Fig. 4 presents sample data associated with a reference point.
Fig. 4. Data Merging Process
3.2 Data Preprocessing Layer
The data processing layer mostly pre-processes data to filter out features with low correlation or high noise, and then maps the dataset using a pre-defined model. Since data imbalance commonly occurs during the training process, the SMOTE algorithm is used to enhance learning efficiency. In the layer, three processes - feature evaluation, noise filtering, and data imbalance learning – are performed.
3.2.1 Features Evaluation
The purpose of the process is to determine whether the features retain the original influence and relevance in the issue. The effect of features on subsequent intelligent model learning will also be considered, along with kind of model that should be used to learn. The number of features also influences the effectiveness of learning; for example, more features during training increases the computation time.
The collected features include SNR and RSSI data from the three gateways, as well as environmental factors, such as humidity, temperature and co2. The ANOVA analysis results are shown in Fig. 5. All of the collected feature data have some validity, but in order to reduce the complexity of learning, features with F-values of less than 100 are eliminated.
Fig. 5. ANOVA-based Features Evaluation
Recursive feature selection is used to combine features that provide the highest accuracy. As observed in Fig. 6, as the number of features is reduced, the amount of data to be collected is significantly reduced and the convergence time of the learning is improved. If only three features such as the RSSIs that are received from the three gateways, are used, then the accuracy is only 75%. However, when the above eight features are used, the accuracy of the proposed system reaches 99%.
Fig. 6. Recursive Features Selection
3.2.2 Noise Filtering
In many localization studies, noise that is generated by wireless signals must be filtered out. If the noise is not filtered out, then it may cause instability of the constructed AI@LBS system. Therefore, DBSCAN, which is a clustering method, is used to filter the noise in this work.
Owing to the use of the distance-based clustering algorithm, the last is typically "circular-like" faced with clusters with complex shapes. The method usually fails to achieve satisfactory results. The density-based clustering method, DBSCAN algorithm, effectively avoids this problem [20]. DBSCAN algorithm is not limited by the number of clusters. The algorithm is as follows:
(1) Initial parameters include data set, the radius Eps and the critical value MinPts.
(2) Select unprocessed points from the database.
(3) Whether is the core point or not?
(4) If it is not, return (1) and continue to find the next point.
(5) If it is, then find all points where the direct density is reachable, and then form a cluster. Return (1) and continue to find the next point.
In this study, the DBSCAN algorithm is used in the localization application. The left side of Fig. 7 presents the distribution of learning data with the NTUST campus. For the sake of visualization, the three features with the highest f-value are shown here. The left image presents the distribution of the original data in the feature space. Green and red colors represent the distributions of data that are obtained from different class. Both green and red data form obvious groups and some data points are far away from the two groups. The DBSCAN algorithm can be used to identify and further filter out these outliers, as shown on the righthand side of Fig. 7.
Fig. 7. Data Preprocessing with/without DBSCAN Algorithm
3.2.3 Data Imbalance Learning
Since the DBSCAN algorithm filters the data of each class to unequal extents, data imbalance occurs. Without processing, the built AI@LBS will be biased towards the class with the most learning data in the training phase, reducing the fairness of learning. The above SMOTE process yields the new data points that are shown in Fig. 8. The data density of the two classes was significantly improved.
Fig. 8. Data Preprocessing with/without SMOTE Method
3.3 Data Training Layer
This machine learning algorithm, Random Forest, is used to train the localization classification model. Random forest is a supervised learning method which is combined with the Bagging algorithm in Ensemble Learning and the CART (Classification and Regression Tree) algorithm [21]. During the research process, Bagging adopted a random sampling mechanism to reduce the impact of non-correlated data, and CART decision tree improved the sensitivity of the model to the features, thereby improving the disadvantages of both algorithms. Random forests are divided into three parts for training: data collection, feature sampling, and decision tree generation using the CART algorithm. In this study, the problem of the inconsistency of the traditional decision tree was solved by performing two random samplings, which also solved the problem of overfitting. Before machine learning is performed, the preprocessed data are typically divided into two sets - one for training and the other validation. K-fold-cross validation is a statistical method for cutting a sample into multiple small subsets that have been used as data and verification data. The 10-fold cross validation (k=10), which cuts the data into ten folds, is used herein. As shown in Fig. 9, the first fold is validation data, and all others are training data. After segmentation, a model is learning using the folds, and the accuracy of the model is evaluated. Then, the same method is used with the second fold as verification data, and so on.
Fig. 9. K-fold-cross Operation
3.4 Data Testing Layer
In data training layer, k-fold-cross validation is performed to validate the performance of the learned AI@LBA system. The AI@LBA system is tested and its accuracy is analyzed in the data testing layer. Fig. 10 presents the procedures of the field test.
Fig. 10. Field Test Procedures
4. Performance Analysis
The localization application with the campus of National Taiwan University of Science and Technology (NTUST, www.ntust.edu.tw) is conducted. The NTUST topology is shown in Fig. 11. In this field application, an individual at any location on campus is accurately guided to the closest target building by information that is send from the mobile node (Target) to the LoRa gateway. The three LoRa gateways are represented by a purple circle and located on the tenth floor of the three buildings, and the campus boundaries are selected to cover the widest area. Thus, good localization is achieved using few gateways.
Fig. 11. NTUST Campus Topology (www.ntust.edu.tw)
This LoRa devices (node and gateway) are from KIWI Technology Incorporation (https://kiwi-tec.com/en/). Data are transmitted between the LoRa node and the LoRa gateway and then uploaded to the KIWI cloud for computation. The specifications of the LoRa device configuration are listed in Table 1.
Table 1. LoRa Device (LAS-301, TLG79) Configuration
4.1 Performance Analysis
Eight cases were considered to verify the accuracy and stability of the proposed AI@LBS system. Based on the density of the deployed reference points, the eight cases are divided into two groups, as presented in Table 2. Cases 1-1 to 1-4 involve 14 reference points and 4000 data, and Cases 2-1 to 2-4 involve six reference points and 2000 data. The amount of data is increased to 4000 using SMOTE algorithm. The number of testing data is 800.
Table 2. 8 Cases Study
Fig. 12 presents the accuracies of the proposed AI@LBS system for all cases. Owing to the use of 10-fold cross validation, differences in standard deviations are compared to elucidate the stability in each case, the detected accuracy falls within a certain error range, as shown in Fig. 13. Fig. 14 presents the average distance errors in the field test.
Fig. 12. Performance Analysis- Accuracy
Fig. 13. Performance Analysis- Standard Deviation
Fig. 14. Performance Analysis- Average Distance Error
From Fig. 12 ~ 14, regardless of the accuracy, standard deviation or distance error, the results in Case 2-1 are worse than those in Case1-1, as expected. The DBSCAN algorithm process increases the accuracy from 95.37% to 97.37% (Case 2-1 to Case 2-2) and the standard deviation is reduced from 0.336% to 0.237%, indicating that noise filtering greatly increases the stability of the data. Similarly, the SMOTE method process increases the accuracy from 95.37% to 98.38% (Case 2-1 to Case 2-3). If the original data in Case 2-1 are preprocessed by the SMOTE algorithm (Case 2-3), then this case can replace Case 1-1 and the number of reference points thus be reduced. Preprocessing the data by DBSCAN algorithm and then SMOTE method processes greatly reduces the standard deviation. The accuracy is simultaneously increased owing to the increase in diversity of the effective data by preprocessing them using SMOTE method process.
5. Conclusion
The AI@LBS positioning system is designed for intelligent localization using AI and LoRa techniques, and is suitable for use in a noisy outdoor environment, such as a campus or forest. To mitigate the problem of the instability problem over a long distance and to increase overall localization accuracy, a machine learning architecture with the DBSCAN algorithm and the SMOTE method is proposed for outdoor localization; it is used on the NTUST campus. The proposed architecture comprises four layers, which are data collecting, data preprocessing, data training and data testing layers. In the data collecting layer, data that are transmitted between the LoRa node and the LoRa gateway are collected. The data preprocessing layer is responsible for processing the training dataset, including feature evaluation, noise filtering, and data imbalance learning. In the data training layer, the machine learning algorithm is used to build the positioning system, AI@LBS. In the testing layer is responsible for investigate the feasibility of the proposed system. The Fingerprint mechanism with the Random Forest algorithm is used to realize the AI@LBS system, whose accuracy and average distance error reach 95.37% and 2.72m, respectively. When the AI@LBS system is improved using the DBSCAN algorithm and SMOTE method processes, its accuracy and average distance error reach 99.17% and 0.48m, respectively.
References
- B. Islam, M.T. Islam and S, "Nirjon, Feasibility of LoRa for Indoor Localization," Technical Reports, University of North Carolina, USA, 2017.
- E. Goldoni, L. Prando, A. Vizziello, P. Savazzi and P. Gamba, "Experimental Data Set Analysis of RSSI-based Indoor and Outdoor Localization in LoRa Networks," Internet Technology Letters, Vol.2, No.1, pp.75-80, 2018.
- A. Yassin, Y. Nasser, M. Awad, A.D. Ahmed, R. Liu, C. Yuen, R. Raulefs and E. Aboutanios, "Recent Advances in Indoor Localization: A Survey on Theoretical Approaches and Applications," IEEE Communications Surveys & Tutorials, Vol.19, No.2, pp.1327-1346, 2017. https://doi.org/10.1109/COMST.2016.2632427
- R. Kaune, "Accuracy Studies for TDOA and TOA Localization," in Proc. of the International Conference on Information Fusion, Singapore, pp.408-415, August 2012.
- S. Uebayashi, M. Shimizu and T. Fujiwara, "A Study of TDOA Positioning Using UWB Reflected Waves," in Proc. of the 78th IEEE Vehicular Technology Conference, USA, pp.1-5, September 2013.
- T. Janssen, M. Weyn and R. Berkvens, "Localization in Low Power Wide Area Networks Using Wi-Fi Fingerprints," Applied Sciences, Vol.7, No.9, pp.1-16, 2017.
- S. He and S. G. Chan, "INTRI: Contour-Based Trilateration for Indoor Fingerprint-Based Localization," IEEE Transactions on Mobile Computing, Vol.16, No.6, pp.1676-1690, 2017. https://doi.org/10.1109/TMC.2016.2604810
- Y. Shu, Y. Huang, J. Zhang, P. Coue, P. Cheng, J. Chen and K. G. Shin, "Gradient-Based Fingerprinting for Indoor Localization and Tracking," IEEE Transactions on Industrial Electronics, Vol.63, No.4, pp.2424-2433, 2016. https://doi.org/10.1109/TIE.2015.2509917
- N. Poursafar, M.E.E. Alahi and S. Mukhopadhyay, "Long-range Wireless Technologies for IoT Applications: A Review," in Proc. of the Eleventh International Conference on Sensing Technology, pp. 1-6, 2017.
- U. Raza, P. Kulkarni and M. Sooriyabandara, "Low Power Wide Area Networks: An Overview," IEEE Communications Surveys & Tutorials, Vol.19, No.2, pp.855-873, 2017. https://doi.org/10.1109/COMST.2017.2652320
- X. Lin, J. Bergman, F. Gunnarsson, O. Liberg, S.M. Razavi, H.S. Razaghi, H. Rydn and Y. Sui, "Positioning for the Internet of Things: A 3GPP Perspective," IEEE Communications Magazine, Vol.55, No.12, pp.179-185, 2017. https://doi.org/10.1109/mcom.2017.1700269
- S. Hu, A. Berg, X. Li and F. Rusek, "Improving the Performance of OTDOA Based Positioning in NB-IoT Systems," in Proc. of the IEEE Global Communications Conference, pp.1-7, 2017.
- K. Radnosrati, G. Hendeby, C. Fritsche, F. Gunnarsson and F. Gustafsson, "Performance of OTDOA Positioning in Narrowband IoT Systems," in Proc. of IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, pp.1-7, 2017.
- H. Sallouha, A. Chiumento and S. Pollin, "Localization in Long-range Ultra Narrow Band IoT Networks using RSSI," in Proc. of the IEEE International Conference on Communications, pp.1- 6, 2017.
- T. Janssen, M. Aernouts, R. Berkvens and M. Weyn, "Outdoor Fingerprinting Localization Using Sigfox," in Proc. of the International Conference on Indoor Positioning and Indoor Navigation, pp.1-6, 2018.
- M. Aernouts, R. Berkvens, K. Van Vlaenderen and M. Weyn, "Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas," Data, Vol.3, No.2, pp.1-13, 2018.
- G.G.L. Ribeiro, L.F.d. Lima, L. Oliveira, J.J.P.C. Rodrigues, C.N.M. Marins and G.A.B. Marcondes, "An Outdoor Localization System Based on SigFox," in Proc. of the Vehicular Technology Conference, pp.1-5, 2018.
- LoRa-Alliance, "A Technical Overview of LoRa and LoRaWAN," LoRa-Alliance, San Ramon, 2015.
- B.C. Fargas and M.N. Petersen, "GPS-free Geolocation using LoRa in Low-Power WANs," in Proc. of the Global Internet of Things Summit, pp.1-6, 2017.
- Z. Wang, M. Huang, H. Du and H. Qin, "A Clustering Algorithm based on FDP and DBSCAN," in Proc. of the 14th International Conference on Computational Intelligence and Security, pp.145-149, 2018.
- A. S. More and D. P. Rana, "Review of Random Forest Classification Techniques to Resolve Data Imbalance," in Proc. of the 1st International Conference on Intelligent Systems and Information Management, pp.72-78, 2017.