DOI QR코드

DOI QR Code

Digital Forensic for Location Information using Hierarchical Clustering and k-means Algorithm

  • Lee, Chanjin (Dept. Of Computer Engineering Pukyong National University) ;
  • Chung, Mokdong (Dept. Of Computer Engineering Pukyong National University)
  • Received : 2015.11.09
  • Accepted : 2015.11.27
  • Published : 2016.01.31

Abstract

Recently, the competition among global IT companies for the market occupancy of the IoT(Internet of Things) is fierce. Internet of Things are all the things and people around the world connected to the Internet, and it is becoming more and more intelligent. In addition, for the purpose of providing users with a customized services to variety of context-awareness, IoT platform and related research have been active area. In this paper, we analyze third party instant messengers of Windows 8 Style UI and propose a digital forensic methodology. And, we are well aware of the Android-based map and navigation applications. What we want to show is GPS information analysis by using the R. In addition, we propose a structured data analysis applying the hierarchical clustering model using GPS data in the digital forensics modules. The proposed model is expected to help support the IOT services and efficient criminal investigation process.

Keywords

1. INTRODUCTION

As the digital data which is created and recorded in digital media increases at an exponential rate, it is often treated in the court. Digital forensics is defined as a concept of technology and procedure of dealing with digital data in order to submit to the court by collecting, examining, analyzing and preserving it [1,2]. The Windows 8 supports PC and mobile devices which are based on SOC (System on chip). Since it is possible to interwork Smartphone and tablet devices, we can use applications without a separate player in PC. As Smartphone are equipped with various sensors and communication capabilities, the data for operating PC applications remains by synchronizing with Smartphone. Since these traces leave the information such as an application type, run time, timestamp, and so on, they can be clues or evidences to find criminal behavior patterns in the perspective of digital forensics [3].

In this paper, we’ll check for file information techniques that can be collected based on Windows 8 Style UI and propose for the analysis and investigation procedures. Section 2 reviews related work. In section 3, we analyze Style UI apps and Android apps in terms of digital forensics. Section 4 IOT with structured data forensic service. Section 5 implementation and evaluation. Finally, section 6 draws conclusions and discusses the directions of our future research.

 

2. RELATED WORK

2.1 Research on Windows 8 Style UI Forensics

Style UI, called Metro UI in the past, is a user interface that is appropriate for mobile devices. The artifacts are created automatically by using operating systems and applications. Analysis of artifacts and traces can be classified as digital forensic analysis. When analyzing the Style UI applications, it is possible to know the user’s habits, preferences and so on. Thus, we can find the user’s various private information from e-mail and SNS (Social Network Service) applications which might be investigation clues. Table 1 shows the Style UI artifacts of basic applications in the existing researches [4,5]. We must have administrator permissions in order to access the contents of the listed paths in Table 1.

Table 1.Window8 Style UI artifacts

2.2 Research on Smartphone Instant Messenger Forensic

As Smartphones are wide spread, the number of users who use the messenger applications instead of SMS services increases rapidly. Recent instant messenger applications not only provide the feature of sending instant messages but also offer multimedia transaction including file sharing. Therefore, it is possible to obtain useful information on a criminal investigation by analyzing the messenger.

There are detailed researches on Smartphone instant messengers. Information was extracted on both Android and iOS, out of twelve messenger applications, such as ‘Facebook’ Messenger, ‘We-Chat’ and ‘KakaoTalk’ [6]. The ‘People’, the basic application of Windows 8 was analyzed and ‘Viber’ and ‘WhatsApp’ were analyzed on Android in more detail[7].

However, in the existing researches, many types of messenger applications were analyzed, but there is a lack of study on Style UI messenger applications. Among the traces left by the user, there are the friend information, message information, transmitted and received information, location information, and so on. These are possible to obtain important information needed for the investigation from these artifacts. We need to focus on analyzing the location information of these artifacts.

2.3 Research on Smpartphone GPS Information Application Forensic

D. Kim et al[6] research on the Google map, Daum map, and Naver map while collecting the time and location information for iPhone & Android based application analyze is. MapAn’ tools were proposed on this environment[8]. Maus Stenfan et al. analyzed coordinate information of the latitude and longitude where city and address information are shown in the form of text for storing location-information related research[9]. And there is a research on method of gathering GPS information stored in the picture file[10]. As such, the digital forensic research is actively related to the location information and the analysis method. We try to deal with the wide range of applications than existing researches and we want to show the visualization information about the location using location information.

In this paper, we analyze GPS information from Style UI App and Android App, and suggest ways of utilizing the data collected by 'R'.

2.4 Hierarchical Clustering

Hierarchical clustering, shown in Table 2, uses statistical analysis to data mining techniques. It performs hierarchical clustering to fulfill these clusters. Hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types[11].

Table 2.Hierarchical Clustering Algorithm

Agglomerative : This is a "bottom up" approach: each observation starts from its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Divisive : This is a "top down" approach: all observations start from one cluster, and splits are performed recursively as one moves down the hierarchy.

Each of the data is a combined method of the one cluster. There are a various ways to find the nearest cluster depending on how to define the distance between the clusters. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.

2.5 k-means Algorithm

k-means clustering is a method of vector quantization, originated from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Given a set of observations (x1,x2,...,xn ), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k(≤ n) sets S = {S1,S2,...,Sk } so as to minimize the within-cluster sum of squares. In other words, its(1) objective is to find:

where μi is the mean of points in Si [12].

2.6 Geographic Profiling using the R

Offender profiling, also known as criminal profiling, is a behavioral and investigative tool that is intended to help investigators to accurately predict and profile the characteristics of unknown criminal subjects or offenders[13]. The category is as follow : criminal profiling, criminal personality profiling, criminological profiling, behavioral profiling or criminal investigative analysis. Geographic profiling is another method to profile an offender.

Representative instance of geographic profiling system is as follow. Canadian-CGT and US-CrimeStat and DRAGNET of England are representative instances of geographic profiling systems. Likewise, our government has also developed the ‘GeoPros’ in 2009. Considering the domestic regional characteristics and conditions. Recently, our government finished the upgrade work and aggressively utilized it in crime prevention and arrest activities.

 

3. WINDOWS8 STYLE UI APPS AND ANDROID APPS TO ANALYZE

We analyze ‘Viber’ and ‘Facebook’, which are instant messenger applications for Style UI. ‘Viber’ and ‘Facebook’ are widespread applications in the world and they provide many functions such as chat, free calls and sending images, where we can find a lot of information for criminal investigation.

Especially, ‘Viber’ and ‘Facebook’ are synchronized to the account applications at first launch. Therefore, we can identify some artifacts like contacts, phone numbers, and names. They have many functions to contact other people, where we can find messages, temporal information, GPS, and call logs. We present the analysis methods and procedures of the two applications more specifically in the following subsections.

3.1 Windows 8 Style App-based Instant Messenger Applications

If the application is ‘Viber’ or ‘Facebook’, we can access information of ‘Viber’ or ‘Facebook’ application through the DBbrowser(SQLite). This paper focuses on the GPS data which was shown in Table 1 and Table 3 to Table 4. The access path is as follows.

Table 3.k-means Clustering Algorithm

Table 4.Viber.db – Events Information

· Viber App’s Information.

DB-File Path. \%SystemDrive%\Users\\AppData\Local\Packages\2414FC7A.Viber-FreePhoneCallsText_p61zvh252yqyr\LocalState\viber.db

GPS- Mark Type : WGS-84

We can see the type of message and the events that occurred in conversation with each other from the ‘Events’ field in Table 1. In addition, it is possible to obtain a GPS coordinate information.

Fig. 1.GPS Information Table.

Distance errors between the actual testers had a relatively low error range less within 200m about GPS data.

Fig. 2.Check a GPS point.

· Facebook App’s Information.

DB-File Path. \%SystemDrive%\Users\\AppData\Local\Packages\Facebook.Facebook_8xx8rvfyw5nnt\LocalState\100000343378744\DB\Stories

GPS- Mark Type : WGS-84

Fig. 3.Stories – feed_section Information.

We were able to see the profile registered by the user. It’s a school history, company history, and so on. This path is the 'Stories' having a lot of information. Since most of the information is encrypted, we can see a few information in plain.

It was found the simple position information from the Fig. 4. We can check the residence of being in a particular city. And a simple GPS information could also be found.

Fig. 4.Stories–places_seciton Information.

3.2 Android-based Navigation and Map Applications

First, the experimental environment was tested in the Andriod 4.4.2ver Nexus4 Smartphone. The reason for this is to a stable looting and file system access. Check the Internal storage "\data\data" and the mounted external storage for in order to investigate the application. In this paper, target of apps analyzed are following. List are Atlan3D, KT navigation apps and Naver, Google map apps. Fig. 5 indicates the location information of each of the analyzed application.

· Atlan3D Navi App’s Information.

DB-File Path. \sdcard\atlan3D\UserData\AtlanSmartRecentDest.archive

GPS- Mark Type : WGS-84

Fig. 5.Atlan3D Navi GPS Information.

· KT Navi App’s Information.

DB-File Path. \data\data\kt.navi\databases\Db_Recentdestination

GPS- Mark Type : KTM

Pos(x,y) - Start Point

Ent(x,y) - End Point

Fig. 6.KT Navi GPS Information.

DB-File Path. \data\data\kt.navi\databases\Db_DestinationSendList

GPS- Mark Type : KTM

Pos(x,y) – Share user Point

Fig. 7.KT Navi Share user GPS Information.

· Naver map App’s Information.

DB-File Path. \data\data\com.nhn.android.nmap\databases\mapHistory

GPS- Mark Type : WGS-84

Fig. 8.Naver map GPS Information.

· Google map App’s Information.

DB-File Path. \data\data\ com.google.android.apps.maps\databases\gmm_myplaces

GPS- Mark Type : WGS-84

Fig. 9.Google map GPS Information.

Fig. 5-9 are the summary of the location information from the data. Location information collected has a coordinate system and time notation. Expressed in a variety of ways, and can be classified according to the properties of the location information. Table IV summarizes the location information notation which can be found on Android.

Table 5.Summary of Android Location Information

 

4. IOT WITH STRUCTURED DATA FORENSIC SERVICE

4.1 Co-Biz IoT Framework

Our laboratory researches on an IoT based Framework, called IoT Co-biz Framework, as shown in Fig. 10. The Framework to support context aware IoT services considering various situations. The data support the IoT Forensic services with focusing artifact of PC and Smartphone. Therefore, we propose an algorithm of location information model using the structured data extracted in point of view the digital forensics.

Fig. 10.Co-Biz IoT Framework.

4.2 Algorithm of the Proposed Model

The algorithm of the proposed model consists of a collection of common procedure performed in the digital forensic; that is, collection, research, analysis, and construction of the order report[14]. Table VI shows the overall algorithm of the proposed model[3].

Table 6.Proposed Algorithm for Location Data

 

5. IMPLEMENTATION AND EVALUATION

5.1 Implementation

As noted in related research of this paper, the basic implementation utilized big-data analytics tools ‘R-studio’ for geographic profiling. GPS data were experimental data of Windows 8 style UI App and Android App in the third section of this paper.

Fig. 11.Visualization techniques using R

Experimental environment is equipped with i5-2400 CPU and 16GB RAM of Windows 8 PC. R-Studio was used for statistical analysis. Map representation is expressed by utilizing the ‘Google-map’ package.

As shown in Fig. 12, we can check the location information of black point. GPS data is the result of processing R-studio. The index information from the data is placed on top of each point input. We’ll find central area of map. The data is distributed concentrated among the city of Busan. When these criminals(suspects) are location data, they can be inferred that they're the nearby residence. Therefore, we can guess the major action radius and its surrounding destinations. If the extracted area data shows a high density among specific location, the location will be the best rank among the investigated data.

Fig. 12.Example of including GPS data based on map.

Fig. 13 is the shortest connecting method of hierarchical clustering. It was used location information in Busan (Fig. 12). The data is mainly concentrated on the vicinity of Yeonsan-dong in Busan.

Fig. 13.An example of hierarchical clustering.

Fig. 14.Example of include GPS Big-Data based on map.

Based on previous experiments, we use public data portal, supported by the government 3.0 for big-data applications. Location big-data was used "National Women·Family-related facilities information".

As shown in Fig. 15, the most frequent places are Seoul, Gyeonggi-do and Incheon.

Fig. 15.An example of hierarchical clustering (Single).

Fig. 16 is the Average Linkage Method. ALM can be seen easily because it shows a ranking value in proportion to the average value.

Fig. 16.An example of hierarchical clustering (Complete).

Fig, 17 is the Ward Linkage Method. The figures were displayed in an easy to understand areas of high priority areas 1, 2 and 3.

Fig. 17.An example of hierarchical clustering.(Ward)

It is the only non-hierarchical clustering way of three ways.

Fig. 18.An example of k-means clustering.

Finally, this figure applied to the k-means algorithm. As we can see in the picture, it appears in the shape of the korean peninsula only using the location information. Through the map, the "National Women·Family-related facilities location information" and local groups can be seen at a glance.

Fig. 19.Total of k-means clustering on map.

Separated by administrative district on the 17th District of South Korea was set to k values. The picture is divided into 17 zones of 17 clustering.

5.2 Evaluation

In this paper, we expected effects of GPS visualization analysis model as follows. First, it is possible to reduce the scope of the investigation work through schematized tools. Second, the data can refer(guess) to major destination and residence. Third, if we periodically update the data, we expect our method to be used for pre-crime preventions by illustrating the range of crimes on the map.

 

6. CONCLUSION AND FUTURE WORK

In this paper, we extracted location information from the various applications in order to support crime investigations and application of the Messenger Style UI of Windows 8 analyzed. In addition, we analyzed location information from Android based navigation-applications and map-applications. Unlike the existing researches, the data collected might be used for location information visualization and analysis by using R-studio with hierarchical clustering and k-means algorithm.

In the future, we are going to add an improved visualization methods that are using the extracted data from Style UI & Android in order to support criminal investigation more efficiently and intuitional. Also, we are going to study the fuzzy k- means algorithm analysis technique.

References

  1. H. Lee, S. Lee, and J. Lim, "Digital Forensics Technologies," Review of Korea Institute of Information Security and Cryptology, pp. 8-16, 2002.
  2. N.A. Mutawa, I. Baggili, and A. Marrington, “Forensic Analysis of Social Networking Applications on Mobile Devices,” Journal of Digital Investigation, Vol. 9, Supplement, pp. S24-S33, 2012. https://doi.org/10.1016/j.diin.2012.05.007
  3. Y. Son and M. Chung, “Digital Forensics for Android Location Information using Hierarchical Clustering,” Journal of The Institute of Electronics and Information Engineers, Vol. 51, No. 6, pp. 143-151, 2014. https://doi.org/10.5573/ieie.2014.51.6.143
  4. Window8 Forensic(1) Metro UI and Artifacts, http://download.ahnlab.com/kr/site/magazineAhn/ahn_201207.pdf (accessed July, 2012).
  5. S. Jang and S. Lee, "Windows 8 Metro App from View Point of Digital Forensics," Proceeding of The Workshop of Digital Forensics, pp. 115-123, 2013.
  6. J. Choi, D. Kim, and S. Lee, "Smartphone Messenger Data Collection and Analysis," Proceeding of The Workshop of Digital Forensics, pp. 3-10, 2013.
  7. A. Mahajan, M.S. Dahiya, and H.P. Sanghvi, “Forensic Analysis of Instant Messenger Applications on Android Devices,” International Journal of Computer Applications, Vol. 68, No. 8, pp.38-44, 2013. https://doi.org/10.5120/11602-6965
  8. D. Kim, J. Bang, and S. Lee, "Analysis of Smartphone-Based Location Information," Computer Science and Convergence is proceedings of the 3rd FTRA International Conference on Computer Science and its Applications and The 2011 FTRA World Convergence Conference, Lecture Notes in Electrical Engineering, Vol. 114, pp. 43-53, 2012.
  9. S. Maus, H. Höfken, and M. Schuba, "Forensic Analysis of Geodata in Android Smartphones," Proceeding of International Conference on Cybercrime, pp.1-11, 2011.
  10. Y. Sun, "Geo-Location Forensic on Mobile Devices," The First International Conference on Digital Forensics and Investigation, pp. 1-8 2012.
  11. L. Rokach and O. Maimon, Clustering Methods, Data mining and knowledge discovery handbook, Springer US, New York, 2005.
  12. S.P. Lloyd, “Least Squares Quantization in PCM,” Information Theory, Institute of Electrical and Electronics Engineers Transactions on 28.2, Vol. 28, No. 2, pp. 129-137, 1982. https://doi.org/10.1109/TIT.1982.1056489
  13. Kocsis and Richard N, Applied criminal psychology: A guide to forensic behavioral sciences, Charles C Thomas Publisher, Springfield, Illinois, 2009.
  14. K. Kent, S. Chevalier, T. Grance and H. Dang, Guide to Integrating Forensic Techniques into Incident Response, National Institute of Standards and Technology Special Publication, Columbia, 2006.
  15. J. Zhou, L. Liang, and L. Chen, “Geographic Profiling Based on Multi-point Centrography with K-means Clustering,” World Academy of Science, Engineering and Technology, Vol. 6, No. 61, pp. 875-878, 2012.
  16. D. Quick, Forensic Analysis of Cloud Storage Client Data, Master of Science of University of South Australia, 2012.
  17. A. Iqba, A. Marrington, and I. Baggili, "Forensic Artifacts of the ChatON Instant Messaging Application," Systematic Approaches to Digital Forensic Engineering , 2013 Eighth International Workshop on institute of Electrical and Electronics Engineers, pp.1-6, 2013.