Search | Korea Science

Efficient k-Nearest Neighbor Join Query Processing Algorithm using MapReduce (맵리듀스를 이용한 효율적인 k-NN 조인 질의처리 알고리즘)

Yun, Deulnyeok;Jang, Miyoung;Chang, Jaewoo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.767-770
- /
- 2014
대용량 데이터를 분석하기 위한 맵리듀스 기반 k-NN 조인 질의처리 알고리즘은 최근 데이터 마이닝 및 분석을 기반으로 하는 응용 분야에서 매우 중요하게 활용되고 있다. 그러나, 대표적인 연구인 보로노이 기반 k-NN 조인 질의처리 알고리즘은 보로노이 인덱스 구축 비용이 매우 크기 때문에 대용량 데이터에 적합하지 못하다. 아울러 보로노이 셀 정보를 저장하기 위해 사용하는 R-트리는 맵리듀스 환경의 분산 병렬 처리에 적합하지 않다. 따라서 본 논문에서는 새로운 그리드 인덱스 기반의 k-NN 조인 질의 처리 알고리즘을 제안한다. 첫째, 높은 인덱스 구축 비용 문제를 해결하기 위해, 데이터 분포를 고려한 동적 그리드 인덱스 생성 기법을 제안한다. 둘째, 맵리듀스 환경에서 효율적으로 k-NN 조인 질의를 수행하기 위해, 인접셀 정보를 시그니처로 활용하는 후보영역 탐색 및 필터링 알고리즘을 제안한다. 마지막으로 성능 평가를 통해 제안하는 기법이 질의 처리 시간 측면에서 기존 기법에 비해 최대 3배 높은 질의 처리 성능을 나타냄을 보인다.
https://doi.org/10.3745/PKIPS.y2014m11a.767 인용 PDF

Travel Time Prediction Algorithm for Trajectory data by using Rule-Based Classification on MapReduce (맵리듀스 환경에서 규칙 기반 분류화를 이용한 궤적 데이터 주행 시간 예측 알고리즘)

Kim, JaeWon;Lee, HyunJo;Chang, JaeWoo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.798-801
- /
- 2014
여행 정보 시스템(ATIS), 교통 관리 시스템 (ITS) 등 궤적 기반 서비스에서, 서비스 품질을 향상시키기 위해서는 주어진 궤적 질의에 대한 정확한 주행시간을 예측하는 것이 필수적이다. 이를 위한 대표적인 공간 데이터 분석 기법으로는 데이터 분류에서 높은 정확도를 보장하는 규칙 기반 분류화 기법이 존재한다. 그러나 기존 규칙 기반 분류화 기법은 단일 컴퓨터 환경만을 고려하기 때문에, 대용량 공간 데이터 처리에 적합하지 않은 문제점이 존재한다. 이를 해결하기 위해, 본 연구에서는 맵리듀스 환경에서 규칙 기반 분류화를 이용한 궤적 데이터 주행 시간 예측 알고리즘을 개발하고자 한다. 제안하는 알고리즘은 첫째, 맵리듀스를 이용하여 대용량 공간 데이터를 병렬적으로 분석함으로써, 활용도 높은 궤적 데이터 규칙을 생성한다. 이를 통해 대용량 공간 데이터 기반의 규칙 생성 시간을 감소시킨다. 둘째, 그리드 구조 기반의 지도 데이터 분할을 통해, 사용자 질의처리 시 탐색 성능을 향상시킨다. 즉, 주행 시간 예측을 위한 규칙 그룹을 탐색 시 질의를 포함하는 그리드 셀만을 탐색하기 때문에, 질의처리 성능이 향상된다. 마지막으로 맵리듀스 구조에 적합한 질의처리 알고리즘을 설계하여, 효율적인 병렬 질의처리를 지원한다. 이를 위해 맵 함수에서는 선정된 그리드 셀에 대해, 질의에 포함된 도로 구간에서의 주행 시간을 병렬적으로 측정한다. 아울러 리듀스 함수에서는 출발 시간 및 구간별 주행 시간을 바탕으로 맵 함수의 결과를 병합함으로써, 최종 결과를 생성한다. 이를 통해 공간 빅데이터 분석을 통한 주행 시간 예측 기법의 처리 시간 및 결과 정확도를 향상시킨다.
https://doi.org/10.3745/PKIPS.y2014m11a.798 인용 PDF

A Study on The Optimum Earthwork Volume using GIS (GIS기법을 이용한 토공산정의 최적화)

Kim, Sung-Hun;Sim, Hee-Chul;Do, Kwang-Min;Lee, Jong-Dal
- 한국방재학회:학술대회논문집
- /
- 2007.02a
- /
- pp.344-348
- /
- 2007
This study was made a process of earth work amount and earth work transfer etc. This research uses DAS S/W and GIS S/W, comparison and analyze. DAS S/W is a program develop in Korea land corporation. This purpose of this study is presenting a method that at it, can calculate detailed earth work. Also, apply GIS method to DAS S/W using earth work calculation data. when GIS analysis method applied. it can improve the accuracy of earth work calculate method and earth work model's efficiency.
PDF

An Efficient Grid Cell Based Spatial Clustering Algorithm for Spatial Data Mining (공간데이타 마이닝을 위한 효율적인 그리드 셀 기반 공간 클러스터링 알고리즘)

Moon, Sang-Ho;Lee, Dong-Gyu;Seo, Young-Duck
- The KIPS Transactions:PartD
- /
- v.10D no.4
- /
- pp.567-576
- /
- 2003
Spatial data mining, i.e., discovery of interesting characteristics and patterns that may implicitly exists in spatial databases, is a challenging task due to the huge amounts of spatial data. Clustering algorithms are attractive for the task of class identification in spatial databases. Several methods for spatial clustering have been presented in recent years, but have the following several drawbacks increase costs due to computing distance among objects and process only memory-resident data. In this paper, we propose an efficient grid cell based spatial clustering method for spatial data mining. It focuses on resolving disadvantages of existing clustering algorithms. In details, it aims to reduce cost further for good efficiency on large databases. To do this, we devise a spatial clustering algorithm based on grid ceil structures including cell relationships.
https://doi.org/10.3745/KIPSTD.2003.10D.4.567 인용 PDF KSCI

Efficient Top-k Query Processing Algorithm Using Grid Index-based View Selection Method (그리드 인덱스 기반 뷰 선택 기법을 이용한 효율적인 Top-k 질의처리 알고리즘)

Hong, Seungtae;Youn, Deulnyeok;Chang, Jae Woo
- KIISE Transactions on Computing Practices
- /
- v.21 no.1
- /
- pp.76-81
- /
- 2015
Research on top-k query processing algorithms for analyzing big data have been spotlighted recently. However, because existing top-k query processing algorithms do not provide an efficient index structure, they incur high query processing costs and cannot support various types of queries. To solve these problems, we propose a top-k query processing algorithm using a view selection method based on a grid index. The proposed algorithm reduces the query processing time by retrieving the minimum number of grid cells for the query range, by using a grid index-based view selection method. Finally, we show from our performance analysis that the proposed scheme outperforms an existing scheme, in terms of both query processing time and query result accuracy.
https://doi.org/10.5626/KTCP.2015.21.1.76 인용 KSCI

A Efficient Cloaking Region Creation Scheme using Hilbert Curves in Distributed Grid Environment (분산 그리드 환경에서 힐버트 커브를 이용한 효율적인 Cloaking 영역 설정 기법)

Lee, Ah-Reum;Um, Jung-Ho;Chang, Jae-Woo
- Journal of Korea Spatial Information System Society
- /
- v.11 no.1
- /
- pp.115-126
- /
- 2009
Recent development in wireless communication and mobile positioning technologies makes Location-Based Services (LBSs) popular. However, because, in the LBSs, users request a query to database servers by using their exact locations, the location information of the users can be misused by adversaries. Therefore, a mechanism for users' privacy protection is required for the safe use of LBSs by mobile users. For this, we, in this paper, propose a efficient cloaking region creation scheme using Hilbert curves in distributed grid environment, so as to protect users' privacy in LBSs. The proposed scheme generates a minimum cloaking region by analyzing the characteristic of a Hilbert curve and computing the Hilbert curve values of neighboring cells based on it, so that we may create a cloaking region to satisfy K-anonymity. In addition, to reduce network communication cost, we make use of a distributed hash table structure, called Chord. Finally, we show from our performance analysis that the proposed scheme outperforms the existing grid-based cloaking method.
PDF

Countinuous k-Nearest Neighbor Query Processing Algorithm for Distributed Grid Scheme (분산 그리드 기법을 위한 연속 k-최근접 질의처리 알고리즘)

Kim, Young-Chang;Chang, Jae-Woo
- Journal of Korea Spatial Information System Society
- /
- v.11 no.3
- /
- pp.9-18
- /
- 2009
Recently, due to the advanced technologies of mobile devices and wireless communication, there are many studies on telematics and LBS(location-based service) applications. because moving objects usually move on spatial networks, their locations are updated frequently, leading to the degradation of retrieval performance. To manage the frequent updates of moving objects' locations in an efficient way, a new distributed grid scheme, called DS-GRID (distributed S-GRID), and k-NN(k-nearest neighbor) query processing algorithm was proposed[1]. However, the result of k-NN query processing technique may be invalidated as the location of query and moving objects are changed. Therefore, it is necessary to study on continuous k-NN query processing algorithm. In this paper, we propose both MCE-CKNN and MBP(Monitoring in Border Point)-CKNN algorithmss are S-GRID. The MCE-CKNN algorithm splits a query route into sub-routes based on cell and seproves retrieval performance by processing query in parallel way by. In addition, the MBP-CKNN algorithm stores POIs from the border points of each grid cells and seproves retrieval performance by decreasing the number of accesses to the adjacent cells. Finally, it is shown from the performance analysis that our CKNN algorithms achieves 15-53% better retrieval performance than the Kolahdouzan's algorithm.
PDF

Performance Improvement of Declustering Algorithm by Efficient Grid-Partitioning Multi-Dimensional Space (다차원 공간의 효율적인 그리드 분할을 통한 디클러스터링 알고리즘 성능향상 기법)

Kim, Hak-Cheol
- Journal of Korea Spatial Information System Society
- /
- v.12 no.1
- /
- pp.37-48
- /
- 2010
In this paper, we analyze the shortcomings of the previous declustering methods, which are based on grid-like partitioning and a mapping function from a cell to a disk number, for high-dimensional space and propose a solution. The problems arise from the fact that the number of splitting is small(for the most part, binary-partitioning is sufficient), and the side length of a range query whose selectivity is small is quite large. To solve this problem, we propose a mathematical model to estimate the performance of a grid-like partitioning method. With the proposed estimation model, we can choose a good grid-like partitioning method among the possible schemes and this results in overall improvement in declustering performance. Several experimental results show that we can improve the performance of a previous declustering method up to 2.7 times.
PDF KSCI

Partial Dimensional Clustering based on Projection Filtering in High Dimensional Data Space (대용량의 고차원 데이터 공간에서 프로젝션 필터링 기반의 부분차원 클러스터링 기법)

이혜명;정종진
- The Journal of Society for e-Business Studies
- /
- v.8 no.4
- /
- pp.69-88
- /
- 2003
In high dimensional data, most of clustering algorithms tend to degrade the performance rapidly because of nature of sparsity and amount of noise. Recently, partial dimensional clustering algorithms have been studied, which have good performance in clustering. These algorithms select the dimensional data closely related to clustering but discard the dimensional data which are not directly related to clustering in entire dimensional data. However, the traditional algorithms have some problems. At first, the algorithms employ grid based techniques but the large amount of grids make worse the performance of algorithm in terms of computational time and memory space. Secondly, the algorithms explore dimensions related to clustering using k-medoid but it is very difficult to determine the best quality of k-medoids in large amount of high dimensional data. In this paper, we propose an efficient partial dimensional clustering algorithm which is called CLIP. CLIP explores dense regions for cluster on a certain dimension. Then, the algorithm probes dense regions on a next dimension. dependent on the dense regions of the explored dimension using incremental projection. CLIP repeats these probing work in all dimensions. Clustering by Incremental projection can prune the search space largely and reduce the computational time considerably. We evaluate the performance(efficiency, effectiveness and accuracy, etc.) of the proposed algorithm compared with other algorithms using common synthetic data.
PDF

GIS-Based Methods to Assess the Population Distribution Criteria for Undesirable Facilities: The Case of Nuclear Power Plants (비선호 시설의 인구분포 관련 입지기준 평가를 위한 GIS-기반 방법론 연구 -원자력 발전소의 경우-)

Lee, Sang-Il;Cho, Daeheon
- Journal of the Korean Geographical Society
- /
- v.47 no.5
- /
- pp.755-774
- /
- 2012
The main objective of the study is to propose GIS-based methods to assess the population distribution criteria for undesirable facilities such as nuclear power plants. First of all, a review of the relevant criteria was conducted for the official documents compiled by such institutions as IAEA (International Atomic Energy Agency), U.S. NRC (Nuclear Regulatory Commission), and some national institutes including the Korea Institute of Nuclear Safety. It is informed from the review that the fundamental principle underlying the various criteria is to maximize the distance between a plant and the nearest population center. It is realized that two interrelated GIS-based techniques need to be devised to put the principle into practice; sophisticated ways of representing population distribution and identifying population centers. A dasymetric areal interpolation is proposed for the former and cell-based and area-based critical density methods are introduced. Grid-based population distributions at various spatial resolutions are created by means of the dasymetric areal interpolation. By applying the critical density methods to the gridded population distribution, some population centers satisfying the population size and density criteria can be identified. These methods were applied to the case of the Gori-1 nuclear power plant and their strengths and limitations were discussed. It was revealed that the assessment results could vary depending upon which method was employed and what values were chosen for various parameters. This study is expected to contribute to foster the applications of methods and techniques developed in geospatial analysis and modeling to the site selection and evaluation.
PDF

Search Result 12, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)