• Title/Summary/Keyword: k nearest neighbor method

Search Result 316, Processing Time 0.025 seconds

Design of an Efficient Parallel High-Dimensional Index Structure (효율적인 병렬 고차원 색인구조 설계)

  • Park, Chun-Seo;Song, Seok-Il;Sin, Jae-Ryong;Yu, Jae-Su
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.58-71
    • /
    • 2002
  • Generally, multi-dimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amount of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel high-dimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-n$\times$mD(disk) architecture which is the hybrid type of nP-nD and lP-nD. Its node structure increases fan-out and reduces the height of a index tree. Also, A range search algorithm that maximizes I/O parallelism is devised, and it is applied to K-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

K Nearest Neighbor Joins for Big Data Processing based on Spark (Spark 기반 빅데이터 처리를 위한 K-최근접 이웃 연결)

  • JIAQI, JI;Chung, Yeongjee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1731-1737
    • /
    • 2017
  • K Nearest Neighbor Join (KNN Join) is a simple yet effective method in machine learning. It is widely used in small dataset of the past time. As the number of data increases, it is infeasible to run this model on an actual application by a single machine due to memory and time restrictions. Nowadays a popular batch process model called MapReduce which can run on a cluster with a large number of computers is widely used for large-scale data processing. Hadoop is a framework to implement MapReduce, but its performance can be further improved by a new framework named Spark. In the present study, we will provide a KNN Join implement based on Spark. With the advantage of its in-memory calculation capability, it will be faster and more effective than Hadoop. In our experiments, we study the influence of different factors on running time and demonstrate robustness and efficiency of our approach.

A Nearest Neighbor Query Processing Algorithm Supporting K-anonymity Based on Weighted Adjacency Graph in LBS (위치 기반 서비스에서 K-anonymity를 보장하는 가중치 근접성 그래프 기반 최근접 질의처리 알고리즘)

  • Jang, Mi-Young;Chang, Jae-Woo
    • Spatial Information Research
    • /
    • v.20 no.4
    • /
    • pp.83-92
    • /
    • 2012
  • Location-based services (LBS) are increasingly popular due to the improvement of geo-positioning capabilities and wireless communication technology. However, in order to enjoy LBS services, a user requesting a query must send his/her exact location to the LBS provider. Therefore, it is a key challenge to preserve user's privacy while providing LBS. To solve this problem, the existing method employs a 2PASS cloaking framework that not only hides the actual user location but also reduces bandwidth consumption. However, 2PASS does not fully guarantee the actual user privacy because it does not take the real user distribution into account. Hence, in this paper, we propose a nearest neighbor query processing algorithm that supports K-anonymity property based on the weighted adjacency graph(WAG). Our algorithm not only preserves the location of a user by guaranteeing k-anonymity in a query region, but also improves a bandwidth usage by reducing unnecessary search for a query result. We demonstrate from experimental results that our algorithm outperforms the existing one in terms of query processing time and bandwidth usage.

KNN/PFCM Hybrid Algorithm for Indoor Location Determination in WLAN (WLAN 실내 측위 결정을 위한 KNN/PFCM Hybrid 알고리즘)

  • Lee, Jang-Jae;Jung, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.146-153
    • /
    • 2010
  • For the indoor location, wireless fingerprinting is most favorable because fingerprinting is most accurate among the technique for wireless network based indoor location which does not require any special equipments dedicated for positioning. As fingerprinting method,k-nearest neighbor(KNN) has been widely applied for indoor location in wireless location area networks(WLAN), but its performance is sensitive to number of neighborsk and positions of reference points(RPs). So possibilistic fuzzy c-means(PFCM) clustering algorithm is applied to improve KNN, which is the KNN/PFCM hybrid algorithm presented in this paper. In the proposed algorithm, through KNN,k RPs are firstly chosen as the data samples of PFCM based on signal to noise ratio(SNR). Then, thek RPs are classified into different clusters through PFCM based on SNR. Experimental results indicate that the proposed KNN/PFCM hybrid algorithm generally outperforms KNN and KNN/FCM algorithm when the locations error is less than 2m.

Shape Feature Extraction technique for Content-Based Image Retrieval in Multimedia Databases

  • Kim, Byung-Gon;Han, Joung-Woon;Lee, Jaeho;Haechull Lim
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.869-872
    • /
    • 2000
  • Although many content-based image retrieval systems using shape feature have tried to cover rotation-, position- and scale-invariance between images, there have been problems to cover three kinds of variance at the same time. In this paper, we introduce new approach to extract shape feature from image using MBR(Minimum Bounding Rectangle). The proposed method scans image for extracting MBR information and, based on MBR information, compute contour information that consists of 16 points. The extracted information is converted to specific values by normalization and rotation. The proposed method can cover three kinds of invariance at the same time. We implemented our method and carried out experiments. We constructed R*_tree indexing structure, perform k-nearest neighbor search from query image, and demonstrate the capability and usefulness of our method.

  • PDF

Personalized Movie Recommendation System Combining Data Mining with the k-Clique Method

  • Vilakone, Phonexay;Xinchang, Khamphaphone;Park, Doo-Soon
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1141-1155
    • /
    • 2019
  • Today, most approaches used in the recommendation system provide correct data prediction similar to the data that users need. The method that researchers are paying attention and apply as a model in the recommendation system is the communities' detection in the big social network. The outputted result of this approach is effective in improving the exactness. Therefore, in this paper, the personalized movie recommendation system that combines data mining for the k-clique method is proposed as the best exactness data to the users. The proposed approach was compared with the existing approaches like k-clique, collaborative filtering, and collaborative filtering using k-nearest neighbor. The outputted result guarantees that the proposed method gives significant exactness data compared to the existing approach. In the experiment, the MovieLens data were used as practice and test data.

An Online Forklift Dispatching Algorithm Based on Minimal Cost Assignment Approach (최소 비용할당 기반 온라인 지게차 운영 알고리즘)

  • kwon, BoBae;Son, Jung-Ryoul;Ha, Byung-Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.2
    • /
    • pp.71-81
    • /
    • 2018
  • Forklifts in a shipyard lift and transport heavy objects. Tasks occur dynamically and the rate of the task occurrence changes over time. Especially, the rate of the task occurrence is high immediately after morning and afternoon business hours. The weight of objects varies according to task characteristic, and a forklift also has the workable or allowable weight limit. In this study, we propose an online forklift dispatching algorithm based on nearest-neighbor dispatching rule using minimal cost assignment approach in order to attain the efficient operations. The proposed algorithm considers various types of forklift and multiple jobs at the same time to determine the dispatch plan. We generate dummy forklifts and dummy tasks to handle unbalance in the numbers of forklifts and tasks by taking their capacity limits and weights. In addition, a method of systematic forklift selection is also devised considering the condition of the forklift. The performance indicator is the total travel distance and the average task waiting time. We validate our approach against the priority rule-based method of the previous study by discrete-event simulation.

An Algorithm of Curved Hull Plates Classification for the Curved Hull Plates Forming Process (곡가공 프로세스를 고려한 곡판 분류 알고리즘)

  • Noh, Ja-Ckyou;Shin, Jong-Gye
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.46 no.6
    • /
    • pp.675-687
    • /
    • 2009
  • In general, the forming process of the curved hull plates consists of sub tasks, such as roll bending, line heating, and triangle heating. In order to complement the automated curved hull forming system, it is necessary to develop an algorithm to classify the curved hull plates of a ship into standard shapes with respect to the techniques of forming task, such as the roll bending, the line heating, and the triangle heating. In this paper, the curved hull plates are classified by four standard shapes and the combination of them, or saddle, convex, flat, cylindrical shape, and the combination of them, that are related to the forming tasks necessary to form the shapes. In preprocessing, the Gaussian curvature and the mean curvature at the mid-point of a mesh of modeling surface by Coon's patch are calculated. Then the nearest neighbor method to classify the input plate type is applied. Tests to verify the developed algorithm with sample plates of a real ship data have been performed.

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

FREQUENCY HISTOGRAM MODEL FOR LINE TRANSECT DATA WITH AND WITHOUT THE SHOULDER CONDITION

  • EIDOUS OMAR
    • Journal of the Korean Statistical Society
    • /
    • v.34 no.1
    • /
    • pp.49-60
    • /
    • 2005
  • In this paper we introduce a nonparametric method for estimating the probability density function of detection distances in line transect sampling. The estimator is obtained using a frequency histogram density estimation method. The asymptotic properties of the proposed estimator are derived and compared with those of the kernel estimator under the assumption that the data collected satisfy the shoulder condition. We found that the asymptotic mean square error (AMSE) of the two estimators have about the same convergence rate. The formula for the optimal histogram bin width is derived which minimizes AMSE. Moreover, the performances of the corresponding k-nearest-neighbor estimators are studied through simulation techniques. In the absence of our knowledge whether the shoulder condition is valid or not a new semi-parametric model is suggested to fit the line transect data. The performances of the proposed two estimators are studied and compared with some existing nonparametric and semiparametric estimators using simulation techniques. The results demonstrate the superiority of the new estimators in most cases considered.