• Title/Summary/Keyword: Improved k-Nearest Neighbor

Search Result 47, Processing Time 0.029 seconds

DGR-Tree : An Efficient Index Structure for POI Search in Ubiquitous Location Based Services (DGR-Tree : u-LBS에서 POI의 검색을 위한 효율적인 인덱스 구조)

  • Lee, Deuk-Woo;Kang, Hong-Koo;Lee, Ki-Young;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.3
    • /
    • pp.55-62
    • /
    • 2009
  • Location based Services in the ubiquitous computing environment, namely u-LBS, use very large and skewed spatial objects that are closely related to locational information. It is especially essential to achieve fast search, which is looking for POI(Point of Interest) related to the location of users. This paper examines how to search large and skewed POI efficiently in the u-LBS environment. We propose the Dynamic-level Grid based R-Tree(DGR-Tree), which is an index for point data that can reduce the cost of stationary POI search. DGR-Tree uses both R-Tree as a primary index and Dynamic-level Grid as a secondary index. DGR-Tree is optimized to be suitable for point data and solves the overlapping problem among leaf nodes. Dynamic-level Grid of DGR-Tree is created dynamically according to the density of POI. Each cell in Dynamic-level Grid has a leaf node pointer for direct access with the leaf node of the primary index. Therefore, the index access performance is improved greatly by accessing the leaf node directly through Dynamic-level Grid. We also propose a K-Nearest Neighbor(KNN) algorithm for DGR-Tree, which utilizes Dynamic-level Grid for fast access to candidate cells. The KNN algorithm for DGR-Tree provides the mechanism, which can access directly to cells enclosing given query point and adjacent cells without tree traversal. The KNN algorithm minimizes sorting cost about candidate lists with minimum distance and provides NEB(Non Extensible Boundary), which need not consider the extension of candidate nodes for KNN search.

  • PDF

A Strategy for Neighborhood Selection in Collaborative Filtering-based Recommender Systems (협력 필터링 기반의 추천 시스템을 위한 이웃 선정 전략)

  • Lee, Soojung
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1380-1385
    • /
    • 2015
  • Collaborative filtering is one of the most successfully used methods for recommender systems and has been utilized in various areas such as books and music. The key point of this method is selecting the most proper recommenders, for which various similarity measures have been studied. To improve recommendation performance, this study analyzes problems of existing recommender selection methods based on similarity and presents a method of dynamically determining recommenders based on the rate of co-rated items as well as similarity. Examination of performance with varying thresholds through experiments revealed that the proposed method yielded greatly improved results in both prediction and recommendation qualities, and that in particular, this method showed performance improvements with only a few recommenders satisfying the given thresholds.

Hyperparameter Tuning Based Machine Learning classifier for Breast Cancer Prediction

  • Md. Mijanur Rahman;Asikur Rahman Raju;Sumiea Akter Pinky;Swarnali Akter
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.196-202
    • /
    • 2024
  • Currently, the second most devastating form of cancer in people, particularly in women, is Breast Cancer (BC). In the healthcare industry, Machine Learning (ML) is commonly employed in fatal disease prediction. Due to breast cancer's favorable prognosis at an early stage, a model is created to utilize the Dataset on Wisconsin Diagnostic Breast Cancer (WDBC). Conversely, this model's overarching axiom is to compare the effectiveness of five well-known ML classifiers, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (KNN), and Naive Bayes (NB) with the conventional method. To counterbalance the effect with conventional methods, the overarching tactic we utilized was hyperparameter tuning utilizing the grid search method, which improved accuracy, secondary precision, third recall, and finally the F1 score. In this study hyperparameter tuning model, the rate of accuracy increased from 94.15% to 98.83% whereas the accuracy of the conventional method increased from 93.56% to 97.08%. According to this investigation, KNN outperformed all other classifiers in terms of accuracy, achieving a score of 98.83%. In conclusion, our study shows that KNN works well with the hyper-tuning method. These analyses show that this study prediction approach is useful in prognosticating women with breast cancer with a viable performance and more accurate findings when compared to the conventional approach.

Fast Search with Data-Oriented Multi-Index Hashing for Multimedia Data

  • Ma, Yanping;Zou, Hailin;Xie, Hongtao;Su, Qingtang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2599-2613
    • /
    • 2015
  • Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it di-vides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method (DOMIH). We first compute the covariance ma-trix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned bina-ry codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of DOMIH can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%. To pinpoint the potential of DOMIH, we further use near-duplicate image retrieval as examples to show the applications and the good performance of our method.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

The Relationship between Smartphone Use and Oral Health in Adolescents

  • Ahn, Eunsuk;Han, Ji-Hyoung
    • Journal of dental hygiene science
    • /
    • v.20 no.1
    • /
    • pp.44-50
    • /
    • 2020
  • Background: Smartphones are a modern necessity. While they are convenient to use, smartphones also have side effects such as addiction. This study assessed the relationship between smartphone use, a part of everyday life in modern society, and oral health. Methods: An analysis was conducted using 2017 Korea Youth Risk Behavior Web-based Survey data. The propensity score estimation algorithm used logistic regression and 1:1 matching algorithm using nearest-neighbor matching. After matching, a total of 15,032 participants were classified into two groups containing 7,516 teenagers each who did and did not use smartphones, respectively. Results: Comparison of oral health behaviors according to smartphone use revealed a statistically significant difference in the frequency of tooth brushing per day, use of oral hygiene products, intake of foods harmful to oral health, and experience of oral health education (p<0.05). The factors affecting oral pain experience of adolescents were examined. Compared to male participants, female participants had an odds ratio of 1.627 for oral pain (p<0.05). According to the household income level, compared to the group with higher income, the group with lower income showed higher oral pain experience (p<0.05). Oral pain experience was 1.601 times more frequent among teenagers using smartphones (p<0.05). Conclusion: The results of this study indicated that use of smartphones by adolescents affected their oral health. These findings indicate the need for improved oral health management through the use of effective school oral health programs and individual counseling by oral health professionals, promotion of information dissemination through public media, and development of prevention strategies.

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending (P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.71-78
    • /
    • 2019
  • This study aims to identify good borrowers within the context of P2P lending. P2P lending is a growing platform that allows individuals to lend and borrow money from each other. Inherent in any loans is credit risk of borrowers and needs to be considered before any lending. Specifically in the context of P2P lending, traditional models fall short and thus this study aimed to rectify this as well as explore the problem of class imbalances seen within credit risk data sets. This study implemented an over-sampling technique known as Synthetic Minority Over-sampling Technique (SMOTE). To test our approach, we implemented five benchmarking classifiers such as support vector machines, logistic regression, k-nearest neighbor, random forest, and deep neural network. The data sample used was retrieved from the publicly available LendingClub dataset. The proposed SMOTE revealed significantly improved results in comparison with the benchmarking classifiers. These results should help actors engaged within P2P lending to make better informed decisions when selecting potential borrowers eliminating the higher risks present in P2P lending.

Evaluation of Classification Models of Mild Left Ventricular Diastolic Dysfunction by Tei Index (Tei Index를 이용한 경도의 좌심실 이완 기능 장애 분류 모델 평가)

  • Su-Min Kim;Soo-Young Ye
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.761-766
    • /
    • 2023
  • In this paper, TI was measured to classify the presence or absence of mild left ventricular diastolic dysfunction. Of the total 306 data, 206 were used as training data and 100 were used as test data, and the machine learning models used for classification used SVM and KNN. As a result, it was confirmed that SVM showed relatively higher accuracy than KNN and was more useful in diagnosing the presence of left ventricular diastolic dysfunction. In future research, it is expected that classification performance can be further improved by adding various indicators that evaluate not only TI but also cardiac function and securing more data. Furthermore, it is expected to be used as basic data to predict and classify other diseases and solve the problem of insufficient medical manpower compared to the increasing number of tests.

Real Time Face Detection and Recognition using Rectangular Feature based Classifier and Class Matching Algorithm (사각형 특징 기반 분류기와 클래스 매칭을 이용한 실시간 얼굴 검출 및 인식)

  • Kim, Jong-Min;Kang, Myung-A
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.1
    • /
    • pp.19-26
    • /
    • 2010
  • This paper proposes a classifier based on rectangular feature to detect face in real time. The goal is to realize a strong detection algorithm which satisfies both efficiency in calculation and detection performance. The proposed algorithm consists of the following three stages: Feature creation, classifier study and real time facial domain detection. Feature creation organizes a feature set with the proposed five rectangular features and calculates the feature values efficiently by using SAT (Summed-Area Tables). Classifier learning creates classifiers hierarchically by using the AdaBoost algorithm. In addition, it gets excellent detection performance by applying important face patterns repeatedly at the next level. Real time facial domain detection finds facial domains rapidly and efficiently through the classifier based on the rectangular feature that was created. Also, the recognition rate was improved by using the domain which detected a face domain as the input image and by using PCA and KNN algorithms and a Class to Class rather than the existing Point to Point technique.

Improved LTE Fingerprint Positioning Through Clustering-based Repeater Detection and Outlier Removal

  • Kwon, Jae Uk;Chae, Myeong Seok;Cho, Seong Yun
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.11 no.4
    • /
    • pp.369-379
    • /
    • 2022
  • In weighted k-nearest neighbor (WkNN)-based Fingerprinting positioning step, a process of comparing the requested positioning signal with signal information for each reference point stored in the fingerprint DB is performed. At this time, the higher the number of matched base station identifiers, the higher the possibility that the terminal exists in the corresponding location, and in fact, an additional weight is added to the location in proportion to the number of matching base stations. On the other hand, if the matching number of base stations is small, the selected candidate reference point has high dependence on the similarity value of the signal. But one problem arises here. The positioning signal can be compared with the repeater signal in the signal information stored on the DB, and the corresponding reference point can be selected as a candidate location. The selected reference point is likely to be an outlier, and if a certain weight is applied to the corresponding location, the error of the estimated location information increases. In order to solve this problem, this paper proposes a WkNN technique including an outlier removal function. To this end, it is first determined whether the repeater signal is included in the DB information of the matched base station. If the reference point for the repeater signal is selected as the candidate position, the reference position corresponding to the outlier is removed based on the clustering technique. The performance of the proposed technique is verified through data acquired in Seocho 1 and 2 dongs in Seoul.