• Title/Summary/Keyword: k-Nearest neighbor

Search Result 641, Processing Time 0.028 seconds

Analysis of GPU-based Parallel Shifted Sort Algorithm by comparing with General GPU-based Tree Traversal (일반적인 GPU 트리 탐색과의 비교실험을 통한 GPU 기반 병렬 Shifted Sort 알고리즘 분석)

  • Kim, Heesu;Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1151-1156
    • /
    • 2017
  • It is common to achieve lower performance in traversing tree data structures in GPU than one expects. In this paper, we analyze the reason of lower-than-expected performance in GPU tree traversal and present that the warp divergences is caused by the branch instructions ("if${\ldots}$ else") which appear commonly in tree traversal CUDA codes. Also, we compare the parallel shifted sort algorithm which can reduce the number of warp divergences with a kd-tree CUDA implementation to show that the shifted sort algorithm can work faster than the kd-tree CUDA implementation thanks to less warp divergences. As the analysis result, the shifted sort algorithm worked about 16-fold faster than the kd-tree CUDA implementation for $2^{23}$ query points and $2^{23}$ data points in $R^3$ space. The performance gaps tend to increase in proportion to the number of query points and data points.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

CS-Tree : Cell-based Signature Index Structure for Similarity Search in High-Dimensional Data (CS-트리 : 고차원 데이터의 유사성 검색을 위한 셀-기반 시그니쳐 색인 구조)

  • Song, Gwang-Taek;Jang, Jae-U
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.305-312
    • /
    • 2001
  • Recently, high-dimensional index structures have been required for similarity search in such database applications s multimedia database and data warehousing. In this paper, we propose a new cell-based signature tree, called CS-tree, which supports efficient storage and retrieval on high-dimensional feature vectors. The proposed CS-tree partitions a high-dimensional feature space into a group of cells and represents a feature vector as its corresponding cell signature. By using cell signatures rather than real feature vectors, it is possible to reduce the height of our CS-tree, leading to efficient retrieval performance. In addition, we present a similarity search algorithm for efficiently pruning the search space based on cells. Finally, we compare the performance of our CS-tree with that of the X-tree being considered as an efficient high-dimensional index structure, in terms of insertion time, retrieval time for a k-nearest neighbor query, and storage overhead. It is shown from experimental results that our CS-tree is better on retrieval performance than the X-tree.

  • PDF

Noncontact Sleep Efficiency and Stage Estimation for Sleep Apnea Patients Using an Ultra-Wideband Radar (UWB 레이더를 사용한 수면무호흡환자에 대한 비접촉방식 수면효율 및 수면 단계 추정)

  • Park, Sang-Bae;Kim, Jung-Ha
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.23 no.3
    • /
    • pp.433-444
    • /
    • 2020
  • This study proposes a method to improve the sleep stage and efficiency estimation of sleep apnea patients using a UWB (Ultra-Wideband) radar. Motion and respiration extracted from the radar signal were used. Respiratory signal disturbances by motion artifacts and irregular respiration patterns of sleep apnea patients are compensated for in the preprocessing stage. Preprocessing calculates the standard deviation of the respiration signal for a shift window of 15 seconds to estimate thresholds for compensation and applies it to the breathing signal. The method for estimating the sleep stage is based on the difference in amplitude of two kinds of smoothed respirations signals. In smoothing, the window size is set to 10 seconds and 34 seconds, respectively. The estimated feature was processed by the k-nearest neighbor classifier and the feature filtering model to discriminate between the sleep periods of the rapid eye movement (REM) and non-rapid eye movement (NREM). The feature filtering model reflects the characteristics of the REM sleep that occur continuously and the characteristics that mainly occur in the latter part of this stage. The sleep efficiency is estimated by using the sleep onset time and motion events. Sleep onset time uses estimated features from the gradient changes of the breathing signal. A motion event was applied based on the estimated energy change in the UWB signal. Sleep efficiency and sleep stage accuracy were assessed with polysomnography. The average sleep efficiency and sleep stage accuracy were estimated respectively to be about 96.3% and 88.8% in 18 sleep apnea subjects.

Combining Multiple Classifiers for Automatic Classification of Email Documents (전자우편 문서의 자동분류를 위한 다중 분류기 결합)

  • Lee, Jae-Haeng;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.192-201
    • /
    • 2002
  • Automated text classification is considered as an important method to manage and process a huge amount of documents in digital forms that are widespread and continuously increasing. Recently, text classification has been addressed with machine learning technologies such as k-nearest neighbor, decision tree, support vector machine and neural networks. However, only few investigations in text classification are studied on real problems but on well-organized text corpus, and do not show their usefulness. This paper proposes and analyzes text classification methods for a real application, email document classification task. First, we propose a combining method of multiple neural networks that improves the performance through the combinations with maximum and neural networks. Second, we present another strategy of combining multiple machine learning classifiers. Voting, Borda count and neural networks improve the overall classification performance. Experimental results show the usefulness of the proposed methods for a real application domain, yielding more than 90% precision rates.

A Study of Outer-ring Galaxies within z<0.05 (적색편이 z<0.05의 외부고리 은하에 대한 연구)

  • Chang, Hunhwi;Sohn, Jungjoo;Ahn, Hongbae
    • Journal of the Korean earth science society
    • /
    • v.41 no.3
    • /
    • pp.211-221
    • /
    • 2020
  • This study classified outer-ring galaxies using 25,308 galaxies within z=0.05 from the SDSS DR7, which are larger than Rpet>6 arcsec and whose minor-to-major axis ratio (b/a)<0.6. We selected 531 galaxies that have ring-like structures by visual inspection of the color images of 25,308 galaxies; these galaxies with ring-like structures served as a primary sample from which we selected 90 outer-ring galaxies. The final sample of 69 outer-ring galaxies was selected by examining the photometric properties of the candidate galaxies. Their properties were determined by conducting surface photometry on their u, g, r, i, and z images. The frequency of the outer-ring galaxies was found to be 0.3% of the local galaxies. We examined the environment of the outer-ring galaxies using two measures of environment, namely, the projected distance to the nearest-neighbor galaxy and the local background density. We did not observe any notable difference between outer-ring and other galactic environments.

Development of Traffic Prediction and Optimal Traffic Control System for Highway based on Cell Transmission Model in Cloud Environment (Cell Transmission Model 시뮬레이션을 기반으로 한 클라우드 환경 아래에서의 고속도로 교통 예측 및 최적 제어 시스템 개발)

  • Tak, Se-hyun;Yeo, Hwasoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.4
    • /
    • pp.68-80
    • /
    • 2016
  • This study proposes the traffic prediction and optimal traffic control system based on cell transmission model and genetic algorithm in cloud environment. The proposed prediction and control system consists of four parts. 1) Data preprocessing module detects and imputes the corrupted data and missing data points. 2) Data-driven traffic prediction module predicts the future traffic state using Multi-level K-Nearest Neighbor (MK-NN) Algorithm with stored historical data in SQL database. 3) Online traffic simulation module simulates the future traffic state in various situations including accident, road work, and extreme weather condition with predicted traffic data by MK-NN. 4) Optimal road control module produces the control strategy for large road network with cell transmission model and genetic algorithm. The results show that proposed system can effectively reduce the Vehicle Hours Traveled upto 60%.

Vantage Point Metric Index Improvement for Multimedia Databases

  • Chanpisey, Uch;Lee, Sang-Kon Samuel;Lee, In-Hong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.112-114
    • /
    • 2011
  • On multimedia databases, in order to realize the fast access method, indexing methods for the multidimension data space are used. However, since it is a premise to use the Euclid distance as the distance measure, this method lacks in flexibility. On the other hand, there are metric indexing methods which require only to satisfy distance axiom. Since metric indexing methods can also apply for distance measures other than the Euclid distance, these methods have high flexibility. This paper proposes an improved method of VP-tree which is one of the metric indexing methods. VP-tree follows the node which suits the search range from a route node at searching. And distances between a query and all objects linked from the leaf node which finally arrived are computed, and it investigates whether each object is contained in the search range. However, search speed will become slow if the number of distance calculations in a leaf node increases. Therefore, we paid attention to the candidates selection method using the triangular inequality in a leaf node. As the improved methods, we propose a method to use the nearest neighbor object point for the query as the datum point of the triangular inequality. It becomes possible to make the search range smaller and to cut down the number of times of distance calculation by these improved methods. From evaluation experiments using 10,000 image data, it was found that our proposed method could cut 5%~12% of search time of the traditional method.

A Design and Implementation Red Tide Prediction Monitoring System using Case Based Reasoning (사례 기반 추론을 이용한 적조 예측 모니터링 시스템 구현 및 설계)

  • Song, Byoung-Ho;Jung, Min-A;Lee, Sung-Ro
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.12B
    • /
    • pp.1219-1226
    • /
    • 2010
  • It is necessary to implementation of system contain intelligent decision making algorithm because discriminant and prediction system for Red Tide is insufficient development and the study of red tide are focused for the investigation of chemical and biological causing. In this paper, we designed inference system using case based reasoning method and implemented knowledge base that case for Red Tide. We used K-Nearest Neighbor algorithm for recommend best similar case and input 375 EA by case for Red Tide case base. As a result, conducted 10-fold cross verification for minimal impact from learning data and acquired confidence, we obtained about 84.2% average accuracy for Red Tide case and the best performance results in case by number of similarity classification k is 5. And, we implemented Red Tide monitoring system using inference result.

Designing Hypothesis of 2-Substituted-N-[4-(1-methyl-4,5-diphenyl-1H-imidazole-2-yl)phenyl] Acetamide Analogs as Anticancer Agents: QSAR Approach

  • Bedadurge, Ajay B.;Shaikh, Anwar R.
    • Journal of the Korean Chemical Society
    • /
    • v.57 no.6
    • /
    • pp.744-754
    • /
    • 2013
  • Quantitative structure-activity relationship (QSAR) analysis for recently synthesized imidazole-(benz)azole and imidazole - piperazine derivatives was studied for their anticancer activities against breast (MCF-7) cell lines. The statistically significant 2D-QSAR models ($r^2=0.8901$; $q^2=0.8130$; F test = 36.4635; $r^2$ se = 0.1696; $q^2$ se = 0.12212; pred_$r^2=0.4229$; pred_$r^2$ se = 0.4606 and $r^2=0.8763$; $q^2=0.7617$; F test = 31.8737; $r^2$ se = 0.1951; $q^2$ se = 0.2708; pred_$r^2=0.4386$; pred_$r^2$ se = 0.3950) were developed using molecular design suite (VLifeMDS 4.2). The study was performed with 18 compounds (data set) using random selection and manual selection methods used for the division of the data set into training and test set. Multiple linear regression (MLR) methodology with stepwise (SW) forward-backward variable selection method was used for building the QSAR models. The results of the 2D-QSAR models were further compared with 3D-QSAR models generated by kNN-MFA, (k-Nearest Neighbor Molecular Field Analysis) investigating the substitutional requirements for the favorable anticancer activity. The results derived may be useful in further designing novel imidazole-(benz)azole and imidazole-piperazine derivatives against breast (MCF-7) cell lines prior to synthesis.