• Title/Summary/Keyword: Knn

Search Result 261, Processing Time 0.031 seconds

K Nearest Neighbor Joins for Big Data Processing based on Spark (Spark 기반 빅데이터 처리를 위한 K-최근접 이웃 연결)

  • JIAQI, JI;Chung, Yeongjee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1731-1737
    • /
    • 2017
  • K Nearest Neighbor Join (KNN Join) is a simple yet effective method in machine learning. It is widely used in small dataset of the past time. As the number of data increases, it is infeasible to run this model on an actual application by a single machine due to memory and time restrictions. Nowadays a popular batch process model called MapReduce which can run on a cluster with a large number of computers is widely used for large-scale data processing. Hadoop is a framework to implement MapReduce, but its performance can be further improved by a new framework named Spark. In the present study, we will provide a KNN Join implement based on Spark. With the advantage of its in-memory calculation capability, it will be faster and more effective than Hadoop. In our experiments, we study the influence of different factors on running time and demonstrate robustness and efficiency of our approach.

Effect of Bi4Zr3O12 on the properties of (KxNa1-x)NbO3 based ceramics

  • Mgbemere, Henry. E.;Akano, Theddeus T.;Schneider, Gerold. A.
    • Advances in materials Research
    • /
    • v.5 no.2
    • /
    • pp.93-105
    • /
    • 2016
  • KNN-based ceramics modified with small amounts of $Bi_4Zr_3O_{12}$ (BiZ) has been synthesized using high-throughput experimentation (HTE). The results from X-ray diffraction show that for samples with base composition $(K_{0.5}Na_{0.5})NbO_3$ (KNN), the phase present changes from orthorhombic to pseudo-cubic with more than 0.2 mol% BiZ addition; for samples with base composition $(K_{0.48}Na_{0.48}Li_{0.04})(Nb_{0.9}Ta_{0.1})O_3$ (KNNLT), the phase present changes from a mixture of orthorhombic and tetragonal symmetry to pseudo-cubic with more than 0.4 mol % while for samples with base composition $(K_{0.48}Na_{0.48}Li_{0.04})(Nb_{0.86}Ta_{0.1}Sb_{0.04})O_3$ (KNNLST), the phase present is tetragonal with <0.3 mol% BiZ addition and transforms to pseudo-cubic with more dopant addition. The microstructures of the samples show that addition of BiZ decreases the average grain size and increases the volume of pores at the grain boundaries. The values of dielectric constant for KNN and KNNLT compositions increase slightly with BiZ addition while that for KNNLST decreases gradually with BiZ addition. The dielectric loss values are between 0.02 and 0.04 for KNNLT and KNNLST compositions while they are ~ 0.05 for KNN samples. The resistivity values increases with BiZ addition and values in the range of $10^{10}{\Omega}cm$ and $10^{12}{\Omega}cm$ are obtained. The piezoelectric charge coefficient ($d{^*}_{33}$) is highest for KNNLST samples and decreases gradually from ~400 pm/V to ~100 pm/V with BiZ addition.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Evaluation of Classification Models of Mild Left Ventricular Diastolic Dysfunction by Tei Index (Tei Index를 이용한 경도의 좌심실 이완 기능 장애 분류 모델 평가)

  • Su-Min Kim;Soo-Young Ye
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.761-766
    • /
    • 2023
  • In this paper, TI was measured to classify the presence or absence of mild left ventricular diastolic dysfunction. Of the total 306 data, 206 were used as training data and 100 were used as test data, and the machine learning models used for classification used SVM and KNN. As a result, it was confirmed that SVM showed relatively higher accuracy than KNN and was more useful in diagnosing the presence of left ventricular diastolic dysfunction. In future research, it is expected that classification performance can be further improved by adding various indicators that evaluate not only TI but also cardiac function and securing more data. Furthermore, it is expected to be used as basic data to predict and classify other diseases and solve the problem of insufficient medical manpower compared to the increasing number of tests.

Dynamic threshold location algorithm based on fingerprinting method

  • Ding, Xuxing;Wang, Bingbing;Wang, Zaijian
    • ETRI Journal
    • /
    • v.40 no.4
    • /
    • pp.531-536
    • /
    • 2018
  • The weighted K-nearest neighbor (WKNN) algorithm is used to reduce positioning accuracy, as it uses a fixed number of neighbors to estimate the position. In this paper, we propose a dynamic threshold location algorithm (DH-KNN) to improve positioning accuracy. The proposed algorithm is designed based on a dynamic threshold to determine the number of neighbors and filter out singular reference points (RPs). We compare its performance with the WKNN and Enhanced K-Nearest Neighbor (EKNN) algorithms in test spaces of networks with dimensions of $20m{\times}20m$, $30m{\times}30m$, $40m{\times}40m$ and $50m{\times}50m$. Simulation results show that the maximum position accuracy of DH-KNN improves by 31.1%, and its maximum position error decreases by 23.5%. The results demonstrate that our proposed method achieves better performance than other well-known algorithms.

A Study on the Voltage/Var Control of Distribution System Using Kohonen Neural Network (코호넬 신경회로망을 이용한 배전시스템의 전압/무효전력 제어게 관한 연구)

  • Kim, Gwang-Won;Kim, Jong-Il
    • Proceedings of the KIEE Conference
    • /
    • 1998.11a
    • /
    • pp.329-331
    • /
    • 1998
  • This paper presents a modified Learning Vector Quantization rule to control shunt capacitor banks and feeder voltage regulators in electric distribution systems with Kohonen Neural Network(KNN). The objective of the KNN is on-line decision of the optimal state of shunt capacitor banks and feeder voltage regulators which minimize $I^{2}R$ losses of the distribution system while maintaining all the bus voltages within the limits. The KNN is tested on a distribution system with 30 buses, 5 on-off switchable capacitor banks and a nine tap line voltage regulator.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.

Speaker and Context Independent Emotion Recognition System using Gaussian Mixture Model (GMM을 이용한 화자 및 문장 독립적 감정 인식 시스템 구현)

  • 강면구;김원구
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2463-2466
    • /
    • 2003
  • This paper studied the pattern recognition algorithm and feature parameters for emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used lot speaker and context independent recognition. The speech parameters used as the feature are pitch, energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and their derivatives as a feature showed better performance than that using the Pitch and energy Parameters. For pattern recognition algorithm, GMM based emotion recognizer was superior to KNN and VQ based recognizer

  • PDF

Short- and long-term outcomes of very low birth weight infants in Korea: Korean Neonatal Network update in 2019

  • Lee, Jang Hoon;Youn, YoungAh;Chang, Yun Sil
    • Clinical and Experimental Pediatrics
    • /
    • v.63 no.8
    • /
    • pp.284-290
    • /
    • 2020
  • Korea currently has the world's lowest birth rate but a rapidly inreasing number of preterm infants. The Korean Neonatal Network (KNN), launched by the Korean Society of Neonatology under the support of Korea Centers for Disease Control, has collected population-based data for very low birth weight infants (VLBWIs) born in Korea since 2013. In terms of the short-term outcomes of VLBWIs born from 2013 to 2016 registered in the KNN, the survival rate of all VLBWIs was 86%. Respiratory distress syndrome and bronchopulmonary dysplasia were observed in 78% and 30% of all VLBWIs, respectively. Necrotizing enterocolitis occurred in 7%, while 8% of the VLBWIs needed therapy for retinopathy of prematurity in the neonatal intensive care unit (NICU). Sepsis occurred in 21% during their NICU stay. Intraventricular hemorrhage (grade ≥III) was diagnosed in 10%. In terms of the long-term outcomes for VLBWIs born from 2013 to 2014 registered in the KNN, the post-discharge mortality rate was approximately 1.2%-1.5%, mainly owing to their underlying illness. Nearly half of the VLBWIs were readmitted to the hospital at least once in their first 1-2 years of life, mostly as a result of respiratory diseases. The overall prevalence of cerebral palsy was 6.2%-6.6% in Korea. Bilateral blindness was reported in 0.2%-0.3% of VLBWIs, while bilateral hearing loss was found in 0.8%-1.9%. Since its establishment, the KNN has published annual reports and papers that facilitate the improvement of VLBWI outcome and the formulation of essential healthcare policies in Korea.