• Title/Summary/Keyword: K-NN(K-Nearest Neighbor)

Search Result 198, Processing Time 0.022 seconds

Sleep Deprivation Attack Detection Based on Clustering in Wireless Sensor Network (무선 센서 네트워크에서 클러스터링 기반 Sleep Deprivation Attack 탐지 모델)

  • Kim, Suk-young;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.1
    • /
    • pp.83-97
    • /
    • 2021
  • Wireless sensors that make up the Wireless Sensor Network generally have extremely limited power and resources. The wireless sensor enters the sleep state at a certain interval to conserve power. The Sleep deflation attack is a deadly attack that consumes power by preventing wireless sensors from entering the sleep state, but there is no clear countermeasure. Thus, in this paper, using clustering-based binary search tree structure, the Sleep deprivation attack detection model is proposed. The model proposed in this paper utilizes one of the characteristics of both attack sensor nodes and normal sensor nodes which were classified using machine learning. The characteristics used for detection were determined using Long Short-Term Memory, Decision Tree, Support Vector Machine, and K-Nearest Neighbor. Thresholds for judging attack sensor nodes were then learned by applying the SVM. The determined features were used in the proposed algorithm to calculate the values for attack detection, and the threshold for determining the calculated values was derived by applying SVM.Through experiments, the detection model proposed showed a detection rate of 94% when 35% of the total sensor nodes were attack sensor nodes and improvement of up to 26% in power retention.

Effect of Location Error on the Estimation of Aboveground Biomass Carbon Stock (지상부 바이오매스 탄소저장량의 추정에 위치 오차가 미치는 영향)

  • Kim, Sang-Pil;Heo, Joon;Jung, Jae-Hoon;Yoo, Su-Hong;Kim, Kyoung-Min
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.2
    • /
    • pp.133-139
    • /
    • 2011
  • Estimation of biomass carbon stock is an important research for estimation of public benefit of forest. Previous studies about biomass carbon stock estimation have limitations, which come from the used deterministic models. The most serious problem of deterministic models is that deterministic models do not provide any explanation about the relevant effects of errors. In this study, the effects of location errors were analyzed in order to estimation of biomass carbon stock of Danyang area using Monte Carlo simulation method. More specifically, the k-Nearest Neighbor(kNN) algorithm was used for basic estimation. In this procedure, random and systematic errors were added on the location of Sample plot, and effects on estimation error were analyzed by checking the changes of RMSE. As a result of random error simulation, mean RMSE of estimation was increased from 24.8 tonC/ha to 26 tonC/ha when 0.5~1 pixel location errors were added. However, mean RMSE was converged after the location errors were added 0.8 pixel, because of characteristic of study site. In case of the systematic error simulation, any significant trends of RMSE were not detected in the test data.

A Study on Data Clustering of Light Buoy Using DBSCAN(I) (DBSCAN을 이용한 등부표 위치 데이터 Clustering 연구(I))

  • Gwang-Young Choi;So-Ra Kim;Sang-Won Park;Chae-Uk Song
    • Journal of Navigation and Port Research
    • /
    • v.47 no.4
    • /
    • pp.231-238
    • /
    • 2023
  • The position of a light buoy is always flexible due to the influence of external forces such as tides and wind. The position can be checked through AIS (Automatic Identification System) or RTU (Remote Terminal Unit) for AtoN. As a result of analyzing the position data for the last five years (2017-2021) of a light buoy, the average position error was 15.4%. It is necessary to detect position error data and obtain refined position data to prevent navigation safety accidents and management. This study aimed to detect position error data and obtain refined position data by DBSCAN Clustering position data obtained through AIS or RTU for AtoN. For this purpose, 21 position data of Gunsan Port No. 1 light buoy where RTU was installed among western waters with the most position errors were DBSCAN clustered using Python library. The minPts required for DBSCAN Clustering applied the value commonly used for two-dimensional data. Epsilon was calculated and its value was applied using the k-NN (nearest neighbor) algorithm. As a result of DBSCAN Clustering, position error data that did not satisfy minPts and epsilon were detected and refined position data were acquired. This study can be used as asic data for obtaining reliable position data of a light buoy installed with AIS or RTU for AtoN. It is expected to be of great help in preventing navigation safety accidents.

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

Recognition of damage pattern and evolution in CFRP cable with a novel bonding anchorage by acoustic emission

  • Wu, Jingyu;Lan, Chengming;Xian, Guijun;Li, Hui
    • Smart Structures and Systems
    • /
    • v.21 no.4
    • /
    • pp.421-433
    • /
    • 2018
  • Carbon fiber reinforced polymer (CFRP) cable has good mechanical properties and corrosion resistance. However, the anchorage of CFRP cable is a big issue due to the anisotropic property of CFRP material. In this article, a high-efficient bonding anchorage with novel configuration is developed for CFRP cables. The acoustic emission (AE) technique is employed to evaluate the performance of anchorage in the fatigue test and post-fatigue ultimate bearing capacity test. The obtained AE signals are analyzed by using a combination of unsupervised K-means clustering and supervised K-nearest neighbor classification (K-NN) for quantifying the performance of the anchorage and damage evolutions. An AE feature vector (including both frequency and energy characteristics of AE signal) for clustering analysis is proposed and the under-sampling approaches are employed to regress the influence of the imbalanced classes distribution in AE dataset for improving clustering quality. The results indicate that four classes exist in AE dataset, which correspond to the shear deformation of potting compound, matrix cracking, fiber-matrix debonding and fiber fracture in CFRP bars. The AE intensity released by the deformation of potting compound is very slight during the whole loading process and no obvious premature damage observed in CFRP bars aroused by anchorage effect at relative low stress level, indicating the anchorage configuration in this study is reliable.

Development of Interactive Content Services through an Intelligent IoT Mirror System (지능형 IoT 미러 시스템을 활용한 인터랙티브 콘텐츠 서비스 구현)

  • Jung, Wonseok;Seo, Jeongwook
    • Journal of Advanced Navigation Technology
    • /
    • v.22 no.5
    • /
    • pp.472-477
    • /
    • 2018
  • In this paper, we develop interactive content services for preventing depression of users through an intelligent Internet of Things(IoT) mirror system. For interactive content services, an IoT mirror device measures attention and meditation data from an EEG headset device and also measures facial expression data such as "sad", "angery", "disgust", "neutral", " happy", and "surprise" classified by a multi-layer perceptron algorithm through an webcam. Then, it sends the measured data to an oneM2M-compliant IoT server. Based on the collected data in the IoT server, a machine learning model is built to classify three levels of depression (RED, YELLOW, and GREEN) given by a proposed merge labeling method. It was verified that the k-nearest neighbor (k-NN) model could achieve about 93% of accuracy by experimental results. In addition, according to the classified level, a social network service agent sent a corresponding alert message to the family, friends and social workers. Thus, we were able to provide an interactive content service between users and caregivers.

Overview of Research Trends in Estimation of Forest Carbon Stocks Based on Remote Sensing and GIS (원격탐사와 GIS 기반의 산림탄소저장량 추정에 관한 주요국 연구동향 개관)

  • Kim, Kyoung-Min;Lee, Jung-Bin;Kim, Eun-Sook;Park, Hyun-Ju;Roh, Young-Hee;Lee, Seung-Ho;Park, Key-Ho;Shin, Hyu-Seok
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.3
    • /
    • pp.236-256
    • /
    • 2011
  • Forest carbon stocks change due to land use change is an important data required by UNFCCC(United Nations framework convention on climate change). Spatially explicit estimation of forest carbon stocks based on IPCC GPG(intergovernmental panel on climate change good practice guidance) tier 3 gives high reliability. But a current estimation which was aggregated from NFI data doesn't have detail forest carbon stocks by polygon or cell. In order to improve an estimation remote sensing and GIS have been used especially in Europe and North America. We divided research trends in main countries into 4 categories such as remote sensing, GIS, geostatistics and environmental modeling considering spatial heterogeneity. The easiest way to apply is combination NFI data with forest type map based on GIS. Considering especially complicated forest structure of Korea, geostatistics is useful to estimate local variation of forest carbon. In addition, fine scale image is good for verification of forest carbon stocks and determination of CDM site. Related domestic researches are still on initial status and forest carbon stocks are mainly estimated using k-nearest neighbor(k-NN). In order to select suitable method for forest in Korea, an applicability of diverse spatial data and algorithm must be considered. Also the comparison between methods is required.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF