• 제목/요약/키워드: k nearest neighbor approach

검색결과 96건 처리시간 0.025초

Cable anomaly detection driven by spatiotemporal correlation dissimilarity measurements of bridge grouped cable forces

  • Dong-Hui, Yang;Hai-Lun, Gu;Ting-Hua, Yi;Zhan-Jun, Wu
    • Smart Structures and Systems
    • /
    • 제30권6호
    • /
    • pp.661-671
    • /
    • 2022
  • Stayed cables are the key components for transmitting loads in cable-stayed bridges. Therefore, it is very important to evaluate the cable force condition to ensure bridge safety. An online condition assessment and anomaly localization method is proposed for cables based on the spatiotemporal correlation of grouped cable forces. First, an anomaly sensitive feature index is obtained based on the distribution characteristics of grouped cable forces. Second, an adaptive anomaly detection method based on the k-nearest neighbor rule is used to perform dissimilarity measurements on the extracted feature index, and such a method can effectively remove the interference of environment factors and vehicle loads on online condition assessment of the grouped cable forces. Furthermore, an online anomaly isolation and localization method for stay cables is established, and the complete decomposition contributions method is used to decompose the feature matrix of the grouped cable forces and build an anomaly isolation index. Finally, case studies were carried out to validate the proposed method using an in-service cable-stayed bridge equipped with a structural health monitoring system. The results show that the proposed approach is sensitive to the abnormal distribution of grouped cable forces and is robust to the influence of interference factors. In addition, the proposed approach can also localize the cables with abnormal cable forces online, which can be successfully applied to the field monitoring of cables for cable-stayed bridges.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권12호
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법 (A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases)

  • 오현교;윤석호;김상욱
    • 전자공학회논문지CI
    • /
    • 제47권1호
    • /
    • pp.80-91
    • /
    • 2010
  • 이미지 데이터가 증가함에 따라 효율적인 검색을 위해서 이미지 데이터를 구조화해야 할 필요성이 증가하고 있다. 이미지 데이터를 구조화하기 위한 대표적인 방법으로는 클러스터링이 있다. 그러나 기존 클러스터링 방법들은 클러스터링을 수행하기 전에 매개변수로서 클러스터의 개수를 사용자로부터 제공 받아야 되는 어려움이 있다. 본 논문에서는 클러스터의 개수를 사용자에게 제공 받지 않고 이미지 데이터를 클러스터링 하는 방안에 대해서 논의 한다. 제안하는 방안은 객체들 간의 상호 연관관계를 이용하여 매개변수 없이 데이터의 감추어진 구조나 패턴을 찾아내는 방법인 Cross-Association을 기반으로 한다. 이미지 데이터 클러스터링에 Cross-Association을 적용하기 위해서는 먼저 이미지 데이터를 그래프로 변환해야 한다. 그런 후에 생성된 그래프를 Cross-Association에 적용시키고 그 결과를 클러스터링 관점에서 해석한다. 본 논문에서는 또한 Cross-Association을 기반으로 계층적 클러스터링 하는 방법과 아웃라이어 검출 방법을 제안한다. 실험을 통해서 제안하는 방법의 우수성을 규명하고 이미지 데이터를 클러스터링 하는데 적절한 k-최근접 이웃검색에서의 k값과 더 나은 그래프 생성 방법이 무엇인지를 제시한다.

Designing Hypothesis of 2-Substituted-N-[4-(1-methyl-4,5-diphenyl-1H-imidazole-2-yl)phenyl] Acetamide Analogs as Anticancer Agents: QSAR Approach

  • Bedadurge, Ajay B.;Shaikh, Anwar R.
    • 대한화학회지
    • /
    • 제57권6호
    • /
    • pp.744-754
    • /
    • 2013
  • Quantitative structure-activity relationship (QSAR) analysis for recently synthesized imidazole-(benz)azole and imidazole - piperazine derivatives was studied for their anticancer activities against breast (MCF-7) cell lines. The statistically significant 2D-QSAR models ($r^2=0.8901$; $q^2=0.8130$; F test = 36.4635; $r^2$ se = 0.1696; $q^2$ se = 0.12212; pred_$r^2=0.4229$; pred_$r^2$ se = 0.4606 and $r^2=0.8763$; $q^2=0.7617$; F test = 31.8737; $r^2$ se = 0.1951; $q^2$ se = 0.2708; pred_$r^2=0.4386$; pred_$r^2$ se = 0.3950) were developed using molecular design suite (VLifeMDS 4.2). The study was performed with 18 compounds (data set) using random selection and manual selection methods used for the division of the data set into training and test set. Multiple linear regression (MLR) methodology with stepwise (SW) forward-backward variable selection method was used for building the QSAR models. The results of the 2D-QSAR models were further compared with 3D-QSAR models generated by kNN-MFA, (k-Nearest Neighbor Molecular Field Analysis) investigating the substitutional requirements for the favorable anticancer activity. The results derived may be useful in further designing novel imidazole-(benz)azole and imidazole-piperazine derivatives against breast (MCF-7) cell lines prior to synthesis.

ARPA 레이더 개발을 위한 물표 획득 및 추적 기술 연구 (A Study on Target Acquisition and Tracking to Develop ARPA Radar)

  • 이희용;신일식;이광일
    • 한국항해항만학회지
    • /
    • 제39권4호
    • /
    • pp.307-312
    • /
    • 2015
  • ARPA(Automatic Radar Plotting Aid)는 자동레이더 플로팅 장치로써, 레이더 물표의 상대침로와 상대방위로 구성된 운동벡터에 본선의 침로와 방위로 구성되는 운동벡터를 가감 연산(벡터연산)하여, 물표의 진침로와 진방위 및 최근접점과 근접시간을 계산하는 장치를 말한다. 본 연구의 목적은 ARPA 레이더를 구현하기 위한 물표의 획득 및 추적 기술을 개발하는 것으로, 이에 관한 여러 선행 연구를 검토하여 적용 가능한 알고리듬 및 기법을 조합하여 기초적인 ARPA 기능을 개발하였다. 주요 연구내용으로, 레이더 영상에서 물표를 획득하기 위하여, 회색조 변환, 가운시안 평활 필터 적용, 이진화 및 라벨링(Labeling)과 같은 순차적 영상 처리 방법을 고안하였고, 이전 영상에서의 물표가 다음 영상에서의 어느 물표인지를 결정하는데 근접이웃탐색알고리듬을 사용하였으며, 물표의 진침로와 진방위를 계산하는 거동해석에 칼만필터를 사용하였다. 또한 이러한 기법을 전산 구현하여 실선실험을 수행하였고, 이를 통해 개발된 ARPA의 기능이 실용상 사용가능함을 검증하였다.

호서 사이버 박물관: 웹기반의 파노라마 비디오 가상현실에 대한 효율적인 이미지 스티칭 알고리즘 (A proposed image stitching method for web-based panoramic virtual reality for Hoseo Cyber Museum)

  • 아르판 칸;홍성수
    • 한국산학기술학회논문지
    • /
    • 제14권2호
    • /
    • pp.893-898
    • /
    • 2013
  • 파노라마 가상현실이란 특정 장소의 경험을 재현하는 방식으로, 현실 세계의 장소에 직접 가보지 않고 가상현실 속의 사물이나 정보를 보다 쉽고 빠르게 탐색하고 습득 할 수 있다. 본 논문에서는, 우리는 이상적인 키 포인트를 탐지하는 동적 프로그래밍을 사용하여 함께 이 지점과 인접한 이미지를 병합하고, 부드러운 색상 전환을 위해 이미지를 혼합하는데 사용된다. FAST와 SURF 탐지는 이미지의 확실한 특징을 찾는데 사용되고, 가장 가까운 이웃 알고리즘은 해당되는 특징을 일치시키는데 사용되며, RANSAC을 사용하여 일치하는 키 포인트를 homography로 판단한다. 이러한 방법으로 이미지를 자동 선택하여 스티칭하는 방법을 사용한다.

P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구 (Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending)

  • 프란시스 조셉 코스텔로;이건창
    • 디지털융복합연구
    • /
    • 제17권9호
    • /
    • pp.71-78
    • /
    • 2019
  • 본 연구는 P2P 대부 플랫폼에서 우수 대출자를 예측시 유용한 합성 소수집단 오버샘플링 기법을 제안하고 그 성과를 실증적으로 검증하고자 한다. P2P 대부 관련 우수 대출자를 추정할 때 일어나는 문제점중의 하나는 클래스 간 불균형이 심하여 이를 해결하지 않고서는 우수 대출자 예측이 쉽지 않다는 점이다. 이러한 문제를 해결하기 위하여 본 연구에서는 SMOTE, 즉 합성 소수집단 오버샘플링 기법을 제안하고 LendingClub 데이터셋에 적용하여 성과를 검증하였다. 검증결과 SMOTE 방법은 서포트 벡터머신, k-최근접이웃, 로지스틱 회귀, 랜덤 포레스트, 그리고 딥 뉴럴네트워크 분류기와 비교하여 통계적으로 우수한 성과를 보였다.

Courses Recommendation Algorithm Based On Performance Prediction In E-Learning

  • Koffi, Dagou Dangui Augustin Sylvain Legrand;Ouattara, Nouho;Mambe, Digrais Moise;Oumtanaga, Souleymane;ADJE, Assohoun
    • International Journal of Computer Science & Network Security
    • /
    • 제21권2호
    • /
    • pp.148-157
    • /
    • 2021
  • The effectiveness of recommendation systems depends on the performance of the algorithms with which these systems are designed. The quality of the algorithms themselves depends on the quality of the strategies with which they were designed. These strategies differ from author to author. Thus, designing a good recommendation system means implementing the good strategies. It's in this context that several research works have been proposed on various strategies applied to algorithms to meet the needs of recommendations. Researchers are trying indefinitely to address this objective of seeking the qualities of recommendation algorithms. In this paper, we propose a new algorithm for recommending learning items. Learner performance predictions and collaborative recommendation methods are used as strategies for this algorithm. The proposed performance prediction model is based on convolutional neural networks (CNN). The results of the performance predictions are used by the proposed recommendation algorithm. The results of the predictions obtained show the efficiency of Deep Learning compared to the k-nearest neighbor (k-NN) algorithm. The proposed recommendation algorithm improves the recommendations of the learners' learning items. This algorithm also has the particularity of dissuading learning items in the learner's profile that are deemed inadequate for his or her training.

Protecting Accounting Information Systems using Machine Learning Based Intrusion Detection

  • Biswajit Panja
    • International Journal of Computer Science & Network Security
    • /
    • 제24권5호
    • /
    • pp.111-118
    • /
    • 2024
  • In general network-based intrusion detection system is designed to detect malicious behavior directed at a network or its resources. The key goal of this paper is to look at network data and identify whether it is normal traffic data or anomaly traffic data specifically for accounting information systems. In today's world, there are a variety of principles for detecting various forms of network-based intrusion. In this paper, we are using supervised machine learning techniques. Classification models are used to train and validate data. Using these algorithms we are training the system using a training dataset then we use this trained system to detect intrusion from the testing dataset. In our proposed method, we will detect whether the network data is normal or an anomaly. Using this method we can avoid unauthorized activity on the network and systems under that network. The Decision Tree and K-Nearest Neighbor are applied to the proposed model to classify abnormal to normal behaviors of network traffic data. In addition to that, Logistic Regression Classifier and Support Vector Classification algorithms are used in our model to support proposed concepts. Furthermore, a feature selection method is used to collect valuable information from the dataset to enhance the efficiency of the proposed approach. Random Forest machine learning algorithm is used, which assists the system to identify crucial aspects and focus on them rather than all the features them. The experimental findings revealed that the suggested method for network intrusion detection has a neglected false alarm rate, with the accuracy of the result expected to be between 95% and 100%. As a result of the high precision rate, this concept can be used to detect network data intrusion and prevent vulnerabilities on the network.

독립성분분석을 이용한 다변량 시계열 모의 (Multivariate Time Series Simulation With Component Analysis)

  • 이태삼;호세살라스;주하카바넨;노재경
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2008년도 학술발표회 논문집
    • /
    • pp.694-698
    • /
    • 2008
  • In hydrology, it is a difficult task to deal with multivariate time series such as modeling streamflows of an entire complex river system. Normal distribution based model such as MARMA (Multivariate Autorgressive Moving average) has been a major approach for modeling the multivariate time series. There are some limitations for the normal based models. One of them might be the unfavorable data-transformation forcing that the data follow the normal distribution. Furthermore, the high dimension multivariate model requires the very large parameter matrix. As an alternative, one might be decomposing the multivariate data into independent components and modeling it individually. In 1985, Lins used Principal Component Analysis (PCA). The five scores, the decomposed data from the original data, were taken and were formulated individually. The one of the five scores were modeled with AR-2 while the others are modeled with AR-1 model. From the time series analysis using the scores of the five components, he noted "principal component time series might provide a relatively simple and meaningful alternative to conventional large MARMA models". This study is inspired from the researcher's quote to develop a multivariate simulation model. The multivariate simulation model is suggested here using Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Three modeling step is applied for simulation. (1) PCA is used to decompose the correlated multivariate data into the uncorrelated data while ICA decomposes the data into independent components. Here, the autocorrelation structure of the decomposed data is still dominant, which is inherited from the data of the original domain. (2) Each component is resampled by block bootstrapping or K-nearest neighbor. (3) The resampled components bring back to original domain. From using the suggested approach one might expect that a) the simulated data are different with the historical data, b) no data transformation is required (in case of ICA), c) a complex system can be decomposed into independent component and modeled individually. The model with PCA and ICA are compared with the various statistics such as the basic statistics (mean, standard deviation, skewness, autocorrelation), and reservoir-related statistics, kernel density estimate.

  • PDF