• Title/Summary/Keyword: Weighted Euclidean Distance

Search Result 43, Processing Time 0.027 seconds

An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT (함수 변환과 FFT에 기반한 조정자가 없는 XML 문서 클러스터링 기법)

  • Lee, Ho-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.169-180
    • /
    • 2007
  • This paper discusses a new unsupervised XML document clustering technique based on the function transform and FFT(Fast Fourier Transform). An XML document is transformed into a discrete function based on the hierarchical nesting structure of the elements. The discrete function is, then, transformed into vectors using FFT. The vectors of two documents are compared using a weighted Euclidean distance metric. If the comparison is lower than the pre specified threshold, the two documents are considered similar in the structure and are grouped into the same cluster. XML clustering can be useful for the storage and searching of XML documents. The experiments were conducted with 800 synthetic documents and also with 520 real documents. The experiments showed that the function transform and FFT are effective for the incremental and unsupervised clustering of XML documents similar in structure.

Machine Learning-Based Malicious URL Detection Technique (머신러닝 기반 악성 URL 탐지 기법)

  • Han, Chae-rim;Yun, Su-hyun;Han, Myeong-jin;Lee, Il-Gu
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.3
    • /
    • pp.555-564
    • /
    • 2022
  • Recently, cyberattacks are using hacking techniques utilizing intelligent and advanced malicious codes for non-face-to-face environments such as telecommuting, telemedicine, and automatic industrial facilities, and the damage is increasing. Traditional information protection systems, such as anti-virus, are a method of detecting known malicious URLs based on signature patterns, so unknown malicious URLs cannot be detected. In addition, the conventional static analysis-based malicious URL detection method is vulnerable to dynamic loading and cryptographic attacks. This study proposes a technique for efficiently detecting malicious URLs by dynamically learning malicious URL data. In the proposed detection technique, malicious codes are classified using machine learning-based feature selection algorithms, and the accuracy is improved by removing obfuscation elements after preprocessing using Weighted Euclidean Distance(WED). According to the experimental results, the proposed machine learning-based malicious URL detection technique shows an accuracy of 89.17%, which is improved by 2.82% compared to the conventional method.

Gaussian Weighted CFCM for Blind Equalization of Linear/Nonlinear Channel

  • Han, Soo-Whan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.3
    • /
    • pp.169-180
    • /
    • 2013
  • The modification of conditional Fuzzy C-Means (CFCM) with Gaussian weights (CFCM_GW) is accomplished for blind equalization of channels in this paper. The proposed CFCM_GW can deal with both of linear and nonlinear channels, because it searches for the optimal desired states of an unknown channel in a direct manner, which is not dependent on the type of channel structure. In the search procedure of CFCM_GW, the Bayesian likelihood fitness function, the Gaussian weighted partition matrix and the conditional constraint are exploited. Especially, in contrast to the common Euclidean distance in conventional Fuzzy C-Means(FCM), the Gaussian weighted partition matrix and the conditional constraint in the proposed CFCM_GW make it more robust to the heavy noise communication environment. The selected channel states by CFCM_GW are always close to the optimal set of a channel even when the additive white Gaussian noise (AWGN) is heavily corrupted. These given channel states are utilized as the input of the Bayesian equalizer to reconstruct transmitted symbols. The simulation studies demonstrate that the performance of the proposed method is relatively superior to those of the existing conventional FCM based approaches in terms of accuracy and speed.

A Heuristic Method for Max ($\bar{x}$, $\bar{y}$) TSP (Max($\bar{x}$, $\bar{y}$) TSP 를 위한 발견적 해법)

  • Lee, Hwa-Ki;Seo, Sang-Moon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.19 no.3
    • /
    • pp.37-49
    • /
    • 1993
  • In this paper, the TSP(traveling salesman problem) which its costs(distance) between nodes are defined with Max($\bar{x}$, $\bar{y}$) has been dealt. In order to find a satisfactory solution for this kind of problem, we generate weighted matrix, and then develope a new heuristic problem solving method using the weighted matrix. Also we analyze the effectiveness of the newly developed heuristic method comparing it with other heuristic algorithm already exists for Euclidean TSP. Finally, we apply a new developed algorithm to real Max($\bar{x}$,$\bar{y}$) TSP such as PCB inserting.

  • PDF

Blind linear/nonlinear equalization for heavy noise-corrupted channels

  • Han, Soo- Whan;Park, Sung-Dae
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.3
    • /
    • pp.383-391
    • /
    • 2009
  • In this paper, blind equalization using a modified Fuzzy C-Means algorithm with Gaussian Weights (MFCM_GW) is attempted to the heavy noise-corrupted channels. The proposed algorithm can deal with both of linear and nonlinear channels, because it searches for the optimal channel output states of a channel instead of estimating the channel parameters in a direct manner. In contrast to the common Euclidean distance in Fuzzy C-Means (FCM), the use of the Bayesian likelihood fitness function and the Gaussian weighted partition matrix is exploited in its search procedure. The selected channel states by MFCM_GW are always close to the optimal set of a channel even the additive white Gaussian noise (AWGN) is heavily corrupted in it. Simulation studies demonstrate that the performance of the proposed method is relatively superior to existing genetic algorithm (GA) and conventional FCM based methods in terms of accuracy and speed.

The Implementation of RRTs for a Remote-Controlled Mobile Robot

  • Roh, Chi-Won;Lee, Woo-Sub;Kang, Sung-Chul;Lee, Kwang-Won
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.2237-2242
    • /
    • 2005
  • The original RRT is iteratively expanded by applying control inputs that drive the system slightly toward randomly-selected states, as opposed to requiring point-to-point convergence, as in the probabilistic roadmap approach. It is generally known that the performance of RRTs can be improved depending on the selection of the metrics in choosing the nearest vertex and bias techniques in choosing random states. We designed a path planning algorithm based on the RRT method for a remote-controlled mobile robot. First, we considered a bias technique that is goal-biased Gaussian random distribution along the command directions. Secondly, we selected the metric based on a weighted Euclidean distance of random states and a weighted distance from the goal region. It can save the effort to explore the unnecessary regions and help the mobile robot to find a feasible trajectory as fast as possible. Finally, the constraints of the actuator should be considered to apply the algorithm to physical mobile robots, so we select control inputs distributed with commanded inputs and constrained by the maximum rate of input change instead of random inputs. Simulation results demonstrate that the proposed algorithm is significantly more efficient for planning than a basic RRT planner. It reduces the computational time needed to find a feasible trajectory and can be practically implemented in a remote-controlled mobile robot.

  • PDF

Organ Recognition in Ultrasound images Using Log Power Spectrum (로그 전력 스펙트럼을 이용한 초음파 영상에서의 장기인식)

  • 박수진;손재곤;김남철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.9C
    • /
    • pp.876-883
    • /
    • 2003
  • In this paper, we propose an algorithm for organ recognition in ultrasound images using log power spectrum. The main procedure of the algorithm consists of feature extraction and feature classification. In the feature extraction, as a translation invariant feature, log power spectrum is used for extracting the information on echo of the organs tissue from a preprocessed input image. In the feature classification, Mahalanobis distance is used as a measure of the similarity between the feature of an input image and the representative feature of each class. Experimental results for real ultrasound images show that the proposed algorithm yields the improvement of maximum 30% recognition rate than the recognition algorithm using power spectrum and Euclidean distance, and results in better recognition rate of 10-40% than the recognition algorithm using weighted quefrency complex cepstrum.

Deduction of Acupoints Selecting Elements on Zhenjiuzishengjing using hierarchical clustering (계층적 군집분석(hierarchical clustering)을 통한 침구자생경(鍼灸資生經) 경혈 선택 요인 분석)

  • Oh, Junho
    • Journal of Haehwa Medicine
    • /
    • v.23 no.1
    • /
    • pp.115-124
    • /
    • 2014
  • Objectives : There are plenty of medical record of acupuncture & moxibustion in Traditional East Asian medicine(TEAM). We performed this study to find out the hidden criteria lies on this record to choose proper acupoints. Methods : "Zhenjiuzishengjing", ancient TEAM book was analysed using document clustering techniques. Corpus was made from this book. It contained 196 texts driven from each symptoms. Each texts converted to vector representing frequency of 349 acupoints. Distance of vectors calculated by weighted Euclidean distance method. According to this distances, hierarchical clustering of symptoms was builded. Results : The cluster consisted of five large groups. they had high corelation with body part; head and face, chest, abdomen, upper extremity, lower extremity, back. Conclusions : It assumes that body part of symptom is the most importance criteria of acupoints selecting. some high similar symptom vectors consolidated this result. the other criteria is cause and pathway of illness. some symptoms bound together which had common cause and pathway.

Using Voronoi Diagram and Power Diagram in Application Problems (응용문제에서 보로노이 다이어그램과 파워 다이어그램의 사용성 비교)

  • Kim, Donguk
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.4
    • /
    • pp.235-243
    • /
    • 2012
  • The Voronoi diagram of spheres and power diagram have been known as powerful tools to analyze spatial characteristics of weighted points, and these structures have variety range of applications including molecular spatial structure analysis, location based optimization, architectural design, etc. Due to the fact that both diagrams are based on different distance metrics, one has better usability than another depending on application problems. In this paper, we compare these diagrams in various situations from the user's viewpoint, and show the Voronoi diagram of spheres is more effective in the problems based on the Euclidean distance metric such as nearest neighbor search, path bottleneck locating, and internal void finding.

Adaptation and Clustering Method for Speaker Identification with Small Training Data (화자적응과 군집화를 이용한 화자식별 시스템의 성능 및 속도 향상)

  • Kim Se-Hyun;Oh Yung-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.83-99
    • /
    • 2006
  • One key factor that hinders the widespread deployment of speaker identification technologies is the requirement of long enrollment utterances to guarantee low error rate during identification. To gain user acceptance of speaker identification technologies, adaptation algorithms that can enroll speakers with short utterances are highly essential. To this end, this paper applies MLLR speaker adaptation for speaker enrollment and compares its performance against other speaker modeling techniques: GMMs and HMM. Also, to speed up the computational procedure of identification, we apply speaker clustering method which uses principal component analysis (PCA) and weighted Euclidean distance as distance measurement. Experimental results show that MLLR adapted modeling method is most effective for short enrollment utterances and that the GMMs performs better when long utterances are available.

  • PDF