• Title/Summary/Keyword: K-NN

Search Result 791, Processing Time 0.025 seconds

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

A Study on Performance of ML Algorithms and Feature Extraction to detect Malware (멀웨어 검출을 위한 기계학습 알고리즘과 특징 추출에 대한 성능연구)

  • Ahn, Tae-Hyun;Park, Jae-Gyun;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.1
    • /
    • pp.211-216
    • /
    • 2018
  • In this paper, we studied the way that classify whether unknown PE file is malware or not. In the classification problem of malware detection domain, feature extraction and classifier are important. For that purpose, we studied what the feature is good for classifier and the which classifier is good for the selected feature. So, we try to find the good combination of feature and classifier for detecting malware. For it, we did experiments at two step. In step one, we compared the accuracy of features using Opcode only, Win. API only, the one with both. We founded that the feature, Opcode and Win. API, is better than others. In step two, we compared AUC value of classifiers, Bernoulli Naïve Bayes, K-nearest neighbor, Support Vector Machine and Decision Tree. We founded that Decision Tree is better than others.

A study on neighbor selection methods in k-NN collaborative filtering recommender system (근접 이웃 선정 협력적 필터링 추천시스템에서 이웃 선정 방법에 관한 연구)

  • Lee, Seok-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.809-818
    • /
    • 2009
  • Collaborative filtering approach predicts the preference of active user about specific items transacted on the e-commerce by using others' preference information. To improve the prediction accuracy through collaborative filtering approach, it must be needed to gain enough preference information of users' for predicting preference. But, a bit much information of users' preference might wrongly affect on prediction accuracy, and also too small information of users' preference might make bad effect on the prediction accuracy. This research suggests the method, which decides suitable numbers of neighbor users for applying collaborative filtering algorithm, improved by existing k nearest neighbors selection methods. The result of this research provides useful methods for improving the prediction accuracy and also refines exploratory data analysis approach for deciding appropriate numbers of nearest neighbors.

  • PDF

Analysis of Twenty-Four Hours Heart Rate Variability among Patients with Major Depressive Disorder (주요우울장애 환자에서 24시간 심박변이도 분석)

  • Kang, Jung-Kun;Lee, Sun-Mi;Kang, Eun-Ho;Woo, Jong-Min
    • Anxiety and mood
    • /
    • v.9 no.2
    • /
    • pp.140-146
    • /
    • 2013
  • Objective : There have been few comprehensive studies on the analysis of 24-hour HRV of major depressive disorder (MDD). The purpose of this study was to compare the autonomic nerve system of patients with a MDD with healthy patients and to examine the physiologic and clinical effects of 24-hour HRV by analyzing whether the HRV demonstrates the level of depressive symptoms after improving the symptoms in patients with a MDD. Methods : The 24-hour HRV was measured in patient groups with a MDD (n=16) and control groups (n=16). The patients with a MDD received the follow up test for two months after the treatment. Results : There were significant differences among the indexes (SDNN, rMSSD, SDNN index, and pNN50) of time-domain analysis and the indexes (TP, VLF, LF, HF, and ULF) of frequency-domain analysis of HRV between patient and control groups. The means of RR, SDNN, SDANN, and TP increased after two month of the treatment, comparing them with before the treatment, but there were no statistical significance. Conclusion : The results of 24-hour HRV analysis indicated significant decrease of HRV indexes among MDD patients which may suggest decrease of parasympathetic nervous functions.

Anomalous Trajectory Detection in Surveillance Systems Using Pedestrian and Surrounding Information

  • Doan, Trung Nghia;Kim, Sunwoong;Vo, Le Cuong;Lee, Hyuk-Jae
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.256-266
    • /
    • 2016
  • Concurrently detected and annotated abnormal events can have a significant impact on surveillance systems. By considering the specific domain of pedestrian trajectories, this paper presents two main contributions. First, as introduced in much of the work on trajectory-based anomaly detection in the literature, only information about pedestrian paths, such as direction and speed, is considered. Differing from previous work, this paper proposes a framework that deals with additional types of trajectory-based anomalies. These abnormal events take places when a person enters prohibited areas. Those restricted regions are constructed by an online learning algorithm that uses surrounding information, including detected pedestrians and background scenes. Second, a simple data-boosting technique is introduced to overcome a lack of training data; such a problem particularly challenges all previous work, owing to the significantly low frequency of abnormal events. This technique only requires normal trajectories and fundamental information about scenes to increase the amount of training data for both normal and abnormal trajectories. With the increased amount of training data, the conventional abnormal trajectory classifier is able to achieve better prediction accuracy without falling into the over-fitting problem caused by complex learning models. Finally, the proposed framework (which annotates tracks that enter prohibited areas) and a conventional abnormal trajectory detector (using the data-boosting technique) are integrated to form a united detector. Such a detector deals with different types of anomalous trajectories in a hierarchical order. The experimental results show that all proposed detectors can effectively detect anomalous trajectories in the test phase.

A Comparison of Systematic Sampling Designs for Forest Inventory

  • Yim, Jong Su;Kleinn, Christoph;Kim, Sung Ho;Jeong, Jin-Hyun;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.2
    • /
    • pp.133-141
    • /
    • 2009
  • This study was conducted to support for determining an efficient sampling design for forest resources assessments in South Korea with respect to statistical efficiency. For this objective, different systematic sampling designs were simulated and compared based on an artificial forest population that had been built from field sample data and satellite data in Yang-Pyeong County, Korea. Using the k-NN technique, two thematic maps (growing stock and forest cover type per pixel unit) across the test area were generated; field data (n=191) and Landsat ETM+ were used as source data. Four sampling designs (systematic sampling, systematic sampling for post-stratification, systematic cluster sampling, and stratified systematic sampling) were employed as optimum sampling design candidates. In order to compute error variance, the Monte Carlo simulation was used (k=1,000). Then, sampling error and relative efficiency were compared. When the objective of an inventory was to obtain estimations for the entire population, systematic cluster sampling was superior to the other sampling designs. If its objective is to obtain estimations for each sub-population, post-stratification gave a better estimation. In order to successfully perform this procedure, it requires clear definitions of strata of interest per field observation unit for efficient stratification.

Review of Association between Air Pollution and Heart Rate Variability (HRV) (대기오염과 심박변이도(Heart Rate Variability, HRV)의 연관성에 대한 고찰)

  • Guak, Sooyoung;Lim, Chaeyun;Lee, Kiyoung;Park, Ji Young
    • Journal of Environmental Health Sciences
    • /
    • v.41 no.4
    • /
    • pp.223-230
    • /
    • 2015
  • Objectives: There is considerable evidence that polluted ambient air contributes to the risk of cardiovascular morbidity and mortality. Heart rate variability (HRV) is defined as the variation in heartbeat intervals and has been reported as a biological marker of cardiovascular disease. This article reviews the existing literature in order to examine the association between air pollution and HRV. Methods: Literature was searched using Web of Science with the key words of "air pollution", "heart rate variability" and other related terms. A total of 156 articles were listed. For review, 21 of those listed publications were chosen after excluding studies regarding chamber studies, occupational environment, secondhand smoke and automobile exhaust. Results: Research methods employed in the publications were classified by type of participants (elderly/adult), air pollution monitoring (ambient/personal) and HRV monitoring (continuous/spot). Among HRV parameters, power in the low frequency range (LF), power in the high frequency range (HF) and standard deviation of all NN intervals (SDNN) were all associated with air pollutants. The chosen studies were mostly based on elderly populations. In studies based on continuous HRV monitoring, LF and SDNN significantly decreased when $PM_{2.5}$ exposure increased. Conclusion: Continuous HRV monitoring combined with personal exposure monitoring has been one of the most common study methods in recent publications. We expect that this review will be useful for the study of the association between air pollution and cardiovascular effects using HRV.

A Study on the Correlation between the Patterns of the Zone 4 of Factor AA in 7-Zone-diagnostic System and Heart Rate Variability (7구역진단기의 Factor AA 제4구역 유형과 심박변이도(HRV)와의 상관성 연구)

  • Yu, Jung-Suk;Cho, Yi-Hyun;Lee, Jin-Seok;Lee, Hui-Yong;Song, Beom-Yong
    • Journal of Acupuncture Research
    • /
    • v.25 no.4
    • /
    • pp.71-80
    • /
    • 2008
  • Objectives : The 7-zonediagnostic system is a diagnostic device to predetermine bodily locations by measuring the energy of body. This study was to investigate the relation between the different patterns of Zone 4 of Factor AA in VEGA DFM 722 (VEGA, Germany), 7-zone-diagnositic system and heart rate variability. Methods : We made three groups according to the Factor AA patterns of VEGA DFM 722. The Factor AA pattern of Group A is that the red bar graph of zone 4 was higher than the normal range. The Factor AA patterns of Group B was that the red bar graph of zone 4 was located at the normal range. The Factor AA patterns of Group C was that the red bar graph of zone 4 was lower than the normal range. We investigated how to difference of the index of heart rate variability(HRV, LX-3202, LAXTHA, Korea) according to each groups. Results : Complexity, HRV-index, RMSSD, SDSD values of Group B were higher than other Groups. pNN50 values of Group B were lower than other groups. And Ln(TP), Ln(VLF), Ln(LF), Ln(HF) values of Group B were higher than other groups. Conclusions : We presumed that Group B was healthier than other groups for the stress.

  • PDF

Relationship between Heart Rate Variability(HRV) and BDI, STAI and STAXI (심박변이도 지표에 나타난 자율신경 상태와 우울, 불안 및 분노 설문검사 척도 간의 상관성 평가)

  • Kim, Sang-Young;Seo, Hyun-Wook;Kim, Jong-Woo;Chung, Sun-Yong
    • Journal of Oriental Neuropsychiatry
    • /
    • v.22 no.4
    • /
    • pp.87-100
    • /
    • 2011
  • Objectives : This study aims to evaluate the relationship between HRV indices and scores of emotional questionnaires and to find out the effective way to assess patients emotional and physical condition. Methods : We selected 144 patients who had both HRV data and BDI, STAI and STAXI scores on the chart among outpatients from July 2006 to December 2010. The relationship between the scores rated from the questionnaires and HRV indices are analyzed. And the HRV indices of patients included in the top 30 percent group and the bottom 30% group are compared. Results : 1. There were no significant correlations between HRV indices and scores of BDI, STAI and trait anger of STAXI. 2. SDNN and TP of HRV significantly decreased with higher state anger scores of STAXI. The top 30 percent group of state anger had lower SDNN, TP, LF, HF and HRV-index and higher pNN50 than the bottom 30 percent group. 3. RMSSD of HRV significantly decreased with higher anger-in scores of STAXI. The top 30 percent group of anger-in had lower RMSSD than the bottom 30 percent group. Conclusions : HRV can be used to evaluate emotional and physical changes related to state anger and inappropriate anger expression.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF