• Title/Summary/Keyword: Subset selection

Search Result 203, Processing Time 0.028 seconds

Real-time Classification of Internet Application Traffic using a Hierarchical Multi-class SVM

  • Yu, Jae-Hak;Lee, Han-Sung;Im, Young-Hee;Kim, Myung-Sup;Park, Dai-Hee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.5
    • /
    • pp.859-876
    • /
    • 2010
  • In this paper, we propose a hierarchical application traffic classification system as an alternative means to overcome the limitations of the port number and payload based methodologies, which are traditionally considered traffic classification methods. The proposed system is a new classification model that hierarchically combines a binary classifier SVM and Support Vector Data Descriptions (SVDDs). The proposed system selects an optimal attribute subset from the bi-directional traffic flows generated by our traffic analysis system (KU-MON) that enables real-time collection and analysis of campus traffic. The system is composed of three layers: The first layer is a binary classifier SVM that performs rapid classification between P2P and non-P2P traffic. The second layer classifies P2P traffic into file-sharing, messenger and TV, based on three SVDDs. The third layer performs specialized classification of all individual application traffic types. Since the proposed system enables both coarse- and fine-grained classification, it can guarantee efficient resource management, such as a stable network environment, seamless bandwidth guarantee and appropriate QoS. Moreover, even when a new application emerges, it can be easily adapted for incremental updating and scaling. Only additional training for the new part of the application traffic is needed instead of retraining the entire system. The performance of the proposed system is validated via experiments which confirm that its recall and precision measures are satisfactory.

Stress Detection of Railway Point Machine Using Sound Analysis (소리 정보를 이용한 철도 선로전환기의 스트레스 탐지)

  • Choi, Yongju;Lee, Jonguk;Park, Daihee;Lee, Jonghyun;Chung, Yongwha;Kim, Hee-Young;Yoon, Sukhan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.433-440
    • /
    • 2016
  • Railway point machines act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Since point failure can significantly affect railway operations with potentially disastrous consequences, early stress detection of point machine is critical for monitoring and managing the condition of rail infrastructure. In this paper, we propose a stress detection method for point machine in railway condition monitoring systems using sound data. The system enables extracting sound feature vector subset from audio data with reduced feature dimensions using feature subset selection, and employs support vector machines (SVMs) for early detection of stress anomalies. Experimental results show that the system enables cost-effective detection of stress using a low-cost microphone, with accuracy exceeding 98%.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

A Note on Finding Optimum Conditions Using Mixture Experimental Data with Process Variables (공정변수를 갖는 혼합물 실험 자료를 활용한 최적조건 찾기에 관한 소고)

  • Lim, Yong B.
    • Journal of Korean Society for Quality Management
    • /
    • v.41 no.1
    • /
    • pp.109-118
    • /
    • 2013
  • Purpose: Given the several proper models for given mixture components-process variables experimental data, we propose a strategy to find the optimal condition in which the performance of the responses is well-behaved under those models. Methods: Given the mixture experimental data with process variables, first we choose the reasonable starting models among the class of admissible product models based on the model selection criteria and then, search for the candidate models that are the subset models of the starting model by the sequential variable selection method or all possible regressions procedure. Good candidate models are screened by the evaluation of model selection criteria and checking the residual plots for the validity of the model assumption. Results: We propose a strategy to find the optimal condition in which the performance of the responses is well-behaved under those good candidate models by adopting the optimization methods developed in multiple responses surface methodology. Conclusion: A strategy is proposed to find the optimal condition in which the performance of the responses is well-behaved under those proper combined models. This strategy to find the optimal condition is illustrated with the example in this paper.

Fuzzy discretization with spatial distribution of data and Its application to feature selection (데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용)

  • Son, Chang-Sik;Shin, A-Mi;Lee, In-Hee;Park, Hee-Joon;Park, Hyoung-Seob;Kim, Yoon-Nyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.165-172
    • /
    • 2010
  • In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.

Maximum Simplex Volume based Landmark Selection for Isomap (최대 부피 Simplex 기반의 Isomap을 위한 랜드마크 추출)

  • Chi, Junhwa
    • Korean Journal of Remote Sensing
    • /
    • v.29 no.5
    • /
    • pp.509-516
    • /
    • 2013
  • Since traditional linear feature extraction methods are unable to handle nonlinear characteristics often exhibited in hyperspectral imagery, nonlinear feature extraction, also known as manifold learning, is receiving increased attention in hyperspectral remote sensing society as well as other community. A most widely used manifold Isomap is generally promising good results in classification and spectral unmixing tasks, but significantly high computational overhead is problematic, especially for large scale remotely sensed data. A small subset of distinguishing points, referred to as landmarks, is proposed as a solution. This study proposes a new robust and controllable landmark selection method based on the maximum volume of the simplex spanned by landmarks. The experiments are conducted to compare classification accuracies with standard deviation according to sampling methods, the number of landmarks, and processing time. The proposed method could employ both classification accuracy and computational efficiency.

Reference Node Selection Scheme for Estimating Relative Locations of Mobile Robots (이동 로봇의 상대위치 추정을 위한 기준노드 선택 기법)

  • Ha, Taejin;Kim, Sunyong;Park, Sun Young;Kwon, Daehoon;Ham, Jaehyun;Lim, Hyuk
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.19 no.4
    • /
    • pp.508-516
    • /
    • 2016
  • When GPS signals are not available, a relative localization can be alternatively used to represent the topological relationship between mobile nodes. A relative location map of a network can be constructed by using the distance information between all the pairs of nodes in the network. If a network is large, a number of small local maps are individually constructed and are merged to obtain the whole map. However, this approach may result in a high computation and communication overhead. In this paper, we propose a reference-node selection scheme for relative localization map construction, which chooses a subset of nodes as a reference node that is supposed to construct local maps. The scheme is a greedy algorithm that iteratively chooses nodes with high degree as a reference node until the chosen local maps are successfully merged with a sufficient number of common nodes between nearby local maps. The simulation results indicate that the proposed scheme achieves higher localization accuracy with a reduced computational overhead.

Model based Facial Expression Recognition using New Feature Space (새로운 얼굴 특징공간을 이용한 모델 기반 얼굴 표정 인식)

  • Kim, Jin-Ok
    • The KIPS Transactions:PartB
    • /
    • v.17B no.4
    • /
    • pp.309-316
    • /
    • 2010
  • This paper introduces a new model based method for facial expression recognition that uses facial grid angles as feature space. In order to be able to recognize the six main facial expression, proposed method uses a grid approach and therefore it establishes a new feature space based on the angles that each gird's edge and vertex form. The way taken in the paper is robust against several affine transformations such as translation, rotation, and scaling which in other approaches are considered very harmful in the overall accuracy of a facial expression recognition algorithm. Also, this paper demonstrates the process that the feature space is created using angles and how a selection process of feature subset within this space is applied with Wrapper approach. Selected features are classified by SVM, 3-NN classifier and classification results are validated with two-tier cross validation. Proposed method shows 94% classification result and feature selection algorithm improves results by up to 10% over the full set of feature.

Theoretical Analysis of MIMO Antenna Selection & Switching System to Spatial Channel Correlation using Channel Statistics (공간적 채널 상관도에 따른 통계적인 채널 특성을 이용한 다중 안테나 선택 및 스위칭 시스템의 성능 분석)

  • Lee Hakju;Park Seungil;Lee Chungyong;Park Hyuncheol;Hong Daesik
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.4 s.334
    • /
    • pp.15-20
    • /
    • 2005
  • Multi-Input, Multi-Output system suffers for the spatial channel correlation due to lack of spatial diversity. To overcome this defect, the antenna selection and switching system is proposed which selects the adequate antenna subset with highest channel diversity gain and switches the trasmission techniques according to channel environments. However. its performance analysis is insufficient due to the difficulty of modeling the spatial channel correlation. In this paper, the theoretical upper bound of symbol error probability is derived by using the statistical properties of Frobenius norm and minimum eigen-value of channel matrix. By computer simulation, it is shown that the derived theoretical upper bound is similar to the simulation results.

Plant breeding in the 21st century: Molecular breeding and high throughput phenotyping

  • Sorrells, Mark E.
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2017.06a
    • /
    • pp.14-14
    • /
    • 2017
  • The discipline of plant breeding is experiencing a renaissance impacting crop improvement as a result of new technologies, however fundamental questions remain for predicting the phenotype and how the environment and genetics shape it. Inexpensive DNA sequencing, genotyping, new statistical methods, high throughput phenotyping and gene-editing are revolutionizing breeding methods and strategies for improving both quantitative and qualitative traits. Genomic selection (GS) models use genome-wide markers to predict performance for both phenotyped and non-phenotyped individuals. Aerial and ground imaging systems generate data on correlated traits such as canopy temperature and normalized difference vegetative index that can be combined with genotypes in multivariate models to further increase prediction accuracy and reduce the cost of advanced trials with limited replication in time and space. Design of a GS training population is crucial to the accuracy of prediction models and can be affected by many factors including population structure and composition. Prediction models can incorporate performance over multiple environments and assess GxE effects to identify a highly predictive subset of environments. We have developed a methodology for analyzing unbalanced datasets using genome-wide marker effects to group environments and identify outlier environments. Environmental covariates can be identified using a crop model and used in a GS model to predict GxE in unobserved environments and to predict performance in climate change scenarios. These new tools and knowledge challenge the plant breeder to ask the right questions and choose the tools that are appropriate for their crop and target traits. Contemporary plant breeding requires teams of people with expertise in genetics, phenotyping and statistics to improve efficiency and increase prediction accuracy in terms of genotypes, experimental design and environment sampling.

  • PDF