• Title/Summary/Keyword: data selection

Search Result 5,770, Processing Time 0.027 seconds

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Selection of Data-adaptive Polynomial Order in Local Polynomial Nonparametric Regression

  • Jo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.177-183
    • /
    • 1997
  • A data-adaptive order selection procedure is proposed for local polynomial nonparametric regression. For each given polynomial order, bias and variance are estimated and the adaptive polynomial order that has the smallest estimated mean squared error is selected locally at each location point. To estimate mean squared error, empirical bias estimate of Ruppert (1995) and local polynomial variance estimate of Ruppert, Wand, Wand, Holst and Hossjer (1995) are used. Since the proposed method does not require fitting polynomial model of order higher than the model order, it is simpler than the order selection method proposed by Fan and Gijbels (1995b).

  • PDF

A Study on Hybrid Feature Selection in Intrusion Detection System (침입탐지시스템에서 하이브리드 특징 선택에 관한 연구)

  • Han Myeong-Muk
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.279-282
    • /
    • 2006
  • 네트워크를 기반으로 한 컴퓨터 시스템이 현대 사회에 있어서 더욱 더 불가결한 역할을 하는 것에 따라, 네트워크 기반 컴퓨터 시스템은 침입자의 침입 목표가 되고 있다. 이를 보호하기 위한 침입탐지시스템(Intrusion Detection System : IDS)은 점차 중요한 기술이 되었다. 침입탐지시스템에서 패턴들을 분석한 후 정상/비정상을 판단 및 예측하기 위해서는 초기단계인 특징추출이나 선택이 매우 중요한 부분이 되고 있다. 본 논문에서는 IDS에서 중요한 부분인 feature selection을 Data Mining 기법인 Genetic Algorithm(GA)과 Decision Tree(DT)를 적용해서 구현했다.

  • PDF

An expert system approach to the type selection of part feeders (부품공급장치 선정을 위한 전문가 시스템)

  • 조덕영;조형석
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1989.10a
    • /
    • pp.296-301
    • /
    • 1989
  • As a cornerstone of assembly automation, the automatic part feeders are used to feed the various kind of the parts to the assembly workstation in the desired order and fashion. In this paper, EXPERT SYSTEM consisting of the data base for the feeding functions and part properties plus the rule base for the selection of feeder types is developed. The symbolic data of the part properties are used as basic factors in the selection rule of the suitable feeder types.

  • PDF

Bayesian Model Selection in Analysis of Reciprocals

  • Kang, Sang-Gil;Kim, Dal-Ho;Cha, Young-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.1167-1176
    • /
    • 2005
  • Tweedie (1957a) proposed a method for the analysis of residuals from an inverse Gaussian population paralleling the analysis of variance in normal theory. He called it the analysis of reciprocals. In this paper, we propose a Bayesian model selection procedure based on the fractional Bayes factor for the analysis of reciprocals. Using the proposed model selection procedures, we compare with the classical tests.

  • PDF

Bandwidth Selection for Local Smoothing Jump Detector

  • Park, Dong-Ryeon
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.1047-1054
    • /
    • 2009
  • Local smoothing jump detection procedure is a popular method for detecting jump locations and the performance of the jump detector heavily depends on the choice of the bandwidth. However, little work has been done on this issue. In this paper, we propose the bootstrap bandwidth selection method which can be used for any kernel-based or local polynomial-based jump detector. The proposed bandwidth selection method is fully data-adaptive and its performance is evaluated through a simulation study and a real data example.

General Set Covering for Feature Selection in Data Mining

  • Ma, Zhengyu;Ryoo, Hong Seo
    • Management Science and Financial Engineering
    • /
    • v.18 no.2
    • /
    • pp.13-17
    • /
    • 2012
  • Set covering has widely been accepted as a staple tool for feature selection in data mining. We present a generalized version of this classical combinatorial optimization model to make it better suited for the purpose and propose a surrogate relaxation-based procedure for its meta-heuristic solution. Mathematically and also numerically with experiments on 25 set covering instances, we demonstrate the utility of the proposed model and the proposed solution method.

Variable Bandwidth Selection for Kernel Regression

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.5 no.1
    • /
    • pp.11-20
    • /
    • 1994
  • In recent years, nonparametric kernel estimation of regresion function are abundant and widely applicable to many areas of statistics. Most of modern researches concerned with the fixed global bandwidth selection which can be used in the estimation of regression function with all the same value for all x. In this paper, we propose a method for selecting locally varing bandwidth based on bootstrap method in kernel estimation of fixed design regression. Performance of proposed bandwidth selection method for finite sample case is conducted via Monte Carlo simulation study.

  • PDF

Language- Independent Sentence Boundary Detection with Automatic Feature Selection

  • Lee, Do-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1297-1304
    • /
    • 2008
  • This paper proposes a machine learning approach for language-independent sentence boundary detection. The proposed method requires no heuristic rules and language-specific features, such as part-of-speech information, a list of abbreviations or proper names. With only the language-independent features, we perform experiments on not only an inflectional language but also an agglutinative language, having fairly different characteristics (in this paper, English and Korean, respectively). In addition, we obtain good performances in both languages. We have also experimented with the methods under a wide range of experimental conditions, especially for the selection of useful features.

  • PDF

Selection of Geospatial Features for Location Guidance Map Generation

  • Kakinohana, Issei;Nie, Yoshinori;Nakamura, Morikazu;Miyagi, Hayao;Onaga, Kenji
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.1107-1110
    • /
    • 2000
  • This paper proposes a selection procedure of geospatial data for location guidance map generation system. The selection procedure requires some targets appointed by users as input data and outputs generation. The procedure is embedded in a prototype of object-oriented GIS. We show sample maps generated by the system.

  • PDF