• Title/Summary/Keyword: Attribute subset selection

Search Result 11, Processing Time 0.028 seconds

Determining Attributes of Suicide Attempts in Korean Elderly People: Emphasis on Attribute Selection Techniques

  • Bae, Eun Chan;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.9
    • /
    • pp.11-20
    • /
    • 2015
  • In order to prevent the elderly people from committing suicide attempts, it is necessary to verify attributes that affect the suicide attempts. It is noted that previous studies have focused on qualitative approaches, and simple correlation analyses to determine the attributes related to the suicide attempts in the elderly people. However, such previous approaches had led to insufficient performance when facing with complicated data sets. In this sense, this study suggests an alternative method in which attribute selection techniques are adopted to determine more relevant attributes of the suicide attempts occurring in Korean elderly people. To verify empirical validity of our proposed method, we used Korea National Health and Nutrition Examination Survey (KNHANES) from January 2007 to December 2012. Empirical results proved that the proposed attribute selection techniques showed better predictive effectiveness; 94.4% compared to the simple statistical methods. This study proposes a way to determining the elderly suicide and preventing it to happen.

Improvement of Classification Accuracy on Success and Failure Factors in Software Reuse using Feature Selection (특징 선택을 이용한 소프트웨어 재사용의 성공 및 실패 요인 분류 정확도 향상)

  • Kim, Young-Ok;Kwon, Ki-Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.219-226
    • /
    • 2013
  • Feature selection is the one of important issues in the field of machine learning and pattern recognition. It is the technique to find a subset from the source data and can give the best classification performance. Ie, it is the technique to extract the subset closely related to the purpose of the classification. In this paper, we experimented to select the best feature subset for improving classification accuracy when classify success and failure factors in software reuse. And we compared with existing studies. As a result, we found that a feature subset was selected in this study showed the better classification accuracy.

The Role of Site Stickiness and Its Antecedents in a Social Commerce Environment (소셜커머스에서 사이트 밀착도의 역할과 선행 요인에 관한 연구)

  • Kim, Byoungsoo
    • Journal of Information Technology Services
    • /
    • v.12 no.3
    • /
    • pp.23-37
    • /
    • 2013
  • Social commerce is a subset of e-commerce that involves using social media, and user contributions to assist in the online buying and selling of products and services. Given the rapid growth of social commerce sites such as Groupon, Ticketmonster, and Coupang, it has become critical to understand customer purchasing decision-making processes in the social commerce environment. This study developed a theoretical model to examine the role of social commerce site's stickiness in customers' repurchasing decision processes. This study identifies price attribute, variety of selection, shopping enjoyment, and anger as the key factors of social commerce site's stickiness. Data collected from 164 users who had more purchasing experiences with social commerce for more than 7 months were empirically tested against the research model. The analysis results indicate that social commerce site's stickiness plays an important role in enhancing customer's purchasing behavior. Moreover, price attribute and shopping enjoyment significantly influence social commerce site's stickiness, whereas anger does not significantly affect consumer purchasing decision-making processes. However, contrary to our expectation, variety of selection negatively influences social commerce site's stickiness. The theoretical and practical implications of the findings are described.

Fuzzy discretization with spatial distribution of data and Its application to feature selection (데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용)

  • Son, Chang-Sik;Shin, A-Mi;Lee, In-Hee;Park, Hee-Joon;Park, Hyoung-Seob;Kim, Yoon-Nyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.165-172
    • /
    • 2010
  • In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.

Real-time Classification of Internet Application Traffic using a Hierarchical Multi-class SVM

  • Yu, Jae-Hak;Lee, Han-Sung;Im, Young-Hee;Kim, Myung-Sup;Park, Dai-Hee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.5
    • /
    • pp.859-876
    • /
    • 2010
  • In this paper, we propose a hierarchical application traffic classification system as an alternative means to overcome the limitations of the port number and payload based methodologies, which are traditionally considered traffic classification methods. The proposed system is a new classification model that hierarchically combines a binary classifier SVM and Support Vector Data Descriptions (SVDDs). The proposed system selects an optimal attribute subset from the bi-directional traffic flows generated by our traffic analysis system (KU-MON) that enables real-time collection and analysis of campus traffic. The system is composed of three layers: The first layer is a binary classifier SVM that performs rapid classification between P2P and non-P2P traffic. The second layer classifies P2P traffic into file-sharing, messenger and TV, based on three SVDDs. The third layer performs specialized classification of all individual application traffic types. Since the proposed system enables both coarse- and fine-grained classification, it can guarantee efficient resource management, such as a stable network environment, seamless bandwidth guarantee and appropriate QoS. Moreover, even when a new application emerges, it can be easily adapted for incremental updating and scaling. Only additional training for the new part of the application traffic is needed instead of retraining the entire system. The performance of the proposed system is validated via experiments which confirm that its recall and precision measures are satisfactory.

The Generation of Control Rules for Data Mining (데이터 마이닝을 위한 제어규칙의 생성)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.11 no.11
    • /
    • pp.343-349
    • /
    • 2013
  • Rough set theory comes to derive optimal rules through the effective selection of features from the redundancy of lots of information in data mining using the concept of equivalence relation and approximation space in rough set. The reduction of attributes is one of the most important parts in its applications of rough set. This paper purports to define a information-theoretic measure for determining the most important attribute within the association of attributes using rough entropy. The proposed method generates the effective reduct set and formulates the core of the attribute set through the elimination of the redundant attributes. Subsequently, the control rules are generated with a subset of feature which retain the accuracy of the original features through the reduction.

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.14B no.2
    • /
    • pp.135-140
    • /
    • 2007
  • Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

An In-depth Analysis on Traffic Flooding Attacks Detection using Association Rule Mining (연관관계규칙을 이용한 트래픽 폭주 공격 탐지의 심층 분석)

  • Jaehak Yu;Bongsu Kang;Hansung Lee;Jun-Sang Park;Myung-Sup Kim;Daihee Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.1563-1566
    • /
    • 2008
  • 본 논문에서는 데이터의 전처리과정으로 SNMP MIB 데이터에 대한 속성 부분집합의 선택 방법(attribute subset selection)을 사용하여 특징선택 및 축소(feature selection & reduction)를 실시하였다. 또한 데이터 마이닝의 대표적인 해석학적 분석 모델인 연관관계규칙기법(association rule mining)을 이용하여 트래픽 폭주 공격 및 공격유형별 SNMP MIB 데이터에 내재되어 있는 특징들을 규칙의 형태로 추출하여 분석하는 의미론적 심층해석을 실시하였다. 공격유형에 대한 패턴 규칙의 추출 및 분석은 공격이 발생한 프로토콜에 대해서만 서비스를 제한하고 관리할 수 있는 정책적 근거를 제공함으로써 보다 안정적인 네트워크 환경과 원활한 자원관리를 지원할 수 있다. 본 논문에서 제시한 트래픽 폭주 공격 및 공격유형별 데이터로부터의 자동적 특징의 규칙 추출 및 의미론적 해석방법은 침입탐지 시스템을 위한 새로운 방법론에 모멘텀을 제시할 수 있다는 긍정적인 가능성과 함께 침입탐지 및 대응시스템의 정책 수립을 지원할 수 있을 것으로 기대된다.

Hierarchical Internet Application Traffic Classification using a Multi-class SVM (다중 클래스 SVM을 이용한 계층적 인터넷 애플리케이션 트래픽의 분류)

  • Yu, Jae-Hak;Lee, Han-Sung;Im, Young-Hee;Kim, Myung-Sup;Park, Dai-Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.7-14
    • /
    • 2010
  • In this paper, we introduce a hierarchical internet application traffic classification system based on SVM as an alternative overcoming the uppermost limit of the conventional methodology which is using the port number or payload information. After selecting an optimal attribute subset of the bidirectional traffic flow data collected from the campus, the proposed system classifies the internet application traffic hierarchically. The system is composed of three layers: the first layer quickly determines P2P traffic and non-P2P traffic using a SVM, the second layer classifies P2P traffics into file-sharing, messenger, and TV, based on three SVDDs. The third layer makes specific classification of the entire 16 application traffics. By classifying the internet application traffic finely or coarsely, the proposed system can guarantee an efficient system resource management, a stable network environment, a seamless bandwidth, and an appropriate QoS. Also, even a new application traffic is added, it is possible to have a system incremental updating and scalability by training only a new SVDD without retraining the whole system. We validate the performance of our approach with computer experiments.

Application of Decision Tree for the Classification of Antimicrobial Peptide

  • Lee, Su Yeon;Kim, Sunkyu;Kim, Sukwon S.;Cha, Seon Jeong;Kwon, Young Keun;Moon, Byung-Ro;Lee, Byeong Jae
    • Genomics & Informatics
    • /
    • v.2 no.3
    • /
    • pp.121-125
    • /
    • 2004
  • The purpose of this study was to investigate the use of decision tree for the classification of antimicrobial peptides. The classification was based on the activities of known antimicrobial peptides against common microbes including Escherichia coli and Staphylococcus aureus. A feature selection was employed to select an effective subset of features from available attribute sets. Sequential applications of decision tree with 17 nodes with 9 leaves and 13 nodes with 7 leaves provided the classification rates of $76.74\%$ and $74.66\%$ against E. coli and S. aureus, respectively. Angle subtended by positively charged face and the positive charge commonly gave higher accuracies in both E. coli and S. aureusdatasets. In this study, we describe a successful application of decision tree that provides the understanding of the effects of physicochemical characteristics of peptides on bacterial membrane.