• Title/Summary/Keyword: dimension reduction method

Search Result 251, Processing Time 0.024 seconds

On Optimizing Dissimilarity-Based Classifier Using Multi-level Fusion Strategies (다단계 퓨전기법을 이용한 비유사도 기반 식별기의 최적화)

  • Kim, Sang-Woon;Duin, Robert P. W.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.15-24
    • /
    • 2008
  • For high-dimensional classification tasks, such as face recognition, the number of samples is smaller than the dimensionality of the samples. In such cases, a problem encountered in linear discriminant analysis-based methods for dimension reduction is what is known as the small sample size (SSS) problem. Recently, to solve the SSS problem, a way of employing a dissimilarity-based classification(DBC) has been investigated. In DBC, an object is represented based on the dissimilarity measures among representatives extracted from training samples instead of the feature vector itself. In this paper, we propose a new method of optimizing DBCs using multi-level fusion strategies(MFS), in which fusion strategies are employed to represent features as well as to design classifiers. Our experimental results for benchmark face databases demonstrate that the proposed scheme achieves further improved classification accuracies.

A credit classification method based on generalized additive models using factor scores of mixtures of common factor analyzers (공통요인분석자혼합모형의 요인점수를 이용한 일반화가법모형 기반 신용평가)

  • Lim, Su-Yeol;Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.235-245
    • /
    • 2012
  • Logistic discrimination is an useful statistical technique for quantitative analysis of financial service industry. Especially it is not only easy to be implemented, but also has good classification rate. Generalized additive model is useful for credit scoring since it has the same advantages of logistic discrimination as well as accounting ability for the nonlinear effects of the explanatory variables. It may, however, need too many additive terms in the model when the number of explanatory variables is very large and there may exist dependencies among the variables. Mixtures of factor analyzers can be used for dimension reduction of high-dimensional feature. This study proposes to use the low-dimensional factor scores of mixtures of factor analyzers as the new features in the generalized additive model. Its application is demonstrated in the classification of some real credit scoring data. The comparison of correct classification rates of competing techniques shows the superiority of the generalized additive model using factor scores.

An Improved Robust Fuzzy Principal Component Analysis (잡음 민감성이 개선된 퍼지 주성분 분석)

  • Heo, Gyeong-Yong;Woo, Young-Woon;Kim, Seong-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.5
    • /
    • pp.1093-1102
    • /
    • 2010
  • Principal component analysis (PCA) is a well-known method for dimension reduction while maintaining most of the variation in data. Although PCA has been applied to many areas successfully, it is sensitive to outliers. Several variants of PCA have been proposed to resolve the problem and, among the variants, robust fuzzy PCA (RF-PCA) demonstrated promising results. RF-PCA uses fuzzy memberships to reduce the noise sensitivity. However, there are also problems in RF-PCA and the convergence property is one of them. RF-PCA uses two different objective functions to update memberships and principal components, which is the main reason of the lack of convergence property. The difference between two functions also slows the convergence and deteriorates the solutions of RF-PCA. In this paper, a variant of RF-PCA, called RF-PCA2, is proposed. RF-PCA2 uses an integrated objective function both for memberships and principal components. By using alternating optimization, RF-PCA2 is guaranteed to converge on a local optimum. Furthermore, RF-PCA2 converges faster than RF-PCA and the solutions found are more similar to the desired solutions than those of RF-PCA. Experimental results also support this.

Supervised Rank Normalization for Support Vector Machines (SVM을 위한 교사 랭크 정규화)

  • Lee, Soojong;Heo, Gyeongyong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.31-38
    • /
    • 2013
  • Feature normalization as a pre-processing step has been widely used in classification problems to reduce the effect of different scale in each feature dimension and error as a result. Most of the existing methods, however, assume some distribution function on feature distribution. Even worse, existing methods do not use the labels of data points and, as a result, do not guarantee the optimality of the normalization results in classification. In this paper, proposed is a supervised rank normalization which combines rank normalization and a supervised learning technique. The proposed method does not assume any feature distribution like rank normalization and uses class labels of nearest neighbors in classification to reduce error. SVM, in particular, tries to draw a decision boundary in the middle of class overlapping zone, the reduction of data density in that area helps SVM to find a decision boundary reducing generalized error. All the things mentioned above can be verified through experimental results.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

2D-MELPP: A two dimensional matrix exponential based extension of locality preserving projections for dimensional reduction

  • Xiong, Zixun;Wan, Minghua;Xue, Rui;Yang, Guowei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.9
    • /
    • pp.2991-3007
    • /
    • 2022
  • Two dimensional locality preserving projections (2D-LPP) is an improved algorithm of 2D image to solve the small sample size (SSS) problems which locality preserving projections (LPP) meets. It's able to find the low dimension manifold mapping that not only preserves local information but also detects manifold embedded in original data spaces. However, 2D-LPP is simple and elegant. So, inspired by the comparison experiments between two dimensional linear discriminant analysis (2D-LDA) and linear discriminant analysis (LDA) which indicated that matrix based methods don't always perform better even when training samples are limited, we surmise 2D-LPP may meet the same limitation as 2D-LDA and propose a novel matrix exponential method to enhance the performance of 2D-LPP. 2D-MELPP is equivalent to employing distance diffusion mapping to transform original images into a new space, and margins between labels are broadened, which is beneficial for solving classification problems. Nonetheless, the computational time complexity of 2D-MELPP is extremely high. In this paper, we replace some of matrix multiplications with multiple multiplications to save the memory cost and provide an efficient way for solving 2D-MELPP. We test it on public databases: random 3D data set, ORL, AR face database and Polyu Palmprint database and compare it with other 2D methods like 2D-LDA, 2D-LPP and 1D methods like LPP and exponential locality preserving projections (ELPP), finding it outperforms than others in recognition accuracy. We also compare different dimensions of projection vector and record the cost time on the ORL, AR face database and Polyu Palmprint database. The experiment results above proves that our advanced algorithm has a better performance on 3 independent public databases.

A Design on Face Recognition System Based on pRBFNNs by Obtaining Real Time Image (실시간 이미지 획득을 통한 pRBFNNs 기반 얼굴인식 시스템 설계)

  • Oh, Sung-Kwun;Seok, Jin-Wook;Kim, Ki-Sang;Kim, Hyun-Ki
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.12
    • /
    • pp.1150-1158
    • /
    • 2010
  • In this study, the Polynomial-based Radial Basis Function Neural Networks is proposed as one of the recognition part of overall face recognition system that consists of two parts such as the preprocessing part and recognition part. The design methodology and procedure of the proposed pRBFNNs are presented to obtain the solution to high-dimensional pattern recognition problem. First, in preprocessing part, we use a CCD camera to obtain a picture frame in real-time. By using histogram equalization method, we can partially enhance the distorted image influenced by natural as well as artificial illumination. We use an AdaBoost algorithm proposed by Viola and Jones, which is exploited for the detection of facial image area between face and non-facial image area. As the feature extraction algorithm, PCA method is used. In this study, the PCA method, which is a feature extraction algorithm, is used to carry out the dimension reduction of facial image area formed by high-dimensional information. Secondly, we use pRBFNNs to identify the ID by recognizing unique pattern of each person. The proposed pRBFNNs architecture consists of three functional modules such as the condition part, the conclusion part, and the inference part as fuzzy rules formed in 'If-then' format. In the condition part of fuzzy rules, input space is partitioned with Fuzzy C-Means clustering. In the conclusion part of rules, the connection weight of pRBFNNs is represented as three kinds of polynomials such as constant, linear, and quadratic. Coefficients of connection weight identified with back-propagation using gradient descent method. The output of pRBFNNs model is obtained by fuzzy inference method in the inference part of fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of the Particle Swarm Optimization. The proposed pRBFNNs are applied to real-time face recognition system and then demonstrated from the viewpoint of output performance and recognition rate.

Prediction of Implicit Protein - Protein Interaction Using Optimal Associative Feature Rule (최적 연관 속성 규칙을 이용한 비명시적 단백질 상호작용의 예측)

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2006
  • Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein interaction is intrinsic to most cellular processes, prediction of protein interaction is an important issue in post-genomic biology where abundant interaction data have been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of Saccharomyces cerevisiae from public protein interaction data. We discretized continuous-valued features by maximal interdependence-based discretization approach. We also employed feature dimension reduction filter (FDRF) method which is based on the information theory to select optimal informative features, to boost prediction accuracy and overall mining speed, and to overcome the dimensionality problem of conventional data mining approaches. We used association rule discovery algorithm for associative feature and rule mining to predict protein interaction. Using the discovered associative feature we predicted implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplished about 96.5% prediction accuracy with reduced computation time which is about 29.4% faster than conventional method with no feature filter in association rule mining.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Study on the Ultimate Strength Behavior for Ship Perforated Stiffened Plate (선체 유공보강판의 최종강도 거동에 관한 연구)

    • Ko Jae-Yong;Lee Jun-Kyo;Park Joo-Shin;Bae Dong-Kyun
      • Proceedings of KOSOMES biannual meeting
      • /
      • 2005.05a
      • /
      • pp.141-146
      • /
      • 2005
    • Ship have cutout inner bottom and girder and floor etc. Ship's structure is used much, and structure strength must be situated, but establish new concept when high stress interacts sometimes fatally the area. There is no big problem usually by aim of weight reduction, a person and change of freight, piping etc. Because cutout's existence grow up in this place, and, elastic buckling strength by load causes large effect in ultimate strength. Therefore, stiffened perforated plate considering buckling strength and ultimate strength is one of important design criteria which must examine when decide structural concept at initial design. Therefore, and, reasonable buckling strength about perforated stiffened plate need to ultimate strength limited design . Calculated ultimate strength varied several web height and cutout's dimension, and thickness in this investigated data. Used program(ANSYS) applied F.E.A code based on finite element method.

    • PDF

    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.