• Title/Summary/Keyword: 주성분분석

Search Result 1,982, Processing Time 0.035 seconds

Performance Enhancement of Android Malware Classification using PCA (주성분 분석을 활용한 안드로이드 악성코드 분류 성능 향상 방안)

  • Jeon, Dong-Ha;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.249-250
    • /
    • 2022
  • 최근 API Call을 기반으로 하는 악성코드 탐지 및 분류에 대한 연구가 활발히 진행되고 있다. 그러나 API Call 기반의 데이터는 방대한 양과 다양한 차원의 특성으로 인해 분석과 학습 모델 구축 측면에서 비효율적인 한계가 있다. 이에 본 연구에서는 방대한 API Call 정보를 포함하고 있는 CICAndMal2020 데이터 세트를 대상으로 기존의 특성 선택 기법이 아닌 주성분 분석(Principal Component Analysis)을 사용하여 차원을 대폭 축소 시킨 후 머신러닝 기법을 적용하여 분류를 시도하였다. 실험 결과 전체 9,503개의 특성을 25개의 주성분(전체 대비 약 0.26% 수준)으로 축소시키고 다중 분류 기준 약 84%의 정확도를 나타냈다. 결과적으로 기존 연구에서의 탐지 모델 대비 정확도, F1-score 등의 성능 향상은 물론 차원 축소 측면에서 매우 향상된 결과를 달성하였다.

  • PDF

Sensitivity Analysis for Bivariate Spatial Data Using Principal Component Score (주성분점수를 이용한 이변량 공간자료에 대한 감도분석)

  • 최승배;강창완
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.415-427
    • /
    • 2001
  • 공간통계학에서는 다변량 공간자료에 대한 예측방법으로서 코크리깅 기법을 이용한다. 본 논문에서는 코크리깅을 위한 첫 번째 단계인 교차베리오그램의 추정에 대한 감도분석 대신에 일반통계학적 측면에서 주성분점수를 이용한 감도분석방법을 제안한다. 변수가 2개인 경우, 교차베리오그램에 대한 감조분석의 결과와 제안된 주성분점수를 이용한 감도분석의 결과를 비교해 본다. 모의실험을 통하여 제안한 방법의 타당을 검증하고, 실제 자료를 이용한 사례분석의 결과로써 재확인해 본다.

  • PDF

Classification and Selection of the Breeding materials in the Silkworm, Bombyx mori, by Multivariate Analysis 1. Classification of the Silkworm Genetic Stocks by Principal Component Analysis and Cluster Analysis (다변량 해석법에 의한 누에 육종소재의 탐색 1. 주성분분석과 집락분석을 이용한 누에품종분류)

  • 정도섭;이인정
    • Journal of Sericultural and Entomological Science
    • /
    • v.31 no.2
    • /
    • pp.102-112
    • /
    • 1989
  • Principal component analysis and cluster analysis were performed on the nine quantitative characters of the one hundred and forty eight silkworm genetic stocks. The six major quantitative characters such as cocoon yield, cocoon weight, cocoon shell weight, cocoon shell percentage, larval period of the 5th instar silkworm, and total larval period showed significantly positive correlation between them. The first three principal components extracted form the initial nine variables by principal component analysis accounted for about eighty percent of original information. The first and second principal components were characterized as factors related to silk productivity, and cocoon productivity, respectively. On the basis of multivariate analysis using city block distance determined from the first three principal components to measure the phenotypic diversity, the one hundred and forty eight silkworm genetic stocks could be clustered into seven varietal groups, and the phenotypic diversity between the varietal groups was partly related to their geographical origins. Among 7 varietal group, group II and IV revealed higher silk and cocoon productivity.

  • PDF

A Study on Characteristics of Highway Segments for Recreational Trips Using Principal Analysis (주성분분석을 이용한 고속도로의 여가성 도로구간 판별에 관한 연구)

  • Kim, Young-Il;Chung, Jin-Hyuk;Kum, Ki-Jung
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.2 s.73
    • /
    • pp.87-93
    • /
    • 2004
  • A five-day work week has a great impact on the life styles of employed persons and their families. At the same time, the changes also impact on the transportation system because travel patterns, demand, and pattern of congestion change during weekends. The negative impacts on the transportation system should be examined in order to conceive measures to maintain dependable levels of service during weekends. The first step to pursue the issue is to identify the road segments heavily affected by augmented leisure trips. In this study, characteristics of highway segments are engineered by principal analysis using data from TCS database. Scores from principal analysis are employed to distinguish highway segments for leisure trips from total 197 segments considered in this study. In addition, indexes from principal analysis are proposed to identify highway segments for leisure trips.

Efficient Primary-Ambient Decomposition Algorithm for Audio Upmix (오디오 업믹스를 위한 효율적인 주성분-주변성분 분리 알고리즘)

  • Baek, Yong-Hyun;Jeon, Se-Woon;Lee, Seok-Pil;Park, Young-Cheol
    • Journal of Broadcast Engineering
    • /
    • v.17 no.6
    • /
    • pp.924-932
    • /
    • 2012
  • Decomposition of a stereo signal into the primary and ambient components is a key step to the stereo upmix and it is often based on the principal component analysis (PCA). However, major shortcoming of the PCA-based method is that accuracy of the decomposed components is dependent on both the primary-to-ambient power ratio (PAR) and the panning angle. Previously, a modified PCA was suggested to solve the PAR-dependent problem. However, its performance is still dependent on the panning angle of the primary signal. In this paper, we proposed a new PCA-based primary-ambient decomposition algorithm whose performance is not affected by the PAR as well as the panning angle. The proposed algorithm finds scale factors based on a criterion that is set to preserve the powers of the mixed components, so that the original primary and ambient powers are correctly retrieved. Simulation results are presented to show the effectiveness of the proposed algorithm.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

Nonlinear Feature Extraction using Class-augmented Kernel PCA (클래스가 부가된 커널 주성분분석을 이용한 비선형 특징추출)

  • Park, Myoung-Soo;Oh, Sang-Rok
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.48 no.5
    • /
    • pp.7-12
    • /
    • 2011
  • In this papwer, we propose a new feature extraction method, named as Class-augmented Kernel Principal Component Analysis (CA-KPCA), which can extract nonlinear features for classification. Among the subspace method that was being widely used for feature extraction, Class-augmented Principal Component Analysis (CA-PCA) is a recently one that can extract features for a accurate classification without computational difficulties of other methods such as Linear Discriminant Analysis (LDA). However, the features extracted by CA-PCA is still restricted to be in a linear subspace of the original data space, which limites the use of this method for various problems requiring nonlinear features. To resolve this limitation, we apply a kernel trick to develop a new version of CA-PCA to extract nonlinear features, and evaluate its performance by experiments using data sets in the UCI Machine Learning Repository.

A Comparison of Multivariate R-Techniques in SAS, SPSS, Minitab and S-plus (SAS, SPSS, MINITAB, 5-PLUS에서 다변량 R-기법의 비교)

  • 최용석;문희정
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.153-164
    • /
    • 2004
  • In this study, we compare multivariate R-techniques in the up-to-date versions of SAS, SPSS, Minitab and S-plus. The direct input method by typing in command is considered for SAS, while the menu-driven method is considered for SPSS, Minitab and S-plus. Comparison was made in terms of input data format, input option, charts and outputs.

A Multi-Resolution Distance Measure Using Grey Block Distance Algorithms for Principal Component Analysis (주성분분석에서의 제안된 GBD 알고리즘을 이용한 다중해상도 거리 측정)

  • Hong, Jun-Sik
    • Proceedings of the KIEE Conference
    • /
    • 2002.07d
    • /
    • pp.2671-2673
    • /
    • 2002
  • 본 논문에서는 주성분분석(principal component analysis; 이하 PCA)기법을 이용, 이차원 영상을 분류하여 다중해상도에서 기존의 그레이 블록 거리(grey block distance; GBD, 이하 GBD)알고리즘과 비교하여 이차원 영상간의 상대적 식별을 더 용이하게 하기 위한 새로운 GBD 알고리즘 방법을 제안한다. 이 제시된 방법은 다중해상도에서 기존의 GBD 알고리즘과 비교해서 영상이 급격히 변화하는 부분의 정보를 잃지 않게 개선할 수 있었다. 모의 실험 결과로부터 기존의 GBD 알고리즘에 비하여 상대적 식별이 더 용이함을 확인하였다.

  • PDF

Image Classification Using Grey Block Distance Algorithms for Principal Component Analysis and Kurtosis (주성분분석과 첨도에서의 그레이 블록 거리 알고리즘을 이용한 영상분류)

  • Hong, Jun-Sik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.779-782
    • /
    • 2002
  • 본 논문에서는 주성분분석(principal component analysis; 이하 PCA) 및 첨도(Kurtosis)에서의 그레이 블록 거리 알고리즘(grey block algorithms; 이하 GBD)을 이용, 영상간의 거리를 측정하여 어느 정도 영상간의 상대적 식별을 용이하게 하여 영상 분류가 되는지 모의실험을 통하여 확인하고자 한다. 모의실험 결과로부터, PCA에서는 k가 9에서 상대적 식별이 불가능함을 보였고, 첨도에서는 k가 4까지만 블록을 택할 할 수 있음을 모의실험을 통하여 확인할 수 있었다.

  • PDF