• Title/Summary/Keyword: Principal components analysis

Search Result 770, Processing Time 0.026 seconds

Network Anomaly Detection using Hybrid Feature Selection

  • Kim Eun-Hye;Kim Se-Hun
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 2006.06a
    • /
    • pp.649-653
    • /
    • 2006
  • In this paper, we propose a hybrid feature extraction method in which Principal Components Analysis is combined with optimized k-Means clustering technique. Our approach hierarchically reduces the redundancy of features with high explanation in principal components analysis for choosing a good subset of features critical to improve the performance of classifiers. Based on this result, we evaluate the performance of intrusion detection by using Support Vector Machine and a nonparametric approach based on k-Nearest Neighbor over data sets with reduced features. The Experiment results with KDD Cup 1999 dataset show several advantages in terms of computational complexity and our method achieves significant detection rate which shows possibility of detecting successfully attacks.

  • PDF

A Study to Calculate an Efficient Covariance Matrix of Non-local Means with Principal Components Analysis (주성분 분석을 활용한 Non-local means 에서의 효율적인 공분산 행렬 계산 연구)

  • Kim, Jeonghwan;Lee, Minjeong;Jeong, Jechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.205-207
    • /
    • 2015
  • 본 논문에서는 먼저 주성분 분석 (Principal components analysis, PCA) 을 활용한 Non-local means (NLM) 을 소개하고, 주성분 분석을 하기 위해 필수적인 공분산 행렬 계산을 효율적으로 하는 방법을 제안한다. NLM 에서의 Neighborhood patch 의 크기를 $S{\times}S=S^2$, 이미지 전체의 픽셀 수를 ${\mathcal{Q}}$ 일 때 공분한 행렬을 계산 하기 위해서는 $S^2{\times}{\mathcal{Q}}$ 크기를 가지는 행렬간의 곱 연산이 필요하다. 결론적으로 본 논문에서는 이 행렬의 크기를 줄임으로써 PSNR (Peak signal-to-noise ratio) 의 손실 없이 NLM 의 복잡도를 줄일 수 있음을 보여준다.

  • PDF

Pitching grade index in Korean pro-baseball (한국프로야구에서의 투수평가지표)

  • Lee, Jang Taek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.485-492
    • /
    • 2014
  • In baseball, the traditional measure of pitchers are wins and ERA. But these statistics are influenced by luck or team power. So sabermetrician proposes a number of indicators that predict future performance. We determine a new measure, which we call pitching grade index (PGI) that efficiently summarizes a pitcher's performance on a numerical scale using principal components analysis. The PGI statistic can often be useful to assessing a pitcher's individual contribution. Also K-means clustering algorithm are used for segmentation of players into groups.

Clustering non-stationary advanced metering infrastructure data

  • Kang, Donghyun;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.225-238
    • /
    • 2022
  • In this paper, we propose a clustering method for advanced metering infrastructure (AMI) data in Korea. As AMI data presents non-stationarity, we consider time-dependent frequency domain principal components analysis, which is a proper method for locally stationary time series data. We develop a new clustering method based on time-varying eigenvectors, and our method provides a meaningful result that is different from the clustering results obtained by employing conventional methods, such as K-means and K-centres functional clustering. Simulation study demonstrates the superiority of the proposed approach. We further apply the clustering results to the evaluation of the electricity price system in South Korea, and validate the reform of the progressive electricity tariff system.

Hybridization of Quercus aliena Blume and Q. serrata Murray in Korea - Analyses of Morphological variation and Flavonoid chemistry -

  • Park, Jin Hee;Park, Chong-Wook
    • Korean Journal of Environment and Ecology
    • /
    • v.29 no.2
    • /
    • pp.145-161
    • /
    • 2015
  • This research was conducted in order to understand the hybridization between Quercus aliena Blume and Q. serrata Murray in Korea which show wide range of morphological variations within species and interspecific variations of diverse overlapping characteristics caused by hybridization. Morphological analysis (principal components analysis; PCA) of 116 individuals representing two species and their intermediates were performed. As a result, two species were clearly distinguished in terms of morphology, and intermediate morpho-types assumed to be hybrids between the two species were mostly located in the middle of each parent species in the plot of the principal components analysis. There was a clear distinction between two species in trichome distribution pattern which is an important diagnostic character in taxonomy of genus Quercus, whereas intermediate morpho-types showed intermediate state between two species' trichome distributions. Forty-two individuals representing two species and their intermediates were examined for leaf flavonoid constituents. Twenty-three flavonoid compounds were isolated and identified: They were glycosylated derivatives of flavonols, kaempferol, quercetin, isorhamnetin and myricetin. The flavonoid constituents of Q. aliena were five glycosylated derivatives: kaempferol 3-O-galactoside, kaempferol 3-O-glucoside, quercetin 3-O-galactoside, quercetin 3-O-glucoside, and Isorhamnetin 3-O-glucoside. The flavonoid constituents of Q. serrata had 20 diverse flavonol compounds including five flavonoid compounds found in Q. aliena. It was found that there is a clear difference in flavonoid constituents of Q. aliena and Q. serrata. Flavonoid chemistry is very useful in recognizing each species and putative hybrids. The flavonoid constituents of intermediates were a mixture of the two species' constituents and they generally showed similar characteristics to morpho-types. The hybrids between Q. aliena and Q. serrata showed morphologically and chemically diverse characteristics and it is assumed that there are frequent interspecific hybridization and introgression.

Multivariate Analysis and Gas Chromatographic Determination of the Smelly Nitro Compounds in Dried-Fishes (GC에 의한 건어물 냄새성분중 질소화합물 분석과 다변량해석)

  • Bae, Sun Young;Lee, Dong Sun
    • Journal of the Korean Chemical Society
    • /
    • v.41 no.2
    • /
    • pp.105-112
    • /
    • 1997
  • The smelly nitro compounds were extracted from dried fishes by simultanous distillation and extraction, then were analyzed by GC-MS. Carbon number and order of an amine could be predicted by using retention time and equivalent chain length. Anchovy, codfish, imitation crab meat, cuttle fish, file fish, pollack, shrimp, octopus, harvest fish, and hard-shelled mussel were used for this investigation. Various smelly nitro compounds such as methylamine, acetamide, thiazole, 2-hydroxy isopropylamine, N-methyl pyrroline, piperidine, cyclohexylamine were identified, however, dimethylamine, trimethylamine, diethylamine were not detected. Principal components analysis was applied to GC-MS profiles for pattern recognition of smelly nitro compounds in dried fishes. Multivariate aspects using principal components analysis were very useful for pattern recognition of smelly components, category similarity.

  • PDF

Application of Dimensional Expansion and Reduction to Earthquake Catalog for Machine Learning Analysis (기계학습 분석을 위한 차원 확장과 차원 축소가 적용된 지진 카탈로그)

  • Jang, Jinsu;So, Byung-Dal
    • The Journal of Engineering Geology
    • /
    • v.32 no.3
    • /
    • pp.377-388
    • /
    • 2022
  • Recently, several studies have utilized machine learning to efficiently and accurately analyze seismic data that are exponentially increasing. In this study, we expand earthquake information such as occurrence time, hypocentral location, and magnitude to produce a dataset for applying to machine learning, reducing the dimension of the expended data into dominant features through principal component analysis. The dimensional extended data comprises statistics of the earthquake information from the Global Centroid Moment Tensor catalog containing 36,699 seismic events. We perform data preprocessing using standard and max-min scaling and extract dominant features with principal components analysis from the scaled dataset. The scaling methods significantly reduced the deviation of feature values caused by different units. Among them, the standard scaling method transforms the median of each feature with a smaller deviation than other scaling methods. The six principal components extracted from the non-scaled dataset explain 99% of the original data. The sixteen principal components from the datasets, which are applied with standardization or max-min scaling, reconstruct 98% of the original datasets. These results indicate that more principal components are needed to preserve original data information with even distributed feature values. We propose a data processing method for efficient and accurate machine learning model to analyze the relationship between seismic data and seismic behavior.

Chemometric Aspects and Determination of Sugar Composition of Honey by HPLC (HPLC에 의한 꿀 중의 당조성 분석과 화학계량학적 고찰)

  • Yoon, Jung-Hyeon;Bae, Sun-Young;Kim, Kun;Lee, Dong-Sun
    • Analytical Science and Technology
    • /
    • v.10 no.5
    • /
    • pp.362-369
    • /
    • 1997
  • Chemometric technique was applied to the sugar composition in five honeys of known botanical or geographical origin following HPLC. Fructose and glucose were predominant carbohydrates in honeys, and small amount of sucrose was also detected in one sample. Sugar contents in honeys samples were compared by the geographical or botanical origin. Fructose/glucose ratio ranged from 0.99 to 1.55 was obtained and these results are in good agreement with the ratio of literature. The plot of principal components analysis(PCA) showed that different honey samples grouped into distinct cluster by the geographical or botanical origin. Increasing the first or second principal component score, higher amount of sugar or less fructose/glucose ratio was observed in PCA plot. Chemometric approach was very useful to provide pattern recognition of sugar profile or quality indices of honey sample and to detect adulteration.

  • PDF

Analysis of Functional Connectivity in Human Working Memory using Positron Emission Tomography and Principal Component Analysis

  • Lee, J.S.;Ahn, J.Y.;Jang, M.J.;Lee, D.S.;Chung, J.K.;Lee, M.C.;Park, K.S.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1998 no.11
    • /
    • pp.257-258
    • /
    • 1998
  • To reveal the interconnected brain regions involved in human working memory, their functional connectivity was analyzed using principal component analysis (PCA). rCBF PET scans were peformed on 5 normal volunteers during the verbal and visual working memory tasks and PCA was applied. PCA produced the first principal components related with the increase of the difficulty and the second one which demonstrate the dissociation of verbal and visual memory system.

  • PDF

Varietal Classification by Multivariate Analysis on Quantitative Traits in Pecan

  • Shin, Dong-Young;Nou, Ill-Sup
    • Plant Resources
    • /
    • v.2 no.2
    • /
    • pp.75-80
    • /
    • 1999
  • Twenty two varieties of pecan including wild types were classified based on 6 characters measured by principal component analysis score distance. The results are summarized as fellow. Twenty two varieties were classified into 5 groups based in PCA score distance. Five groups were distinctly characterized by many morphological characters. Total variation could be explained by 51%, 95%, 99% with first, third and fifth principal components respectively. Varimax rotation of the factor loading of the first factors indicated that the first component was highly loaded with leaf characters, the second component with fruit characters, but fruit length was negative loaded. The second, the third and the fourths groups of cultivars had very close genetic parentage similarity.

  • PDF