• Title/Summary/Keyword: kernel principal component analysis

Search Result 61, Processing Time 0.032 seconds

DR-LSTM: Dimension reduction based deep learning approach to predict stock price

  • Ah-ram Lee;Jae Youn Ahn;Ji Eun Choi;Kyongwon Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.213-234
    • /
    • 2024
  • In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling.

Study on Faults Diagnosis of Induction Motor Using KPCA Feature Extraction Technique (KPCA 특징추출기법을 이용한 유도전동기 결함 진단 연구)

  • Han, Sang-Bo;Hwang, Don-Ha;Kang, Dong-Sik
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.1063-1064
    • /
    • 2007
  • 본 연구는 유도전동기 진단시스템을 개발하기 위하여 테스트 전동기 내부에 취부된 자속센서 신호를 사용한 알고리즘 적용 결과를 논한 것으로서 분류기별 고장 판별 정확도에 대하여 서술하였다. 특징추출은 Kernel Principal Component Analysis (KPCA) 방법을 이용 하였으며, 테스트 샘플들에 대해서는 LDA(Linear Discriminant Analysis)와 k-NN(k-Nearest neighbors) 분류기법을 이용하여 판별하였다. 회전자 바 손상이나 편심(동적/정적)인 경우는 두 가지 분류기 모두 95[%]이상의 높은 분류 정확도를 보였지만, LDA인 경우 정상상태를 비롯한 베이링 불량이나, 샤프트 변형인 경우는 낮은 분류율을 보였다.

  • PDF

Development of integrated drought index(IDI) using remote sensing data and multivariate model (원격탐사자료와 다변량 통계모형을 활용한 통합가뭄지수 개발)

  • Park, Seo-Yeon;Kim, Jong-Suk;Kim, Tae-Woong;Lee, Joo-Heon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.359-359
    • /
    • 2020
  • 현재 우리나라의 가뭄감시 정보는 기상학적/농업적/수문학적 가뭄이 별도의 지수로 개발되어 다양한 형태의 정보를 생산·제공되고 있다. 각각의 가뭄 지수들 기준 및 특성에 따라 분석되고 있기 때문에 가뭄전문가의 입장에서는 매우 정밀한 가뭄정보를 제공받는 장점이 있는 반면에, 일반 국민들이 가뭄 정보를 받아들이고 이해하는데 어려움이 있어 이를 한눈에 알아볼 수 있는 통합가뭄지도가 필요하며, 통합가뭄도를 제작하기 위해서는 통합가뭄지수가 개발되어야 한다. 본 연구에서는 원격탐사자료를 활용하여 농업적 가뭄지수인 Agricultural Dry Condition Index (ADCI)와 수문학적 가뭄지수인 Water Budget-based Drought Index (WBDI)를 개발하였으며, 기상학적 가뭄지수인 Standardized Precipitation Index (SPI)를 포함하여 기상-농업-수문학적 가뭄지수를 결합한 통합가뭄지수를 산정하였다. 다양한 가뭄지수를 활용하여 개발되었기 때문에 다변량 통계 모형 중 선형 모형인 Principal Component Analysis (PCA)기법과 비선형 모형인 Kernel Entropy PCA, Kernel PCA를 적용하였다. 또한 과거 가뭄사상을 활용하여 산정된 통합가뭄지수 검증을 위해 과거 가뭄사상에 대한 가뭄 발생시기, 심도, 쇠퇴패턴이 양상 평가 및 Intentionally Biased Bootstrap Resampling (IBBR)을 활용한 지수별 민감도 분석을 통해 통합가뭄지수 적용성 평가를 진행하였다.

  • PDF

Morphological Characteristics and Classification of Zizyphus Cultivars in Korea by Multivariative Analysis (다변량 분석에 의한 국내산 대추나무 품종의 형태적 특성과 유연관계)

  • Lee Moon-Ho;Hwang Suk-In;Jang Yong-Seok
    • Korean Journal of Plant Resources
    • /
    • v.19 no.1
    • /
    • pp.105-111
    • /
    • 2006
  • The objectives of this study, an analysis of fruit and leaf morphological characteristics among the five Zizyphus cultivars could be used for the investigation of cultivars classification and could provide information to make out the UPOV TG(Test Guidelines). ANOVA tests showed that there were statistically significant differences in all fruit and leaf morphological characteristics among the five Zizyphus cultivars at 1% level. But, for kernel characteristics, differences were statistically non-significant among the cultivars. Approximately, the Wolchul and Boeun cultivars showed larger and smaller values in overall characteristics and cultivars, respectively. The results of principal component analysis(PCA) for the fruit and leaf morphological characteristics showed that the first for principal components(PC's) explained about 65.3% of the total variation. The first PC was correlated with those characteristics that were mainly related to the terminal leaf length(TLL), leaf length(LL), fruit length(FL), terminal leaf width(TLW), and leaf petiole length(LPL). The second and third PC was mainly correlated with the terminal leaf morphological index(TLMI). Therefore, these characteristics were important to analysis of the fruit and leaf morphological characteristics and classification among the five Zizyphus cultivars. Cluster analysis using UPGMA method based on principal components showed that five Zizyphus cultivars could be clustered into two groups. Group I comprises Mudung, Wolchul, and Bokjo and Geumsung cultivars, Group II is Boeun cultivar. These results well similar to that of principal component analysis.

Supervised-learning-based algorithm for color image compression

  • Liu, Xue-Dong;Wang, Meng-Yue;Sa, Ji-Ming
    • ETRI Journal
    • /
    • v.42 no.2
    • /
    • pp.258-271
    • /
    • 2020
  • A correlation exists between luminance samples and chrominance samples of a color image. It is beneficial to exploit such interchannel redundancy for color image compression. We propose an algorithm that predicts chrominance components Cb and Cr from the luminance component Y. The prediction model is trained by supervised learning with Laplacian-regularized least squares to minimize the total prediction error. Kernel principal component analysis mapping, which reduces computational complexity, is implemented on the same point set at both the encoder and decoder to ensure that predictions are identical at both the ends without signaling extra location information. In addition, chrominance subsampling and entropy coding for model parameters are adopted to further reduce the bit rate. Finally, luminance information and model parameters are stored for image reconstruction. Experimental results show the performance superiority of the proposed algorithm over its predecessor and JPEG, and even over JPEG-XR. The compensation version with the chrominance difference of the proposed algorithm performs close to and even better than JPEG2000 in some cases.

Context Dependent Fusion with Support Vector Machines (Support Vector Machine을 이용한 문맥 민감형 융합)

  • Heo, Gyeongyong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.7
    • /
    • pp.37-45
    • /
    • 2013
  • Context dependent fusion (CDF) is a fusion algorithm that combines multiple outputs from different classifiers to achieve better performance. CDF tries to divide the problem context into several homogeneous sub-contexts and to fuse data locally with respect to each sub-context. CDF showed better performance than existing methods, however, it is sensitive to noise due to the large number of parameters optimized and the innate linearity limits the application of CDF. In this paper, a variant of CDF using support vector machines (SVMs) for fusion and kernel principal component analysis (K-PCA) for context extraction is proposed to solve the problems in CDF, named CDF-SVM. Kernel PCA can shape irregular clusters including elliptical ones through the non-linear kernel transformation and SVM can draw a non-linear decision boundary. Regularization terms is also included in the objective function of CDF-SVM to mitigate the noise sensitivity in CDF. CDF-SVM showed better performance than CDF and its variants, which is demonstrated through the experiments with a landmine data set.

A Study on Genetic Nature of Korean Local Corn Lines (한국 재래종 옥수수의 유전적 특성)

  • ;Bong-Ho Chae
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.28 no.4
    • /
    • pp.473-480
    • /
    • 1983
  • To obtain basic information on the Korean local com lines a total of 57 lines were selected from 1,000 Korean local collection at Chungnam National University, and classified by principal component analysis, and genetic nature was investigated. There were a great variation in mean values of plant characters of the lines. The mean values of plant characters except for density of kernels varied with types of crossing. All characters except for tasselling dates were reduced in magnitude when selfed, while those characters were increased when topcrossed. The inbreeding depression varied with plant characters and lines. The characters such as yield, kernel weight per ear, ear weight and plant height showed great degree of inbreeding depression. Group I showed high inbreeding depression in such characters as 100 kernel weight, leaf number, plant height and days to tasselling, while group II showed high inbreeding depression in other plant characters. Heterosis of plant characters varied also with lines. The ear weight, kernel weight per ear, yield, 100 kernel weight, and plant height were some of the plant characters showing high heterosis. Group II showed high values of heterosis in such characters as ear length, ear diameter, ear weight, kernel weight per ear, 100 kernel weight and leaf length, while group I was high in heterosis in other plant characters. The degree of homozygosity was highest in ear weight (79.1%) and lowest in ear number per plant (-2.1%). Group II showed higher degree of homozygosity than group I. Correlation coefficients between characters of sibbed and topcrossed lines were positive for all characters. Highly significant correlation coefficients between sibbed and topcrossed lines were obtained especially for characters such as ear number per plant, plant height, leaf length and yield per plot.

  • PDF

Real-time Fault Diagnosis of Induction Motor Using Clustering and Radial Basis Function (클러스터링과 방사기저함수 네트워크를 이용한 실시간 유도전동기 고장진단)

  • Park, Jang-Hwan;Lee, Dae-Jong;Chun, Myung-Geun
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.20 no.6
    • /
    • pp.55-62
    • /
    • 2006
  • For the fault diagnosis of three-phase induction motors, we construct a experimental unit and then develop a diagnosis algorithm based on pattern recognition. The experimental unit consists of machinery module for induction motor drive and data acquisition module to obtain the fault signal. As the first step for diagnosis procedure, preprocessing is performed to make the acquired current simplified and normalized. To simplify the data, three-phase current is transformed into the magnitude of Concordia vector. As the next step, feature extraction is performed by kernel principal component analysis(KPCA) and linear discriminant analysis(LDA). Finally, we used the classifier based on radial basis function(RBF) network. To show the effectiveness, the proposed diagnostic system has been intensively tested with the various data acquired under different electrical and mechanical faults with varying load.

Analysis of Dimensionality Reduction Methods Through Epileptic EEG Feature Selection for Machine Learning in BCI (BCI에서 기계 학습을 위한 간질 뇌파 특징 선택을 통한 차원 감소 방법 분석)

  • Tong, Yang;Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1333-1342
    • /
    • 2018
  • Until now, Electroencephalography(: EEG) has been the most important and convenient method for the diagnosis and treatment of epilepsy. However, it is difficult to identify the wave characteristics of an epileptic EEG signals because it is very weak, non-stationary and has strong background noise. In this paper, we analyse the effect of dimensionality reduction methods on Epileptic EEG feature selection and classification. Three dimensionality reduction methods: Pincipal Component Analysis(: PCA), Kernel Principal Component Analysis(: KPCA) and Linear Discriminant Analysis(: LDA) were investigated. The performance of each method was evaluated by using Support Vector Machine SVM, Logistic Regression(: LR), K-Nearestneighbor(: K-NN), Decision Tree(: DR) and Random Forest(: RF). From the experimental result, PCA recorded 75% of highest accuracy in SVM, LR and K-NN. KPCA recorded 85% of best performance in SVM and K-KNN while LDA achieved 100% accuracy in K-NN. Thus, LDA dimensionality reduction is found to provide the best classification result for epileptic EEG signal.

A comparative study of the physical and cooking characteristics of common types of rice collected from the market by quantitative statistical analysis

  • Evan Butrus Ilia;Mahmood Fadhil Saleem;Hamed Hassanzadeh
    • Food Science and Preservation
    • /
    • v.30 no.4
    • /
    • pp.602-616
    • /
    • 2023
  • Fifteen types of rice collected from Kurdistan region-Iraq were investigated by principal component analysis (PCA) in terms of physical properties and cooking characteristics. The dimensions of evaluated grains correspond to 5.05-8.75 mm for length, 1.54-2.47 mm for width, and 1.37-1.95 for thickness. The equivalent diameter was in the range of 5.23-10.03 mm, and the area took 13.30-28.25 mm2. The sphericity analysis values varied from 0.32 to 0.56, the aspect ratio from 0.17 to 0.39, and the volume of the grain was measured in the range from 4.48 to 17.74 mm3, hectoliter weight values were 730-820 kg/m3, and true density from 0.6 to 0.96 g/cm3. The broken grain ratio was 1.5-18.3%, thousand kernel weight corresponded to 15.88 to 22.42 g. The water uptake ratios for 30 min of soaking were increased at 60℃ compared to 30 and 45℃. The PCA was used to study the correlation of the most effective factors. Results of PCA showed that the first (PC1) and second (PC2) components retained 63.4% and 34.8% of the total variance, which PC1 was mostly related to hectoliter, broken ratio, and moisture content characteristics while PC2 was mostly concerned with hardness and true density. For cooking properties, the PC1 and PC2 retained 88.5% and 9.3% of the total variance, respectively. PC1 was mostly related to viscosity, spring value, and hardness after cooking, while PC2 was mostly concerned with spring value, hardness before cooking, and hardness after cooking.