• Title/Summary/Keyword: Data Principal

Search Result 2,078, Processing Time 0.024 seconds

Cluster Analysis Using Principal Coordinates for Binary Data

  • Chae, Seong-San;Kim, Jeong, Il
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.683-696
    • /
    • 2005
  • The results of using principal coordinates prior to cluster analysis are investigated on the samples from multiple binary outcomes. The retrieval ability of the known clustering algorithm is significantly improved by using principal coordinates instead of using the distance directly transformed from four association coefficients for multiple binary variables.

A Study on the Principal Component Analysis of Anthropometric Data (인체계측치(人體計測値)의 주성분분석(主成分分析)에 관한 연구(硏究))

  • Lee, Sang-Do;Jeong, Jung-Hui;Kim, Geuk-Bae
    • Journal of the Ergonomics Society of Korea
    • /
    • v.2 no.1
    • /
    • pp.3-11
    • /
    • 1983
  • Anthropometric data is most basic materials in the all studies related with it. Therefore, in anthropometric data, not only consideration of the state of variance, but more various analysis is needed. This study selected the 13 parts that properly show a whole characteristics of human body and, anthropometric data were obtained through the actual measurements for male and female workers who were engaged in production factory. And, to interpret anthropometric data, principal component analysis of multivariate analysis methods was applied.

  • PDF

Evaluation of Water Quality Using Multivariate Statistic Analysis with Optimal Scaling

  • Kim, Sang-Soo;Jin, Hyun-Guk;Park, Jong-Soo;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.2
    • /
    • pp.349-357
    • /
    • 2005
  • Principal component analysis(PCA) was carried out to evaluate the water quality with the monitering data collected from 1997 to 2003 along the coastal area of Ulsan, Korea. To enhance evaluation and to complement descriptive power of traditional PCA, optimal scaling was applied to transform the original data into optimally scaled data. Cluster analysis was also applied to classify the monitering stations according to their characteristics of water quality.

  • PDF

A Channel Equalization Algorithm Using Neural Network Based Data Least Squares (뉴럴네트웍에 기반한 Data Least Squares를 사용한 채널 등화기 알고리즘)

  • Lim, Jun-Seok;Pyeon, Yong-Kuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2E
    • /
    • pp.63-68
    • /
    • 2007
  • Using the neural network model for oriented principal component analysis (OPCA), we propose a solution to the data least squares (DLS) problem, in which the error is assumed to lie in the data matrix only. In this paper, we applied this neural network model to channel equalization. Simulations show that the neural network based DLS outperforms ordinary least squares in channel equalization problems.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

An eigenspace projection clustering method for structural damage detection

  • Zhu, Jun-Hua;Yu, Ling;Yu, Li-Li
    • Structural Engineering and Mechanics
    • /
    • v.44 no.2
    • /
    • pp.179-196
    • /
    • 2012
  • An eigenspace projection clustering method is proposed for structural damage detection by combining projection algorithm and fuzzy clustering technique. The integrated procedure includes data selection, data normalization, projection, damage feature extraction, and clustering algorithm to structural damage assessment. The frequency response functions (FRFs) of the healthy and the damaged structure are used as initial data, median values of the projections are considered as damage features, and the fuzzy c-means (FCM) algorithm are used to categorize these features. The performance of the proposed method has been validated using a three-story frame structure built and tested by Los Alamos National Laboratory, USA. Two projection algorithms, namely principal component analysis (PCA) and kernel principal component analysis (KPCA), are compared for better extraction of damage features, further six kinds of distances adopted in FCM process are studied and discussed. The illustrated results reveal that the distance selection depends on the distribution of features. For the optimal choice of projections, it is recommended that the Cosine distance is used for the PCA while the Seuclidean distance and the Cityblock distance suitably used for the KPCA. The PCA method is recommended when a large amount of data need to be processed due to its higher correct decisions and less computational costs.

Inverse Eigenvalue Problems with Partial Eigen Data for Acyclic Matrices whose Graph is a Broom

  • Sharma, Debashish;Sen, Mausumi
    • Kyungpook Mathematical Journal
    • /
    • v.57 no.2
    • /
    • pp.211-222
    • /
    • 2017
  • In this paper, we consider three inverse eigenvalue problems for a special type of acyclic matrices. The acyclic matrices considered in this paper are described by a graph called a broom on n + m vertices, which is obtained by joining m pendant edges to one of the terminal vertices of a path on n vertices. The problems require the reconstruction of such a matrix from given partial eigen data. The eigen data for the first problem consists of the largest eigenvalue of each of the leading principal submatrices of the required matrix, while for the second problem it consists of an eigenvalue of each of its trailing principal submatrices. The third problem has an eigenvalue and a corresponding eigenvector of the required matrix as the eigen data. The method of solution involves the use of recurrence relations among the leading/trailing principal minors of ${\lambda}I-A$, where A is the required matrix. We derive the necessary and sufficient conditions for the solutions of these problems. The constructive nature of the proofs also provides the algorithms for computing the required entries of the matrix. We also provide some numerical examples to show the applicability of our results.

Probabilistic penalized principal component analysis

  • Park, Chongsun;Wang, Morgan C.;Mo, Eun Bi
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.143-154
    • /
    • 2017
  • A variable selection method based on probabilistic principal component analysis (PCA) using penalized likelihood method is proposed. The proposed method is a two-step variable reduction method. The first step is based on the probabilistic principal component idea to identify principle components. The penalty function is used to identify important variables in each component. We then build a model on the original data space instead of building on the rotated data space through latent variables (principal components) because the proposed method achieves the goal of dimension reduction through identifying important observed variables. Consequently, the proposed method is of more practical use. The proposed estimators perform as the oracle procedure and are root-n consistent with a proper choice of regularization parameters. The proposed method can be successfully applied to high-dimensional PCA problems with a relatively large portion of irrelevant variables included in the data set. It is straightforward to extend our likelihood method in handling problems with missing observations using EM algorithms. Further, it could be effectively applied in cases where some data vectors exhibit one or more missing values at random.

Discriminant Analysis of Marketed Liquor by a Multi-channel Taste Evaluation System

  • Kim, Nam-Soo
    • Food Science and Biotechnology
    • /
    • v.14 no.4
    • /
    • pp.554-557
    • /
    • 2005
  • As a device for taste sensation, an 8-channel taste evaluation system was prepared and applied for discriminant analysis of marketed liquor. The biomimetic polymer membranes for the system were prepared through a casting procedure by employing polyvinyl chloride, bis (2-ethylhexyl)sebacate as plasticizer and electroactive materials such as valinomycin in the ratio of 33:66:1, and were separately attached over the sensitive area of ion-selective electrodes to construct the corresponding taste sensor array. The sensor array in conjunction with a double junction reference electrode was connected to a high-input impedance amplifier and the amplified sensor signals were interfaced to a personal computer via an A/D converter. When the signal data from the sensor array for 3 groups of marketed liquor like Maesilju, Soju and beer were analyzed by principal component analysis after normalization, it was observed that the 1st, 2nd and 3rd principal component were responsible for most of the total data variance, and the analyzed liquor samples were discriminated well in 2 dimensional principal component planes composed of the 1st-2nd and the 1st-3rd principal component.

Quantitative Analysis for Biomass Energy Problem Using a Radial Basis Function Neural Network (RBF 뉴럴네트워크를 사용한 바이오매스 에너지문제의 계량적 분석)

  • Baek, Seung Hyun;Hwang, Seung-June
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.36 no.4
    • /
    • pp.59-63
    • /
    • 2013
  • In biomass gasification, efficiency of energy quantification is a difficult part without finishing the process. In this article, a radial basis function neural network (RBFN) is proposed to predict biomass efficiency before gasification. RBFN will be compared with a principal component regression (PCR) and a multilayer perceptron neural network (MLPN). Due to the high dimensionality of data, principal component transform is first used in PCR and afterwards, ordinary regression is applied to selected principal components for modeling. Multilayer perceptron neural network (MLPN) is also used without any preprocessing. For this research, 3 wood samples and 3 other feedstock are used and they are near infrared (NIR) spectrum data with high-dimensionality. Ash and char are used as response variables. The comparison results of two responses will be shown.