• Title/Summary/Keyword: Data Principal

Search Result 2,078, Processing Time 0.028 seconds

Cluster Analysis with Air Pollutants and Meteorological Factors in Seoul

  • Kim, Jae-Hee;Lim, Ji-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.773-787
    • /
    • 2003
  • Principal component analysis, factor analysis and cluster analysis have been performed to analyze the relationship between air pollutants and meteorological variables measured in 1999 in Seoul. In principal analysis, the first principal has been shown the contrast effect between $O_3$ and the other pollutants, the second principal has been shown the contrast effect between CO, $SO_2$, $NO_2$ and $O_3$, PM10, TSP. In factor analysis, the first factor has been found as PM10, TSP, $NO_2$ concentrations which are related with suspended particulates. As a result of cluster analysis, three clusters respectively have represented different air pollution levels, seasonal characteristics of air pollutants and meteorological situations.

  • PDF

Classification for intraclass correlation pattern by principal component analysis

  • Chung, Hie-Choon;Han, Chien-Pai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.589-595
    • /
    • 2010
  • In discriminant analysis, we consider an intraclass correlation pattern by principal component analysis. We assume that the two populations are equally likely and the costs of misclassification are equal. In this situation, we consider two procedures, i.e., the test and proportion procedures, for selecting the principal components in classifica-tion. We compare the regular classification method and the proposed two procedures. We consider two methods for estimating error rate, i.e., the leave-one-out method and the bootstrap method.

Global Covariance based Principal Component Analysis for Speaker Identification (화자식별을 위한 전역 공분산에 기반한 주성분분석)

  • Seo, Chang-Woo;Lim, Young-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.69-73
    • /
    • 2009
  • This paper proposes an efficient global covariance-based principal component analysis (GCPCA) for speaker identification. Principal component analysis (PCA) is a feature extraction method which reduces the dimension of the feature vectors and the correlation among the feature vectors by projecting the original feature space into a small subspace through a transformation. However, it requires a larger amount of training data when performing PCA to find the eigenvalue and eigenvector matrix using the full covariance matrix by each speaker. The proposed method first calculates the global covariance matrix using training data of all speakers. It then finds the eigenvalue matrix and the corresponding eigenvector matrix from the global covariance matrix. Compared to conventional PCA and Gaussian mixture model (GMM) methods, the proposed method shows better performance while requiring less storage space and complexity in speaker identification.

  • PDF

Combining Ridge Regression and Latent Variable Regression

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.51-61
    • /
    • 2007
  • Ridge regression (RR), principal component regression (PCR) and partial least squares regression (PLS) are among popular regression methods for collinear data. While RR adds a small quantity called ridge constant to the diagonal of X'X to stabilize the matrix inversion and regression coefficients, PCR and PLS use latent variables derived from original variables to circumvent the collinearity problem. One problem of PCR and PLS is that they are very sensitive to overfitting. A new regression method is presented by combining RR and PCR and PLS, respectively, in a unified manner. It is intended to provide better predictive ability and improved stability for regression models. A real-world data from NIR spectroscopy is used to investigate the performance of the newly developed regression method.

  • PDF

Comparison of Shape Variability in Principal Component Biplot with Missing Values

  • Shin, Sang-Min;Choi, Yong-Seok;Lee, Nae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.1109-1116
    • /
    • 2008
  • Biplots are the multivariate analogue of scatter plots. They are useful for giving a graphical description of the data matrix, for detecting patterns and for displaying results found by more formal methods of analysis. Nevertheless, when some values are missing in data matrix, most biplots are not directly applicable. In particular, we are interested in the shape variability of principal component biplot which is the most popular in biplots with missing values. For this, we estimate the missing data using the EM algorithm and mean imputation according to missing rates. Even though we estimate missing values of biplot of incomplete data, we have different shapes of biplots according to the imputation methods and missing rates. Therefore we propose a RMS(root mean square) for measuring and comparing the shape variability between the original biplots and the estimated biplots.

Big Data Analysis Using Principal Component Analysis (주성분 분석을 이용한 빅데이터 분석)

  • Lee, Seung-Joo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.6
    • /
    • pp.592-599
    • /
    • 2015
  • In big data environment, we need new approach for big data analysis, because the characteristics of big data, such as volume, variety, and velocity, can analyze entire data for inferring population. But traditional methods of statistics were focused on small data called random sample extracted from population. So, the classical analyses based on statistics are not suitable to big data analysis. To solve this problem, we propose an approach to efficient big data analysis. In this paper, we consider a big data analysis using principal component analysis, which is popular method in multivariate statistics. To verify the performance of our research, we carry out diverse simulation studies.

Dilemma of Data Driven Technology Regulation : Applying Principal-agent Model on Tracking and Profiling Cases in Korea (데이터 기반 기술규제의 딜레마 : 국내 트래킹·프로파일링 사례에 대한 주인-대리인 모델의 적용)

  • Lee, Youhyun;Jung, Ilyoung
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.17-32
    • /
    • 2020
  • This study analyzes the regulatory issues of stakeholders, the firm, the government, and the individual, in the data industry using the principal-agent theory. While the importance of data driven economy is increasing rapidly, policy regulations and restrictions to use data impede the growth of data industry. We applied descriptive case analysis methodology using principal-agent theory. From our analysis, we found several meaningful results. First, key policy actors in data industry are data firms and the government among stakeholders. Second, two major concerns are that firms frequently invade personal privacy and the global companies obtain monopolistic power in data industry. This paper finally suggests policy and strategy in response to regulatory issues. The government should activate the domestic agent system for the supervision of global companies and increase data protection. Companies need to address discriminatory regulatory environments and expand legal data usage standards. Finally, individuals must embody an active behavior of consent.

Stream Data Analysis of the Weather on the Location using Principal Component Analysis (주성분 분석을 이용한 지역기반의 날씨의 스트림 데이터 분석)

  • Kim, Sang-Yeob;Kim, Kwang-Deuk;Bae, Kyoung-Ho;Ryu, Keun-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.28 no.2
    • /
    • pp.233-237
    • /
    • 2010
  • The recent advance of sensor networks and ubiquitous techniques allow collecting and analyzing of the data which overcome the limitation imposed by time and space in real-time for making decisions. Also, analysis and prediction of collected data can support useful and necessary information to users. The collected data in sensor networks environment is the stream data which has continuous, unlimited and sequential properties. Because of the continuous, unlimited and large volume properties of stream data, managing stream data is difficult. And the stream data needs dynamic processing method because of the memory constraint and access limitation. Accordingly, we analyze correlation stream data using principal component analysis. And using result of analysis, it helps users for making decisions.