• Title/Summary/Keyword: Data Principal

Search Result 2,090, Processing Time 0.03 seconds

A Classification of Rural Area Using Principal Component Analysis and GIS (주성분 분석과 지리정보시스템을 이용한 충청북도 농촌 지역의 유형화)

  • Park, Jin-Sun;Joo, Ho-Gil;Yoon, Seong-Soo;Rhee, Shin-Ho
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2003.10a
    • /
    • pp.131-134
    • /
    • 2003
  • The purpose of this study is for classification to do a short distance rural area with the object to the center to Cheongju area. This study used principal component analysis and geography information system, and it was disciplined oneself. It was done a study object region to Cheongju-si, Cheongwon-gun Goesan-gun, Eumseong-gun, and we divided an index by of 22 large class and 104 small class, and the SPSS analyzed the Principal Component Analysis. We used a Geography Information System, and it was made graphical data by the results that have finished Principal Component Analysis.

  • PDF

Moving Window Principal Component Analysis for Detecting Positional Fluctuation of Spectral Changes

  • Ryu, Soo-Ryeon;Noda, Isao;Jung, Young-Mee
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2332-2338
    • /
    • 2011
  • In this study, we proposed a new promising idea of utilizing moving window principal component analysis (MWPCA) as a sensitive diagnostic tool to detect the presence of peak position shift. In this approach, the moving window is constructed from a small data segment along the wavenumber axis. For each window bound by a narrow wavenumber region, separate PCA analysis was applied. Simulated spectra with complex spectral feature variations were analyzed to explore the possibility of MWPCA technique. This MWPCA-based detection of the peak shift, potentially coupled with 2D correlation analysis to provide additional verification, may offer an attractive solution.

LS-SVM for large data sets

  • Park, Hongrak;Hwang, Hyungtae;Kim, Byungju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.549-557
    • /
    • 2016
  • In this paper we propose multiclassification method for large data sets by ensembling least squares support vector machines (LS-SVM) with principal components instead of raw input vector. We use the revised one-vs-all method for multiclassification, which is one of voting scheme based on combining several binary classifications. The revised one-vs-all method is performed by using the hat matrix of LS-SVM ensemble, which is obtained by ensembling LS-SVMs trained using each random sample from the whole large training data. The leave-one-out cross validation (CV) function is used for the optimal values of hyper-parameters which affect the performance of multiclass LS-SVM ensemble. We present the generalized cross validation function to reduce computational burden of leave-one-out CV functions. Experimental results from real data sets are then obtained to illustrate the performance of the proposed multiclass LS-SVM ensemble.

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.

Variable Arrangement for Data Visualization

  • Huh, Moon Yul;Song, Kwang Ryeol
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.643-650
    • /
    • 2001
  • Some classical plots like scatterplot matrices and parallel coordinates are valuable tools for data visualization. These tools are extensively used in the modern data mining softwares to explore the inherent data structure, and hence to visually classify or cluster the database into appropriate groups. However, the interpretation of these plots are very sensitive to the arrangement of variables. In this work, we introduce two methods to arrange the variables for data visualization. First method is based on the work of Wegman (1999), and this is to arrange the variables using minimum distance among all the pairwise permutation of the variables. Second method is using the idea of principal components. We Investigate the effectiveness of these methods with parallel coordinates using real data sets, and show that each of the two proposed methods has its own strength from different aspects respectively.

  • PDF

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.1
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

A Model-based Collaborative Filtering Through Regularized Discriminant Analysis Using Market Basket Data

  • Lee, Jong-Seok;Jun, Chi-Hyuck;Lee, Jae-Wook;Kim, Soo-Young
    • Management Science and Financial Engineering
    • /
    • v.12 no.2
    • /
    • pp.71-85
    • /
    • 2006
  • Collaborative filtering, among other recommender systems, has been known as the most successful recommendation technique. However, it requires the user-item rating data, which may not be easily available. As an alternative, some collaborative filtering algorithms have been developed recently by utilizing the market basket data in the form of the binary user-item matrix. Viewing the recommendation scheme as a two-class classification problem, we proposed a new collaborative filtering scheme using a regularized discriminant analysis applied to the binary user-item data. The proposed discriminant model was built in terms of the major principal components and was used for predicting the probability of purchasing a particular item by an active user. The proposed scheme was illustrated with two modified real data sets and its performance was compared with the existing user-based approach in terms of the recommendation precision.

Program Development of Integrated Expression Profile Analysis System for DNA Chip Data Analysis (DNA칩 데이터 분석을 위한 유전자발연 통합분석 프로그램의 개발)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.4
    • /
    • pp.381-388
    • /
    • 2001
  • A program for integrated gene expression profile analysis such as hierarchical clustering, K-means, fuzzy c-means, self-organizing map(SOM), principal component analysis(PCA), and singular value decomposition(SVD) was made for DNA chip data anlysis by using Matlab. It also contained the normalization method of gene expression input data. The integrated data anlysis program could be effectively used in DNA chip data analysis and help researchers to get more comprehensive analysis view on gene expression data of their own.

  • PDF

Processing of Downhole S-wave Seismic Survey Data by Considering Direction of Polarization

  • Kim, Jin-Hoo;Park, Choon-B.
    • Journal of the Korean Geophysical Society
    • /
    • v.5 no.4
    • /
    • pp.321-328
    • /
    • 2002
  • Difficulties encountered in downhole S-wave (shear wave) surveys include the precise determination of shear wave travel times and determination of geophone orientation relative to the direction of polarization caused by the seismic source. In this study an S-wave enhancing and a principal component analysis method were adopted as a tool for determination of S-wave arrivals and the direction of polarization from downhole S-wave survey data. An S-wave enhancing method can almost double the amplitudes of S-waves, and the angle between direction of polarization and a geophone axis can be obtained by a principal component analysis. Once the angle is obtained data recorded by two horizontal geophones are transformed to principal axes, yielding so called scores. The scores gathered along depth are all in-phase, consequently, the accuracy of S-wave arrival picking could be remarkably improved. Applying this processing method to the field data reveals that the test site consists of a layered ground earth structure.

  • PDF

The Development of a Fault Diagnosis Model Based on Principal Component Analysis and Support Vector Machine for a Polystyrene Reactor (주성분 분석과 서포트 벡터 머신을 이용한 폴리스티렌 중합 반응기 이상 진단 모델 개발)

  • Jeong, Yeonsu;Lee, Chang Jun
    • Korean Chemical Engineering Research
    • /
    • v.60 no.2
    • /
    • pp.223-228
    • /
    • 2022
  • In chemical processes, unintended faults can make serious accidents. To tackle them, proper fault diagnosis models should be designed to identify the root cause of faults. To design a fault diagnosis model, a process and its data should be analyzed. However, most previous researches in the field of fault diagnosis just handle the data set of benchmark processes simulated on commercial programs. It indicates that it is really hard to get fresh data sets on real processes. In this study, real faulty conditions of an industrial polystyrene process are tested. In this process, a runaway reaction occurred and this caused a large loss since operators were late aware of the occurrence of this accident. To design a proper fault diagnosis model, we analyzed this process and a real accident data set. At first, a mode classification model based on support vector machine (SVM) was trained and principal component analysis (PCA) model for each mode was constructed under normal operation conditions. The results show that a proposed model can quickly diagnose the occurrence of a fault and they indicate that this model is able to reduce the potential loss.