• Title/Summary/Keyword: Principal Components Analysis

Search Result 762, Processing Time 0.037 seconds

Pitching grade index in Korean pro-baseball (한국프로야구에서의 투수평가지표)

  • Lee, Jang Taek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.485-492
    • /
    • 2014
  • In baseball, the traditional measure of pitchers are wins and ERA. But these statistics are influenced by luck or team power. So sabermetrician proposes a number of indicators that predict future performance. We determine a new measure, which we call pitching grade index (PGI) that efficiently summarizes a pitcher's performance on a numerical scale using principal components analysis. The PGI statistic can often be useful to assessing a pitcher's individual contribution. Also K-means clustering algorithm are used for segmentation of players into groups.

Clustering non-stationary advanced metering infrastructure data

  • Kang, Donghyun;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.225-238
    • /
    • 2022
  • In this paper, we propose a clustering method for advanced metering infrastructure (AMI) data in Korea. As AMI data presents non-stationarity, we consider time-dependent frequency domain principal components analysis, which is a proper method for locally stationary time series data. We develop a new clustering method based on time-varying eigenvectors, and our method provides a meaningful result that is different from the clustering results obtained by employing conventional methods, such as K-means and K-centres functional clustering. Simulation study demonstrates the superiority of the proposed approach. We further apply the clustering results to the evaluation of the electricity price system in South Korea, and validate the reform of the progressive electricity tariff system.

Hybridization of Quercus aliena Blume and Q. serrata Murray in Korea - Analyses of Morphological variation and Flavonoid chemistry -

  • Park, Jin Hee;Park, Chong-Wook
    • Korean Journal of Environment and Ecology
    • /
    • v.29 no.2
    • /
    • pp.145-161
    • /
    • 2015
  • This research was conducted in order to understand the hybridization between Quercus aliena Blume and Q. serrata Murray in Korea which show wide range of morphological variations within species and interspecific variations of diverse overlapping characteristics caused by hybridization. Morphological analysis (principal components analysis; PCA) of 116 individuals representing two species and their intermediates were performed. As a result, two species were clearly distinguished in terms of morphology, and intermediate morpho-types assumed to be hybrids between the two species were mostly located in the middle of each parent species in the plot of the principal components analysis. There was a clear distinction between two species in trichome distribution pattern which is an important diagnostic character in taxonomy of genus Quercus, whereas intermediate morpho-types showed intermediate state between two species' trichome distributions. Forty-two individuals representing two species and their intermediates were examined for leaf flavonoid constituents. Twenty-three flavonoid compounds were isolated and identified: They were glycosylated derivatives of flavonols, kaempferol, quercetin, isorhamnetin and myricetin. The flavonoid constituents of Q. aliena were five glycosylated derivatives: kaempferol 3-O-galactoside, kaempferol 3-O-glucoside, quercetin 3-O-galactoside, quercetin 3-O-glucoside, and Isorhamnetin 3-O-glucoside. The flavonoid constituents of Q. serrata had 20 diverse flavonol compounds including five flavonoid compounds found in Q. aliena. It was found that there is a clear difference in flavonoid constituents of Q. aliena and Q. serrata. Flavonoid chemistry is very useful in recognizing each species and putative hybrids. The flavonoid constituents of intermediates were a mixture of the two species' constituents and they generally showed similar characteristics to morpho-types. The hybrids between Q. aliena and Q. serrata showed morphologically and chemically diverse characteristics and it is assumed that there are frequent interspecific hybridization and introgression.

Multivariate Analysis and Gas Chromatographic Determination of the Smelly Nitro Compounds in Dried-Fishes (GC에 의한 건어물 냄새성분중 질소화합물 분석과 다변량해석)

  • Bae, Sun Young;Lee, Dong Sun
    • Journal of the Korean Chemical Society
    • /
    • v.41 no.2
    • /
    • pp.105-112
    • /
    • 1997
  • The smelly nitro compounds were extracted from dried fishes by simultanous distillation and extraction, then were analyzed by GC-MS. Carbon number and order of an amine could be predicted by using retention time and equivalent chain length. Anchovy, codfish, imitation crab meat, cuttle fish, file fish, pollack, shrimp, octopus, harvest fish, and hard-shelled mussel were used for this investigation. Various smelly nitro compounds such as methylamine, acetamide, thiazole, 2-hydroxy isopropylamine, N-methyl pyrroline, piperidine, cyclohexylamine were identified, however, dimethylamine, trimethylamine, diethylamine were not detected. Principal components analysis was applied to GC-MS profiles for pattern recognition of smelly nitro compounds in dried fishes. Multivariate aspects using principal components analysis were very useful for pattern recognition of smelly components, category similarity.

  • PDF

Application of Dimensional Expansion and Reduction to Earthquake Catalog for Machine Learning Analysis (기계학습 분석을 위한 차원 확장과 차원 축소가 적용된 지진 카탈로그)

  • Jang, Jinsu;So, Byung-Dal
    • The Journal of Engineering Geology
    • /
    • v.32 no.3
    • /
    • pp.377-388
    • /
    • 2022
  • Recently, several studies have utilized machine learning to efficiently and accurately analyze seismic data that are exponentially increasing. In this study, we expand earthquake information such as occurrence time, hypocentral location, and magnitude to produce a dataset for applying to machine learning, reducing the dimension of the expended data into dominant features through principal component analysis. The dimensional extended data comprises statistics of the earthquake information from the Global Centroid Moment Tensor catalog containing 36,699 seismic events. We perform data preprocessing using standard and max-min scaling and extract dominant features with principal components analysis from the scaled dataset. The scaling methods significantly reduced the deviation of feature values caused by different units. Among them, the standard scaling method transforms the median of each feature with a smaller deviation than other scaling methods. The six principal components extracted from the non-scaled dataset explain 99% of the original data. The sixteen principal components from the datasets, which are applied with standardization or max-min scaling, reconstruct 98% of the original datasets. These results indicate that more principal components are needed to preserve original data information with even distributed feature values. We propose a data processing method for efficient and accurate machine learning model to analyze the relationship between seismic data and seismic behavior.

Chemometric Aspects and Determination of Sugar Composition of Honey by HPLC (HPLC에 의한 꿀 중의 당조성 분석과 화학계량학적 고찰)

  • Yoon, Jung-Hyeon;Bae, Sun-Young;Kim, Kun;Lee, Dong-Sun
    • Analytical Science and Technology
    • /
    • v.10 no.5
    • /
    • pp.362-369
    • /
    • 1997
  • Chemometric technique was applied to the sugar composition in five honeys of known botanical or geographical origin following HPLC. Fructose and glucose were predominant carbohydrates in honeys, and small amount of sucrose was also detected in one sample. Sugar contents in honeys samples were compared by the geographical or botanical origin. Fructose/glucose ratio ranged from 0.99 to 1.55 was obtained and these results are in good agreement with the ratio of literature. The plot of principal components analysis(PCA) showed that different honey samples grouped into distinct cluster by the geographical or botanical origin. Increasing the first or second principal component score, higher amount of sugar or less fructose/glucose ratio was observed in PCA plot. Chemometric approach was very useful to provide pattern recognition of sugar profile or quality indices of honey sample and to detect adulteration.

  • PDF

Analysis of Functional Connectivity in Human Working Memory using Positron Emission Tomography and Principal Component Analysis

  • Lee, J.S.;Ahn, J.Y.;Jang, M.J.;Lee, D.S.;Chung, J.K.;Lee, M.C.;Park, K.S.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1998 no.11
    • /
    • pp.257-258
    • /
    • 1998
  • To reveal the interconnected brain regions involved in human working memory, their functional connectivity was analyzed using principal component analysis (PCA). rCBF PET scans were peformed on 5 normal volunteers during the verbal and visual working memory tasks and PCA was applied. PCA produced the first principal components related with the increase of the difficulty and the second one which demonstrate the dissociation of verbal and visual memory system.

  • PDF

Varietal Classification by Multivariate Analysis on Quantitative Traits in Pecan

  • Shin, Dong-Young;Nou, Ill-Sup
    • Plant Resources
    • /
    • v.2 no.2
    • /
    • pp.75-80
    • /
    • 1999
  • Twenty two varieties of pecan including wild types were classified based on 6 characters measured by principal component analysis score distance. The results are summarized as fellow. Twenty two varieties were classified into 5 groups based in PCA score distance. Five groups were distinctly characterized by many morphological characters. Total variation could be explained by 51%, 95%, 99% with first, third and fifth principal components respectively. Varimax rotation of the factor loading of the first factors indicated that the first component was highly loaded with leaf characters, the second component with fruit characters, but fruit length was negative loaded. The second, the third and the fourths groups of cultivars had very close genetic parentage similarity.

  • PDF

A Fuzzy Neural Network Combining Wavelet Denoising and PCA for Sensor Signal Estimation

  • Na, Man-Gyun
    • Nuclear Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.485-494
    • /
    • 2000
  • In this work, a fuzzy neural network is used to estimate the relevant sensor signal using other sensor signals. Noise components in input signals into the fuzzy neural network are removed through the wavelet denoising technique . Principal component analysis (PCA) is used to reduce the dimension of an input space without losing a significant amount of information. A lower dimensional input space will also usually reduce the time necessary to train a fuzzy-neural network. Also, the principal component analysis makes easy the selection of the input signals into the fuzzy neural network. The fuzzy neural network parameters are optimized by two learning methods. A genetic algorithm is used to optimize the antecedent parameters of the fuzzy neural network and a least-squares algorithm is used to solve the consequent parameters. The proposed algorithm was verified through the application to the pressurizer water level and the hot-leg flowrate measurements in pressurized water reactors.

  • PDF

Analyses of Power Consumption of the Heat Pump Dryer in the Automobile Drying Process by using the Principal Component Analysis and Multiple Regression (주성분 분석과 다중회귀모형을 사용한 자동차 건조 공정의 히트펌프 건조기 소모 전력 분석)

  • Lee, Chang-Yong;Song, Gensoo;Kim, Jinho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.1
    • /
    • pp.143-151
    • /
    • 2015
  • In this paper, we investigate how the power consumption of a heat pump dryer depends on various factors in the drying process by analyzing variables that affect the power consumption. Since there are in general many variables that affect the power consumption, for a feasible analysis, we utilize the principal component analysis to reduce the number of variables (or dimensionality) to two or three. We find that the first component is correlated positively to the entrance temperature of various devices such as compressor, expander, evaporator, and the second, negatively to condenser. We then model the power consumption as a multiple regression with two and/or three transformed variables of the selected principal components. We find that fitted value from the multiple regression explains 80~90% of the observed value of the power consumption. This results can be applied to a more elaborate control of the power consumption in the heat pump dryer.