• Title/Summary/Keyword: partial least squares discriminant analysis

Search Result 63, Processing Time 0.022 seconds

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Forensic Classification of Latent Fingerprints Applying Laser-induced Plasma Spectroscopy Combined with Chemometric Methods (케모메트릭 방법과 결합된 레이저 유도 플라즈마 분광법을 적용한 유류 지문의 법의학적 분류 연구)

  • Yang, Jun-Ho;Yoh, Jai-Ick
    • Korean Journal of Optics and Photonics
    • /
    • v.31 no.3
    • /
    • pp.125-133
    • /
    • 2020
  • An innovative method for separating overlapping latent fingerprints, using laser-induced plasma spectroscopy (LIPS) combined with multivariate analysis, is reported in the current study. LIPS provides the capabilities of real-time analysis and high-speed scanning, as well as data regarding the chemical components of overlapping fingerprints. These spectra provide valuable chemical information for the forensic classification and reconstruction of overlapping latent fingerprints, by applying appropriate multivariate analysis. This study utilizes principal-component analysis (PCA) and partial-least-squares (PLS) techniques for the basis classification of four types of fingerprints from the LIPS spectra. The proposed method is successfully demonstrated through a classification example of four distinct latent fingerprints, using discrimination such as soft independent modeling of class analogy (SIMCA) and partial-least-squares discriminant analysis (PLS-DA). This demonstration develops an accuracy of more than 85% and is proven to be sufficiently robust. In addition, by laser-scanning analysis at a spatial interval of 125 ㎛, the overlapping fingerprints were separated as two-dimensional forms.

Principal Discriminant Variate (PDV) Method for Classification of Multicollinear Data: Application to Diagnosis of Mastitic Cows Using Near-Infrared Spectra of Plasma Samples

  • Jiang, Jian-Hui;Tsenkova, Roumiana;Yu, Ru-Qin;Ozaki, Yukihiro
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1244-1244
    • /
    • 2001
  • In linear discriminant analysis there are two important properties concerning the effectiveness of discriminant function modeling. The first is the separability of the discriminant function for different classes. The separability reaches its optimum by maximizing the ratio of between-class to within-class variance. The second is the stability of the discriminant function against noises present in the measurement variables. One can optimize the stability by exploring the discriminant variates in a principal variation subspace, i. e., the directions that account for a majority of the total variation of the data. An unstable discriminant function will exhibit inflated variance in the prediction of future unclassified objects, exposed to a significantly increased risk of erroneous prediction. Therefore, an ideal discriminant function should not only separate different classes with a minimum misclassification rate for the training set, but also possess a good stability such that the prediction variance for unclassified objects can be as small as possible. In other words, an optimal classifier should find a balance between the separability and the stability. This is of special significance for multivariate spectroscopy-based classification where multicollinearity always leads to discriminant directions located in low-spread subspaces. A new regularized discriminant analysis technique, the principal discriminant variate (PDV) method, has been developed for handling effectively multicollinear data commonly encountered in multivariate spectroscopy-based classification. The motivation behind this method is to seek a sequence of discriminant directions that not only optimize the separability between different classes, but also account for a maximized variation present in the data. Three different formulations for the PDV methods are suggested, and an effective computing procedure is proposed for a PDV method. Near-infrared (NIR) spectra of blood plasma samples from mastitic and healthy cows have been used to evaluate the behavior of the PDV method in comparison with principal component analysis (PCA), discriminant partial least squares (DPLS), soft independent modeling of class analogies (SIMCA) and Fisher linear discriminant analysis (FLDA). Results obtained demonstrate that the PDV method exhibits improved stability in prediction without significant loss of separability. The NIR spectra of blood plasma samples from mastitic and healthy cows are clearly discriminated between by the PDV method. Moreover, the proposed method provides superior performance to PCA, DPLS, SIMCA and FLDA, indicating that PDV is a promising tool in discriminant analysis of spectra-characterized samples with only small compositional difference, thereby providing a useful means for spectroscopy-based clinic applications.

  • PDF

PRINCIPAL DISCRIMINANT VARIATE (PDV) METHOD FOR CLASSIFICATION OF MULTICOLLINEAR DATA WITH APPLICATION TO NEAR-INFRARED SPECTRA OF COW PLASMA SAMPLES

  • Jiang, Jian-Hui;Yuqing Wu;Yu, Ru-Qin;Yukihiro Ozaki
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1042-1042
    • /
    • 2001
  • In linear discriminant analysis there are two important properties concerning the effectiveness of discriminant function modeling. The first is the separability of the discriminant function for different classes. The separability reaches its optimum by maximizing the ratio of between-class to within-class variance. The second is the stability of the discriminant function against noises present in the measurement variables. One can optimize the stability by exploring the discriminant variates in a principal variation subspace, i. e., the directions that account for a majority of the total variation of the data. An unstable discriminant function will exhibit inflated variance in the prediction of future unclassified objects, exposed to a significantly increased risk of erroneous prediction. Therefore, an ideal discriminant function should not only separate different classes with a minimum misclassification rate for the training set, but also possess a good stability such that the prediction variance for unclassified objects can be as small as possible. In other words, an optimal classifier should find a balance between the separability and the stability. This is of special significance for multivariate spectroscopy-based classification where multicollinearity always leads to discriminant directions located in low-spread subspaces. A new regularized discriminant analysis technique, the principal discriminant variate (PDV) method, has been developed for handling effectively multicollinear data commonly encountered in multivariate spectroscopy-based classification. The motivation behind this method is to seek a sequence of discriminant directions that not only optimize the separability between different classes, but also account for a maximized variation present in the data. Three different formulations for the PDV methods are suggested, and an effective computing procedure is proposed for a PDV method. Near-infrared (NIR) spectra of blood plasma samples from daily monitoring of two Japanese cows have been used to evaluate the behavior of the PDV method in comparison with principal component analysis (PCA), discriminant partial least squares (DPLS), soft independent modeling of class analogies (SIMCA) and Fisher linear discriminant analysis (FLDA). Results obtained demonstrate that the PDV method exhibits improved stability in prediction without significant loss of separability. The NIR spectra of blood plasma samples from two cows are clearly discriminated between by the PDV method. Moreover, the proposed method provides superior performance to PCA, DPLS, SIMCA md FLDA, indicating that PDV is a promising tool in discriminant analysis of spectra-characterized samples with only small compositional difference.

  • PDF

Discrimination of Cultivars and Cultivation Origins from the Sepals of Dry Persimmon Using FT-IR Spectroscopy Combined with Multivariate Analysis (FT-IR 스펙트럼 데이터의 다변량 통계분석을 이용한 곶감의 원산지 및 품종 식별)

  • Hur, Suel Hye;Kim, Suk Weon;Min, Byung Whan
    • Korean Journal of Food Science and Technology
    • /
    • v.47 no.1
    • /
    • pp.20-26
    • /
    • 2015
  • This study aimed to establish a rapid system for discriminating the cultivation origins and cultivars of dry persimmons, using metabolite fingerprinting by Fourier transform infrared (FT-IR) spectroscopy combined with multivariate analysis. Whole-cell extracts from the sepals of four Korean cultivars and two different Chinese dry persimmons were subjected to FT-IR spectroscopy. Principle component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) of the FT-IR spectral data successfully discriminated six dry persimmons into two groups depending on their cultivation origins. Principal component loading values showed that the 1750-1420 and $1190-950cm^{-1}$ regions of the FT-IR spectra were significantly important for the discrimination of cultivation origins. The accuracy of prediction of the cultivation origins and cultivars by PLS regression was 100% (p<0.01) and 85.9% (p<0.05), respectively. These results clearly show that metabolic fingerprinting of FT-IR spectra can be applied for rapid discrimination of the cultivation origins and cultivars of commercial dry persimmons.

Discrimination model for cultivation origin of paper mulberry bast fiber and Hanji based on NIR and MIR spectral data combined with PLS-DA (닥나무 인피섬유와 한지의 원산지 판별모델 개발을 위한 NIR 및 MIR 스펙트럼 데이터의 PLS-DA 적용)

  • Jang, Kyung-Ju;Jung, So-Yoon;Go, In-Hee;Jeong, Seon-Hwa
    • Analytical Science and Technology
    • /
    • v.32 no.1
    • /
    • pp.7-16
    • /
    • 2019
  • The objective of this study was the development of a discrimination model for the cultivational origin of paper mulberry bast fiber and Hanji using near infrared (NIR) and mid infrared (MIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA). Paper mulberry bast fiber was purchased in 10 different regions of Korea, and used to make Hanji. PLS-DA was performed using pre-treated FT-NIR and FT-MIR spectral data for paper mulberry bast fiber and Hanji. PLS-DA of paper mulberry bast fiber and Hanji samples, using FT-NIR spectral data, showed 100 % performance in cross validation and the confusion matrix (accuracy, sensitivity, and specificity). The discrimination models showed four regional groups which demonstrated clearer separation and much superior score plots in the NIR spectral data-based model than in the MIR spectral data-based model. Furthermore, the discrimination model based on the NIR spectral data of paper mulberry bast fiber had highly similar score morphology to that of the discrimination model based on the NIR spectral data of Hanji.

Comparison of the antioxidant properties and flavonols in various parts of Korean red onions by multivariate data analysis

  • Park, Mi Jin;Ryu, Da Hye;Cho, Jwa Yeong;Ha, In Jong;Moon, Jin Seong;Kang, Young-Hwa
    • Horticulture, Environment, and Biotechnology : HEB
    • /
    • v.59 no.6
    • /
    • pp.919-927
    • /
    • 2018
  • To compare the antioxidant properties and flavonols in various parts; dry skin (DS) and edible portion (EP), of 8 red onions (Allium cepa L, ROs), total content of phenolics (TPC), flavonoids (TFC), and anthocyanins (TAC) and DPPH radical scavenging properties were estimated and the content of six flavonols were quantified by HPLC-PDA analysis. The major component of DS and EP of RO was quercetin and quercetin-4'-glucoside, respectively. Score plots of the PCA and PLS-DA were segregated by flavonols content and antioxidant properties according to the EP and DS of ROs. Loading plot of the PCA showed that the quercetin and sum of flavonol content were highly correlated with antioxidant activity of ROs. Therefore, flavonol content and antioxidant activity can be used as markers for distinct parts of ROs.

Correlation analysis of human urinary metabolites related to gender and obesity using NMR-based metabolic profiling

  • Kim, Ja-Han;Park, Jung-Dae;Park, Sung-Soo;Hwang, Geum-Sook
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.16 no.1
    • /
    • pp.46-66
    • /
    • 2012
  • Metabolomic studies using human urine have shown that human metabolism is altered by a variety of environmental, cultural, and physiological factors. Comprehensive information about normal human metabolite profiles is necessary for accurate clinical diagnosis of disease and for disease prevention and treatment. In this study, metabolite correlation analyses, using $^1H$ nuclear magnetic resonance (NMR) spectroscopy coupled with multivariate statistics, were performed on human urine to compare metabolic differences based on gender and/or obesity in healthy human subjects. First, we applied partial least squares discriminant analysis to the NMR spectral data set to verify the data's ability to discriminate by gender and obesity. Then, the differences in metabolite-metabolite correlation between male and female, and between normal and high body mass index (obese) subjects were investigated through pairwise correlations. Creatine and several metabolites, including isoleucine, trans-aconitate, and trimethylamine N-oxide (TMAO), exhibited different quantitative relationships depending on gender. Dimethylamine had a different correlation with glycine and TMAO, based on gender. The correlation of TMAO with amino acids was considerably lower in obese, compared to normal, subjects. We expect that the results will shed light on the metabolic pathways of healthy humans and will assist in the accurate diagnosis of human disease.

Application of Metabolomics to Quality Control of Natural Product Derived Medicines

  • Lee, Kyung-Min;Jeon, Jun-Yeong;Lee, Byeong-Ju;Lee, Hwanhui;Choi, Hyung-Kyoon
    • Biomolecules & Therapeutics
    • /
    • v.25 no.6
    • /
    • pp.559-568
    • /
    • 2017
  • Metabolomics has been used as a powerful tool for the analysis and quality assessment of the natural product (NP)-derived medicines. It is increasingly being used in the quality control and standardization of NP-derived medicines because they are composed of hundreds of natural compounds. The most common techniques that are used in metabolomics consist of NMR, GC-MS, and LC-MS in combination with multivariate statistical analyses including principal components analysis (PCA) and partial least squares-discriminant analysis (PLS-DA). Currently, the quality control of the NP-derived medicines is usually conducted using HPLC and is specified by one or two indicators. To create a superior quality control framework and avoid adulterated drugs, it is necessary to be able to determine and establish standards based on multiple ingredients using metabolic profiling and fingerprinting. Therefore, the application of various analytical tools in the quality control of NP-derived medicines forms the major part of this review. $Veregen^{(R)}$ (Medigene AG, Planegg/Martinsried, Germany), which is the first botanical prescription drug approved by US Food and Drug Administration, is reviewed as an example that will hopefully provide future directions and perspectives on metabolomics technologies available for the quality control of NP-derived medicines.

Discrimination of Floral Scents and Metabolites in Cut Flowers of Peony (Paeonia lactiflora Pall.) Cultivars

  • Ahn, Myung Suk;Park, Pue Hee;Kwon, Young Nam;Mekapogu, Manjulatha;Kim, Suk Weon;Jie, Eun Yee;Jeong, Jae Ah;Park, Jong Taek;Kwon, Oh Keun
    • Korean Journal of Plant Resources
    • /
    • v.31 no.6
    • /
    • pp.641-651
    • /
    • 2018
  • Floral scents and metabolites from cut flowers of 14 peony cultivars (Paeonia lactiflora Pall.) were analyzed to discriminate different cultivars and to compare the Korean cultivar with the other cut peonies imported to Korea using electronic nose (E-nose) and Fourier transform infrared (FT-IR) spectroscopy combined with multivariate analysis, respectively. Principal component analysis (PCA) and discriminant function analysis (DFA) dendrogram of peony floral scents were not precisely same but there were 3 groups including same cultivars. PCA and partial least squares-discriminant analysis (PLS-DA) dendrograms of peony metabolites showed that different cut peony cultivars were clustered into two major groups including same cultivars. Fragrance pattern of Korean 'Taebaek' was classified to same group with 'Jubilee' on the PCA and DFA results and its metabolite pattern was clearly discriminated by the PCA and PLS-DA compared to the other cultivars. These results show that the 14 peony cut flowers could be discriminated corresponding to their chemical relationship and the metabolic profile of Korean 'Taebaek' has distinctive characteristics. Furthermore, we suggest that these results could be used as the preliminary data for breeding new cut peony cultivars and for improving the availability of Korean cut peony in cosmetic industry.