• Title/Summary/Keyword: Variable Importance in Projection

Search Result 26, Processing Time 0.036 seconds

Comparison of Variable Importance Measures in Tree-based Classification (나무구조의 분류분석에서 변수 중요도에 대한 고찰)

  • Kim, Na-Young;Lee, Eun-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.717-729
    • /
    • 2014
  • Projection pursuit classification tree uses a 1-dimensional projection with the view of the most separating classes in each node. These projection coefficients contain information distinguishing two groups of classes from each other and can be used to calculate the importance measure of classification in each variable. This paper reviews the variable importance measure with increasing interest in line with growing data size. We compared the performances of projection pursuit classification tree with those of classification and regression tree(CART) and random forest. Projection pursuit classification tree are found to produce better performance in most cases, particularly with highly correlated variables. The importance measure of projection pursuit classification tree performs slightly better than the importance measure of random forest.

A Prediction Model for Coating Thickness Based on PLS Model and Variable Selection (부분최소자승법과 변수선택을 이용한 코팅두께 예측모델 개발)

  • Lee, Hye-Seon;Lee, Young-Rok;Jun, Chi-Hyuck;Hong, Jae-Hwa
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.295-304
    • /
    • 2010
  • Coating thickness is one of target variables in quality control process in steel industry. To predict coating thickness and to control quality of anti-fingerprint steel coils, ultraviolet-visible spectra are measured. We propose a variable-interval selection procedure based on the variable importance in projection in partial least square model. Using the proposed variable interval selection method, prediction performance gets better in the reduced model than the full model with full spectra absorbance. It is also shown that the first differencing as a data preprocessing technique does work well for the prediction of coating thickness.

Development of On-line Quantitative Analysis for Bioethanol Using Infrared Spectroscopy (적외선 분광분석을 이용한 바이오 에탄올 on-line용 정량분석법 개발)

  • Kim, Hyeonguk;Ryu, Jun-Hyung;Liu, J. Jay
    • Applied Chemistry for Engineering
    • /
    • v.23 no.1
    • /
    • pp.35-41
    • /
    • 2012
  • This paper proposes a new methodology for the real-time on-line quality monitoring of biofuel processes through the integration of infrared spectroscopy and chemometrics. A method of Partial Least Squares (PLS) in Chemometrics is employed for quantitative analysis of key components in bioethanol products. After a number of preprocessing methods and variable importance in projection (VIP) are used, Savitzky-Golay method showed the best performance in terms of spectrum correction, noise reduction, and model maintenance. The proposed method allows us to economically forecast the concentration of multiple impurities encountered with the production of bioethanol. The proposed system is also accurate enough ($R^2$ > 0.99) to replace the laboratory analysis.

Unraveling dynamic metabolomes underlying different maturation stages of berries harvested from Panax ginseng

  • Lee, Mee Youn;Seo, Han Sol;Singh, Digar;Lee, Sang Jun;Lee, Choong Hwan
    • Journal of Ginseng Research
    • /
    • v.44 no.3
    • /
    • pp.413-423
    • /
    • 2020
  • Background: Ginseng berries (GBs) show temporal metabolic variations among different maturation stages, determining their organoleptic and functional properties. Methods: We analyzed metabolic variations concomitant to five different maturation stages of GBs including immature green (IG), mature green (MG), partially red (PR), fully red (FR), and overmature red (OR) using mass spectrometry (MS)-based metabolomic profiling and multivariate analyses. Results: The partial least squares discriminant analysis score plot based on gas chromatography-MS datasets highlighted metabolic disparity between preharvest (IG and MG) and harvest/postharvest (PR, FR, and OR) GB extracts along PLS1 (34.9%) with MG distinctly segregated across PLS2 (18.2%). Forty-three significantly discriminant primary metabolites were identified encompassing five developmental stages (variable importance in projection > 1.0, p < 0.05). Among them, most amino acids, organic acids, 5-C sugars, ethanolamines, purines, and palmitic acid were detected in preharvest GB extracts, whereas 6-C sugars, phenolic acid, and oleamide levels were distinctly higher during later maturation stages. Similarly, the partial least squares discriminant analysis based on liquid chromatography-MS datasets displayed preharvest and harvest/postharvest stages clustered across PLS1 (11.1 %); however, MG and PR were separated from IG, FR, and OR along PLS2 (5.6 %). Overall, 24 secondary metabolites were observed significantly discriminant (variable importance in projection > 1.0, p < 0.05), with most displaying higher relative abundance during preharvest stages excluding ginsenosides Rg1 and Re. Furthermore, we observed strong positive correlations between total flavonoid and phenolic metabolite contents in GB extracts and antioxidant activity. Conclusion: Comprehending the dynamic metabolic variations associated with GB maturation stages rationalize their optimal harvest time per se the related agroeconomic traits.

Development of Nondestructive Detection Method for Adulterated Powder Products Using Raman Spectroscopy and Partial Least Squares Regression (라만 분광법과 부분최소자승법을 이용한 불량 분말식품 비파괴검사 기술 개발)

  • Lee, Sangdae;Lohumi, Santosh;Cho, Byoung-Kwan;Kim, Moon S.;Lee, Soo-Hee
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.34 no.4
    • /
    • pp.283-289
    • /
    • 2014
  • This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The $R^2_c$ and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

Study of Prediction Model Improvement for Apple Soluble Solids Content Using a Ground-based Hyperspectral Scanner (지상용 초분광 스캐너를 활용한 사과의 당도예측 모델의 성능향상을 위한 연구)

  • Song, Ahram;Jeon, Woohyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.5_1
    • /
    • pp.559-570
    • /
    • 2017
  • A partial least squares regression (PLSR) model was developed to map the internal soluble solids content (SSC) of apples using a ground-based hyperspectral scanner that could simultaneously acquire outdoor data and capture images of large quantities of apples. We evaluated the applicability of various preprocessing techniques to construct an optimal prediction model and calculated the optimal band through a variable importance in projection (VIP)score. From the 515 bands of hyperspectral images extracted at wavelengths of 360-1019 nm, 70 reflectance spectra of apples were extracted, and the SSC ($^{\circ}Brix$) was measured using a digital photometer. The optimal prediction model wasselected considering the root-mean-square error of cross-validation (RMSECV), root-mean-square error of prediction (RMSEP) and coefficient of determination of prediction $r_p^2$. As a result, multiplicative scatter correction (MSC)-based preprocessing methods were better than others. For example, when a combination of MSC and standard normal variate (SNV) was used, RMSECV and RMSEP were the lowest at 0.8551 and 0.8561 and $r_c^2$ and $r_p^2$ were the highest at 0.8533 and 0.6546; wavelength ranges of 360-380, 546-690, 760, 915, 931-939, 942, 953, 971, 978, 981, 988, and 992-1019 nm were most influential for SSC determination. The PLSR model with the spectral value of the corresponding region confirmed that the RMSEP decreased to 0.6841 and $r_p^2$ increased to 0.7795 as compared to the values of the entire wavelength band. In this study, we confirmed the feasibility of using a hyperspectral scanner image obtained from outdoors for the SSC measurement of apples. These results indicate that the application of field data and sensors could possibly expand in the future.

Operational Performance Evaluation of Korean Major Container Terminals

  • Lu, Bo;Park, Nam-Kyu
    • Journal of Navigation and Port Research
    • /
    • v.34 no.9
    • /
    • pp.719-726
    • /
    • 2010
  • As the competition among the container terminals in Korea has become increasingly fierce, every terminal is striving to increase its investments constantly and lower its operational costs in order to maintain the competitive edge and provide satisfactory services to terminal users. The unreasoning behavior, however, has induced that substantial waste and inefficiency exists in container terminal production. Therefore, it is of great importance for the terminal to know whether it has fully used its existing infrastructures and that output has been maximized given the input. From this perspective, data envelopment analysis (DEA) provides a more appropriate benchmark. This study applies three models of DEA to acquire a variety of analytical results about the operational efficiency to the Korean container terminals. According to efficiency value analysis, this study first finds the reason of inefficiency. It is followed by identification of the potential areas of improvement for inefficient terminals by applying slack variable method and giving the projection results. Finally, return to scale approach is used to assess whether each terminal is in a state of increasing, decreasing, or constant return to scale. The results of this study can provide terminal managers with insight into resource allocation and optimization of the operating performance.

Volatile Compounds for Discrimination between Beef, Pork, and Their Admixture Using Solid-Phase-Microextraction-Gas Chromatography-Mass Spectrometry (SPME-GC-MS) and Chemometrics Analysis

  • Zubayed Ahamed;Jin-Kyu Seo;Jeong-Uk Eom;Han-Sul Yang
    • Food Science of Animal Resources
    • /
    • v.44 no.4
    • /
    • pp.934-950
    • /
    • 2024
  • This study addresses the prevalent issue of meat species authentication and adulteration through a chemometrics-based approach, crucial for upholding public health and ensuring a fair marketplace. Volatile compounds were extracted and analyzed using headspace-solid-phase-microextraction-gas chromatography-mass spectrometry. Adulterated meat samples were effectively identified through principal component analysis (PCA) and partial least square-discriminant analysis (PLS-DA). Through variable importance in projection scores and a Random Forest test, 11 key compounds, including nonanal, octanal, hexadecanal, benzaldehyde, 1-octanol, hexanoic acid, heptanoic acid, octanoic acid, and 2-acetylpyrrole for beef, and hexanal and 1-octen-3-ol for pork, were robustly identified as biomarkers. These compounds exhibited a discernible trend in adulterated samples based on adulteration ratios, evident in a heatmap. Notably, lipid degradation compounds strongly influenced meat discrimination. PCA and PLS-DA yielded significant sample separation, with the first two components capturing 80% and 72.1% of total variance, respectively. This technique could be a reliable method for detecting meat adulteration in cooked meat.

Evaluation of benzene residue in edible oils using Fourier transform infrared (FTIR) spectroscopy

  • Joshi, Ritu;Cho, Byoung-Kwan;Lohumi, Santosh;Joshi, Rahul;Lee, Jayoung;Lee, Hoonsoo;Mo, Changyeun
    • Korean Journal of Agricultural Science
    • /
    • v.46 no.2
    • /
    • pp.257-271
    • /
    • 2019
  • The use of food grade hexane (FGH) for edible oil extraction is responsible for the presence of benzene in the crude oil. Benzene is a Group 1 carcinogen and could pose a serious threat to the health of consumer. However, its detection still depends on classical methods using chromatography which requires a rapid non-destructive detection method. Hence, the aim of this study was to investigate the feasibility of using Fourier transform infrared (FTIR) spectroscopy combined with multivariate analysis to detect and quantify the benzene residue in edible oil (sesame and cottonseed oil). Oil samples were adulterated with varying quantities of benzene, and their FTIR spectra were acquired with an attenuated total reflectance (ATR) method. Optimal variables for a partial least-squares regression (PLSR) model were selected using the variable importance in projection (VIP) and the selectivity ratio (SR) methods. The developed PLS models with whole variables and the VIP- and SR-selected variables were validated against an independent data set which resulted in $R^2$ values of 0.95, 0.96, and 0.95 and standard error of prediction (SEP) values of 38.5, 33.7, and 41.7 mg/L, respectively. The proposed technique of FTIR combined with multivariate analysis and variable selection methods can detect benzene residuals in edible oils with the advantages of being fast and simple and thus, can replace the conventional methods used for the same purpose.

Rancidity Prediction of Soybean Oil by Using Near-Infrared Spectroscopy Techniques

  • Hong, Suk-Ju;Lee, Ah-Yeong;Han, Yun-hyeok;Park, Jongmin;So, Jung Duck;Kim, Ghiseok
    • Journal of Biosystems Engineering
    • /
    • v.43 no.3
    • /
    • pp.219-228
    • /
    • 2018
  • Purpose: This study evaluated the feasibility of a near-infrared spectroscopy technique for the rancidity prediction of soybean oil. Methods: A near-infrared spectroscopy technique was used to evaluate the rancidity of soybean oils which were artificially deteriorated. A soybean oil sample was collected, and the acid values were measured using titrimetric analysis. In addition, the transmission spectra of the samples were obtained for whole test periods. The prediction model for the acid value was constructed by using a partial least-squares regression (PLSR) technique and the appropriate spectrum preprocessing methods. Furthermore, optimal wavelength selection methods such as variable importance in projection (VIP) and bootstrap of beta coefficients were applied to select the most appropriate variables from the preprocessed spectra. Results: There were significantly different increases in the acid values from the sixth days onwards during the 14-day test period. In addition, it was observed that the NIR spectra that exhibited intense absorption at 1,195 nm and 1,410 nm could indicate the degradation of soybean oil. The PLSR model developed using the Savitzky-Golay $2^{nd}$ order derivative method for preprocessing exhibited the highest performance in predicting the acid value of soybean oil samples. onclusions: The study helped establish the feasibility of predicting the rancidity of the soybean oil (using its acid value) by means of a NIR spectroscopy together with optimal variable selection methods successfully. The experimental results suggested that the wavelengths of 1,150 nm and 1,450 nm, which were highly correlated with the largest absorption by the second and first overtone of the C-H, O-H stretch vibrational transition, were caused by the deterioration of soybean oil.