• Title/Summary/Keyword: multivariate classification

Search Result 305, Processing Time 0.035 seconds

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

Study on the spectroscopic reconstruction of explosive-contaminated overlapping fingerprints using the laser-induced plasma emissions

  • Yang, Jun-Ho;Yoh, Jai-Ick
    • Analytical Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.86-97
    • /
    • 2020
  • Reconstruction and separation of explosive-contaminated overlapping fingerprints constitutes an analytical challenge of high significance in forensic sciences. Laser-induced breakdown spectroscopy (LIBS) allows real-time chemical mapping by detecting the light emissions from laser-induced plasma and can offer powerful means of fingerprint classification based on the chemical components of the sample. During recent years LIBS has been studied one of the spectroscopic techniques with larger capability for forensic sciences. However, despite of the great sensitivity, LIBS suffers from a limited detection due to difficulties in reconstruction of overlapping fingerprints. Here, the authors propose a simple, yet effective, method of using chemical mapping to separate and reconstruct the explosive-contaminated, overlapping fingerprints. A Q-switched Nd:YAG laser system (1064 nm), which allows the laser beam diameter and the area of the ablated crater to be controlled, was used to analyze the chemical compositions of eight samples of explosive-contaminated fingerprints (featuring two sample explosive and four individuals) via the LIBS. Then, the chemical validations were further performed by applying the Raman spectroscopy. The results were subjected to principal component and partial least-squares multivariate analyses, and showed the classification of contaminated fingerprints at higher than 91% accuracy. Robustness and sensitivity tests indicate that the novel method used here is effective for separating and reconstructing the overlapping fingerprints with explosive trace.

APPLICATION OF MULTIVARIATE DISCRIMINANT ANALYSIS FOR CLASSIFYING PROFICIENCY OF EQUIPMENT OPERATORS

  • Ruel R. Cabahug;Ruth Guinita-Cabahug;David J. Edwards
    • International conference on construction engineering and project management
    • /
    • 2005.10a
    • /
    • pp.662-666
    • /
    • 2005
  • Using data gathered from expert opinion of plant and equipment professionals; this paper presents the key variables that may constitute a maintenance proficient plant operator. The Multivariate Discriminant Analysis (MDA) was applied to generate data and was tested for sensitivity analysis. Results showed that the MDA model was able to classify plant operators' proficiency at 94.10 percent accuracy and determined nine (9) key variables of a maintenance proficient plant operator. The key variables included: i) number of years of experience as equipment operator (PQ1); ii) eye-hand coordination (PQ9); iii) eye-hand-foot coordination (PQ10); iv) planning skills (TE16); v) pay/wage (MQ1); vi) work satisfaction (MQ4); vii) operator responsibilities as defined by management (MF1); viii) clear management policies (MF4); and ix) management pay scheme (MF5). The classification procedure of nine variables formed the general model with the equation viz: OMP (general) = 0.516PQ1 + 0.309PQ9 + 0.557PQ10 + 0.831TE16 + 0.8MQ1 + 0.0216MQ4 + 0.136MF1 + 0.28MF4 + 0.332MF5 - 4.387

  • PDF

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Classification of Agricultural Reservoirs Using Multivariate Analysis (다변량분석법을 활용한 농업용 저수지 수질유형분류)

  • Choi, Eun-Hee;Kim, Hyung-Joong;Park, Youmg-Suk
    • KCID journal
    • /
    • v.17 no.2
    • /
    • pp.17-27
    • /
    • 2010
  • In order to manage the water quality in reservoir, it is necessary to understand the temporal and spatial variation of reservoirs and to classify the reservoirs. In this research, agricultural reservoirs are classified according to physical characteristics (depth, residence time, shape of the reservoir etc) and water quality using multivatriate analysis (PCA and CA). CA (Cluster Analysis) method classify reservoirs into several groups as a similarity of the reservoirs, but it is difficult to indicate a full list to the one table. In case of PCA (Principle Component Analysis) method, it has the advantage for the classification on the reservoirs depending on the water quality similarity and also it is useful to analyze the relationship between related factors through correlation analysis. However PCA is limited to classify into several groups based on the characteristics of the reservoirs and each user should be classified as randomly subjective according to the relative position of the reservoir in the figure. In conclusions, compared to conventional reservoirs classification methods, both CA and PCA methods are considered to be a classification method that describes the nature of the reservoir well, but classification results has a restriction on use, so further research will be needed to complement.

  • PDF

The Classification of Forest Cover Types by Consecutive Application of Multivariate Statistical Analysis in the Natural Forest of Western Mt. Jiri (다변량 통계 분석법의 연속 적용에 의한 서부 지리산 천연림의 산림 피복형 분류)

  • Chung, Sang Hoon;Kim, Ji Hong
    • Journal of Korean Society of Forest Science
    • /
    • v.102 no.3
    • /
    • pp.407-414
    • /
    • 2013
  • This study was conducted to classify forest cover types using the multivariate statistical analysis in the natural forest of western Mt. Jiri. On the basis of the vegetation data by point quarter sampling, the adopted analytical methods were species-area curve (SAC), hierarchical cluster analysis (HCA), indicator species analysis (ISA), and multiple discriminant analysis (MDA). SAC selected the outlier tree species which was likely to have no influence on the classification of forest cover types, excluded from all analytical process. Based on forest vegetative information, HCA classified the study area into 2 to 10 clusters and ISA indicated that the optimal number of clusters were seven. MDA was taken to test the clusters that classified with HCA and ISA. The seven clusters were classified appropriately as overall classification success were 91.3%. The classified forest cover types were named by the ratio of the dominant species in the upper layer of each cluster. They were (1) Quercus mongolica Pure forest, (2) Mixed mesophytic forest, (3) Q. mongolica - Q. serrata forest, (4) Abies koreana - Q. mongolica forest, (5) Fraxinus mandshurica forest, (6) Q. serrata forest, and (7) Carpinus laxiflora forest.

Classification of latent classes and analysis of influencing factors on longitudinal changes in middle school students' mathematics interest and achievement: Using multivariate growth mixture model (중학생들의 수학 흥미와 성취도의 종단적 변화에 따른 잠재집단 분류 및 영향요인 탐색: 다변량 성장혼합모형을 이용하여)

  • Rae Yeong Kim;Sooyun Han
    • The Mathematical Education
    • /
    • v.63 no.1
    • /
    • pp.19-33
    • /
    • 2024
  • This study investigates longitudinal patterns in middle school students' mathematics interest and achievement using panel data from the 4th to 6th year of the Gyeonggi Education Panel Study. Results from the multivariate growth mixture model confirmed the existence of heterogeneous characteristics in the longitudinal trajectory of students' mathematics interest and achievement. Students were classified into four latent classes: a low-level class with weak interest and achievement, a high-level class with strong interest and achievement, a middlelevel-increasing class where interest and achievement rise with grade, and a middle-level-decreasing class where interest and achievement decline with grade. Each class exhibited distinct patterns in the change of interest and achievement. Moreover, an examination of the correlation between intercepts and slopes in the multivariate growth mixture model reveals a positive association between interest and achievement with respect to their initial values and growth rates. We further explore predictive variables influencing latent class assignment. The results indicated that students' educational ambition and time spent on private education positively affect mathematics interest and achievement, and the influence of prior learning varies based on its intensity. The perceived instruction method significantly impacts latent class assignment: teacher-centered instruction increases the likelihood of belonging to higher-level classes, while learner-centered instruction increases the likelihood of belonging to lower-level classes. This study has significant implications as it presents a new method for analyzing the longitudinal patterns of students' characteristics in mathematics education through the application of the multivariate growth mixture model.

Performance Comparison of Mahalanobis-Taguchi System and Logistic Regression : A Case Study (마할라노비스-다구치 시스템과 로지스틱 회귀의 성능비교 : 사례연구)

  • Lee, Seung-Hoon;Lim, Geun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.39 no.5
    • /
    • pp.393-402
    • /
    • 2013
  • The Mahalanobis-Taguchi System (MTS) is a diagnostic and predictive method for multivariate data. In the MTS, the Mahalanobis space (MS) of reference group is obtained using the standardized variables of normal data. The Mahalanobis space can be used for multi-class classification. Once this MS is established, the useful set of variables is identified to assist in the model analysis or diagnosis using orthogonal arrays and signal-to-noise ratios. And other several techniques have already been used for classification, such as linear discriminant analysis and logistic regression, decision trees, neural networks, etc. The goal of this case study is to compare the ability of the Mahalanobis-Taguchi System and logistic regression using a data set.

Time-Frequency Analysis of Electrohysterogram for Classification of Term and Preterm Birth

  • Ryu, Jiwoo;Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.103-109
    • /
    • 2015
  • In this paper, a novel method for the classification of term and preterm birth is proposed based on time-frequency analysis of electrohysterogram (EHG) using multivariate empirical mode decomposition (MEMD). EHG is a promising study for preterm birth prediction, because it is low-cost and accurate compared to other preterm birth prediction methods, such as tocodynamometry (TOCO). Previous studies on preterm birth prediction applied prefilterings based on Fourier analysis of an EHG, followed by feature extraction and classification, even though Fourier analysis is suboptimal to biomedical signals, such as EHG, because of its nonlinearity and nonstationarity. Therefore, the proposed method applies prefiltering based on MEMD instead of Fourier-based prefilters before extracting the sample entropy feature and classifying the term and preterm birth groups. For the evaluation, the Physionet term-preterm EHG database was used where the proposed method and Fourier prefiltering-based method were adopted for comparative study. The result showed that the area under curve (AUC) of the receiver operating characteristic (ROC) was increased by 0.0351 when MEMD was used instead of the Fourier-based prefilter.