• 제목/요약/키워드: multivariate classification

검색결과 305건 처리시간 0.027초

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

Study on the spectroscopic reconstruction of explosive-contaminated overlapping fingerprints using the laser-induced plasma emissions

  • Yang, Jun-Ho;Yoh, Jai-Ick
    • 분석과학
    • /
    • 제33권2호
    • /
    • pp.86-97
    • /
    • 2020
  • Reconstruction and separation of explosive-contaminated overlapping fingerprints constitutes an analytical challenge of high significance in forensic sciences. Laser-induced breakdown spectroscopy (LIBS) allows real-time chemical mapping by detecting the light emissions from laser-induced plasma and can offer powerful means of fingerprint classification based on the chemical components of the sample. During recent years LIBS has been studied one of the spectroscopic techniques with larger capability for forensic sciences. However, despite of the great sensitivity, LIBS suffers from a limited detection due to difficulties in reconstruction of overlapping fingerprints. Here, the authors propose a simple, yet effective, method of using chemical mapping to separate and reconstruct the explosive-contaminated, overlapping fingerprints. A Q-switched Nd:YAG laser system (1064 nm), which allows the laser beam diameter and the area of the ablated crater to be controlled, was used to analyze the chemical compositions of eight samples of explosive-contaminated fingerprints (featuring two sample explosive and four individuals) via the LIBS. Then, the chemical validations were further performed by applying the Raman spectroscopy. The results were subjected to principal component and partial least-squares multivariate analyses, and showed the classification of contaminated fingerprints at higher than 91% accuracy. Robustness and sensitivity tests indicate that the novel method used here is effective for separating and reconstructing the overlapping fingerprints with explosive trace.

APPLICATION OF MULTIVARIATE DISCRIMINANT ANALYSIS FOR CLASSIFYING PROFICIENCY OF EQUIPMENT OPERATORS

  • Ruel R. Cabahug;Ruth Guinita-Cabahug;David J. Edwards
    • 국제학술발표논문집
    • /
    • The 1th International Conference on Construction Engineering and Project Management
    • /
    • pp.662-666
    • /
    • 2005
  • Using data gathered from expert opinion of plant and equipment professionals; this paper presents the key variables that may constitute a maintenance proficient plant operator. The Multivariate Discriminant Analysis (MDA) was applied to generate data and was tested for sensitivity analysis. Results showed that the MDA model was able to classify plant operators' proficiency at 94.10 percent accuracy and determined nine (9) key variables of a maintenance proficient plant operator. The key variables included: i) number of years of experience as equipment operator (PQ1); ii) eye-hand coordination (PQ9); iii) eye-hand-foot coordination (PQ10); iv) planning skills (TE16); v) pay/wage (MQ1); vi) work satisfaction (MQ4); vii) operator responsibilities as defined by management (MF1); viii) clear management policies (MF4); and ix) management pay scheme (MF5). The classification procedure of nine variables formed the general model with the equation viz: OMP (general) = 0.516PQ1 + 0.309PQ9 + 0.557PQ10 + 0.831TE16 + 0.8MQ1 + 0.0216MQ4 + 0.136MF1 + 0.28MF4 + 0.332MF5 - 4.387

  • PDF

메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거 (Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas)

  • 홍고르출;김미혜
    • 한국융합학회논문지
    • /
    • 제11권12호
    • /
    • pp.23-30
    • /
    • 2020
  • 본 연구에서는, 천연가스(NG) 데이터와 가스 관련 환경 요소 간의 관계를 기계학습 알고리즘을 사용하여 가스 누출 데이터를 직접 측정하지 않고 가스 누출 위험 수준을 예측하였다. 이번 연구는 서버가 제공하는 오픈 데이터인 IoT 기반 원격 제어 피카로(Picarro) 가스 센서 사양을 기반으로 사용했다. 천연 가스는 공기 중으로 누출이 되며, 대기 오염, 환경, 그리고 건강에 큰 문제가 된다. 본 연구에서 제안하는 방법은 천연 가스의 누출 위험 예측을 위한 랜덤 포레스트(Random Forest) 분류 기반 다변량 특이치 제거 방법이다. 비지도 k-평균 클러스터링 후에 실험 데이터 집합은 불균형 데이터이다. 따라서 우리는 제안된 모델이 중간과 높은 위험 수준을 가장 잘 예측할 수 있다는 점에 초점을 맞춘다. 이 경우 각 분류 모델에 대한 수신자 조작 특성(ROC) 곡선, 정확도, 평균 표준 오차(MSE)를 비교했다. 실험 결과로 정확도, 수신자 조작 특성의 곡선 아래 영역(AUC, Area Under the ROC Curve), MSE가 각각 MOL_RF의 경우 99.71%, 99.57%, 및 0.0016의 결과 값을 얻었다.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

다변량분석법을 활용한 농업용 저수지 수질유형분류 (Classification of Agricultural Reservoirs Using Multivariate Analysis)

  • 최은희;김형중;박영석
    • 한국관개배수논문집
    • /
    • 제17권2호
    • /
    • pp.17-27
    • /
    • 2010
  • In order to manage the water quality in reservoir, it is necessary to understand the temporal and spatial variation of reservoirs and to classify the reservoirs. In this research, agricultural reservoirs are classified according to physical characteristics (depth, residence time, shape of the reservoir etc) and water quality using multivatriate analysis (PCA and CA). CA (Cluster Analysis) method classify reservoirs into several groups as a similarity of the reservoirs, but it is difficult to indicate a full list to the one table. In case of PCA (Principle Component Analysis) method, it has the advantage for the classification on the reservoirs depending on the water quality similarity and also it is useful to analyze the relationship between related factors through correlation analysis. However PCA is limited to classify into several groups based on the characteristics of the reservoirs and each user should be classified as randomly subjective according to the relative position of the reservoir in the figure. In conclusions, compared to conventional reservoirs classification methods, both CA and PCA methods are considered to be a classification method that describes the nature of the reservoir well, but classification results has a restriction on use, so further research will be needed to complement.

  • PDF

다변량 통계 분석법의 연속 적용에 의한 서부 지리산 천연림의 산림 피복형 분류 (The Classification of Forest Cover Types by Consecutive Application of Multivariate Statistical Analysis in the Natural Forest of Western Mt. Jiri)

  • 정상훈;김지홍
    • 한국산림과학회지
    • /
    • 제102권3호
    • /
    • pp.407-414
    • /
    • 2013
  • 본 연구는 다변량 통계 분석법을 이용하여 지리산 서부 천연림을 대상으로 산림 피복형을 분류하기 위해 실시하였다. 점표본법에 의한 식생자료를 바탕으로, 수종-표본점 곡선, 계층적 군집분석, 지표종분석, 다중판별분석 등의 다변량 통계 분석법을 이용하여 식생자료를 분석하였다. 수종-표본점 곡선에서는 산림 피복형 분류에서 전혀 영향력이 없는 수종들을 예외값으로 제거하였다. 예외값을 제외한 산림식생정보를 바탕으로 계층적 군집분석을 이용하여 연구대상지를 2~10개의 클러스터로 분류하였으며, 지표종분석을 통해 연구대상지의 적정 클러스터 수는 7개인 것으로 파악되었다. 이를 통계적으로 검증하기 위해 다중판별분석을 실시하였고, 91.3%가 정확하게 분류되어, 연구대상지 산림 피복형의 개수는 7개가 적당한 것으로 나타났다. 각 클러스터 상층의 우점수종 비율에 따라 신갈나무순림, 중생혼합림, 신갈나무-졸참나무림, 구상나무-신갈나무림, 들메나무림, 졸참나무림, 서어나무림으로 산림 피복형을 명명하였다.

중학생들의 수학 흥미와 성취도의 종단적 변화에 따른 잠재집단 분류 및 영향요인 탐색: 다변량 성장혼합모형을 이용하여 (Classification of latent classes and analysis of influencing factors on longitudinal changes in middle school students' mathematics interest and achievement: Using multivariate growth mixture model)

  • 김래영;한수연
    • 한국수학교육학회지시리즈A:수학교육
    • /
    • 제63권1호
    • /
    • pp.19-33
    • /
    • 2024
  • 본 연구는 중학생들의 수학 흥미와 성취도의 종단적인 변화 양상을 알아보기 위해 경기교육종단연구 4-6차년도 데이터를 분석하였다. 다변량 성장혼합모형을 이용하여 분석한 결과 학생들의 수학 흥미와 성취도의 변화 양상에 이질적인 특성이 존재함을 확인하였고, 종단적인 변화 양상에 따라 학생들을 4개의 잠재집단으로 구분하였다. 학생들은 흥미와 성취도가 모두 낮은 저수준 유형, 모두 높은 고수준 유형, 학년이 올라감에 따라 증가하는 중수준-증가 유형, 학년이 올라감에 따라 감소하는 중수준-감소 유형으로 구분되었으며, 유형마다 흥미와 성취도의 종단적인 변화 양상이 다르게 나타나는 것을 확인하였다. 또한, 다변량 성장혼합모형의 초기값과 기울기 사이의 상관관계를 분석한 결과, 수학 흥미와 성취도는 초기값뿐 아니라 변화율에 있어서도 서로 긍정적인 영향이 있는 것으로 나타났다. 잠재집단의 결정에 영향을 미치는 요인을 개인, 수업방식, 가정 변인으로 나누어 그 영향력을 살펴보았고, 학생의 교육포부와 사교육 시간은 수학 흥미 및 성취도에 긍정적인 영향을 미치며 선행학습의 경우 그 정도에 따라 영향력이 달라지는 양상을 확인하였다. 학생이 인식한 수업방식의 경우, 교수자 중심 수업은 흥미와 성취도가 높은 집단에 속할 확률을 높이고, 학습자 중심 수업은 흥미와 성취도가 낮은 집단에 속할 확률을 높이는 것으로 나타났다. 본 연구는 다변량 성장혼합모형을 통해 수학교육에서 흥미와 성취도를 비롯한 다양한 특성에 대한 학생들의 변화 양상을 분석하는 새로운 방법을 제시하였다는 점에서 의의가 있다.

마할라노비스-다구치 시스템과 로지스틱 회귀의 성능비교 : 사례연구 (Performance Comparison of Mahalanobis-Taguchi System and Logistic Regression : A Case Study)

  • 이승훈;임근
    • 대한산업공학회지
    • /
    • 제39권5호
    • /
    • pp.393-402
    • /
    • 2013
  • The Mahalanobis-Taguchi System (MTS) is a diagnostic and predictive method for multivariate data. In the MTS, the Mahalanobis space (MS) of reference group is obtained using the standardized variables of normal data. The Mahalanobis space can be used for multi-class classification. Once this MS is established, the useful set of variables is identified to assist in the model analysis or diagnosis using orthogonal arrays and signal-to-noise ratios. And other several techniques have already been used for classification, such as linear discriminant analysis and logistic regression, decision trees, neural networks, etc. The goal of this case study is to compare the ability of the Mahalanobis-Taguchi System and logistic regression using a data set.

Time-Frequency Analysis of Electrohysterogram for Classification of Term and Preterm Birth

  • Ryu, Jiwoo;Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.103-109
    • /
    • 2015
  • In this paper, a novel method for the classification of term and preterm birth is proposed based on time-frequency analysis of electrohysterogram (EHG) using multivariate empirical mode decomposition (MEMD). EHG is a promising study for preterm birth prediction, because it is low-cost and accurate compared to other preterm birth prediction methods, such as tocodynamometry (TOCO). Previous studies on preterm birth prediction applied prefilterings based on Fourier analysis of an EHG, followed by feature extraction and classification, even though Fourier analysis is suboptimal to biomedical signals, such as EHG, because of its nonlinearity and nonstationarity. Therefore, the proposed method applies prefiltering based on MEMD instead of Fourier-based prefilters before extracting the sample entropy feature and classifying the term and preterm birth groups. For the evaluation, the Physionet term-preterm EHG database was used where the proposed method and Fourier prefiltering-based method were adopted for comparative study. The result showed that the area under curve (AUC) of the receiver operating characteristic (ROC) was increased by 0.0351 when MEMD was used instead of the Fourier-based prefilter.