• Title/Summary/Keyword: multivariate classification

Search Result 314, Processing Time 0.039 seconds

Diagnostic Classification Based on Nonlinear Representation and Filtering of Process Measurement Data (공정측정데이터의 비선형표현과 전처리를 활용한 분류기반 진단)

  • Cho, Hyun-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.5
    • /
    • pp.3000-3005
    • /
    • 2015
  • Reliable monitoring and diagnosis of industrial processes is quite important for in terms of quality and safety. The goal of fault diagnosis is to find process variables responsible for causing specific abnormalities of the process. This work presents a classification-based diagnostic scheme based on nonlinear representation of process data. The use of a nonlinear kernel technique is able to reduce the size of the data considered and provides efficient and reliable representation of the measurement data. As a filtering stage a preprocessing is performed to eliminate unwanted parts of the data with enhanced performance. The case study of an industrial batch process has shown that the performance of the scheme outperformed other methods. In addition, the use of a nonlinear representation technique and filtering improved the diagnosis performance in the case study.

Partial Discharge Data Analysis with Unsupervised Classification (무감독분류 기법에 의한 부분방전 데이터 분석)

  • Cho, Kyungsoon;Hong, Seonhack
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.4
    • /
    • pp.9-16
    • /
    • 2018
  • This study described partial discharge(PD) distribution analysis between the XLPE(Cross-Linked PolyEthylene)and EPDM(Ethylene Propylene Diene Monomer) interface with unsupervised classification. The ${\phi}-q-n$ patterns were analyzed using phase resolved partial discharge(PRPD). K-means cluster analysis forms a cluster based on similarities and distances among scattered individuals, and analyzes the characteristics of the formed clusters, dividing the multivariate data into several groups according to the similarity of each characteristic, Is a statistical analysis that makes it easier to navigate. It was confirmed that the phase angle of the cluster with the maximum discharge charge was concentrated around $0^{\circ}$ and $180^{\circ}$ at 30 kV after the initial phase distribution localized around $90^{\circ}$ and $300^{\circ}$ expanded to the whole phase angle according to the voltage rise. The Euclidean distance between the center of gravity and the discharge charge in the ${\Phi}-q$ cluster increased with increasing applied voltage.

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

Affective Representations of Basic Tastes and Intensity using Multivariate Analyses (다변량분석방법을 이용한 미각 자극의 기본 맛과 강도에 따른 정서표상 )

  • Chaery Park;Inik Kim;Jongwan Kim
    • Science of Emotion and Sensibility
    • /
    • v.26 no.2
    • /
    • pp.39-52
    • /
    • 2023
  • According to the core affect theory, affect consists of two independent dimensions of valence and arousal. Previous studies have found that various types of stimuli, such as pictures, videos, and music, are mapped onto the core affect space. However, the research on affect using gustatory stimuli has not been explored sufficiently. This study investigated whether the affects elicited by tastes could be mapped onto the core affect space. Stimuli were selected based on two factors (taste types and intensity). Participants were presented with each stimulus, evaluated the tastes, and rated their affective responses on taste and emotion scales. The data were analyzed using repeated-measures ANOVAs and multivariate analyses (multidimensional scaling and classification). The results of univariate analyses indicated that participants felt positive for sweet stimuli but negative for bitter and salty. Furthermore, participants reported high arousal with high intensity. Multidimensional scaling revealed that taste stimuli are also represented on the core affect dimensions. Specifically, it was confirmed that in the first dimension, sweetness was represented as a positive affect, while bitter and salty tastes were represented as a negative affect. In the second dimension, bitterness was represented as low arousal and sourness as high arousal. Classification analyses confirmed that the taste was identified consistently based on the affective responses within and across participants. This study showed that the taste stimuli in daily life are also located on core affect dimensions of valence and arousal.

Estimation of Brain Connectivity during Motor Imagery Tasks using Noise-Assisted Multivariate Empirical Mode Decomposition

  • Lee, Ki-Baek;Kim, Ko Keun;Song, Jaeseung;Ryu, Jiwoo;Kim, Youngjoo;Park, Cheolsoo
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1812-1824
    • /
    • 2016
  • The neural dynamics underlying the causal network during motor planning or imagery in the human brain are not well understood. The lack of signal processing tools suitable for the analysis of nonlinear and nonstationary electroencephalographic (EEG) hinders such analyses. In this study, noise-assisted multivariate empirical mode decomposition (NA-MEMD) is used to estimate the causal inference in the frequency domain, i.e., partial directed coherence (PDC). Natural and intrinsic oscillations corresponding to the motor imagery tasks can be extracted due to the data-driven approach of NA-MEMD, which does not employ predefined basis functions. Simulations based on synthetic data with a time delay between two signals demonstrated that NA-MEMD was the optimal method for estimating the delay between two signals. Furthermore, classification analysis of the motor imagery responses of 29 subjects revealed that NA-MEMD is a prerequisite process for estimating the causal network across multichannel EEG data during mental tasks.

A Study of the Integration of Individual Classification Model in Data Mining for the Credit Evaluation (신용평가를 위한 데이터마이닝 분류모형의 통합모형에 관한 연구)

  • Kim Kap Sik
    • The KIPS Transactions:PartD
    • /
    • v.12D no.2 s.98
    • /
    • pp.211-218
    • /
    • 2005
  • This study presents an integrated data mining model for the credit evaluation of the customers of a capital company. Based on customer information and financing processes in capital market, we derived individual models from multi-layered perceptrons(MLP), multivariate discrimination analysis(MDA), and decision tree. Further, the results from the existing models were compared with the results from the integrated model using genetic algorithm. The integrated model presented by this study turned out to be superior to the existing models. This study contributes not only to verifying the existing individual models but also to overcoming the limitations of the existing approaches.

Improving data reliability on oligonucleotide microarray

  • Yoon, Yeo-In;Lee, Young-Hak;Park, Jin-Hyun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.107-116
    • /
    • 2004
  • The advent of microarray technologies gives an opportunity to moni tor the expression of ten thousands of genes, simultaneously. Such microarray data can be deteriorated by experimental errors and image artifacts, which generate non-negligible outliers that are estimated by 15% of typical microarray data. Thus, it is an important issue to detect and correct the se faulty probes prior to high-level data analysis such as classification or clustering. In this paper, we propose a systematic procedure for the detection of faulty probes and its proper correction in Genechip array based on multivariate statistical approaches. Principal component analysis (PCA), one of the most widely used multivariate statistical approaches, has been applied to construct a statistical correlation model with 20 pairs of probes for each gene. And, the faulty probes are identified by inspecting the squared prediction error (SPE) of each probe from the PCA model. Then, the outlying probes are reconstructed by the iterative optimization approach minimizing SPE. We used the public data presented from the gene chip project of human fibroblast cell. Through the application study, the proposed approach showed good performance for probe correction without removing faulty probes, which may be desirable in the viewpoint of the maximum use of data information.

  • PDF

Prognostic Value of Pretreatment Serum Alkaline Phosphatase in Nasopharyngeal Carcinoma

  • Xie, Ying;Wei, Zheng-Bo;Duan, Xu-Wei
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.8
    • /
    • pp.3547-3553
    • /
    • 2014
  • Background: The prognostic value of serum alkaline phosphatase (S-ALP) has not been fully validated for nasopharyngeal carcinoma (NPC). Materials and Methods: S-ALP levels were measured in 601 patients newly diagnosed with NPC before radical treatment, and possible associations of these levels with 5-year overall survival (OS) and tumor-free survival (TFS) were explored using univariate and multivariate analyses. Results: Elevated pretreatment S-ALP (>85 U/L) was significantly less frequent among patients classified as T1+2 or stage I+II than among those classified as T3+4 or stage III+IV. Multivariate analysis showed that elevated pretreatment S-ALP (>85 U/L), age, T classification and N stage were independent predictors of poor OS and TFS. Conclusions: Pretreatment S-ALP may be a reliable biomarker to evaluate the long-term prognosis of patients with NPC.

Application of metabolic profiling for biomarker discovery

  • Hwang, Geum-Sook
    • Proceedings of the Korean Society of Applied Pharmacology
    • /
    • 2007.11a
    • /
    • pp.19-27
    • /
    • 2007
  • An important potential of metabolomics-based approach is the possibility to develop fingerprints of diseases or cellular responses to classes of compounds with known common biological effect. Such fingerprints have the potential to allow classification of disease states or compounds, to provide mechanistic information on cellular perturbations and pathways and to identify biomarkers specific for disease severity and drug efficacy. Metabolic profiles of biological fluids contain a vast array of endogenous metabolites. Changes in those profiles resulting from perturbations of the system can be observed using analytical techniques, such as NMR and MS. $^1H$ NMR was used to generate a molecular fingerprint of serum or urinary sample, and then pattern recognition technique was applied to identity molecular signatures associated with the specific diseases or drug efficiency. Several metabolites that differentiate disease samples from the control were thoroughly characterized by NMR spectroscopy. We investigated the metabolic changes in human normal and clinical samples using $^1H$ NMR. Spectral data were applied to targeted profiling and spectral binning method, and then multivariate statistical data analysis (MVDA) was used to examine in detail the modulation of small molecule candidate biomarkers. We show that targeted profiling produces robust models, generates accurate metabolite concentration data, and provides data that can be used to help understand metabolic differences between healthy and disease population. Such metabolic signatures could provide diagnostic markers for a disease state or biomarkers for drug response phenotypes.

  • PDF

A Hill-Sliding Strategy for Initialization of Gaussian Clusters in the Multidimensional Space

  • Park, J.Kyoungyoon;Chen, Yung-H.;Simons, Daryl-B.;Miller, Lee-D.
    • Korean Journal of Remote Sensing
    • /
    • v.1 no.1
    • /
    • pp.5-27
    • /
    • 1985
  • A hill-sliding technique was devised to extract Gaussian clusters from the multivariate probability density estimates of sample data for the first step of iterative unsupervised classification. The underlying assumption in this approach was that each cluster possessed a unimodal normal distribution. The key idea was that a clustering function proposed could distinguish elements of a cluster under formation from the rest in the feature space. Initial clusters were extracted one by one according to the hill-sliding tactics. A dimensionless cluster compactness parameter was proposed as a universal measure of cluster goodness and used satisfactorily in test runs with Landsat multispectral scanner (MSS) data. The normalized divergence, defined by the cluster divergence divided by the entropy of the entire sample data, was utilized as a general separability measure between clusters. An overall clustering objective function was set forth in terms of cluster covariance matrices, from which the cluster compactness measure could be deduced. Minimal improvement of initial data partitioning was evaluated by this objective function in eliminating scattered sparse data points. The hill-sliding clustering technique developed herein has the potential applicability to decomposition of any multivariate mixture distribution into a number of unimodal distributions when an appropriate diatribution function to the data set is employed.