• Title/Summary/Keyword: multivariate classification

Search Result 309, Processing Time 0.028 seconds

Classification of Forest Cover Types in the Baekdudaegan, South Korea

  • Chung, Sang Hoon;Lee, Sang Tae
    • Journal of Forest and Environmental Science
    • /
    • v.37 no.4
    • /
    • pp.269-279
    • /
    • 2021
  • This study was carried out to introduce the forest cover types of the Baekdudaegan inhabiting the number of native tree species. In order to understand the vegetation distribution characteristics of the Baekdudaegan, a vegetation survey was conducted on the major 20 mountains of the Baekdudaegan. The vegetation data were collected from 3,959 sample points by the point-centered quarter method. Each mountain was classified into 4-7 forests by using various multivariate statistical methods such as cluster analysis, indicator species analysis, multiple discriminant analysis, and species composition analysis. The forests were classified mainly according to the relative abundance of Quercus mongolica. There was a total of 111 classified forests and these forests were integrated into the following nine forest cover types using the percentage similarity index and by clustering according to vegetation type: 1) Mongolian oak, 2) Mongolian oak and other deciduous, 3) Oaks (Mixed Quercus spp.), 4) Korean red pine, 5) Korean red pine and oaks, 6) ash, 7) mixed mesophytic, 8) subalpine zone coniferous, and 9) miscellaneous forest. Forests grouped within the subalpine zone coniferous and miscellaneous classifications were characterized by similar environmental conditions and those forests that did not fit in any other category, respectively.

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Superpixel-based Apple Leaf Disease Classification using Convolutional Neural Network (합성곱 신경망을 이용하는 수퍼픽셀 기반 사과잎 병충해의 분류)

  • Kim, Manbae;Choi, Changyeol
    • Journal of Broadcast Engineering
    • /
    • v.25 no.2
    • /
    • pp.208-217
    • /
    • 2020
  • The classification of plant diseases by images captured by a camera sensor has been studied over past decades. A method that has gained much interest is to use image segmentation, from which statistical features are derived and analyzed by machine learning. Recently, deep learning has been adopted in this area. However, image segmentation is still a difficult task to achieve stable performance due to a variety of environmental variations. The end-to-end learning in neural network has a demerit that train images may be different from real images acquired in outdoor fields. To solve these problems, we propose superpixel-based disease classification method using end-to-end CNN (convolutional neural network) learning. Based on experiments performed on PlantVillage apple images, the classification accuracy is 98.29% and 92.43% for full-image and superpixel. As well, the multivariate F1-score is (0.98, 0.93). Therefore we validate that the method of using superpixel is comparable to that of full-image.

Application of Multispectral Remotely Sensed Imagery for the Characterization of Complex Coastal Wetland Ecosystems of southern India: A Special Emphasis on Comparing Soft and Hard Classification Methods

  • Shanmugam, Palanisamy;Ahn, Yu-Hwan;Sanjeevi , Shanmugam
    • Korean Journal of Remote Sensing
    • /
    • v.21 no.3
    • /
    • pp.189-211
    • /
    • 2005
  • This paper makes an effort to compare the recently evolved soft classification method based on Linear Spectral Mixture Modeling (LSMM) with the traditional hard classification methods based on Iterative Self-Organizing Data Analysis (ISODATA) and Maximum Likelihood Classification (MLC) algorithms in order to achieve appropriate results for mapping, monitoring and preserving valuable coastal wetland ecosystems of southern India using Indian Remote Sensing Satellite (IRS) 1C/1D LISS-III and Landsat-5 Thematic Mapper image data. ISODATA and MLC methods were attempted on these satellite image data to produce maps of 5, 10, 15 and 20 wetland classes for each of three contrast coastal wetland sites, Pitchavaram, Vedaranniyam and Rameswaram. The accuracy of the derived classes was assessed with the simplest descriptive statistic technique called overall accuracy and a discrete multivariate technique called KAPPA accuracy. ISODATA classification resulted in maps with poor accuracy compared to MLC classification that produced maps with improved accuracy. However, there was a systematic decrease in overall accuracy and KAPPA accuracy, when more number of classes was derived from IRS-1C/1D and Landsat-5 TM imagery by ISODATA and MLC. There were two principal factors for the decreased classification accuracy, namely spectral overlapping/confusion and inadequate spatial resolution of the sensors. Compared to the former, the limited instantaneous field of view (IFOV) of these sensors caused occurrence of number of mixture pixels (mixels) in the image and its effect on the classification process was a major problem to deriving accurate wetland cover types, in spite of the increasing spatial resolution of new generation Earth Observation Sensors (EOS). In order to improve the classification accuracy, a soft classification method based on Linear Spectral Mixture Modeling (LSMM) was described to calculate the spectral mixture and classify IRS-1C/1D LISS-III and Landsat-5 TM Imagery. This method considered number of reflectance end-members that form the scene spectra, followed by the determination of their nature and finally the decomposition of the spectra into their endmembers. To evaluate the LSMM areal estimates, resulted fractional end-members were compared with normalized difference vegetation index (NDVI), ground truth data, as well as those estimates derived from the traditional hard classifier (MLC). The findings revealed that NDVI values and vegetation fractions were positively correlated ($r^2$= 0.96, 0.95 and 0.92 for Rameswaram, Vedaranniyam and Pitchavaram respectively) and NDVI and soil fraction values were negatively correlated ($r^2$ =0.53, 0.39 and 0.13), indicating the reliability of the sub-pixel classification. Comparing with ground truth data, the precision of LSMM for deriving moisture fraction was 92% and 96% for soil fraction. The LSMM in general would seem well suited to locating small wetland habitats which occurred as sub-pixel inclusions, and to representing continuous gradations between different habitat types.

Decoding Brain Patterns for Colored and Grayscale Images using Multivariate Pattern Analysis

  • Zafar, Raheel;Malik, Muhammad Noman;Hayat, Huma;Malik, Aamir Saeed
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1543-1561
    • /
    • 2020
  • Taxonomy of human brain activity is a complicated rather challenging procedure. Due to its multifaceted aspects, including experiment design, stimuli selection and presentation of images other than feature extraction and selection techniques, foster its challenging nature. Although, researchers have focused various methods to create taxonomy of human brain activity, however use of multivariate pattern analysis (MVPA) for image recognition to catalog the human brain activities is scarce. Moreover, experiment design is a complex procedure and selection of image type, color and order is challenging too. Thus, this research bridge the gap by using MVPA to create taxonomy of human brain activity for different categories of images, both colored and gray scale. In this regard, experiment is conducted through EEG testing technique, with feature extraction, selection and classification approaches to collect data from prequalified criteria of 25 graduates of University Technology PETRONAS (UTP). These participants are shown both colored and gray scale images to record accuracy and reaction time. The results showed that colored images produces better end result in terms of accuracy and response time using wavelet transform, t-test and support vector machine. This research resulted that MVPA is a better approach for the analysis of EEG data as more useful information can be extracted from the brain using colored images. This research discusses a detail behavior of human brain based on the color and gray scale images for the specific and unique task. This research contributes to further improve the decoding of human brain with increased accuracy. Besides, such experiment settings can be implemented and contribute to other areas of medical, military, business, lie detection and many others.

Multivariate process control procedure using a decision tree learning technique (의사결정나무를 이용한 다변량 공정관리 절차)

  • Jung, Kwang Young;Lee, Jaeheon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.639-652
    • /
    • 2015
  • In today's manufacturing environment, the process data can be easily measured and transferred to a computer for analysis in a real-time mode. As a result, it is possible to monitor several correlated quality variables simultaneously. Various multivariate statistical process control (MSPC) procedures have been presented to detect an out-of-control event. Although the classical MSPC procedures give the out-of-control signal, it is difficult to determine which variable has caused the signal. In order to solve this problem, data mining and machine learning techniques can be considered. In this paper, we applied the technique of decision tree learning to the MSPC, and we did simulation for MSPC procedures to monitor the bivariate normal process means. The results of simulation show that the overall performance of the MSPC procedure using decision tree learning technique is similar for several values of correlation coefficient, and the accurate classification rates for out-of-control are different depending on the values of correlation coefficient and the shift magnitude. The introduced procedure has the advantage that it provides the information about assignable causes, which can be required by practitioners.

HPLC-tandem Mass Spectrometric Analysis of the Marker Compounds in Forsythiae Fructus and Multivariate Analysis

  • Cho, Hwang-Eui;Ahn, Su-Youn;Son, In-Seop;Hwang, Gyung-Hwa;Kim, Sun-Chun;Woo, Mi-Hee;Lee, Seung-Ho;Son, Jong-Keun;Hong, Jin-Tae;Moon, Dong-Cheul
    • Natural Product Sciences
    • /
    • v.17 no.2
    • /
    • pp.147-159
    • /
    • 2011
  • A high-performance liquid chromatography-electrospray ionization-tandem mass spectrometric method was developed to determine simultaneously eight marker constituents of Forsythiae fructus, and subsequently applied it to classify its two botanical origins. The marker compounds of Forsythia suspensa were phillyrin, pinoresinol, phillygenin, lariciresinol and forsythiaside; those of F.viridissima were arctiin, arctigenin and matairesinol. Separation of the eight analytes was achieved on a phenyl-hexyl column (150${\times}$2.0 mm i.d., 3 ${\mu}M$) using gradient elution with the mobile phase: (A) 10% acetonitrile in 0.5% acetic acid, (B) 40% aqueous acetonitrile. A few fragment ions specific to the types of lignans, among the product ions generated by collisonally induced dissociation (CID) of molecular ion clusters, such as [M-H]$^-$ or [M+OAc]$^-$ were used not only for fingerprinting analysis but for the quantification of each epimer by using multiple-reaction monitoring mode. It was shown good linearity ($r^2{\geq}$ 0.9998) over the wide range of all analytes; intra- and inter-day precisions (RSD, %) were within 9.14% and the accuracy ranged from 84.3 to 115.1%. The analytical results of 40 drug samples, combined with multivariate statistical analyses - principal component analysis (PCA) and hierarchical cluster analysis (HCA) - clearly demonstrated the classification of the test samples according to their botanical origins. This method would provide a practical strategy for assessing the authenticity or quality of the herbal drug.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

Complication After Gastrectomy for Gastric Cancer According to Hospital Volume: Based on Korean Gastric Cancer Association-Led Nationwide Survey Data

  • Sang-Ho Jeong;Moon-Won Yoo ;Miyeong Park ;Kyung Won Seo ;Jae-Seok Min;Information Committee of the Korean Gastric Cancer Association
    • Journal of Gastric Cancer
    • /
    • v.23 no.3
    • /
    • pp.462-475
    • /
    • 2023
  • Purpose: This study aimed to analyze the incidence and risk factors of complications following gastric cancer surgery in Korea and to compare the correlation between hospital complications based on the annual number of gastrectomies performed. Materials and Methods: A retrospective analysis was conducted using data from 12,244 patients from 64 Korean institutions. Complications were classified using the Clavien-Dindo classification (CDC). Univariate and multivariate analyses were performed to identify the risk factors for severe complications. Results: Postoperative complications occurred in 14% of the patients, severe complications (CDC IIIa or higher) in 4.9%, and postoperative death in 0.2%. The study found that age, stage, American Society of Anesthesiologists (ASA) score, Eastern Cooperative Oncology Group (ECOG) score, hospital stay, approach methods, and extent of gastric resection showed statistically significant differences depending on hospital volumes (P<0.05). In the univariate analysis, patient age, comorbidity, ASA score, ECOG score, approach methods, extent of gastric resection, tumor-node-metastasis (TNM) stage, and hospital volume were significant risk factors for severe complications. However, only age, sex, ASA score, ECOG score, extent of gastric resection, and TNM stage were statistically significant in the multivariate analysis (P<0.05). Hospital volume was not a significant risk factor in the multivariate analysis (P=0.152). Conclusions: Hospital volume was not a significant risk factor for complications after gastric cancer surgery. The differences in the frequencies of complications based on hospital volumes may be attributed to larger hospitals treating patients with younger age, lower ASA scores, better general conditions, and earlier TNM stages.

IDENTIFICATION OF FALSIFIED DRUGS USING NEAR-INFRARED SPECTROSCOPY

  • Scafi, Sergio H.F.;Pasquini, Celio
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.3112-3112
    • /
    • 2001
  • Near-Infrared Spectroscopy (NIRS) was investigated aiming at the identification of falsified drugs. The identification is based on comparison of the NIR spectrum of a sample with a typical spectra of an authentic drug using multivariate modelling and classification algorithms (PCA/SIMCA). Two spectrophotometers (Brimrose - Luminar 2000 and 2030), based on acoustic-optical filter (AOTF) technology, sharing the same controlling computer, software (Brimrose - Snap 2.03) and the data acquisition electronics, were employed. The Luminar 2000 scans the range 850 1800 nm and was employed for transmitance/absorbance measurements of liquids with a transflectance optical bundle probe with total optical path of 5 mm and a circular area of 0.5 $\textrm{cm}^2$. Model 2030 scans the rage 1100 2400 nm and was employed for reflectance measurement of solids drugs. 300 spectra, acquired in about 20 s, were averaged for each sample. Chemometric treatment of the spectral data, modelling and classification were performed by using the Unscrambler 7.5 software (CAMO Norway). This package provides the Principal Component Analysis (PCA) and SIMCA algorithms, used for modelling and classification, respectively. Initially, NIRS was evaluated for spectrum acquisition of various drugs, selected in order to accomplish the diversity of physico-chemical characteristics found among commercial products. Parameters which could affect the spectra of a given drug (especially if presented as solid tablets) were investigated and the results showed that the first derivative can minimize spectral changes associated with tablet geometry, physical differences in their faces and position in relation to the probe beam. The effect of ambient humidity and temperature were also investigated. The first factor needs to be controlled for model construction because the ambient humidity can cause spectral alterations that should cause the wrong classification of a real drug if the factor is not considered by the model.

  • PDF