• 제목/요약/키워드: multivariate classification

검색결과 311건 처리시간 0.029초

다변량 크리깅과 KOMPSAT-2 영상을 이용한 간석지 표층 퇴적물 분류 (Surface Sediments Classification in Tidal Flats using Multivariate Kriging and KOMPSAT-2 Imagery)

  • 이상원;박노욱;장동호;유희영;임효숙
    • 한국지형학회지
    • /
    • 제19권3호
    • /
    • pp.37-49
    • /
    • 2012
  • 이 논문의 목적은 간석지 표층 퇴적상 분류를 목적으로 다변량 크리깅을 기반으로 고해상도 원격탐사 자료와 현장 조사 자료를 결합하는 방법론을 제안하는데 있다. 퇴적물 성분에 따라 미리 범주화시킨 퇴적물 자료를 사용하여 원격탐사 자료를 분류하는 기존 방법론과 달리 현장 조사 자료와 원격탐사 자료를 이용하여 퇴적물 성분별 분포도를 제작한 후에 최종 단계에서 범주화 시키는 분류 방법론을 제안하였다. 퇴적물 성분별 분포도 제작 과정에서 현장 조사 자료와 원격탐사 자료의 결합을 위해 다변량 크리깅 기법인 회귀 크리깅 기법을 이용하였다. 우선 현장조사 자료의 모래, 실트, 점토 성분별로 고해상도 원격탐사 자료의 분광 정보와 회귀 분석을 수행하여, 각 성분별 경향 성분을 추출하였다. 그리고 현장 조사 자료 위치에서 잔차를 계산한 후에, 잔차에 대해 크리깅을 적용하여 잔차분포도를 얻게 된다. 이후 성분별 경향 성분과 잔차 성분을 합하여 성분별 비율 분포도를 작성한 후에 최종 단계에서 퇴적상 분류를 수행하게 된다. 제안 기법의 적용성 평가를 위해 바람아래 간석지를 대상으로 고해상도 KOMPSAT-2 자료를 이용한 사례 연구를 수행하였다. 사례 연구를 통해 제안 기법이 기존 분류 방법에 비해 상대적으로 높은 분류 정확도를 나타내었으며, 특히 세립질 퇴적물 분류에 더 우수한 것으로 나타났다. 따라서 제안 기법은 원격탐사 자료를 이용한 간석지 표층 퇴적상 분류에 유용하게 사용될 수 있을 것으로 기대된다.

Clinical Relevance of the Tumor Location-Modified Lauren Classification System of Gastric Cancer

  • Choi, Jang Kyu;Park, Young Suk;Jung, Do Hyun;Son, Sang Yong;Ahn, Sang Hoon;Park, Do Joong;Kim, Hyung Ho
    • Journal of Gastric Cancer
    • /
    • 제15권3호
    • /
    • pp.183-190
    • /
    • 2015
  • Purpose: The Lauren classification system is a very commonly used pathological classification system of gastric adenocarcinoma. A recent study proposed that the Lauren classification should be modified to include the anatomical location of the tumor. The resulting three types were found to differ significantly in terms of genomic expression profiles. This retrospective cohort study aimed to evaluate the clinical significance of the modified Lauren classification (MLC). Materials and Methods: A total of 677 consecutive patients who underwent curative gastrectomy from January 2005 to December 2007 for histologically confirmed gastric cancer were included. The patients were divided according to the MLC into proximal non-diffuse (PND), diffuse (D), and distal non-diffuse (DND) type. The groups were compared in terms of clinical features and overall survival. Multivariate analysis served to assess the association between MLC and prognosis. Results: Of the 677 patients, 48, 358, and 271 had PND, D, and DND, respectively. Their 5-year overall survival rates were 77.1%, 77.7%, and 90.4%. Compared to D and PND, DND was associated with significantly better overall survival (both P<0.01). Multivariate analysis showed that age, differentiation, lympho-vascular invasion, T and N stage, but not MLC, were independent prognostic factors for overall survival. Multivariate analysis of early gastric cancer patients showed that MLC was an independent prognostic factor for overall survival (odds ratio, 5.946; 95% confidence intervals, 1.524~23.197; P=0.010). Conclusions: MLC is prognostic for survival in patients with gastric adenocarcinoma, in early gastric cancer. DND was associated with an improved prognosis compared to PND or D.

UHPLC-DAD 및 다변량분석법을 이용한 참당귀의 산지감별법 연구 (Geographical Classification of Angelica gigas using UHPLC-DAD Combined Multivariate Analyses)

  • 김정률;이동영;성상현;김진웅
    • 생약학회지
    • /
    • 제44권4호
    • /
    • pp.332-335
    • /
    • 2013
  • Geographical classification of A. gigas was performed in the present study using UHPLC-DAD combined with multivariate data analysis techniques. Six active constituents were isolated from A. gigas; nodakenin, marmesin, decursinol, demethylsuberosin, decursin and decursinol angelate. One hundred sixty eight A. gigas samples were simultaneously determined using UHPLC-DAD. A principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA) was used to classify the samples according to geographical origins (Korea and China). The origins of A. gigas from Korea and China were correctly classified by 81.6% and 93.8% using PLS-DA Y prediction. This result demonstrates the potential use of UHPLC-DAD combined with multivariate analysis techniques as an accurate and rapid method to classify A. gigas according to their geographical origin.

AUTOMATED ELECTROFACIES DETERMINATION USING MULTIVARIATE STATISTICAL ANALYSIS

  • Kim Jungwhan;Lim Jong-Se
    • 한국석유지질학회:학술대회논문집
    • /
    • 한국석유지질학회 1998년도 제5차 학술발표회 발표논문집
    • /
    • pp.10-14
    • /
    • 1998
  • A systematic methodology is developed for the electrofacies determination from wireline log data using multivariate statistical analysis. To consider corresponding contribution of each log and reduce the computational dimension, multivariate logs are transformed into a single variable through principal components analysis. Resultant principal components logs are segmented using the statistical zonation method to enhance the efficiency and quality of the interpreted results. Hierarchical cluster analysis is then used to group the segments into electrofacies. Optimal number of groups is determined on the basis of the ratio of within-group variance to total variance and core data. This technique is applied to the wells in the Korea Continental Shelf. The results of field application demonstrate that the prediction of lithology based on the electrofacies classification matches well to the core and the cutting data with high reliability This methodology for electrofacies classification can be used to define the reservoir characteristics which are helpful to the reservoir management.

  • PDF

다변량 시계열 자료를 이용한 부정맥 예측 (Prediction of arrhythmia using multivariate time series data)

  • 이민혜;노호석
    • 응용통계연구
    • /
    • 제32권5호
    • /
    • pp.671-681
    • /
    • 2019
  • 최근에 부정맥 환자가 증가하면서 머신러닝을 이용한 부정맥을 예측하는 연구가 활발하게 진행되고 있다. 기존의 많은 연구들은 특정한 시점의 RR 간격 데이터에서 추출한 특징변수 다변량 데이터에 기반하여 부정맥을 예측하였다. 본 연구에서는 심장 상태가 시간에 따라 변해가는 패턴도 부정맥 예측에 중요한 정보가 될 수 있다고 생각하여 일정한 시간 간격을 두고 특징변수의 다변량 벡터를 추출하여 쌓음으써 얻어지는 다변량 시계열 데이터로 부정맥을 예측하는 것의 유용성에 대해 살펴보았다. 1-Nearest Neighbor 방법과 그것을 앙상블(ensemble)한 learner를 중심으로 비교했을 경우 시계열의 특징을 고려한 적절한 시계열 거리함수를 선택하여 시계열 정보를 활용한 다변량 시계열 데이터 기반 방법의 분류 성능이 더 좋게 나오는 것을 확인하였다.

Empirical Bayes Posterior Odds Ratio for Heteroscedastic Classification

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • 제16권2호
    • /
    • pp.92-101
    • /
    • 1987
  • Our interest is to access in some way teh relative odds or probability that a multivariate observation Z belongs to one of k multivariate normal populations with unequal covariance matrices. We derived the empirical Bayes posterior odds ratio for the classification rule when population parameters are unknown. It is a generalization of the posterior odds ratio suggested by Gelsser (1964). The classification rule does not have complicated distribution theory which a large variety of techniques from the sampling viewpoint have. The proposed posterior odds ratio is compared to the Gelsser's posterior odds ratio through a Monte Carlo study. The results show that the empiricla Bayes posterior odds ratio, in general, performs better than the Gelsser's. Especially, for large dimension of Z and small training sample, the performance is prominent.

  • PDF

Non-Destructive Sorting Techniques for Viable Pepper (Capsicum annuum L.) Seeds Using Fourier Transform Near-Infrared and Raman Spectroscopy

  • Seo, Young-Wook;Ahn, Chi Kook;Lee, Hoonsoo;Park, Eunsoo;Mo, Changyeun;Cho, Byoung-Kwan
    • Journal of Biosystems Engineering
    • /
    • 제41권1호
    • /
    • pp.51-59
    • /
    • 2016
  • Purpose: This study examined the performance of two spectroscopy methods and multivariate classification methods to discriminate viable pepper seeds from their non-viable counterparts. Methods: A classification model for viable seeds was developed using partial least square discrimination analysis (PLS-DA) with Fourier transform near-infrared (FT-NIR) and Raman spectroscopic data in the range of $9080-4150cm^{-1}$ (1400-2400 nm) and $1800-970cm^{-1}$, respectively. The datasets were divided into 70% to calibration and 30% to validation. To reduce noise from the spectra and compare the classification results, preprocessing methods, such as mean, maximum, and range normalization, multivariate scattering correction, standard normal variate, and $1^{st}$ and $2^{nd}$ derivatives with the Savitzky-Golay algorithm were used. Results: The classification accuracies for calibration using FT-NIR and Raman spectroscopy were both 99% with first derivative, whereas the validation accuracies were 90.5% with both multivariate scattering correction and standard normal variate, and 96.4% with the raw data (non-preprocessed data). Conclusions: These results indicate that FT-NIR and Raman spectroscopy are valuable tools for a feasible classification and evaluation of viable pepper seeds by providing useful information based on PLS-DA and the threshold value.

다변량 통계 기법을 이용한 물리검층 자료로부터의 암석물리학상 결정 (Automatic Electrofacies Classification from Well Logs Using Multivariate Statistical Techniques)

  • 임종세;김정환;강주명
    • 지구물리와물리탐사
    • /
    • 제1권3호
    • /
    • pp.170-175
    • /
    • 1998
  • 이 연구는 다변량 통계 기법을 이용한 물리검층 자료로부터의 암석물리학상 결정으로 암상을 예측하는 것이다. 기술 통계 분석으로 물리검층 자료의 특성을 파악하고 주성분 분석에 의한 다변량 검층 자료들의 상관도 분석을 통해 변수들을 변환시켜 새로운 변수인 주성분을 구하고 변수들의 차원을 축소한다. 통계적 방법에 의한 주성분 검층 자료의 구획에 의한 효율적 자료 축소와 계산의 효율성을 높여 양질의 해석결과를 얻을 수 있다. 구획된 주성분 검층 자료로부터 계보적 군집 분석에 의해 암석물리학상을 결정한다. 최적 암석물리학상의 수는 전체 변동과 군집내의 변동사이의 비와 코어자료 등에 의해 비교 결정된다. 이 연구에서 개발된 암석물리학상 결정법을 국내대륙붕 물리검층자료에 적용한 결과 결정된 암석물리학상은 시추 코어 및 시추 암편 분석에 의한 암상 구분화와 잘 일치하였다. 이러한 연구는 저류층 특성인자의 신뢰성 있고 정량적인 평가로 유전 개발 및 생산 계획 시 유용한 도구로 활용될 수 있을 것이다.

  • PDF

다변수통계방법을 이용한 산지분류에 관한 연구 (A Study on Forest Land Classification Using Multivariate Statistical Methods : A Case Study at Mt. Kwanak)

  • 정순오
    • 한국조경학회지
    • /
    • 제13권1호
    • /
    • pp.43-66
    • /
    • 1985
  • Korea needs proper and rational public policies on conservation and use of forest land and other natural resources because of the accelerating expansion of national land developments in recent years. Unfortunately, there is no systematic planning system to support the needs. Generally, forest land use planning needs suitability analysis based on efficient land classification system. The goal of this study was to classify a forest land using multivariate satistical methods. A case study was carried out in winter of 1983 on a mountainous area higher than 100m above sea level located at Mt. Kwanak in Anyang -city, Kyung-gi-do (province). The study area was 19.80 km$^2$wide and was divided into 1, 383 Operational Taxonomic Units (OTU's) by a 120m$\times$120m grid. Fourteen descriptors were identified and quantified for each OTU from existing national land data : elevation, slope, aspect, terrain form, geologic material, surface soil permeability, topsoil type, depth of the solum, soil acidity, forest cover type, stand size class, stand age class, stand density class, and simple forest soil capability class. For this study, a FORTRAN IV program was written for input and output map data, and the computer statistics packages, SPSS and BMD, were used to perform the multivariate statistical analysis. Fourteen variables were analyzed to investigate the characteristics of their fire quench distribution and to estimate the correlation coefficients among them. Principal component analysis was executed to find the dimensions of forest land characteristics, and factor scores were used for proper samples of OTU throughout the study area. In order to develop the classes of forest land classification based on 102 surrogates, cluster and discriminant analyses of principal descriptor variable matrix were undertaken. Results obtained through a series of multivariate statistical analyses were as follows ; 1) Principal component analysis was proved to be a useful tool for data selection and identification of principal descriptor variables which represented the characteristics of forest land and facilitated the selection of samples.

  • PDF

Bootstrap Confidence Intervals of Classification Error Rate for a Block of Missing Observations

  • Chung, Hie-Choon
    • Communications for Statistical Applications and Methods
    • /
    • 제16권4호
    • /
    • pp.675-686
    • /
    • 2009
  • In this paper, it will be assumed that there are two distinct populations which are multivariate normal with equal covariance matrix. We also assume that the two populations are equally likely and the costs of misclassification are equal. The classification rule depends on the situation when the training samples include missing values or not. We consider the bootstrap confidence intervals for classification error rate when a block of observation is missing.