• Title/Summary/Keyword: 고차원 자료

Search Result 70, Processing Time 0.021 seconds

Random projection ensemble adaptive nearest neighbor classification (랜덤 투영 앙상블 기법을 활용한 적응 최근접 이웃 판별분류기법)

  • Kang, Jongkyeong;Jhun, Myoungshic
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.401-410
    • /
    • 2021
  • Popular in discriminant classification analysis, k-nearest neighbor classification methods have limitations that do not reflect the local characteristic of the data, considering only the number of fixed neighbors. Considering the local structure of the data, the adaptive nearest neighbor method has been developed to select the number of neighbors. In the analysis of high-dimensional data, it is common to perform dimension reduction such as random projection techniques before using k-nearest neighbor classification. Recently, an ensemble technique has been developed that carefully combines the results of such random classifiers and makes final assignments by voting. In this paper, we propose a novel discriminant classification technique that combines adaptive nearest neighbor methods with random projection ensemble techniques for analysis on high-dimensional data. Through simulation and real-world data analyses, we confirm that the proposed method outperforms in terms of classification accuracy compared to the previously developed methods.

An investigation on the hyper-dimensional figure by the principle of the permanence of equivalent forms (형식불역의 원리를 통한 고차원 도형의 탐구)

  • 송상헌
    • Journal of Educational Research in Mathematics
    • /
    • v.13 no.4
    • /
    • pp.495-506
    • /
    • 2003
  • In this study, 1 investigated some properties on the special hyper-dimensional figures made by the principle of the performance of equivalent forms representation. I supposed 2 definitions on the making n-dimensional figure : a cone type(hypercube) and a pillar type(simplex). We can explain that there exists only 6 4-dimensional regular polytopes as there exists only 5 regular polygons. And there are many hyper-dimensional figures, they all have sufficient condition to show the general Euler' Characteristics. And especially, we could certificate that the simplest cone type and pillar types are fitted to Pascal's Triangle and Hasse's Diagram, each other.

  • PDF

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Spectral clustering: summary and recent research issues (스펙트럴 클러스터링 - 요약 및 최근 연구동향)

  • Jeong, Sanghun;Bae, Suhyeon;Kim, Choongrak
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.2
    • /
    • pp.115-122
    • /
    • 2020
  • K-means clustering uses a spherical or elliptical metric to group data points; however, it does not work well for non-convex data such as the concentric circles. Spectral clustering, based on graph theory, is a generalized and robust technique to deal with non-standard type of data such as non-convex data. Results obtained by spectral clustering often outperform traditional clustering such as K-means. In this paper, we review spectral clustering and show important issues in spectral clustering such as determining the number of clusters K, estimation of scale parameter in the adjacency of two points, and the dimension reduction technique in clustering high-dimensional data.

Comparison of model selection criteria in graphical LASSO (그래프 LASSO에서 모형선택기준의 비교)

  • Ahn, Hyeongseok;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.881-891
    • /
    • 2014
  • Graphical models can be used as an intuitive tool for modeling a complex stochastic system with a large number of variables related each other because the conditional independence between random variables can be visualized as a network. Graphical least absolute shrinkage and selection operator (LASSO) is considered to be effective in avoiding overfitting in the estimation of Gaussian graphical models for high dimensional data. In this paper, we consider the model selection problem in graphical LASSO. Particularly, we compare various model selection criteria via simulations and analyze a real financial data set.

Value at Risk calculation using sparse vine copula models (성근 바인 코풀라 모형을 이용한 고차원 금융 자료의 VaR 추정)

  • An, Kwangjoon;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.875-887
    • /
    • 2021
  • Value at Risk (VaR) is the most popular measure for market risk. In this paper, we consider the VaR estimation of portfolio consisting of a variety of assets based on multivariate copula model known as vine copula. In particular, sparse vine copula which penalizes too many parameters is considered. We show in the simulation study that sparsity indeed improves out-of-sample forecasting of VaR. Empirical analysis on 60 KOSPI stocks during the last 5 years also demonstrates that sparse vine copula outperforms regular copula model.

A review on the t-distributed stochastic neighbors embedding (t-SNE에 대한 요약)

  • Kipoong Kim;Choongrak Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.167-173
    • /
    • 2023
  • This paper investigates several methods of visualizing high-dimensional data in a low-dimensional space. At first, principal component analysis and multidimensional scaling are briefly introduced as linear approaches, and then kernel principal component analysis, self-organizing map, locally linear embedding, Isomap, Laplacian Eigenmaps, and local multidimensional scaling are introduced as nonlinear approaches. In particular, t-SNE, which is widely used but relatively unfamiliar in the field of statistics, is described in more detail. We also present a simple example for several methods, including t-SNE. Finally, we provide a review of several recent studies pointing out the limitations of t-SNE and discuss the future research problems presented.

High-dimensional change point detection using MOSUM-based sparse projection (MOSUM 성근 프로젝션을 이용한 고차원 시계열의 변화점 추정)

  • Kim, Moonjung;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.63-75
    • /
    • 2022
  • This paper proposes the so-called MOSUM-based sparse projection method for change points detection in high-dimensional time series. Our method is inspired by Wang and Samworth (2018), however, our method improves their method in two ways. One is to find change points all at once, so it minimizes sequential error. The other is localized so that more robust to the mean changes offsetting each other. We also propose data-driven threshold selection using block wild bootstrap. A comprehensive simulation study shows that our method performs reasonably well in finite samples. We also illustrate our method to stock prices consisting of S&P 500 index, and found four change points in recent 6 years.

Banded vector heterogeneous autoregression models (밴드구조 VHAR 모형)

  • Sangtae Kim;Changryong Baek
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.529-545
    • /
    • 2023
  • This paper introduces the Banded-VHAR model suitable for high-dimensional long-memory time series with band structure. The Banded-VHAR model has nonignorable correlations only with adjacent dimensions due to data features, for example, geographical information. Row-wise estimation method is adapted for fast computation. Also, two estimation methods, namely BIC and ratio methods, are proposed to estimate the width of band. We demonstrate asymptotic consistency of our proposed estimation methods through simulation study. Real data applications to pm2.5 and apartment trading volume substantiate that our Banded-VHAR model outperforms traditional sparse VHAR model in forecasting and easy to interpret model coefficients.