• Title/Summary/Keyword: 고차원 대용량 자료

Search Result 6, Processing Time 0.021 seconds

A study on high dimensional large-scale data visualization (고차원 대용량 자료의 시각화에 대한 고찰)

  • Lee, Eun-Kyung;Hwang, Nayoung;Lee, Yoondong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1061-1075
    • /
    • 2016
  • In this paper, we discuss various methods to visualize high dimensional large-scale data and review some issues associated with visualizing this type of data. High-dimensional data can be presented in a 2-dimensional space with a few selected important variables. We can visualize more variables with various aesthetic attributes in graphics or use the projection pursuit method to find an interesting low-dimensional view. For large-scale data, we discuss jittering and alpha blending methods that solve any problem with overlapping points. We also review the R package tabplot, scagnostics, and other R packages for interactive web application with visualization.

Current trends in high dimensional massive data analysis (고차원 대용량 자료분석의 현재 동향)

  • Jang, Woncheol;Kim, Gwangsu;Kim, Joungyoun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.999-1005
    • /
    • 2016
  • The advent of big data brings the opportunity to answer many open scientic questions but also presents some interesting challenges. Main features of contemporary datasets are the high dimensionality and massive sample size. In this paper, we give an overview of major challenges caused by these two features: (1) noise accumulation and spurious correlations in high dimensional data; (ii) computational scalability for massive data. We also provide applications of big data in various fields including forecast of disasters, digital humanities and sabermetrics.

Extension of SHORE storage system for multimedia applications (멀티미디어 응용을 위한 SHORE 하부저장 시스템의 확장)

  • 정재욱;장재욱
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.6-8
    • /
    • 1999
  • 컴퓨터 통신 기술의 급속한 발달로 인해 정지영상, 오디오, 비디오와 같은 다양한 미디어로 구성된 대용량의 멀티미디어 자료를 효율적으로 저장하고 관리할 수 있는 하부 저장 시스템이 필요하다. 이러한 멀티미디어 자료에 대한 내용-기반 검색을 위해 텍스트 기반 검색과 색상 또는 질감과 같은 특징 벡터에 기반한 검색이 이루어져야 한다. 본 논문에서는 멀티미디어 응용을 위한 하부저장 시스템을 구현하기 위해 미국 위스콘신 대학에서 개발한 지속성 객체 시스템인 SHORE를 확장하고자 한다. 텍스트 기반 검색을 위해 역화일 구조를 구현하였으며, 고차원의 특징 벡터의 검색을 위해 X-트리를 통합하였다.

  • PDF

Estimating GARCH models using kernel machine learning (커널기계 기법을 이용한 일반화 이분산자기회귀모형 추정)

  • Hwang, Chang-Ha;Shin, Sa-Im
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.419-425
    • /
    • 2010
  • Kernel machine learning is gaining a lot of popularities in analyzing large or high dimensional nonlinear data. We use this technique to estimate a GARCH model for predicting the conditional volatility of stock market returns. GARCH models are usually estimated using maximum likelihood (ML) procedures, assuming that the data are normally distributed. In this paper, we show that GARCH models can be estimated using kernel machine learning and that kernel machine has a higher predicting ability than ML methods and support vector machine, when estimating volatility of financial time series data with fat tail.

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression (단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1512-1518
    • /
    • 2021
  • Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.

Variable selection with quantile regression tree (분위수 회귀나무를 이용한 변수선택 방법 연구)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1095-1106
    • /
    • 2016
  • The quantile regression method proposed by Koenker et al. (1978) focuses on conditional quantiles given by independent variables, and analyzes the relationship between response variable and independent variables at the given quantile. Considering the linear programming used for the estimation of quantile regression coefficients, the model fitting job might be difficult when large data are introduced for analysis. Therefore, dimension reduction (or variable selection) could be a good solution for the quantile regression of large data sets. Regression tree methods are applied to a variable selection for quantile regression in this paper. Real data of Korea Baseball Organization (KBO) players are analyzed following the variable selection approach based on the regression tree. Analysis result shows that a few important variables are selected, which are also meaningful for the given quantiles of salary data of the baseball players.