Search | Korea Science

Lee, Eun-Kyung;Hwang, Nayoung;Lee, Yoondong
- The Korean Journal of Applied Statistics
- /
- v.29 no.6
- /
- pp.1061-1075
- /
- 2016
In this paper, we discuss various methods to visualize high dimensional large-scale data and review some issues associated with visualizing this type of data. High-dimensional data can be presented in a 2-dimensional space with a few selected important variables. We can visualize more variables with various aesthetic attributes in graphics or use the projection pursuit method to find an interesting low-dimensional view. For large-scale data, we discuss jittering and alpha blending methods that solve any problem with overlapping points. We also review the R package tabplot, scagnostics, and other R packages for interactive web application with visualization.
https://doi.org/10.5351/KJAS.2016.29.6.1061 인용 PDF KSCI

Jang, Woncheol;Kim, Gwangsu;Kim, Joungyoun
- The Korean Journal of Applied Statistics
- /
- v.29 no.6
- /
- pp.999-1005
- /
- 2016
The advent of big data brings the opportunity to answer many open scientic questions but also presents some interesting challenges. Main features of contemporary datasets are the high dimensionality and massive sample size. In this paper, we give an overview of major challenges caused by these two features: (1) noise accumulation and spurious correlations in high dimensional data; (ii) computational scalability for massive data. We also provide applications of big data in various fields including forecast of disasters, digital humanities and sabermetrics.
https://doi.org/10.5351/KJAS.2016.29.6.999 인용 PDF KSCI

정재욱;장재욱
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10a
- /
- pp.6-8
- /
- 1999
컴퓨터 통신 기술의 급속한 발달로 인해 정지영상, 오디오, 비디오와 같은 다양한 미디어로 구성된 대용량의 멀티미디어 자료를 효율적으로 저장하고 관리할 수 있는 하부 저장 시스템이 필요하다. 이러한 멀티미디어 자료에 대한 내용-기반 검색을 위해 텍스트 기반 검색과 색상 또는 질감과 같은 특징 벡터에 기반한 검색이 이루어져야 한다. 본 논문에서는 멀티미디어 응용을 위한 하부저장 시스템을 구현하기 위해 미국 위스콘신 대학에서 개발한 지속성 객체 시스템인 SHORE를 확장하고자 한다. 텍스트 기반 검색을 위해 역화일 구조를 구현하였으며, 고차원의 특징 벡터의 검색을 위해 X-트리를 통합하였다.
PDF

Hwang, Chang-Ha;Shin, Sa-Im
- Journal of the Korean Data and Information Science Society
- /
- v.21 no.3
- /
- pp.419-425
- /
- 2010
Kernel machine learning is gaining a lot of popularities in analyzing large or high dimensional nonlinear data. We use this technique to estimate a GARCH model for predicting the conditional volatility of stock market returns. GARCH models are usually estimated using maximum likelihood (ML) procedures, assuming that the data are normally distributed. In this paper, we show that GARCH models can be estimated using kernel machine learning and that kernel machine has a higher predicting ability than ML methods and support vector machine, when estimating volatility of financial time series data with fat tail.
PDF KSCI

Chi, Sang-Mun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.11
- /
- pp.1512-1518
- /
- 2021
Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.
https://doi.org/10.6109/jkiice.2021.25.11.1512 인용 PDF KSCI

Chang, Youngjae
- The Korean Journal of Applied Statistics
- /
- v.29 no.6
- /
- pp.1095-1106
- /
- 2016
The quantile regression method proposed by Koenker et al. (1978) focuses on conditional quantiles given by independent variables, and analyzes the relationship between response variable and independent variables at the given quantile. Considering the linear programming used for the estimation of quantile regression coefficients, the model fitting job might be difficult when large data are introduced for analysis. Therefore, dimension reduction (or variable selection) could be a good solution for the quantile regression of large data sets. Regression tree methods are applied to a variable selection for quantile regression in this paper. Real data of Korea Baseball Organization (KBO) players are analyzed following the variable selection approach based on the regression tree. Analysis result shows that a few important variables are selected, which are also meaningful for the given quantiles of salary data of the baseball players.
https://doi.org/10.5351/KJAS.2016.29.6.1095 인용 PDF KSCI