• Title/Summary/Keyword: High-dimensional data

Search Result 1,531, Processing Time 0.028 seconds

Application of Smart Geospatial Information for Modeling and Analysis of City River (도시하천 분석과 모델링을 위한 스마트 지형공간정보의 응용)

  • Lee, Hyun Jik;Eom, Jun Sik;Yu, Young Geol;Park, Eun Gwan
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.4
    • /
    • pp.135-142
    • /
    • 2013
  • This study aims to seek adequate and optimized method of applying high quality three-dimensional spatial data created via high-resolution digital aerial photograph image and aerial LiDAR data onto three-dimensional planning of environmentally friendly, ecological restoration of rivers in accordance with irrigation and flood control objectives of urban rivers. Through three-dimensional modeling of before and after the restoration, the research also offers basic information regarding restorations of rivers. Also the transition from the conventional two-dimensional planning into three-dimensional planning environment using smart spatial information acquire accuracy of river analysis, analyze possible civil complaints and suggest solutions to potential problems.

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

  • Ko, Hyoseok;Kim, Kipoong;Sun, Hokeun
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.187-195
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's $T^2$ test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

Adjusted Direct Orthogonal Signal Correction For High-Dimensional Spectral Data (고차원 스펙트라 데이터 분석을 위한 Adjusted Direct Orthogonal Signal Correction 기법)

  • Kim, Sin-Young;Kim, Seoung-Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.37 no.4
    • /
    • pp.400-407
    • /
    • 2011
  • Modeling and analysis of high-dimensional spectral data provide an opportunity to uncover inherent patterns in various information-rich data. Orthogonal signal correction (OSC) a preprocessing technique has been widely used to remove unwanted variations of spectral data that do not contribute to prediction or classification. In the present study we propose a novel OSC algorithm called adjusted direct OSC to improve visualization and the ability of classification. Experimental results with real mass spectral data from condom lubricants demonstrate the effectiveness of the proposed approach.

An Experimental Study on Smoothness Regularized LDA in Hyperspectral Data Classification (하이퍼스펙트럴 데이터 분류에서의 평탄도 LDA 규칙화 기법의 실험적 분석)

  • Park, Lae-Jeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.4
    • /
    • pp.534-540
    • /
    • 2010
  • High dimensionality and highly correlated features are the major characteristics of hyperspectral data. Linear projections such as LDA and its variants have been used in extracting low-dimensional features from high-dimensional spectral data. Regularization of LDA has been introduced to alleviate the overfitting that often occurs in a small-sized training data set and leads to poor generalization performance. Among them, a smoothness regularized LDA seems to be effective in the feature extraction for hyperspectral data due to its capability of utilizing the high correlatedness. This paper studies the performance of the regularized LDA in hyperspectral data classification experimentally with varying conditions of the training data. In addition, a new dual smoothness regularized LDA is proposed and evaluated that makes use of both the spectral-domain and spatial-domain correlations between neighboring pixels.

Mini-Review of Studies Reporting the Repeatability and Reproducibility of Diffusion Tensor Imaging

  • Seo, Jeong Pyo;Kwon, Young Hyeon;Jang, Sung Ho
    • Investigative Magnetic Resonance Imaging
    • /
    • v.23 no.1
    • /
    • pp.26-33
    • /
    • 2019
  • Purpose: Diffusion tensor imaging (DTI) data must be analyzed by an analyzer after data processing. Hence, the analyzed data of DTI might depend on the analyzer, making it a major limitation. This paper reviewed previous DTI studies reporting the repeatability and reproducibility of data from the corticospinal tract (CST), one of the most actively researched neural tracts on this topic. Materials and Methods: Relevant studies published between January 1990 and December 2018 were identified by searching PubMed, Google Scholar, and MEDLINE electronic databases using the following keywords: DTI, diffusion tensor tractography, reliability, repeatability, reproducibility, and CST. As a result, 15 studies were selected. Results: Measurements of the CSTs using region of interest methods on 2-dimensional DTI images generally showed excellent repeatability and reproducibility of more than 0.8 but high variability (0.29 to 1.00) between studies. In contrast, measurements of the CST using the 3-dimensional DTT method not only revealed excellent repeatability and reproducibility of more than 0.9 but also low variability (repeatability, 0.88 to 1.00; reproducibility, 0.82 to 0.99) between studies. Conclusion: Both 2-dimensional DTI and 3-dimensional DTT methods appeared to be reliable for measuring the CST but the 3-dimensional DTT method appeared to be more reliable.

Design of an Efficient Parallel High-Dimensional Index Structure (효율적인 병렬 고차원 색인구조 설계)

  • Park, Chun-Seo;Song, Seok-Il;Sin, Jae-Ryong;Yu, Jae-Su
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.58-71
    • /
    • 2002
  • Generally, multi-dimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amount of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel high-dimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-n$\times$mD(disk) architecture which is the hybrid type of nP-nD and lP-nD. Its node structure increases fan-out and reduces the height of a index tree. Also, A range search algorithm that maximizes I/O parallelism is devised, and it is applied to K-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

Current trends in high dimensional massive data analysis (고차원 대용량 자료분석의 현재 동향)

  • Jang, Woncheol;Kim, Gwangsu;Kim, Joungyoun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.999-1005
    • /
    • 2016
  • The advent of big data brings the opportunity to answer many open scientic questions but also presents some interesting challenges. Main features of contemporary datasets are the high dimensionality and massive sample size. In this paper, we give an overview of major challenges caused by these two features: (1) noise accumulation and spurious correlations in high dimensional data; (ii) computational scalability for massive data. We also provide applications of big data in various fields including forecast of disasters, digital humanities and sabermetrics.

Simulator for Dynamic 2/3-Dimensional Switching of Computing Resources

  • Ki, Jang-Geun;Kwon, Kee-Young
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-17
    • /
    • 2020
  • In this paper, as part of the research for the infrastructure of very high flexible and reconfigurable data center using very high speed crossbar switches, we developed a simulator that can model two and three dimensional connection structure of switches with an efficient control algorithm using software defined network and verified the functions and analyzed the performance accordingly. The simulator consists of a control module and a switch module that was coded using Python language based on the Mininet and Ryu Openflow frameworks. The control module dynamically controls the operation of switching cells using a shortest multipath algorithm to calculate efficient paths adaptively between configurable computing resources. Performance analysis by using the simulator shows that the three-dimensional switch architecture can accommodate more hosts per port and has about 1.5 times more successful 1:n connections per port with the same number of switches than the two-dimensional architecture. Also simulation results show that connection length in a 3-dimensional way is shorter than that of 2-dimensional way and the unused switch ratio in a 3-dimensional case is lower than that of 2-dimensional cases.

Temperature Distributions of High Precision Spindle with Built -in Motor (모터내장형 주축의 온도분포해석에 관한 연구)

  • 김용길;김수태;박천홍;김춘배
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1996.04a
    • /
    • pp.624-628
    • /
    • 1996
  • Unsteady-state temperature distributions in the high precision spindle system with built-in motor are studied. For the analysis, three dimensional model is built for the high precision spindle. The three dimensional model includes the estimation on the amount of heat generation of bearing and built-in motor and the thermal characteristic values such as heat transfer coefficient. Temperature distributions are computed using the finite element method. Analysis results are compared with the measured data. Analysis shows that temperature distributions of high precision spindle system can be estimated resonably using the three dimensional model through the finite element method.

  • PDF