Search | Korea Science

A Study on Selecting Principle Component Variables Using Adaptive Correlation (적응적 상관도를 이용한 주성분 변수 선정에 관한 연구)

Ko, Myung-Sook
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.3
- /
- pp.79-84
- /
- 2021
A feature extraction method capable of reflecting features well while mainaining the properties of data is required in order to process high-dimensional data. The principal component analysis method that converts high-level data into low-dimensional data and express high-dimensional data with fewer variables than the original data is a representative method for feature extraction of data. In this study, we propose a principal component analysis method based on adaptive correlation when selecting principal component variables in principal component analysis for data feature extraction when the data is high-dimensional. The proposed method analyzes the principal components of the data by adaptively reflecting the correlation based on the correlation between the input data. I want to exclude them from the candidate list. It is intended to analyze the principal component hierarchy by the eigen-vector coefficient value, to prevent the selection of the principal component with a low hierarchy, and to minimize the occurrence of data duplication inducing data bias through correlation analysis. Through this, we propose a method of selecting a well-presented principal component variable that represents the characteristics of actual data by reducing the influence of data bias when selecting the principal component variable.
https://doi.org/10.3745/KTSDE.2021.10.3.79 인용 PDF KSCI

An Efficient Processing of Continuous Range Queries on High-Dimensional Spatial Data (고차원 공간 데이터를 위한 연속 범위 질의의 효율적인 처리)

Jang, Su-Min;Yoo, Jae-Soo
- Journal of KIISE:Computing Practices and Letters
- /
- v.13 no.6
- /
- pp.397-401
- /
- 2007
Recent applications on continuous queries on moving objects are extended quickly to various parts. These applications need not only 2-dimensional space data but also high-dimensional space data. If we use previous index for overlapped continuous range queries on high-dimensional space data, as the number of continuous range queries on a large number of moving objects becomes larger, their performance degrades significantly. We focus on stationary queries, non-exponential increase of storage cost and efficient processing time for large data sets. In this paper, to solve these problems, we present a novel query indexing method, denoted as PAB(Projected Attribute Bit)-based query index. We transfer information of high-dimensional continuous range query on each axis into one-dimensional bit lists by projecting technique. Also proposed query index supports incremental update for efficient query processing. Through various experiments, we show that our method outperforms the CES(containment-encoded squares)-based indexing method which is one of the most recent research.
PDF KSCI

Feature Extraction on High Dimensional Data Using Incremental PCA (점진적인 주성분분석기법을 이용한 고차원 자료의 특징 추출)

Kim Byung-Joo
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.8 no.7
- /
- pp.1475-1479
- /
- 2004
High dimensional data requires efficient feature extraction techliques. Though PCA(Principal Component Analysis) is a famous feature extraction method it requires huge memory space and computational cost is high. In this paper we use incremental PCA for feature extraction on high dimensional data. Through experiment we show that proposed method is superior to APEX model.
PDF KSCI

Bayesian baseline-category logit random effects models for longitudinal nominal data

Kim, Jiyeong;Lee, Keunbaik
- Communications for Statistical Applications and Methods
- /
- v.27 no.2
- /
- pp.201-210
- /
- 2020
Baseline-category logit random effects models have been used to analyze longitudinal nominal data. The models account for subject-specific variations using random effects. However, the random effects covariance matrix in the models needs to explain subject-specific variations as well as serial correlations for nominal outcomes. In order to satisfy them, the covariance matrix must be heterogeneous and high-dimensional. However, it is difficult to estimate the random effects covariance matrix due to its high dimensionality and positive-definiteness. In this paper, we exploit the modified Cholesky decomposition to estimate the high-dimensional heterogeneous random effects covariance matrix. Bayesian methodology is proposed to estimate parameters of interest. The proposed methods are illustrated with real data from the McKinney Homeless Research Project.
https://doi.org/10.29220/CSAM.2020.27.2.201 인용 PDF KSCI

Effect of outliers on the variable selection by the regularized regression

Jeong, Junho;Kim, Choongrak
- Communications for Statistical Applications and Methods
- /
- v.25 no.2
- /
- pp.235-243
- /
- 2018
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
https://doi.org/10.29220/CSAM.2018.25.2.235 인용 PDF KSCI

Three-Dimensional Borehole Radar Modeling (3차원 시추공 레이다 모델링)

예병주
- Economic and Environmental Geology
- /
- v.33 no.1
- /
- pp.41-50
- /
- 2000
Geo-radar survey which has the advantage of high-resolution and relatively fast survey has been widely used for engineering and environmental problems. Three-dimensional effects have to be considered in the interpretation of geo-radar for high-resolution. However, there exists a trouble on the analysis of the three dimensional effects. To solve this problem an efficient three dimension numerical modeling algorithm is needed. Numerical radar modeling in three dimensional case requires large memory and long calculating time. In this paper, a finite difference method time domain solution to Maxwell's equations for simulating electromagnetic wave propagation in three dimensional media was developed to make economic algorithm which requires smaller memory and shorter calculating time. And in using boundary condition Liao absorption boundary. The numerical result of cross-hole radar survey for tunnel is compared with real data. The two results are well matched. To prove application to three dimensional analysis, the results with variation of tunnel's incident angle to survey cross-section and the result when the tunnel is parallel to the cross-section were examined. This algorithm is useful in various geo-radar survey and can give basic data to develop dat processing and inversion program.
PDF

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

Aydadenta, Husna;Adiwijaya, Adiwijaya
- Journal of Information Processing Systems
- /
- v.14 no.5
- /
- pp.1167-1175
- /
- 2018
Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.
https://doi.org/10.3745/JIPS.04.0087 인용 PDF KSCI

Progression-Preserving Dimension Reduction for High-Dimensional Sensor Data Visualization

Yoon, Hyunjin;Shahabi, Cyrus;Winstein, Carolee J.;Jang, Jong-Hyun
- ETRI Journal
- /
- v.35 no.5
- /
- pp.911-914
- /
- 2013
This letter presents Progression-Preserving Projection, a dimension reduction technique that finds a linear projection that maps a high-dimensional sensor dataset into a two- or three-dimensional subspace with a particularly useful property for visual exploration. As a demonstration of its effectiveness as a visual exploration and diagnostic means, we empirically evaluate the proposed technique over a dataset acquired from our own virtual-reality-enhanced ball-intercepting training system designed to promote the upper extremity movement skills of individuals recovering from stroke-related hemiparesis.
https://doi.org/10.4218/etrij.13.0212.0468 인용 PDF KSCI

An Efficient Content-Based High-Dimensional Index Structure for Image Data

Lee, Jang-Sun;Yoo, Jae-Soo;Lee, Seok-Hee;Kim, Myung-Joon
- ETRI Journal
- /
- v.22 no.2
- /
- pp.32-42
- /
- 2000
The existing multi-dimensional index structures are not adequate for indexing higher-dimensional data sets. Although conceptually they can be extended to higher dimensionalities, they usually require time and space that grow exponentially with the dimensionality. In this paper, we analyze the existing index structures and derive some requirements of an index structure for content-based image retrieval. We also propose a new structure, for indexing large amount of point data in a high-dimensional space that satisfies the requirements. in order to justify the performance of the proposed structure, we compare the proposed structure with the existing index structures in various environments. We show, through experiments, that our proposed structure outperforms the existing structures in terms of retrieval time and storage overhead.
PDF

PdR-Tree : An Efficient Indexing Technique for the improvement of search performance in High-Dimensional Data (PdR-트리 : 고차원 데이터의 검색 성능 향상을 위한 효율적인 인덱스 기법)

Joh, Beom-Seok;Park, Young-Bae
- The KIPS Transactions:PartD
- /
- v.8D no.2
- /
- pp.145-153
- /
- 2001
The Pyramid-Technique is based on mapping n-dimensional space data into one-dimensional data and expressing it as B-tree ; and by solving the problem of search time complexity the pyramid technique also prevents the effect \"phenomenon of dimensional curse\" which is caused by treatment of hypercube range query in n-dimensional data space. The Spherical Pyramid-Technique applies the pyramid method’s space division strategy, uses spherical range query and improves the search performance to make it suitable for similarity search. However, depending on the size of data and change in dimensions, the two above technique demonstrate significantly inferior search performance for data sizes greater than one million and dimensions greater than sixteen. In this paper, we propose a new index-structured PdR-Tree to improve the search performance for high dimensional data such as multimedia data. Test results using simulation data as well as real data demonstrate that PdR-Tree surpasses both the Pyramid-Technique and Spherical Pyramid-Technique in terms of search performance.
PDF

Search Result 1,545, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)