• 제목/요약/키워드: dimensionality reduction

검색결과 209건 처리시간 0.031초

비유사도-기반 분류를 위한 차원 축소방법의 비교 실험 (A Comparative Experiment on Dimensional Reduction Methods Applicable for Dissimilarity-Based Classifications)

  • 김상운
    • 전자공학회논문지
    • /
    • 제53권3호
    • /
    • pp.59-66
    • /
    • 2016
  • 이 논문에서는 비유사도-기반 분류(dissimilarity-based classifications: DBC)를 효율적으로 수행할 수 있는 차원 축소 방법들을 비교 평가한 실험 결과를 보고한다. DBC에선 분류를 위해 대상 물체를 측정한 결과 값들(특징 요소들의 집합)을 이용하는 대신에 각 대상 물체들 사이의 비유사도를 측정하여 분류한다. 현재 DBC와 관련된 이슈들 중의 하나는 대규모 데이터를 취급할 경우에 비유사도 공간의 차원이 고차원으로 되는 문제가 있다. 이 문제를 해결하기 위하여 현재 프로토타입 선택(prototype selection: PS)방법이나 차원 축소(dimension reduction: DR)방법을 이용하고 있다. PS는 전체 학습 데이터에서 프로토타입을 추출하여 비유사도 공간을 구성하는 방법이고, DR은 전체 학습 데이터로 먼저 비유사도 공간을 구성한 다음 이 공간의 차원을 축소하는 방법이다. 이 논문에서는 PS이나 DR 대신에, 학습 데이터에 대한 주성분 분석으로 적절한 차원의 고유 공간 (Eigen space: ES)을 구성한 다음, 이 고유 공간으로 매핑 된 벡터들 사이의 $l_p$-놈(norm) 거리를 비유사도 거리로 측정하여 이용하는 DBC를 제안한다. 인터넷에 공개된 인공 및 실세계 데이터를 이용하여 최 근방 이웃 분류규칙으로 ES에서 수행한 DBC의 분류 성능을 측정한 결과, 고유공간의 차원을 적절하게 선정하였을 경우 PS와 DR를 이용한 DBC보다 분류 성능이 더 향상되었음을 확인하였다.

더미 다중인자 차원축소법에 의한 검증력과 주요 유전자 규명 (Power and major gene-gene identification of dummy multifactor dimensionality reduction algorithm)

  • 여정수;라부미;이호근;이성원;이제영
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권2호
    • /
    • pp.277-287
    • /
    • 2013
  • 광범위 유전자 관련 연구에서는 유전자-유전자 상호작용을 규명하는 것은 매우 중요하다. 최근 유전자-유전자 상호작용을 규명하는데에 대한 많은 연구가 진행되고 있다. 그 중 하나로 더미 다중인자 차원축소법이다. 이 연구의 목적은 모의실험을 통해 유전자-유전자 상호작용 파악하기 위한 더미 다중인자 차원축소의 검증력을 평가하는 것이다. 또한 이 방법을 적용하여 한우모집단에서 경제형질을 위한 단일 염기 다형성의 상호작용 효과를 확인하였다.

서포트 벡터 머신 알고리즘을 활용한 연속형 데이터의 다중인자 차원축소방법 적용 (Support vector machine and multifactor dimensionality reduction for detecting major gene interactions of continuous data)

  • 이제영;이종형
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권6호
    • /
    • pp.1271-1280
    • /
    • 2010
  • 인간의 질병과 가축의 특성에 영향을 주는 유전자들의 상호작용을 규명하는 방법으로 전통적인 통계방법들이 사용되었지만, 유전자와 같은 고차원의 데이터에는 적합하지 않았다. 따라서 다중인자 차원축소방법이 제안되었다. 다중인자 차원축소방법은 모형에 대한 가정이 필요하지 않는 비모수적 방법으로 이분형 자료에 적용 가능 하지만, 연속형 데이터에는 적용할 수 없는 단점이 있다. 따라서 본 연구에서는 일반화 분류 성능이 뛰어난 서포트 벡터 머신 알고리즘을 통해 연속형 자료를 가공하여 다중인자 차원축소방법에 적용하였다. 아울러 한우의 6번 염색체내 6개의 후보 단일염기다형성을 대상으로 연속형 자료인 실제 한우의 경제형질에 서포트 벡터 머신을 이용한 다중인자 차원축소방법을 적용함으로써 한우의 경제형질에 연관된 우수 유전자 상호작용의 조합을 규명하였다.

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs

  • Ke, Shian-Ru;Thuc, Hoang Le Uyen;Hwang, Jenq-Neng;Yoo, Jang-Hee;Choi, Kyoung-Ho
    • ETRI Journal
    • /
    • 제36권4호
    • /
    • pp.662-672
    • /
    • 2014
  • Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.

Investigation of gene-gene interactions of clock genes for chronotype in a healthy Korean population

  • Park, Mira;Kim, Soon Ae;Shin, Jieun;Joo, Eun-Jeong
    • Genomics & Informatics
    • /
    • 제18권4호
    • /
    • pp.38.1-38.9
    • /
    • 2020
  • Chronotype is an important moderator of psychiatric illnesses, which seems to be controlled in some part by genetic factors. Clock genes are the most relevant genes for chronotype. In addition to the roles of individual genes, gene-gene interactions of clock genes substantially contribute to chronotype. We investigated genetic associations and gene-gene interactions of the clock genes BHLHB2, CLOCK, CSNK1E, NR1D1, PER1, PER2, PER3, and TIMELESS for chronotype in 1,293 healthy Korean individuals. Regression analysis was conducted to find associations between single nucleotide polymorphism (SNP) and chronotype. For gene-gene interaction analyses, the quantitative multifactor dimensionality reduction (QMDR) method, a nonparametric model-free method for quantitative phenotypes, were performed. No individual SNP or haplotype showed a significant association with chronotype by both regression analysis and single-locus model of QMDR. QMDR analysis identified NR1D1 rs2314339 and TIMELESS rs4630333 as the best SNP pairs among two-locus interaction models associated with chronotype (cross-validation consistency [CVC] = 8/10, p = 0.041). For the three-locus interaction model, the SNP combination of NR1D1 rs2314339, TIMELESS rs4630333, and PER3 rs228669 showed the best results (CVC = 4/10, p < 0.001). However, because the mean differences between genotype combinations were minor, the clinical roles of clock gene interactions are unlikely to be critical.

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • 제22권10호
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Machine Learning-based Classification of Hyperspectral Imagery

  • Haq, Mohd Anul;Rehman, Ziaur;Ahmed, Ahsan;Khan, Mohd Abdul Rahim
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.193-202
    • /
    • 2022
  • The classification of hyperspectral imagery (HSI) is essential in the surface of earth observation. Due to the continuous large number of bands, HSI data provide rich information about the object of study; however, it suffers from the curse of dimensionality. Dimensionality reduction is an essential aspect of Machine learning classification. The algorithms based on feature extraction can overcome the data dimensionality issue, thereby allowing the classifiers to utilize comprehensive models to reduce computational costs. This paper assesses and compares two HSI classification techniques. The first is based on the Joint Spatial-Spectral Stacked Autoencoder (JSSSA) method, the second is based on a shallow Artificial Neural Network (SNN), and the third is used the SVM model. The performance of the JSSSA technique is better than the SNN classification technique based on the overall accuracy and Kappa coefficient values. We observed that the JSSSA based method surpasses the SNN technique with an overall accuracy of 96.13% and Kappa coefficient value of 0.95. SNN also achieved a good accuracy of 92.40% and a Kappa coefficient value of 0.90, and SVM achieved an accuracy of 82.87%. The current study suggests that both JSSSA and SNN based techniques prove to be efficient methods for hyperspectral classification of snow features. This work classified the labeled/ground-truth datasets of snow in multiple classes. The labeled/ground-truth data can be valuable for applying deep neural networks such as CNN, hybrid CNN, RNN for glaciology, and snow-related hazard applications.

합성곱 오토인코더 기반의 응집형 계층적 군집 분석 (Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders)

  • 박노진;고한석
    • 한국멀티미디어학회논문지
    • /
    • 제23권1호
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

시계열 데이터에 대한 클러스터링 성능 분석: Wavelet과 Autoencoder 비교 (Clustering Performance Analysis for Time Series Data: Wavelet vs. Autoencoder)

  • 황우성;임효상
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2018년도 추계학술발표대회
    • /
    • pp.585-588
    • /
    • 2018
  • 시계열 데이터의 특징을 추출하여 분석하는 과정에서 시게열 데이터가 가지는 고차원성은 차원의 저주(Course of Dimensionality)로 인해 데이터내의 유효한 정보를 찾는데 어려움을 만든다. 이러한 문제를 해결하기 위해 차원 축소 기법(dimensionality reduction)이 널리 사용되고 있지만, 축소 과정에서 발생하는 정보의 희석으로 인하여 시계열 데이터에 대한 군집화(clustering)등을 수행하는데 있어서 성능의 변화를 가져온다. 본 논문은 이러한 현상을 관찰하기 위해 이산 웨이블릿 변환(Discrete Wavelet Transform:DWT)과 오토 인코더(AutoEncoder)를 차원 축소 기법으로 활용하여 시계열 데이터의 차원을 압축 한 뒤, 압축된 데이터를 K-평균(K-means) 알고리즘에 적용하여 군집화의 효율성을 비교하였다. 성능 비교 결과, DWT는 압축된 차원수 그리고 오토인코더는 시계열 데이터에 대한 충분한 학습이 각각 보장된다면 좋은 군집화 성능을 보이는 것을 확인하였다.