• 제목/요약/키워드: 고차원 데이터

Search Result 254, Processing Time 0.026 seconds

IoT Attack Detection Using PCA and Machine Learning (주성분 분석과 기계학습을 이용한 사물인터넷 공격 탐지)

  • Lee, Ji-Gu;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.245-246
    • /
    • 2022
  • 최근 IoT 환경에서 기계학습을 이용한 공격 탐지 모델의 연구가 활발히 진행되고 있으며, 탐지 정확도도 점차 향상되고 있다. 하지만, IoT 환경의 특징인 저 사양 하드웨어, 고차원의 특징, 방대한 트래픽 등으로 인해 탐지성능이 저하되는 문제가 있다. 따라서 본 논문에서는 MQTT(Message Queuing Telementry Transport) 프로토콜 기반의 IoT 환경에서 수집된 데이터셋을 대상으로 주성분 분석(Principal Component Analysis)과 LightGBM을 이용하여 데이터셋 차원을 감소시키고, 공격 클래스를 분류하였다. 실험결과 원본 데이터셋 차원을 주성분 3개(약 9%)로 감소시켰음에도 모든 특징(33개)을 사용한 실험결과와 거의 유사한 성능을 보였다. 또한 기존 연구의 특징 선택을 통한 탐지 모델과 비교하였을 때도 분류성능이 더 우수한 것으로 나타났다.

  • PDF

Design of Digit Recognition System Realized with the Aid of Fuzzy RBFNNs and Incremental-PCA (퍼지 RBFNNs와 증분형 주성분 분석법으로 실현된 숫자 인식 시스템의 설계)

  • Kim, Bong-Youn;Oh, Sung-Kwun;Kim, Jin-Yul
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.1
    • /
    • pp.56-63
    • /
    • 2016
  • In this study, we introduce a design of Fuzzy RBFNNs-based digit recognition system using the incremental-PCA in order to recognize the handwritten digits. The Principal Component Analysis (PCA) is a widely-adopted dimensional reduction algorithm, but it needs high computing overhead for feature extraction in case of using high dimensional images or a large amount of training data. To alleviate such problem, the incremental-PCA is proposed for the computationally efficient processing as well as the incremental learning of high dimensional data in the feature extraction stage. The architecture of Fuzzy Radial Basis Function Neural Networks (RBFNN) consists of three functional modules such as condition, conclusion, and inference part. In the condition part, the input space is partitioned with the use of fuzzy clustering realized by means of the Fuzzy C-Means (FCM) algorithm. Also, it is used instead of gaussian function to consider the characteristic of input data. In the conclusion part, connection weights are used as the extended diverse types in polynomial expression such as constant, linear, quadratic and modified quadratic. Experimental results conducted on the benchmarking MNIST handwritten digit database demonstrate the effectiveness and efficiency of the proposed digit recognition system when compared with other studies.

Design of an Efficient Parallel High-Dimensional Index Structure (효율적인 병렬 고차원 색인구조 설계)

  • Park, Chun-Seo;Song, Seok-Il;Sin, Jae-Ryong;Yu, Jae-Su
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.58-71
    • /
    • 2002
  • Generally, multi-dimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amount of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel high-dimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-n$\times$mD(disk) architecture which is the hybrid type of nP-nD and lP-nD. Its node structure increases fan-out and reduces the height of a index tree. Also, A range search algorithm that maximizes I/O parallelism is devised, and it is applied to K-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

Implementation of a Face Authentication Embedded System Using High-dimensional Local Binary Pattern Descriptor and Joint Bayesian Algorithm (고차원 국부이진패턴과 결합베이시안 알고리즘을 이용한 얼굴인증 임베디드 시스템 구현)

  • Kim, Dongju;Lee, Seungik;Kang, Seog Geun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1674-1680
    • /
    • 2017
  • In this paper, an embedded system for face authentication, which exploits high-dimensional local binary pattern (LBP) descriptor and joint Bayesian algorithm, is proposed. We also present a feasible embedded system for the proposed algorithm implemented with a Raspberry Pi 3 model B. Computer simulation for performance evaluation of the presented face authentication algorithm is carried out using a face database of 500 persons. The face data of a person consist of 2 images, one for training and the other for test. As performance measures, we exploit score distribution and face authentication time with respect to the dimensions of principal component analysis (PCA). As a result, it is confirmed that an embedded system having a good face authentication performance can be implemented with a relatively low cost under an optimized embedded environment.

A Case Study on Effect Analysis of Students' Engagement and Learning Outcomes in Higher Education (대학생의 학습참여가 학습성과에 미치는 영향에 대한 사례 연구)

  • Cho, Jin-Suk;Jeon, Young-Mee
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.524-534
    • /
    • 2019
  • This study was to analyze the students' engagement in regular curriculum and extra-curriculum and its effects on learning outcomes in higher education. Students' engagement was analysed by high order learning, reflective and integrative learning, learning strategies, collaborative learning, discussions with diverse others, and high impact activities. To achieve the purpose of this study, 392 students joined in K-NSSE were participated. To analyze the datum, frequency analysis, ANOVA, correlation analysis, and regression analysis were performed using IBM SPSS 25.0 program. The following results were obtained. First, students' engagement was generally very low, especially in high impact activities which has an effect on the students' achievement. And compared to the students' engagement in the college of humanity and social science, the students' engagement in engineering college were very low. Learning outcomes were influenced by the high impact activities, high-order learning, and discussions with diverse others. So to reinforce students' engagement in learning process, this study proposed a curriculum-extracurriculum integrated system. And to improvement students' engagement, teaching and learning support programs including high impact activities. high order learning, and discussions with diverse others were proposed to be developed and operated.

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

Principal Component Analysis of Higher-Order Hyperedges in EEG Data (EEG 데이터의 고차원 하이퍼에지에서의 주성분 분석)

  • Kim, Joon-Shik;Lee, Chung-Yeon;Zhang, Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.414-416
    • /
    • 2012
  • 고차 주성분 방법으로는 텐서 분석이 있었다. Electroencephalography(EEG) 데이터나 Social Network 데이터에 텐서 분석이 적용되어 주요한 성분들을 찾는 연구들이 있었다. 그러나 텐서 분석은 직관적으로 이해하기에 어려움이 있으며 중요한 노드를 찾는데에는 다소 어려움이 있다. 본 논문에서는 고차 하이퍼에지로 이차원 행렬을 만들고 주성분분석법을 이용하여 중요한 노드를 찾는 새로운 방법론을 제시한다. 데이터로는 Multimodal Memory Game(MMG) 수행시 촬영한 EEG 데이터를 사용하였다. MMG는 TV 드라마 기반의 기억인출게임이다. 베타파의 Power Spectrum Density(PSD)는 각 위치의 채널들의 활성도를 나타내는 지표이다. 우리는 Random Sampling을 바탕으로 PSD 상위 50%의 채널들간의 전이행렬을 구하였다. 그 후 고유치와 고유벡터를 구하였다. 가장 큰 고유치의 고유벡터는 주성분을 나타내며 고유벡터의 각 원소들은 중요도를 나타내는 centrality 이다. 세 명의 피험자에 대한 centrality 상위 30개의 중요한 채널들을 구하였고 세명에 공통적으로 포함되는 채널을 확인하였다.

Emotion Recognition and Expression using Facial Expression (얼굴표정을 이용한 감정인식 및 표현 기법)

  • Ju, Jong-Tae;Park, Gyeong-Jin;Go, Gwang-Eun;Yang, Hyeon-Chang;Sim, Gwi-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.04a
    • /
    • pp.295-298
    • /
    • 2007
  • 본 논문에서는 사람의 얼굴표정을 통해 4개의 기본감정(기쁨, 슬픔, 화남, 놀람)에 대한 특징을 추출하고 인식하여 그 결과를 이용하여 감정표현 시스템을 구현한다. 먼저 주성분 분석(Principal Component Analysis)법을 이용하여 고차원의 영상 특징 데이터를 저차원 특징 데이터로 변환한 후 이를 선형 판별 분석(Linear Discriminant Analysis)법에 적용시켜 좀 더 효율적인 특징벡터를 추출한 다음 감정을 인식하고, 인식된 결과를 얼굴 표현 시스템에 적용시켜 감정을 표현한다.

  • PDF

An SVD-Based Approach for Generating High-Dimensional Data and Query Sets (SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성)

  • 김상욱
    • The Journal of Information Technology and Database
    • /
    • v.8 no.2
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

Design of Digits Recognition Method Based on pRBFNNs Using HOG Features (HOG 특징을 이용한 다항식 방사형 기저함수 신경회로망 기반 숫자 인식 방법의 설계)

  • Kim, Bong-Youn;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2015.07a
    • /
    • pp.1365-1366
    • /
    • 2015
  • 본 논문에서는 HOG 특징을 이용한 다항식 방사형 기저함수 신경회로망 기반 숫자 인식 시스템의 설계를 제안한다. 제안한 숫자 인식 시스템은 HOG 특징을 이용하여 숫자를 입력 데이터로 사용하기 위해 특징을 계산한다. 다항식 방사형 기저 함수 신경회로망은 고차원 데이터의 입-출력 형태를 갖는 클래스를 분류하는데 용이하며, 활성함수의 중심점 및 분포상수는 Fuzzy C-Means(FCM) 알고리즘에 의해 초기 값을 설정한다. 또한 제안한 분류기의 최적화를 위해 Particle Swarm Optimization(PSO)를 사용하여 최적화된 분류기의 성능을 비교한다. 숫자 인식을 위하여 공인 데이터베이스인 MNIST handwritten digit database를 사용하여 분류기의 성능을 평가하고 분석한다.

  • PDF