• Title/Summary/Keyword: 고차원 데이터

Search Result 252, Processing Time 0.03 seconds

Design of Pedestrian Detection System Based on Optimized pRBFNNs Pattern Classifier Using HOG Features and PCA (PCA와 HOG특징을 이용한 최적의 pRBFNNs 패턴분류기 기반 보행자 검출 시스템의 설계)

  • Lim, Myeoung-Ho;Park, Chan-Jun;Oh, Sung-Kwun;Kim, Jin-Yul
    • Proceedings of the KIEE Conference
    • /
    • 2015.07a
    • /
    • pp.1345-1346
    • /
    • 2015
  • 본 논문에서는 보행자 및 배경 이미지로부터 HOG-PCA 특징을 추출하고 다항식 기반 RBFNNs(Radial Basis Function Neural Network) 패턴분류기과 최적화 알고리즘을 이용하여 보행자를 검출하는 시스템 설계를 제안한다. 입력 영상으로부터 보행자를 검출하기 위해 전처리 과정에서 HOG(Histogram of oriented gradient) 알고리즘을 통해 특징을 추출한다. 추출된 특징은 고차원이므로 패턴분류기 분류 시 많은 연산과 처리속도가 따른다. 이를 개선하고자 PCA (Principal Components Analysis)을 사용하여 저차원으로의 차원 축소한다. 본 논문에서 제안하는 분류기는 pRBFNNs 패턴분류기의 효율적인 학습을 위해 최적화 알고리즘인 PSO(Particle Swarm Optimization)을 사용하여 구조 및 파라미터를 최적화시켜 모델의 성능을 향상시킨다. 사용된 데이터로는 보행자 검출에 널리 사용되는 INRIA2005_person data set에서 보행자와 배경 영상을 각각 1200장을 학습 데이터, 검증 데이터로 구성하여 분류기를 설계하고 테스트 이미지를 설계된 최적의 분류기를 이용하여 보행자를 검출하고 검출률을 확인한다.

  • PDF

반도체 공정 신호의 이상탐지 및 분류를 위한 자기구상지도 기반 기법에 관한 연구

  • Yun, Jae-Jun;Park, Jeong-Sul;Baek, Jun-Geol
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2011.02a
    • /
    • pp.36-36
    • /
    • 2011
  • 반도체 공정 신호는 주기 신호와 비주기 신호로 구분된다. 특정 패턴을 가지는 주기 신호는 해당 파라미터(parameter)에 대해서 패턴 매칭을 수행하여 관리하는 연구가 진행되고 있다. 반면 비주기 신호 데이터의 경우에는 패턴 매칭 방법을 수행할 수 없다. 또한 반도체 공정에서 얻을 수 있는 두 개 타입의 데이터는 그 파라미터가 방대하기 때문에 현재 실제 공정에 적용되고 있는 방식인 각각 하나의 파라미터에 대해 관리도(control chart)를 구성해 관리하는 것은 많은 비용과 시간의 낭비를 초래한다. 따라서 두 타입 데이터의 여러 개의 파라미터를 동시에 관측할 수 있고 파라미터간의 내재된 상관관계를 고려할 수 있는 장점을 가진 분석 기법에 대한 연구가 필요하다. 주기 신호의 이상탐지를 위한 기존 연구는 신호를 구간으로 나누어 구간별로 SPC 차트적용 시키는 방법, 각 시점 마다 측정되는 값을 하나의 변수로 고려하여 Hotelling's T square, PCA, PLS 등과 같은 다변량 통계 분석을 적용 시키는 방법들이 제시되어 왔다. 이러한 방법들은 다양한 특성을 가지는 주기신호를 분석하고 이상을 탐지 하는데 많은 한계점을 가진다. 이에 본 논문은 다양한 형태를 가지는 신호의 특성을 반영하여 자기구상지도를 기반으로 신호의 분류와 공정의 이상을 탐지하는 기법을 제안한다. 제안하는 기법은 자기구상지도를 이용하여 복잡한(고차원, 시계열) 신호를 2차원 상의 노드로 맵핑시킴으로써 신호의 특질(feature)을 추출하고 새로 표현된 신호의 특질을 기반으로 Logistic regression을 적용시켜 이상을 탐지 한다. 다양한 이상 상황을 가진 반도체 공정 신호를 사용하여 제안한 이상탐지 성능을 평가하였다.

  • PDF

Forecasting Electric Power Demand Using Census Information and Electric Power Load (센서스 정보 및 전력 부하를 활용한 전력 수요 예측)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-46
    • /
    • 2013
  • In order to develop an accurate analytical model for domestic electricity demand forecasting, we propose a prediction method of the electric power demand pattern by combining SMO classification techniques and a dimension reduction conceptualized subspace clustering techniques suitable for high-dimensional data cluster analysis. In terms of electricity demand pattern prediction, hourly electricity load patterns and the demographic and geographic characteristics can be analyzed by integrating the wireless load monitoring data as well as sub-regional unit of census information. There are composed of a total of 18 characteristics clusters in the prediction result for the sub-regional demand pattern by using census information and power load of Seoul metropolitan area. The power demand pattern prediction accuracy was approximately 85%.

An Improvement of FSDD for Evaluating Multi-Dimensional Data (다차원 데이터 평가가 가능한 개선된 FSDD 연구)

  • Oh, Se-jong
    • Journal of Digital Convergence
    • /
    • v.15 no.1
    • /
    • pp.247-253
    • /
    • 2017
  • Feature selection or variable selection is a data mining scheme for selecting highly relevant features with target concept from high dimensional data. It decreases dimensionality of data, and makes it easy to analyze clusters or classification. A feature selection scheme requires an evaluation function. Most of current evaluation functions are based on statistics or information theory, and they can evaluate only for single feature (one-dimensional data). However, features have interactions between them, and require evaluation function for multi-dimensional data for efficient feature selection. In this study, we propose modification of FSDD evaluation function for utilizing evaluation of multiple features using extended distance function. Original FSDD is just possible for single feature evaluation. Proposed approach may be expected to be applied on other single feature evaluation method.

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

Interface Design for Performance Improvment of CORBA Services (CORBA 서비스의 성능 향상을 위한 인터페이스 설계)

  • Kim, Sang-Ho;Chi, Jeong-Hee;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.1
    • /
    • pp.203-210
    • /
    • 2004
  • A new research of geographic information system has been focused on an open architecture, interoperability and extensibility in their design. In order to process huge data, multidimensional data and heterogeneous geographic format data, these research move away from monolithic system to open structure system managing application layer and user interface layer. Nowadays, many developers have used CORBA middleware for the OGIS systems. However, CORBA middleware has some problems that support only a point-to-point communication and takes a long time to transfer sever data to client. Thus, in this paper, we propose a method on modifying a point-to-point communication and implement the efficient communication method. The proposed method that transfer data from sever to client in one connection support a group communication and reduce a transfer time delay.

Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression (베이즈 정보 기준을 활용한 분할-정복 벌점화 분위수 회귀)

  • Kang, Jongkyeong;Han, Seokwon;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.217-227
    • /
    • 2022
  • Quantile regression is widely used in many fields based on the advantage of providing an efficient tool for examining complex information latent in variables. However, modern large-scale and high-dimensional data makes it very difficult to estimate the quantile regression model due to limitations in terms of computation time and storage space. Divide-and-conquer is a technique that divide the entire data into several sub-datasets that are easy to calculate and then reconstruct the estimates of the entire data using only the summary statistics in each sub-datasets. In this paper, we studied on a variable selection method using Bayes information criteria by applying the divide-and-conquer technique to the penalized quantile regression. When the number of sub-datasets is properly selected, the proposed method is efficient in terms of computational speed, providing consistent results in terms of variable selection as long as classical quantile regression estimates calculated with the entire data. The advantages of the proposed method were confirmed through simulation data and real data analysis.

View-Invariant Body Pose Estimation based on Biased Manifold Learning (편향된 다양체 학습 기반 시점 변화에 강인한 인체 포즈 추정)

  • Hur, Dong-Cheol;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.11
    • /
    • pp.960-966
    • /
    • 2009
  • A manifold is used to represent a relationship between high-dimensional data samples in low-dimensional space. In human pose estimation, it is created in low-dimensional space for processing image and 3D body configuration data. Manifold learning is to build a manifold. But it is vulnerable to silhouette variations. Such silhouette variations are occurred due to view-change, person-change, distance-change, and noises. Representing silhouette variations in a single manifold is impossible. In this paper, we focus a silhouette variation problem occurred by view-change. In previous view invariant pose estimation methods based on manifold learning, there were two ways. One is modeling manifolds for all view points. The other is to extract view factors from mapping functions. But these methods do not support one by one mapping for silhouettes and corresponding body configurations because of unsupervised learning. Modeling manifold and extracting view factors are very complex. So we propose a method based on triple manifolds. These are view manifold, pose manifold, and body configuration manifold. In order to build manifolds, we employ biased manifold learning. After building manifolds, we learn mapping functions among spaces (2D image space, pose manifold space, view manifold space, body configuration manifold space, 3D body configuration space). In our experiments, we could estimate various body poses from 24 view points.

Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases (대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색)

  • Lee, Sanghun;Kim, Bum-Soo;Choi, Mi-Jung;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.242-254
    • /
    • 2015
  • In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.