• Title/Summary/Keyword: 3-D Neural Network

Search Result 425, Processing Time 0.035 seconds

A Design of Multilayer Perceptron for Camera Calibration

  • Do, Yong-Tae
    • Journal of Sensor Science and Technology
    • /
    • v.11 no.4
    • /
    • pp.239-246
    • /
    • 2002
  • In this paper a new design of multi-layer perceptron(MLP) for camera calibration is proposed. Most existing techniques determine a transformation from 3D spatial points to their image points and camera parameters are tried to be estimated from the transformation. The technique proposed here, on the other hand, determines rays of sight uniquely from image points and parameters are estimated from the relationship using an MLP. By this approach projection and back-projection can be done more straightforwardly. Being based on a geometric model, a network design process becomes less ambiguous, which is a clear merit compared to other neural net based techniques. An MLP designed according to the technique proposed showed fast and stable learning in tests under various conditions.

Basic Implementation of Multi Input CNN for Face Recognition (얼굴인식을 위한 다중입력 CNN의 기본 구현)

  • Cheema, Usman;Moon, Seungbin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.1002-1003
    • /
    • 2019
  • Face recognition is an extensively researched area of computer vision. Visible, infrared, thermal, and 3D modalities have been used against various challenges of face recognition such as illumination, pose, expression, partial information, and disguise. In this paper we present a multi-modal approach to face recognition using convolutional neural networks. We use visible and thermal face images as two separate inputs to a multi-input deep learning network for face recognition. The experiments are performed on IRIS visible and thermal face database and high face verification rates are achieved.

Neural Network Design for Predicting Shear Modulus of Food Printability Enhancers (식품 인쇄 적성 증진제의 전단탄성률 예측 신경망 설계)

  • Yoo, Hyun-Ju;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.731-732
    • /
    • 2021
  • 인쇄 적성 증진제는 식품용 3D 프린팅에서 겔화 소재의 인쇄 적성을 향상시키는 요소 중 하나이다. 이 때, 인쇄 적성 증진제의 평가는 전단응력을 받을 때 일어나는 변형의 정도를 나타내는 전단탄성률 기반으로 한다. 그러나, 전단 탄성률 측정은 식품 원재료의 다양함으로 인해 소재별로 측정하는데 많은 시간과 비용이 소요되는 단점이 있다. 이에 본 연구에서는 FCN과 RNN을 사용하여 전단탄성률을 예측하는 신경망 설계를 제안함으로써 인쇄 적성 증진제의 전단탄성률을 측정하는 시간과 비용을 절감하고자 한다.

Neural Network-Based Post Filtering of Atlas for Immersive Video Coding (몰입형 비디오 부호화를 위한 신경망 기반 아틀라스 후처리 필터링)

  • Lim, Sung-Gyun;Lee, Kun-Woo;Kim, Jeong-Woo;Yoon, Yong-Uk;Kim, Jae-Gon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.239-241
    • /
    • 2022
  • MIV(MPEG Immersive Video) 표준은 제한된 3D 공간의 다양한 위치의 뷰(view)들을 효율적으로 압축하여 사용자에게 임의의 위치 및 방향에 대한 6 자유도(6DoF)의 몰입감을 제공한다. MIV 의 참조 소프트웨어인 TMIV(Test Model for Immersive Video)에서는 몰입감을 제공하기 위한 여러 시점의 입력 뷰들 간의 중복 영역을 제거하고 남은 영역들을 패치(patch)로 만들어 패킹(packing)한 아틀라스(atlas)를 생성하고 이를 압축 전송한다. 아틀라스 영상은 일반적인 영상 달리 많은 불연속성을 포함하고 있으며 이는 부호화 효율을 크게 저하시키다 본 논문에서는 아틀라스 영상의 부호화 손실을 줄이기 위한 신경망 기반의 후처리 필터링 기법을 제시한다. 제안기법은 기존의 TMIV 와 비교하여 아틀라스의 복원 화질 향상을 보여준다.

  • PDF

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Video Representation via Fusion of Static and Motion Features Applied to Human Activity Recognition

  • Arif, Sheeraz;Wang, Jing;Fei, Zesong;Hussain, Fida
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3599-3619
    • /
    • 2019
  • In human activity recognition system both static and motion information play crucial role for efficient and competitive results. Most of the existing methods are insufficient to extract video features and unable to investigate the level of contribution of both (Static and Motion) components. Our work highlights this problem and proposes Static-Motion fused features descriptor (SMFD), which intelligently leverages both static and motion features in the form of descriptor. First, static features are learned by two-stream 3D convolutional neural network. Second, trajectories are extracted by tracking key points and only those trajectories have been selected which are located in central region of the original video frame in order to to reduce irrelevant background trajectories as well computational complexity. Then, shape and motion descriptors are obtained along with key points by using SIFT flow. Next, cholesky transformation is introduced to fuse static and motion feature vectors to guarantee the equal contribution of all descriptors. Finally, Long Short-Term Memory (LSTM) network is utilized to discover long-term temporal dependencies and final prediction. To confirm the effectiveness of the proposed approach, extensive experiments have been conducted on three well-known datasets i.e. UCF101, HMDB51 and YouTube. Findings shows that the resulting recognition system is on par with state-of-the-art methods.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Hierarchical Grouping of Line Segments for Building Model Generation (건물 형태 발생을 위한 3차원 선소의 계층적 군집화)

  • Han, Ji-Ho;Park, Dong-Chul;Woo, Dong-Min;Jeong, Tai-Kyeong;Lee, Yun-Sik;Min, Soo-Young
    • Journal of IKEEE
    • /
    • v.16 no.2
    • /
    • pp.95-101
    • /
    • 2012
  • A novel approach for the reconstruction of 3D building model from aerial image data is proposed in this paper. In this approach, a Centroid Neural Network (CNN) with a metric of line segments is proposed for connecting low-level linear structures. After the straight lines are extracted from an edge image using the CNN, rectangular boundaries are then found by using an edge-based grouping approach. In order to avoid producing unrealistic building models from grouping lined segments, a hierarchical grouping method is proposed in this paper. The proposed hierarchical grouping method is evaluated with a set of aerial image data in the experiment. The results show that the proposed method can be successfully applied for the reconstruction of 3D building model from satellite images.

A Study on Design and Implementation of Driver's Blind Spot Assist System Using CNN Technique (CNN 기법을 활용한 운전자 시선 사각지대 보조 시스템 설계 및 구현 연구)

  • Lim, Seung-Cheol;Go, Jae-Seung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.149-155
    • /
    • 2020
  • The Korea Highway Traffic Authority provides statistics that analyze the causes of traffic accidents that occurred since 2015 using the Traffic Accident Analysis System (TAAS). it was reported Through TAAS that the driver's forward carelessness was the main cause of traffic accidents in 2018. As statistics on the cause of traffic accidents, 51.2 percent used mobile phones and watched DMB while driving, 14 percent did not secure safe distance, and 3.6 percent violated their duty to protect pedestrians, representing a total of 68.8 percent. In this paper, we propose a system that has improved the advanced driver assistance system ADAS (Advanced Driver Assistance Systems) by utilizing CNN (Convolutional Neural Network) among the algorithms of Deep Learning. The proposed system learns a model that classifies the movement of the driver's face and eyes using Conv2D techniques which are mainly used for Image processing, while recognizing and detecting objects around the vehicle with cameras attached to the front of the vehicle to recognize the driving environment. Then, using the learned visual steering model and driving environment data, the hazard is classified and detected in three stages, depending on the driver's view and driving environment to assist the driver with the forward and blind spots.

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.