• Title/Summary/Keyword: 3D-CNN

Search Result 157, Processing Time 0.029 seconds

Fast Motion Planning of Wheel-legged Robot for Crossing 3D Obstacles using Deep Reinforcement Learning (심층 강화학습을 이용한 휠-다리 로봇의 3차원 장애물극복 고속 모션 계획 방법)

  • Soonkyu Jeong;Mooncheol Won
    • The Journal of Korea Robotics Society
    • /
    • v.18 no.2
    • /
    • pp.143-154
    • /
    • 2023
  • In this study, a fast motion planning method for the swing motion of a 6x6 wheel-legged robot to traverse large obstacles and gaps is proposed. The motion planning method presented in the previous paper, which was based on trajectory optimization, took up to tens of seconds and was limited to two-dimensional, structured vertical obstacles and trenches. A deep neural network based on one-dimensional Convolutional Neural Network (CNN) is introduced to generate keyframes, which are then used to represent smooth reference commands for the six leg angles along the robot's path. The network is initially trained using the behavioral cloning method with a dataset gathered from previous simulation results of the trajectory optimization. Its performance is then improved through reinforcement learning, using a one-step REINFORCE algorithm. The trained model has increased the speed of motion planning by up to 820 times and improved the success rates of obstacle crossing under harsh conditions, such as low friction and high roughness.

Deep Learning based Visual-Inertial Drone Odomtery Estimation (딥러닝 기반 시각-관성을 활용한 드론 주행기록 추정)

  • Song, Seung-Yeon;Park, Sang-Won;Kim, Han-Gyul;Choi, Su-Han
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.842-845
    • /
    • 2020
  • 본 연구는 시각-관성 기반의 딥러닝 학습으로 자유분방하게 움직이는 드론의 주행기록을 정확하게 추정하는 것을 목표로 한다. 드론의 비행주행은 드론의 온보드 센서와 조정값을 이용하는 것이 일반적이다. 본 연구에서는 이 온보드 센서 데이터를 학습에 사용하여 비행주행의 위치추정을 실험하였다. 선행연구로써 DeepVO[1]룰 구현하여 KITTI[3] 데이터와 Midair[4] 데이터를 비교, 분석하였다. 3D 좌표면에서의 위치 추정에 선행연구 모델의 한계가 있음을 확인하고 IMU를 Feature로써 사용하였다. 본 모델은 FlowNet[2]을 모방한 CNN 네트워크로부터 Optical Flow Feature에 IMU 데이터를 더해 RNN으로 학습을 진행하였다. 본 연구를 통해 주행기록 예측을 다소 정확히 했다고 할 수 없지만, IMU Feature를 통해 주행기록의 예측이 가능함을 볼 수 있었다. 본 연구를 통해 시각-관성 분야에서 사람의 지식이나 조정이 들어가는 센서를 융합하는 기존의 방식에서 사람의 제어가 들어가지 않는 End-to-End 방식으로 인공지능을 학습했다. 또한, 시각과 관성 데이터를 통해 주행기록을 추정할 수 있었고 시각적으로 그래프를 그려 정답과 얼마나 차이 있는지 확인해보았다.

Super-resolution based on multi-channel input convolutional residual neural network (다중 채널 입력 Convolution residual neural networks 기반의 초해상화 기법)

  • Youm, Gwang-Young;Kim, Munchurl
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.37-39
    • /
    • 2016
  • 최근 Convolutional neural networks(CNN) 기반의 초해상화 기법인 Super-Resolution Convolutional Neural Networks (SRCNN) 이 좋은 PSNR 성능을 발휘하는 것으로 보고되었다 [1]. 하지만 많은 제안 방법들이 고주파 성분을 복원하는데 한계를 드러내는 것처럼, SRCNN 도 고주파 성분 복원에 한계점을 지니고 있다. 또한 SRCNN 의 네트워크 층을 깊게 만들면 좋은 PSNR 성능을 발휘하는 것으로 널리 알려져 있지만, 네트워크의 층을 깊게 하는 것은 네트워크 파라미터 학습을 어렵게 하는 경향이 있다. 네트워크의 층을 깊게 할 경우, gradient 값이 아래(역방향) 층으로 갈수록 발산하거나 0 으로 수렴하여, 네트워크 파라미터 학습이 제대로 되지 않는 현상이 발생하기 때문이다. 따라서 본 논문에서는 네트워크 층을 깊게 하는 대신에, 입력을 다중 채널로 구성하여, 네트워크에 고주파 성분에 관한 추가적인 정보를 주는 방법을 제안하였다. 많은 초해상화 기법들이 고주파 성분의 복원 능력이 부족하다는 점에 착안하여, 우리는 네트워크가 고주파 성분에 관한 많은 정보를 필요로 한다는 것을 가정하였다. 따라서 우리는 네트워크의 입력을 고주파 성분이 여러 가지 강도로 입력되도록 저해상도 입력 영상들을 구성하였다. 또한 잔차신호 네트워크(residual networks)를 도입하여, 네트워크 파라미터를 학습할 때 고주파 성분의 복원에 집중할 수 있도록 하였다. 본 논문의 효율성을 검증하기 위하여 set5 데이터와 set14 데이터에 관하여 실험을 진행하였고, SRCNN 과 비교하여 set5 데이터에서는 2, 3, 4 배에 관하여 각각 평균 0.29, 0.35, 0.17dB 의 PSNR 성능 향상이 있었으며, set14 데이터에서는 3 배의 관하여 평균 0.20dB 의 PSNR 성능 향상이 있었다.

  • PDF

Discriminant analysis of grain flours for rice paper using fluorescence hyperspectral imaging system and chemometric methods

  • Seo, Youngwook;Lee, Ahyeong;Kim, Bal-Geum;Lim, Jongguk
    • Korean Journal of Agricultural Science
    • /
    • v.47 no.3
    • /
    • pp.633-644
    • /
    • 2020
  • Rice paper is an element of Vietnamese cuisine that can be used to wrap vegetables and meat. Rice and starch are the main ingredients of rice paper and their mixing ratio is important for quality control. In a commercial factory, assessment of food safety and quantitative supply is a challenging issue. A rapid and non-destructive monitoring system is therefore necessary in commercial production systems to ensure the food safety of rice and starch flour for the rice paper wrap. In this study, fluorescence hyperspectral imaging technology was applied to classify grain flours. Using the 3D hyper cube of fluorescence hyperspectral imaging (fHSI, 420 - 730 nm), spectral and spatial data and chemometric methods were applied to detect and classify flours. Eight flours (rice: 4, starch: 4) were prepared and hyperspectral images were acquired in a 5 (L) × 5 (W) × 1.5 (H) cm container. Linear discriminant analysis (LDA), partial least square discriminant analysis (PLSDA), support vector machine (SVM), classification and regression tree (CART), and random forest (RF) with a few preprocessing methods (multivariate scatter correction [MSC], 1st and 2nd derivative and moving average) were applied to classify grain flours and the accuracy was compared using a confusion matrix (accuracy and kappa coefficient). LDA with moving average showed the highest accuracy at A = 0.9362 (K = 0.9270). 1D convolutional neural network (CNN) demonstrated a classification result of A = 0.94 and showed improved classification results between mimyeon flour (MF)1 and MF2 of 0.72 and 0.87, respectively. In this study, the potential of non-destructive detection and classification of grain flours using fHSI technology and machine learning methods was demonstrated.

Tracking Method of Dynamic Smoke based on U-net (U-net기반 동적 연기 탐지 기법)

  • Gwak, Kyung-Min;Rho, Young J.
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.4
    • /
    • pp.81-87
    • /
    • 2021
  • Artificial intelligence technology is developing as it enters the fourth industrial revolution. Active researches are going on; visual-based models using CNNs. U-net is one of the visual-based models. It has shown strong performance for semantic segmentation. Although various U-net studies have been conducted, studies on tracking objects with unclear outlines such as gases and smokes are still insufficient. We conducted a U-net study to tackle this limitation. In this paper, we describe how 3D cameras are used to collect data. The data are organized into learning and test sets. This paper also describes how U-net is applied and how the results is validated.

Deep Window Detection in Street Scenes

  • Ma, Wenguang;Ma, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.855-870
    • /
    • 2020
  • Windows are key components of building facades. Detecting windows, crucial to 3D semantic reconstruction and scene parsing, is a challenging task in computer vision. Early methods try to solve window detection by using hand-crafted features and traditional classifiers. However, these methods are unable to handle the diversity of window instances in real scenes and suffer from heavy computational costs. Recently, convolutional neural networks based object detection algorithms attract much attention due to their good performances. Unfortunately, directly training them for challenging window detection cannot achieve satisfying results. In this paper, we propose an approach for window detection. It involves an improved Faster R-CNN architecture for window detection, featuring in a window region proposal network, an RoI feature fusion and a context enhancement module. Besides, a post optimization process is designed by the regular distribution of windows to refine detection results obtained by the improved deep architecture. Furthermore, we present a newly collected dataset which is the largest one for window detection in real street scenes to date. Experimental results on both existing datasets and the new dataset show that the proposed method has outstanding performance.

Video Representation via Fusion of Static and Motion Features Applied to Human Activity Recognition

  • Arif, Sheeraz;Wang, Jing;Fei, Zesong;Hussain, Fida
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3599-3619
    • /
    • 2019
  • In human activity recognition system both static and motion information play crucial role for efficient and competitive results. Most of the existing methods are insufficient to extract video features and unable to investigate the level of contribution of both (Static and Motion) components. Our work highlights this problem and proposes Static-Motion fused features descriptor (SMFD), which intelligently leverages both static and motion features in the form of descriptor. First, static features are learned by two-stream 3D convolutional neural network. Second, trajectories are extracted by tracking key points and only those trajectories have been selected which are located in central region of the original video frame in order to to reduce irrelevant background trajectories as well computational complexity. Then, shape and motion descriptors are obtained along with key points by using SIFT flow. Next, cholesky transformation is introduced to fuse static and motion feature vectors to guarantee the equal contribution of all descriptors. Finally, Long Short-Term Memory (LSTM) network is utilized to discover long-term temporal dependencies and final prediction. To confirm the effectiveness of the proposed approach, extensive experiments have been conducted on three well-known datasets i.e. UCF101, HMDB51 and YouTube. Findings shows that the resulting recognition system is on par with state-of-the-art methods.

Artificial neural network reconstructs core power distribution

  • Li, Wenhuai;Ding, Peng;Xia, Wenqing;Chen, Shu;Yu, Fengwan;Duan, Chengjie;Cui, Dawei;Chen, Chen
    • Nuclear Engineering and Technology
    • /
    • v.54 no.2
    • /
    • pp.617-626
    • /
    • 2022
  • To effectively monitor the variety of distributions of neutron flux, fuel power or temperatures in the reactor core, usually the ex-core and in-core neutron detectors are employed. The thermocouples for temperature measurement are installed in the coolant inlet or outlet of the respective fuel assemblies. It is necessary to reconstruct the measurement information of the whole reactor position. However, the reading of different types of detector in the core reflects different aspects of the 3D power distribution. The feasibility of reconstruction the core three-dimension power distribution by using different combinations of in-core, ex-core and thermocouples detectors is analyzed in this paper to synthesize the useful information of various detectors. A comparison of multilayer perceptron (MLP) network and radial basis function (RBF) network is performed. RBF results are more extreme precision but also more sensitivity to detector failure and uncertainty, compare to MLP networks. This is because that localized neural network could offer conservative regression in RBF. Adding random disturbance in training dataset is helpful to reduce the influence of detector failure and uncertainty. Some convolution neural networks seem to be helpful to get more accurate results by use more spatial layout information, though relative researches are still under way.

EpiLoc: Deep Camera Localization Under Epipolar Constraint

  • Xu, Luoyuan;Guan, Tao;Luo, Yawei;Wang, Yuesong;Chen, Zhuo;Liu, WenKai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.2044-2059
    • /
    • 2022
  • Recent works have shown that the geometric constraint can be harnessed to boost the performance of CNN-based camera localization. However, the existing strategies are limited to imposing image-level constraint between pose pairs, which is weak and coarse-gained. In this paper, we introduce a pixel-level epipolar geometry constraint to vanilla localization framework without the ground-truth 3D information. Dubbed EpiLoc, our method establishes the geometric relationship between pixels in different images by utilizing the epipolar geometry thus forcing the network to regress more accurate poses. We also propose a variant called EpiSingle to cope with non-sequential training images, which can construct the epipolar geometry constraint based on a single image in a self-supervised manner. Extensive experiments on the public indoor 7Scenes and outdoor RobotCar datasets show that the proposed pixel-level constraint is valuable, and helps our EpiLoc achieve state-of-the-art results in the end-to-end camera localization task.

Aural-visual two-stream based infant cry recognition (Aural-visual two-stream 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.354-357
    • /
    • 2021
  • Infants communicate their feelings and needs to the outside world through non-verbal methods such as crying and displaying diverse facial expressions. However, inexperienced parents tend to decode these non-verbal messages incorrectly and take inappropriate actions, which might affect the bonding they build with their babies and the cognitive development of the newborns. In this paper, we propose an aural-visual two-stream based infant cry recognition system to help parents comprehend the feelings and needs of crying babies. The proposed system first extracts the features from the pre-processed audio and video data by using the VGGish model and 3D-CNN model respectively, fuses the extracted features using a fully connected layer, and finally applies a SoftMax function to classify the fused features and recognize the corresponding type of cry. The experimental results show that the proposed system classification exceeds 0.92 in F1-score, which is 0.08 and 0.10 higher than the single-stream aural model and single-stream visual model.