• Title/Summary/Keyword: Feature Maps

Search Result 287, Processing Time 0.039 seconds

Learning-Based Multiple Pooling Fusion in Multi-View Convolutional Neural Network for 3D Model Classification and Retrieval

  • Zeng, Hui;Wang, Qi;Li, Chen;Song, Wei
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1179-1191
    • /
    • 2019
  • We design an ingenious view-pooling method named learning-based multiple pooling fusion (LMPF), and apply it to multi-view convolutional neural network (MVCNN) for 3D model classification or retrieval. By this means, multi-view feature maps projected from a 3D model can be compiled as a simple and effective feature descriptor. The LMPF method fuses the max pooling method and the mean pooling method by learning a set of optimal weights. Compared with the hand-crafted approaches such as max pooling and mean pooling, the LMPF method can decrease the information loss effectively because of its "learning" ability. Experiments on ModelNet40 dataset and McGill dataset are presented and the results verify that LMPF can outperform those previous methods to a great extent.

Adaptive Milling Process Modeling and Nerual Networks Applied to Tool Wear Monitoring (밀링공정의 적응모델링과 공구마모 검출을 위한 신경회로망의 적용)

  • Ko, Tae-Jo;Cho, Dong-Woo
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.11 no.1
    • /
    • pp.138-149
    • /
    • 1994
  • This paper introduces a new monitoring technique which utilizes an adaptive signal processing for feature generation, coupled with a multilayered merual network for pattern recognition. The cutting force signal in face milling operation was modeled by a low order discrete autoregressive model, shere parameters were estimated recursively at each sampling instant using a parameter adaptation algorithm based on an RLS(recursive least square) method with discounted measurements. The influences of the adaptation algorithm parameters as well as some considerations for modeling on the estimation results are discussed. The sensitivity of the extimated model parameters to the tool state(new and worn tool)is presented, and the application of a multilayered neural network to tool state monitoring using the previously generated features is also demonstrated with a high success rate. The methodology turned out to be quite suitable for in-process tool wear monitoring in the sense that the model parameters are effective as tool state features in milling operation and that the classifier successfully maps the sensors data to correct output decision.

  • PDF

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

  • Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.483-497
    • /
    • 2023
  • Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

A Feature Map Compression Method for Multi-resolution Feature Map with PCA-based Transformation (PCA 기반 변환을 통한 다해상도 피처 맵 압축 방법)

  • Park, Seungjin;Lee, Minhun;Choi, Hansol;Kim, Minsub;Oh, Seoung-Jun;Kim, Younhee;Do, Jihoon;Jeong, Se Yoon;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.56-68
    • /
    • 2022
  • In this paper, we propose a compression method for multi-resolution feature maps for VCM. The proposed compression method removes the redundancy between the channels and resolution levels of the multi-resolution feature map through PCA-based transformation. According to each characteristic, the basis vectors and mean vector used for transformation, and the transformation coefficient obtained through the transformation are compressed using a VVC-based coder and DeepCABAC. In order to evaluate performance of the proposed method, the object detection performance was measured for the OpenImageV6 and COCO 2017 validation set, and the BD-rate of MPEG-VCM anchor and feature map compression anchor proposed in this paper was compared using bpp and mAP. As a result of the experiment, the proposed method shows a 25.71% BD-rate performance improvement compared to feature map compression anchor in OpenImageV6. Furthermore, for large objects of the COCO 2017 validation set, the BD-rate performance is improved by up to 43.72% compared to the MPEG-VCM anchor.

Development of Pose-Invariant Face Recognition System for Mobile Robot Applications

  • Lee, Tai-Gun;Park, Sung-Kee;Kim, Mun-Sang;Park, Mig-Non
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.783-788
    • /
    • 2003
  • In this paper, we present a new approach to detect and recognize human face in the image from vision camera equipped on the mobile robot platform. Due to the mobility of camera platform, obtained facial image is small and pose-various. For this condition, new algorithm should cope with these constraints and can detect and recognize face in nearly real time. In detection step, ‘coarse to fine’ detection strategy is used. Firstly, region boundary including face is roughly located by dual ellipse templates of facial color and on this region, the locations of three main facial features- two eyes and mouth-are estimated. For this, simplified facial feature maps using characteristic chrominance are made out and candidate pixels are segmented as eye or mouth pixels group. These candidate facial features are verified whether the length and orientation of feature pairs are suitable for face geometry. In recognition step, pseudo-convex hull area of gray face image is defined which area includes feature triangle connecting two eyes and mouth. And random lattice line set are composed and laid on this convex hull area, and then 2D appearance of this area is represented. From these procedures, facial information of detected face is obtained and face DB images are similarly processed for each person class. Based on facial information of these areas, distance measure of match of lattice lines is calculated and face image is recognized using this measure as a classifier. This proposed detection and recognition algorithms overcome the constraints of previous approach [15], make real-time face detection and recognition possible, and guarantee the correct recognition irregardless of some pose variation of face. The usefulness at mobile robot application is demonstrated.

  • PDF

Study of Traffic Sign Auto-Recognition (교통 표지판 자동 인식에 관한 연구)

  • Kwon, Mann-Jun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.9
    • /
    • pp.5446-5451
    • /
    • 2014
  • Because there are some mistakes by hand in processing electronic maps using a navigation terminal, this paper proposes an automatic offline recognition for traffic signs, which are considered ingredient navigation information. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which have been used widely in the field of 2D face recognition as computer vision and pattern recognition applications, was used to recognize traffic signs. First, using PCA, a high-dimensional 2D image data was projected to a low-dimensional feature vector. The LDA maximized the between scatter matrix and minimized the within scatter matrix using the low-dimensional feature vector obtained from PCA. The extracted traffic signs under a real-world road environment were recognized successfully with a 92.3% recognition rate using the 40 feature vectors created by the proposed algorithm.

A Study on Residual U-Net for Semantic Segmentation based on Deep Learning (딥러닝 기반의 Semantic Segmentation을 위한 Residual U-Net에 관한 연구)

  • Shin, Seokyong;Lee, SangHun;Han, HyunHo
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.251-258
    • /
    • 2021
  • In this paper, we proposed an encoder-decoder model utilizing residual learning to improve the accuracy of the U-Net-based semantic segmentation method. U-Net is a deep learning-based semantic segmentation method and is mainly used in applications such as autonomous vehicles and medical image analysis. The conventional U-Net occurs loss in feature compression process due to the shallow structure of the encoder. The loss of features causes a lack of context information necessary for classifying objects and has a problem of reducing segmentation accuracy. To improve this, The proposed method efficiently extracted context information through an encoder using residual learning, which is effective in preventing feature loss and gradient vanishing problems in the conventional U-Net. Furthermore, we reduced down-sampling operations in the encoder to reduce the loss of spatial information included in the feature maps. The proposed method showed an improved segmentation result of about 12% compared to the conventional U-Net in the Cityscapes dataset experiment.

Implementation of Real Time P2P Framework for Spatial Data Sharing between Mobile Devices using SIP (모바일 기기 간의 SIP기반 실시간 공간정보 공유 프레임워크 구현)

  • Park, Key-Ho;Jung, Jae-Gon
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2008.10a
    • /
    • pp.65-72
    • /
    • 2008
  • Mobile Collaboration is an enabling technology that makes users share information between mobile devices and various Mobile P2P platforms have been designed and implemented for it. There are, however, few research papers on application of SIP protocol to spatial data sharing on mobile devices. In this paper, SIP based real time sharing framework is proposed to compose a mobile P2P platform on which spatial data can be trans(erred. A new protocol based on WKT and WKB is defined to send and receive spatial objects with SIP MESSAGE method. Base maps such as digital maps and parcel maps can be provided by a map server that is integrated with SIP server after a new SIP session established and client agents are registered. The framework proposed based onSIP enables users to transfer spatial data such as maps and satellite images directly between mobile devices during VoIP based voice call and therefore, mobile applications can be applied in various domains such asforest management and national defense.

  • PDF

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

  • Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.33-38
    • /
    • 2018
  • Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.