• Title/Summary/Keyword: Depth-wise separable convolution

Search Result 4, Processing Time 0.018 seconds

Further Optimize MobileNetV2 with Channel-wise Squeeze and Excitation (채널간 압축과 해제를 통한 MobileNetV2 최적화)

  • Park, Jinho;Kim, Wonjun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • fall
    • /
    • pp.154-156
    • /
    • 2021
  • Depth-wise separable convolution 은 컴퓨터 자원이 제한된 환경에서 기존의 standard convolution을 대체하는데 강력하고, 효과적인 대안으로 잘 알려져 있다.[1] MobileNetV2 에서는 Inverted residual block을 소개한다. 이는 depth-wise separable convolution으로 인해 생기는 손실, 즉 channel 간의 데이터를 조합해 새로운 feature를 만들어낼 기회를 잃어버릴 때, 이를 depth-wise separable convolution 양단에 point-wise convolution(1×1 convolution)을 사용함으로써 극복해낸 block이다.[1] 하지만 1×1 convolution은 채널 수에 의존적(dependent)인 특징을 갖고 있고, 따라서 결국 네트워크가 깊어지면 깊어질수록 효율적이고(efficient) 가벼운(light weight) 네트워크를 만드는데 병목 현상(bottleneck)을 일으키고 만다. 이 논문에서는 channel-wise squeeze and excitation block(CSE)을 통해 1×1 convolution을 부분적으로 대체하는 방법을 통해 이 병목 현상을 해결한다.

  • PDF

FGW-FER: Lightweight Facial Expression Recognition with Attention

  • Huy-Hoang Dinh;Hong-Quan Do;Trung-Tung Doan;Cuong Le;Ngo Xuan Bach;Tu Minh Phuong;Viet-Vu Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2505-2528
    • /
    • 2023
  • The field of facial expression recognition (FER) has been actively researched to improve human-computer interaction. In recent years, deep learning techniques have gained popularity for addressing FER, with numerous studies proposing end-to-end frameworks that stack or widen significant convolutional neural network layers. While this has led to improved performance, it has also resulted in larger model sizes and longer inference times. To overcome this challenge, our work introduces a novel lightweight model architecture. The architecture incorporates three key factors: Depth-wise Separable Convolution, Residual Block, and Attention Modules. By doing so, we aim to strike a balance between model size, inference speed, and accuracy in FER tasks. Through extensive experimentation on popular benchmark FER datasets, our proposed method has demonstrated promising results. Notably, it stands out due to its substantial reduction in parameter count and faster inference time, while maintaining accuracy levels comparable to other lightweight models discussed in the existing literature.

High-Speed Transformer for Panoptic Segmentation

  • Baek, Jong-Hyeon;Kim, Dae-Hyun;Lee, Hee-Kyung;Choo, Hyon-Gon;Koh, Yeong Jun
    • Journal of Broadcast Engineering
    • /
    • v.27 no.7
    • /
    • pp.1011-1020
    • /
    • 2022
  • Recent high-performance panoptic segmentation models are based on transformer architectures. However, transformer-based panoptic segmentation methods are basically slower than convolution-based methods, since the attention mechanism in the transformer requires quadratic complexity w.r.t. image resolution. Also, sine and cosine computation for positional embedding in the transformer also yields a bottleneck for computation time. To address these problems, we adopt three modules to speed up the inference runtime of the transformer-based panoptic segmentation. First, we perform channel-level reduction using depth-wise separable convolution for inputs of the transformer decoder. Second, we replace sine and cosine-based positional encoding with convolution operations, called conv-embedding. We also apply a separable self-attention to the transformer encoder to lower quadratic complexity to linear one for numbers of image pixels. As result, the proposed model achieves 44% faster frame per second than baseline on ADE20K panoptic validation dataset, when we use all three modules.

Modulation Recognition of MIMO Systems Based on Dimensional Interactive Lightweight Network

  • Aer, Sileng;Zhang, Xiaolin;Wang, Zhenduo;Wang, Kailin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3458-3478
    • /
    • 2022
  • Automatic modulation recognition is the core algorithm in the field of modulation classification in communication systems. Our investigations show that deep learning (DL) based modulation recognition techniques have achieved effective progress for multiple-input multiple-output (MIMO) systems. However, network complexity is always an additional burden for high-accuracy classifications, which makes it impractical. Therefore, in this paper, we propose a low-complexity dimensional interactive lightweight network (DilNet) for MIMO systems. Specifically, the signals received by different antennas are cooperatively input into the network, and the network calculation amount is reduced through the depth-wise separable convolution. A two-dimensional interactive attention (TDIA) module is designed to extract interactive information of different dimensions, and improve the effectiveness of the cooperation features. In addition, the TDIA module ensures low complexity through compressing the convolution dimension, and the computational burden after inserting TDIA is also acceptable. Finally, the network is trained with a penalized statistical entropy loss function. Simulation results show that compared to existing modulation recognition methods, the proposed DilNet dramatically reduces the model complexity. The dimensional interactive lightweight network trained by penalized statistical entropy also performs better for recognition accuracy in MIMO systems.