• Title/Summary/Keyword: lightweight model

Search Result 385, Processing Time 0.021 seconds

Optimizing 2-stage Tiling-based Matrix Multiplication in FPGA-based Neural Network Accelerator (FPGA기반 뉴럴네트워크 가속기에서 2차 타일링 기반 행렬 곱셈 최적화)

  • Jinse, Kwon;Jemin, Lee;Yongin, Kwon;Jeman, Park;Misun, Yu;Taeho, Kim;Hyungshin, Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.367-374
    • /
    • 2022
  • The acceleration of neural networks has become an important topic in the field of computer vision. An accelerator is absolutely necessary for accelerating the lightweight model. Most accelerator-supported operators focused on direct convolution operations. If the accelerator does not provide GEMM operation, it is mostly replaced by CPU operation. In this paper, we proposed an optimization technique for 2-stage tiling-based GEMM routines on VTA. We improved performance of the matrix multiplication routine by maximizing the reusability of the input matrix and optimizing the operation pipelining. In addition, we applied the proposed technique to the DarkNet framework to check the performance improvement of the matrix multiplication routine. The proposed GEMM method showed a performance improvement of more than 2.4 times compared to the non-optimized GEMM method. The inference performance of our DarkNet framework has also improved by at least 2.3 times.

Design and Its Applications of a Hypercube Grid Quorum for Distributed Pub/Sub Architectures in IoTs (사물인터넷에서 분산 발행/구독 구조를 위한 하이퍼큐브 격자 쿼럼의 설계 및 응용)

  • Bae, Ihnhan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.8
    • /
    • pp.1075-1084
    • /
    • 2022
  • Internet of Things(IoT) has become a key available technology for efficiently implementing device to device(D2D) services in various domains such as smart home, healthcare, smart city, agriculture, energy, logistics, and transportation. A lightweight publish/subscribe(Pub/Sub) messaging protocol not only establishes data dissemination pattern but also supports connectivity between IoT devices and their applications. Also, a Pub/Sub broker is deployed to facilitate data exchange among IoT devices. A scalable edge-based publish/subscribe (Pub/Sub) broker overlay networks support latency-sensitive IoT applications. In this paper, we design a hypercube grid quorum(HGQ) for distributed Pub/Sub systems based IoT applications. In designing HGQ, the network of hypercube structures suitable for the publish/subscribe model is built in the edge layer, and the proposed HGQ is designed by embedding a mesh overlay network in the hypercube. As their applications, we propose an HGQ-based mechansim for dissemination of the data of sensors or the message/event of IoT devices in IoT environments. The performance of HGQ is evaluated by analytical models. As the results, the latency and load balancing of applications based on the distributed Pub/Sub system using HGQ are improved.

Influence of interfacial adhesive on the failure mechanisms of truss core sandwich panels under in-plane compression

  • Zarei, Mohammad J.;Hatami, Shahabeddin;Gholami, Mohammad
    • Steel and Composite Structures
    • /
    • v.44 no.4
    • /
    • pp.519-529
    • /
    • 2022
  • Sandwich structures with the superior mechanical properties such as high stiffness and strength-to-weight ratio, good thermal insulation, and high energy absorption capacity are used today in aerospace, automotive, marine, and civil engineering industries. These structures are composed of moderately stiff, thin face sheets that withstand the majority of transverse and in-plane loads, separated by a thick, lightweight core that resists shear forces. In this research, the finite element technique is used to simulate a sandwich panel with a truss core under axial compressive stress using ABAQUS software. A review of past experimental studies shows that the bondline between the core and face sheets plays a vital role in the critical failure load. Therefore, this modeling analyzes the damage initiation modes and debonding between face sheet and core by cohesive surface contact with traction-separation model. According to the results obtained from the modeling, it can be observed that the adhesive stiffness has a significant influence on the critical failure load of the specimens. To achieve the full strength of the structure as a continuum, a lower limit is obtained for the adhesive stiffness. By providing this limit stiffness between the core and the panel face sheets, sudden failure of the structure can be prevented.

Pixel-Wise Polynomial Estimation Model for Low-Light Image Enhancement

  • Muhammad Tahir Rasheed;Daming Shi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2483-2504
    • /
    • 2023
  • Most existing low-light enhancement algorithms either use a large number of training parameters or lack generalization to real-world scenarios. This paper presents a novel lightweight and robust pixel-wise polynomial approximation-based deep network for low-light image enhancement. For mapping the low-light image to the enhanced image, pixel-wise higher-order polynomials are employed. A deep convolution network is used to estimate the coefficients of these higher-order polynomials. The proposed network uses multiple branches to estimate pixel values based on different receptive fields. With a smaller receptive field, the first branch enhanced local features, the second and third branches focused on medium-level features, and the last branch enhanced global features. The low-light image is downsampled by the factor of 2b-1 (b is the branch number) and fed as input to each branch. After combining the outputs of each branch, the final enhanced image is obtained. A comprehensive evaluation of our proposed network on six publicly available no-reference test datasets shows that it outperforms state-of-the-art methods on both quantitative and qualitative measures.

Design and Implementation of Radar Signal Processing System for Vehicle Door Collision Prevention (차량 도어 충돌 방지용 레이다 신호처리 시스템 설계 및 구현)

  • Jeongwoo Han;Minsang Kim;Daehong Kim;Yunho Jung
    • Journal of IKEEE
    • /
    • v.28 no.3
    • /
    • pp.397-404
    • /
    • 2024
  • This paper presents the design and implementation results of a Raspberry-Pi-based embedded system with an FPGA accelerator that can detect and classify objects using an FMCW radar sensor for preventing door collision accidents in vehicles. The proposed system performs a radar sensor signal processing and a deep learning processing that classifies objects into bicycles, automobiles, and pedestrians. Since the CNN algorithm requires substantial computation and memory, it is not suitable for embedded systems. To address this, we implemented a lightweight deep learning model, BNN, optimized for embedded systems on an FPGA, and verified the results achieving a classification accuracy of 90.33% and an execution time of 20ms.

Lightening of Human Pose Estimation Algorithm Using MobileViT and Transfer Learning

  • Kunwoo Kim;Jonghyun Hong;Jonghyuk Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.9
    • /
    • pp.17-25
    • /
    • 2023
  • In this paper, we propose a model that can perform human pose estimation through a MobileViT-based model with fewer parameters and faster estimation. The based model demonstrates lightweight performance through a structure that combines features of convolutional neural networks with features of Vision Transformer. Transformer, which is a major mechanism in this study, has become more influential as its based models perform better than convolutional neural network-based models in the field of computer vision. Similarly, in the field of human pose estimation, Vision Transformer-based ViTPose maintains the best performance in all human pose estimation benchmarks such as COCO, OCHuman, and MPII. However, because Vision Transformer has a heavy model structure with a large number of parameters and requires a relatively large amount of computation, it costs users a lot to train the model. Accordingly, the based model overcame the insufficient Inductive Bias calculation problem, which requires a large amount of computation by Vision Transformer, with Local Representation through a convolutional neural network structure. Finally, the proposed model obtained a mean average precision of 0.694 on the MS COCO benchmark with 3.28 GFLOPs and 9.72 million parameters, which are 1/5 and 1/9 the number compared to ViTPose, respectively.

Strength Design of Lightweight Composite Bicycle Frame (복합재료 라미네이트 경량화 자전거 프레임의 강도 설계)

  • Lee, Jin Ah;Hong, Hyoung Taek;Chun, Heung Jae
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.37 no.2
    • /
    • pp.265-270
    • /
    • 2013
  • Strength design for a lightweight bicycle frame made of carbon/epoxy composite laminates was studied using Tsai-Wu's failure criterion. For the design of bicycle frames, reducing the weight of the frame is of great importance. Furthermore, the frame should satisfy the required strength under specific loading cases. In accordance with the European EN 14764 standard for bicycle frames, three loading cases-pedaling, vertical, and level loadings-were investigated in this study. Because of the anisotropic characteristics of composite materials, it is important to decide the appropriate stacking sequence and the number of layers to be used in the composite bicycle frame. From finite element analysis results, the most suitable stacking sequence of the fiber orientation and the number of layers were determined. The stacking sequences of $[0]_{8n}$, $[90]_{8n}$, $[0/90]_{2ns}$, $[{\pm}45]_{2ns}$, $[0/{\pm}45/90]_{ns}$ (n = 1, 2, 3, 4) were used in the analysis. The results indicated that the $[0/{\pm}45/90]_{3s}$ lay-up model was suitable for a composite bicycle frame. Furthermore, the weakest point and layer were investigated.

S-PRESENT Cryptanalysis through Know-Plaintext Attack Based on Deep Learning (딥러닝 기반의 알려진 평문 공격을 통한 S-PRESENT 분석)

  • Se-jin Lim;Hyun-Ji Kim;Kyung-Bae Jang;Yea-jun Kang;Won-Woong Kim;Yu-Jin Yang;Hwa-Jeong Seo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.2
    • /
    • pp.193-200
    • /
    • 2023
  • Cryptanalysis can be performed by various techniques such as known plaintext attack, differential attack, side-channel analysis, and the like. Recently, many studies have been conducted on cryptanalysis using deep learning. A known-plaintext attack is a technique that uses a known plaintext and ciphertext pair to find a key. In this paper, we use deep learning technology to perform a known-plaintext attack against S-PRESENT, a reduced version of the lightweight block cipher PRESENT. This paper is significant in that it is the first known-plaintext attack based on deep learning performed on a reduced lightweight block cipher. For cryptanalysis, MLP (Multi-Layer Perceptron) and 1D and 2D CNN(Convolutional Neural Network) models are used and optimized, and the performance of the three models is compared. It showed the highest performance in 2D convolutional neural networks, but it was possible to attack only up to some key spaces. From this, it can be seen that the known-plaintext attack through the MLP model and the convolutional neural network is limited in attackable key bits.

Analysis of Scoring Difficulty in Different Match Situations in Relation to First Athlete to Score in World Taekwondo Athletes (세계태권도 겨루기 선수들의 선제득점에 따른 경기 내용별 득점 난이도 분석)

  • Mi-Na Jin;Jung-Hyun Yun;Chang-Jin Lee
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.21-29
    • /
    • 2024
  • This study analyzed the difficulty of scoring in different match situations in relation to which competitor scored first. The study analyzed the data from the 2022 Guadalajara World Taekwondo Championships. The analysis was performed for two separate weight classes: lightweight and heavyweight. Four game content variables were used: whether the athlete scored first, attack type, attack area, and game situation. Descriptive statistics, the Rasch model, and discrimination function questions were applied for data processing. SPSS and Winsteps were used for the statistical analysis, and the statistical significance level was set at 0.05. Consequently, in the lightweight class, the scoring frequency of the first scorer was high for all the game variables. In the heavyweight class, the scoring frequency for the first scorer was high for the attack type and attack area. By contrast, those who did not score first were more frequently found to be in a loss situation. By analyzing the scoring difficulties in different match situations based on whether the competitor scored first, the athletes who scored first in attack type most easily scored first. In losing situations, the athletes who scored first in attack area scored most easily, whereas those who did not score first scored most easily in body and match situations. For the heavyweight class, those who scored first in terms of attack type, counter-attack, and attack area scored the most easily while winning in body and match situations.

Energy-efficient Routing in MIMO-based Mobile Ad hoc Networks with Multiplexing and Diversity Gains

  • Shen, Hu;Lv, Shaohe;Wang, Xiaodong;Zhou, Xingming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.2
    • /
    • pp.700-713
    • /
    • 2015
  • It is critical to design energy-efficient routing protocols for battery-limited mobile ad hoc networks, especially in which the energy-consuming MIMO techniques are employed. However, there are several challenges in such a design: first, it is difficult to characterize the energy consumption of a MIMO-based link; second, without a careful design, the broadcasted RREP packets, which are used in most energy-efficient routing protocols, could flood over the networks, and the destination node cannot decide when to reply the communication request; third, due to node mobility and persistent channel degradation, the selected route paths would break down frequently and hence the protocol overhead is increased further. To address these issues, in this paper, a novel Greedy Energy-Efficient Routing (GEER) protocol is proposed: (a) a generalized energy consumption model for the MIMO-based link, considering the trade-off between multiplexing and diversity gains, is derived to minimize link energy consumption and obtain the optimal transmit model; (b) a simple greedy route discovery algorithm and a novel adaptive reply strategy are adopted to speed up path setup with a reduced establishment overhead; (c) a lightweight route maintenance mechanism is introduced to adaptively rebuild the broken links. Extensive simulation results show that, in comparison with the conventional solutions, the proposed GEER protocol can significantly reduce the energy consumption by up to 68.74%.