• Title/Summary/Keyword: memory size reduction

Search Result 97, Processing Time 0.024 seconds

Comparison of Fall Detection Systems Based on YOLOPose and Long Short-Term Memory

  • Seung Su Jeong;Nam Ho Kim;Yun Seop Yu
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.139-144
    • /
    • 2024
  • In this study, four types of fall detection systems - designed with YOLOPose, principal component analysis (PCA), convolutional neural network (CNN), and long short-term memory (LSTM) architectures - were developed and compared in the detection of everyday falls. The experimental dataset encompassed seven types of activities: walking, lying, jumping, jumping in activities of daily living, falling backward, falling forward, and falling sideways. Keypoints extracted from YOLOPose were entered into the following architectures: RAW-LSTM, PCA-LSTM, RAW-PCA-LSTM, and PCA-CNN-LSTM. For the PCA architectures, the reduced input size stemming from a dimensionality reduction enhanced the operational efficiency in terms of computational time and memory at the cost of decreased accuracy. In contrast, the addition of a CNN resulted in higher complexity and lower accuracy. The RAW-LSTM architecture, which did not include either PCA or CNN, had the least number of parameters, which resulted in the best computational time and memory while also achieving the highest accuracy.

Performance Analysis of Adaptive Partition Cache Replacement using Various Monitoring Ratios for Non-volatile Memory Systems

  • Hwang, Sang-Ho;Kwak, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.4
    • /
    • pp.1-8
    • /
    • 2018
  • In this paper, we propose an adaptive partition cache replacement policy and evaluate the performance of our scheme using various monitoring ratios to help lifetime extension of non-volatile main memory systems without performance degradation. The proposal combines conventional LRU (Least Recently Used) replacement policy and Early Eviction Zone (E2Z), which considers a dirty bit as well as LRU bits to select a candidate block. In particular, this paper shows the performance of non-volatile memory using various monitoring ratios and determines optimized monitoring ratio and partition size of E2Z for reducing the number of writebacks using cache hit counter logic and hit predictor. In the experiment evaluation, we showed that 1:128 combination provided the best results of writebacks and runtime, in terms of performance and complexity trade-off relation, and our proposal yielded up to 42% reduction of writebacks, compared with others.

A New Predictive EC Algorithm for Reduction of Memory Size and Bandwidth Requirements in Wavelet Transform (웨이블릿 변환의 메모리 크기와 대역폭 감소를 위한 Prediction 기반의 Embedded Compression 알고리즘)

  • Choi, Woo-Soo;Son, Chang-Hoon;Kim, Ji-Won;Na, Seong-Yu;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.7
    • /
    • pp.917-923
    • /
    • 2011
  • In this paper, a new prediction based embedded compression (EC) codec algorithm for the JPEG2000 encoder system is proposed to reduce excessive memory requirements. The EC technique can reduce the 50 % memory requirement for intermediate low-frequency coefficients during multiple discrete wavelet transform (DWT) stages compared with direct implementation of the DWT engine of this paper. The LOCO-I predictor and MAP are widely used in many lossless picture compression codec. The proposed EC algorithm use these predictor which are very simple but surprisingly effective. The predictive EC scheme adopts a forward adaptive quantization and fixed length coding to encoding the prediction error. Simulation results show that our LOCO-I and MAP based EC codecs present only PSNR degradation of 0.48 and 0.26 dB in average, respectively. The proposed algorithm improves the average PSNR by 1.39 dB compared to the previous work in [9].

Synthesis and application of Pt and hybrid Pt-$SiO_2$ nanoparticles and control of particles layer thickness (Pt 나노입자와 Hybrid Pt-$SiO_2$ 나노입자의 합성과 활용 및 입자박막 제어)

  • Choi, Byung-Sang
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.4 no.4
    • /
    • pp.301-305
    • /
    • 2009
  • Pt nanoparticles with a narrow size distribution (dia. ~4 nm) were synthesized via an alcohol reduction method and used for the fabrication of hybrid Pt-$SiO_2$ nanoparticles. Also, the self-assembled monolayer of Pt nanoparticles (NPs) was studied as a charge trapping layer for non-volatile memory (NVM) applications. A metal-oxide-semiconductor (MOS) type memory device with Pt NPs exhibits a relatively large memory window. These results indicate that the self-assembled Pt NPs can be utilized for NVM devices. In addition, it was tried to show the control of thin-film thickness of hybrid Pt-$SiO_2$ nanoparticles indicating the possibility of much applications for the MOS type memory devices.

  • PDF

Twiddle Factor Index Generate Method for Memory Reduction in R2SDF FFT (R2SDF FFT의 메모리 감소를 위한 회전인자 인덱스 생성방법)

  • Yang, Seung-Won;Kim, Yong-Eun;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.5
    • /
    • pp.32-38
    • /
    • 2009
  • FTT(Fast Fourier Transform) processor is widely used in OFDM(Orthogonal Frequency Division Multiplesing) system. Because of the increased requirement of mobility and bandwidth in the OFDM system, they need large point FTT processor. Since the size of memory which stores the twiddle factor coefficients are proportional to the N of FFT size, we propose a new method by which we can reduce the size of the coefficient memory. In the proposed method, we exploit a counter and unsigned multiplier to generate the twiddle factor indices. To verify the proposed algorithm, we design TFCGs(Twiddle Factor Coefficient Generator) for 1024pint FFTs with R2SDF(Radix-2 Single-Path Delay Feedback), $R2^3SDF,\;R2^3SDF,\;R2^4SDF$ architectures. The size of ROM is reduced to 1/8N. In the case of $R2^4SDF$ architecture, the area and the power are reduced by 57.9%, 57.5% respectively.

A Study of Integral Image Hardware Design for Memory Size Efficiency (메모리 크기에 효율적인 적분영상 하드웨어 설계 연구)

  • Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.9
    • /
    • pp.75-81
    • /
    • 2014
  • The integral image is the sum of input image pixel values. It is mainly used to speed up processing of a box filter operation, such as Haar-like features. However, large memory for integral image data can be an obstacle on an embedded hardware environment with limited memory resources. Therefore, an efficient method to store the integral image is necessary. In this paper, we propose a memory size reduction hardware design for integral image. The hardware design is used two methods. It is the new integral image memory and modulo calculation for reducing integral image data. The new integral image memory has additional calculation overhead, but it is not obstacle in hardware environment that parallel processing is possible. In the Xilinx Virtex5-LX330T targeted experimental result, integral image memory can be reduced by 50% on a $640{\times}480$ 8-bit gray-scale input image.

Fast Super-Resolution Algorithm Based on Dictionary Size Reduction Using k-Means Clustering

  • Jeong, Shin-Cheol;Song, Byung-Cheol
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.596-602
    • /
    • 2010
  • This paper proposes a computationally efficient learning-based super-resolution algorithm using k-means clustering. Conventional learning-based super-resolution requires a huge dictionary for reliable performance, which brings about a tremendous memory cost as well as a burdensome matching computation. In order to overcome this problem, the proposed algorithm significantly reduces the size of the trained dictionary by properly clustering similar patches at the learning phase. Experimental results show that the proposed algorithm provides superior visual quality to the conventional algorithms, while needing much less computational complexity.

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Bae Keunsung
    • MALSORI
    • /
    • no.52
    • /
    • pp.111-120
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.

  • PDF

270 MHz Full HD H.264/AVC High Profile Encoder with Shared Multibank Memory-Based Fast Motion Estimation

  • Lee, Suk-Ho;Park, Seong-Mo;Park, Jong-Won
    • ETRI Journal
    • /
    • v.31 no.6
    • /
    • pp.784-794
    • /
    • 2009
  • We present a full HD (1080p) H.264/AVC High Profile hardware encoder based on fast motion estimation (ME). Most processing cycles are occupied with ME and use external memory access to fetch samples, which degrades the performance of the encoder. A novel approach to fast ME which uses shared multibank memory can solve these problems. The proposed pixel subsampling ME algorithm is suitable for fast motion vector searches for high-quality resolution images. The proposed algorithm achieves an 87.5% reduction of computational complexity compared with the full search algorithm in the JM reference software, while sustaining the video quality without any conspicuous PSNR loss. The usage amount of shared multibank memory between the coarse ME and fine ME blocks is 93.6%, which saves external memory access cycles and speeds up ME. It is feasible to perform the algorithm at a 270 MHz clock speed for 30 frame/s real-time full HD encoding. Its total gate count is 872k, and internal SRAM size is 41.8 kB.

Memory Reduction of IFFT Using Combined Integer Mapping for OFDM Transmitters (CIM(Combined Integer Mapping)을 이용한 OFDM 송신기의 IFFT 메모리 감소)

  • Lee, Jae-Kyung;Jang, In-Gul;Chung, Jin-Gyun;Lee, Chul-Dong
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.10
    • /
    • pp.36-42
    • /
    • 2010
  • FFT(Fast Fourier Transform) processor is one of the key components in the implementation of OFDM systems for many wireless standards such as IEEE 802.22. To improve the performances of FFT processors, various studies have been carried out to reduce the complexities of multipliers, memory interface, control schemes and so on. While the number of FFT stages increases logarithmically $log_2N$) as the FFT point-size (N) increases, the number of required registers (or, memories) increases linearly. In large point-size FFT designs, the registers occupy more than 70% of the chip area. In this paper, to reduce the memory size of IFFT for OFDM transmitters, we propose a new IFFT design method based on a combined mapping of modulated data, pilot and null signals. The proposed method focuses on reducing the sizes of the registers in the first two stages of the IFFT architectures since the first two stages require 75% of the total registers. By simulations of 2048-point IFFT design for cognitive radio systems, it is shown that the proposed IFFT design method achieves more than 38.5% area reduction compared with previous IFFT designs.