• Title/Summary/Keyword: 연산 지도

Search Result 3,998, Processing Time 0.025 seconds

Design of Unified HEVC/VP9 4×4 Transform Block (HEVC/VP9 4×4 Transform 통합 블록 설계)

  • Jung, Seulkee;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.392-399
    • /
    • 2015
  • This paper proposes a unified $4{\times}4$ transform architecture for HEVC and VP9 codec to reduce hardware size. It performs HEVC $4{\times}4$ IDCT, HEVC $4{\times}4$ IDST, VP9 $4{\times}4$ IDCT, and VP9 $4{\times}4$ IADST in a unified hardware. HEVC $4{\times}4$ IDCT and VP9 $4{\times}4$ IDCT have same IDCT computation except for the scales of coefficients. Similarly, HEVC $4{\times}4$ IDST and VP9 $4{\times}4$ IADST have same IDST computation except for the scales of coefficients. Furthermore, IDCT and IDST have quite a lot of similarity, so they can share some hardwares in common. So the proposed hardware performs all 4 operations in a unified hardware, where each operation has its own multiplication coefficients with shared butterfly adders. The synthesized block in 0.18 um technology is 6,679 gates, and the gate count is reduced by 25.3% in comparison with conventional designs.

Design and Fabrication of a Processing Element for 2-D Systolic FFT Array (고속 퓨리어변환용 2차원 시스토릭 어레이를 위한 처리요소의 설계 및 제작)

  • Lee, Moon-Key;Shin, Kyung-Wook;Choi, Byeong-Yoon;,
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.3
    • /
    • pp.108-115
    • /
    • 1990
  • This paper describes the design and fabrication of a processing element that will be used as a component in the construction of a two dimensional systolic for FFT. The chip performs data shuffling and radix-2 decimation-in-time (DIT) butterfly arithmetic. It consists of a data routing unit, internal control logic and HBA unit which computes butterfly arithmetic. The 6.5K transistors processing element designed with standard cells has been fabricated with a 2u'm double metal CMOS process, and evaluated by wafer probing measurements. The measured characteristics show that a HBA can be computed in 0.5 usec with a 20MHz clok, and it is estimated that the FFT of length 1024 can be transformed in 11.2 usec.

  • PDF

Accuracy Analysis of Fixed Point Arithmetic for Hardware Implementation of Binary Weight Network (이진 가중치 신경망의 하드웨어 구현을 위한 고정소수점 연산 정확도 분석)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.805-809
    • /
    • 2018
  • In this paper, we analyze the change of accuracy when fixed point arithmetic is used instead of floating point arithmetic in binary weight network(BWN). We observed the change of accuracy by varying total bit size and fraction bit size. If the integer part is not changed after fixed point approximation, there is no significant decrease in accuracy compared to the floating-point operation. When overflow occurs in the integer part, the approximation to the maximum or minimum of the fixed point representation minimizes the decrease in accuracy. The results of this paper can be applied to the minimization of memory and hardware resource requirement in the implementation of FPGA-based BWN accelerator.

Simplified Factorizing-Technique for Airborne FMCW-SAR Image Reconstruction (항공기 기반 FMCW-SAR 영상복원을 위한 간소화된 분할연산기법)

  • Hwang, Ji-Hwan;Kim, Duk-Jin;Kim, Jin-Woo;Ok, Jae-Woo;Shin, Hee-Sub;You, Eung-Noh
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.28 no.9
    • /
    • pp.723-732
    • /
    • 2017
  • Simplified factorizing-technique to improve the efficiency on computational procedure and the complexity of the conventional back-projection algorithm, which is used to reconstruct airborne FMCW-SAR image, is suggested, and the reconstruction process of SAR image by this simplified factorizing-technique are presented in this paper. This technique can be efficiently applied to airborne FMCW-SAR having a relatively narrow beamwidth and long synthetic aperture length, and its basic rationale is to exclude the data that has low level of contribution during computational procedure. Using the raw data of practical airborne FMCW-SAR system, performances of this proposed technique such as SAR image quality and processing time were compared and analyzed.

High Speed Implementation of LEA on ARMv8 (ARMv8 상에서 LEA 암호화 고속 구현)

  • Seo, Hwa-jeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.10
    • /
    • pp.1929-1934
    • /
    • 2017
  • Lightweight block cipher (Lightweight Encryption Algorithm, LEA), is the most promising block cipher algorithm due to its efficient implementation feature and high security level. The LEA block cipher is widely used in real-field applications and there are many efforts to enhance the performance of LEA in terms of execution timing to achieve the high availability under any circumstances. In this paper, we enhance the performance of LEA block cipher, particularly on ARMv8 processors. The LEA implementation is optimized by using new SIMD instructions namely NEON engine and 24 LEA encryption operations are simultaneously performed in parallel way. In order to reduce the number of memory access, we utilized the all NEON registers to retain the intermediate results. Finally, we evaluated the performance of the LEA implementation, and the proposed implementations on Apple A7 and Apple A9 achieved the 2.4 cycles/byte and 2.2 cycles/byte, respectively.

Design of Hash Processor for SHA-1, HAS-160, and Pseudo-Random Number Generator (SHA-1과 HAS-160과 의사 난수 발생기를 구현한 해쉬 프로세서 설계)

  • Jeon, Shin-Woo;Kim, Nam-Young;Jeong, Yong-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1C
    • /
    • pp.112-121
    • /
    • 2002
  • In this paper, we present a design of a hash processor for data security systems. Two standard hash algorithms, Sha-1(American) and HAS-1600(Korean), are implemented on a single hash engine to support real time processing of the algorithms. The hash processor can also be used as a PRNG(Pseudo-random number generator) by utilizing SHA-1 hash iterations, which is being used in the Intel software library. Because both SHA-1 and HAS-160 have the same step operation, we could reduce hardware complexity by sharing the computation unit. Due to precomputation of message variables and two-stage pipelined structure, the critical path of the processor was shortened and overall performance was increased. We estimate performance of the hash processor about 624 Mbps for SHA-1 and HAS-160, and 195 Mbps for pseudo-random number generation, both at 100 MHz clock, based on Samsung 0.5um CMOS standard cell library. To our knowledge, this gives the best performance for processing the hash algorithms.

A Compressed Hot-Cold Clustering to Improve Index Operation Performance of Flash Memory-SSD Systems (플래시메모리-SSD의 인덱스 연산 성능 향상을 위한 압축된 핫-콜드 클러스터링 기법)

  • Byun, Si-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.1
    • /
    • pp.166-174
    • /
    • 2010
  • SSDs are one of the best media to support portable and desktop computers' storage devices. Their features include non-volatility, low power consumption, and fast access time for read operations, which are sufficient to present flash memories as major database storage components for desktop and server computers. However, we need to improve traditional index management schemes based on B-Tree due to the relatively slow characteristics of flash memory operations, as compared to RAM memory. In order to achieve this goal, we propose a new index management scheme based on a compressed hot-cold clustering called CHC-Tree. CHC-Tree-based index management improves index operation performance by dividing index nodes into hot or cold segments and compressing pointers and keys in the index nodes and clustering the hot or cold segments. The offset compression techniques using unused free area in cold index node lead to reduce the number of slow erase operations in index node insert/delete processes. Simulation results show that our scheme significantly reduces the write and erase operation overheads, improving the index search performance of B-Tree by up to 26 percent, and the index update performance by up to 23 percent.

Design of a Real-time Algorithm Using Block-DCT for the Recognition of Speed Limit Signs (Block-DCT를 이용한 속도 제한 표지판 실시간 인식 알고리듬의 설계)

  • Han, Seung-Wha;Cho, Han-Min;Kim, Kwang-Soo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.12B
    • /
    • pp.1574-1585
    • /
    • 2011
  • This paper proposes a real-time algorithm for speed limit sign recognition for advanced safety vehicle system. The proposed algorithm uses Block-DCT in extracting features from a given ROI(Region Of Interest) instead of using entire pixel values as in previous works. The proposed algorithm chooses parts of the DCT coefficients according to the proposed discriminant factor, uses correlation coefficients and variances among ROIs from training samples to reduce amount of arithmetic operations without performance degradation in classification process. The algorithm recognizes the speed limit signs using the information obtained during training process by calculating LDA and Mahalanobis Distance. To increase the hit rate of recognition, it uses accumulated classification results computed for a sequence of frames. Experimental results show that the hit rate of recognition for sequential frames reaches up to 100 %. When compared with previous works, numbers of multiply and add operations are reduced by 69.3 % and 67.9 %, respectively. Start after striking space key 2 times.

A Variable-Length FFT/IFFT Processor for Multi-standard OFDM Systems (다중표준 OFDM 시스템용 가변길이 FFT/IFFT 프로세서)

  • Yeem, Chang-Wan;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.2A
    • /
    • pp.209-215
    • /
    • 2010
  • This paper describes a design of variable-length FFT/IFFT processor (VL_FCore) for OFDM-based multi-standard communication systems. The VL_FCore adopts in-place single-memory architecture, and uses a hybrid structure of radix-4 and radix-2 DIF algorithms to accommodate various FFT lengths in the range of $N=64{\times}2^k\;(0{\leq}k{\leq}7)$. To achieve both memory size reduction and the improved SQNR, a two-step conditional scaling technique is devised, which conditionally scales the intermediate results of each computational stage. The performance analysis results show that the average SQNR's of 64~8,192-point FFT's are over 60-dB. The VL_FCore synthesized with a $0.35-{\mu}m$ CMOS cell library has 23,000 gates and 32 Kbytes memory, and it can operate with 75-MHz@3.3-V clock. The 64-point and 8,192-point FFT's can be computed in $2.25-{\mu}s$ and $762.7-{\mu}s$, respectively, thus it satisfies the specifications of various OFDM-based systems.

Low-Complexity Speech Enhancement Algorithm Based on IMCRA Algorithm for Hearing Aids (보청기를 위한 IMCRA 기반 저연산 음성 향상 알고리즘)

  • Jeon, Yuyong;Lee, Sangmin
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.4
    • /
    • pp.363-370
    • /
    • 2017
  • In this paper, we proposed a low-complexity speech enhancement algorithm based on a improved minima controlled recursive averaging (IMCRA) and log minimum mean square error (logMMSE). The IMCRA algorithm track the minima value of input power within buffers in local window and identify the speech presence using ratio between input power and its minima value. In this process, many number of operations are required. To reduce the number of operations of IMCRA algorithm, minima value is tracked using time-varying frequency-dependent smoothing based on speech presence probability. The proposed algorithm enhanced speech quality by 2.778%, 3.481%, 2.980% and 2.162% in 0, 5, 10 and 15dB SNR respectively and reduced computational complexity by average 9.570%.