• Title/Summary/Keyword: 연산 지도

Search Result 3,998, Processing Time 0.026 seconds

Differential CORDIC-based High-speed Phase Calculator for 3D Depth Image Extraction from TOF Sensor (TOF 센서용 3차원 깊이 영상 추출을 위한 차동 CORDIC 기반 고속 위상 연산기)

  • Koo, Jung-Youn;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.3
    • /
    • pp.643-650
    • /
    • 2014
  • A hardware implementation of phase calculator for extracting 3D depth image from TOF(Time-Of-Flight) sensor is described. The designed phase calculator adopts redundant binary number systems and a pipelined architecture to improve throughput and speed. It performs arctangent operation using vectoring mode of DCORDIC(Differential COordinate Rotation DIgital Computer) algorithm. Fixed-point MATLAB simulations are carried out to determine the optimal bit-widths and number of iteration. The phase calculator has ben verified by FPGA-in-the-loop verification using MATLAB/Simulink. A test chip has been fabricated using a TSMC $0.18-{\mu}m$ CMOS process, and test results show that the chip functions correctly. It has 82,000 gates and the estimated throughput is 400 MS/s at 400Mhz@1.8V.

A Robust Algorithm for Moving Object Segmentation and VOP Extraction in Video Sequences (비디오 시퀸스에서 움직임 객체 분할과 VOP 추출을 위한 강력한 알고리즘)

  • Kim, Jun-Ki;Lee, Ho-Suk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.430-441
    • /
    • 2002
  • Video object segmentation is an important component for object-based video coding scheme such as MPEG-4. In this paper, a robust algorithm for segmentation of moving objects in video sequences and VOP(Video Object Planes) extraction is presented. The points of this paper are detection, of an accurate object boundary by associating moving object edge with spatial object edge and generation of VOP. The algorithm begins with the difference between two successive frames. And after extracting difference image, the accurate moving object edge is produced by using the Canny algorithm and morphological operation. To enhance extracting performance, we app]y the morphological operation to extract more accurate VOP. To be specific, we apply morphological erosion operation to detect only accurate object edges. And moving object edges between two images are generated by adjusting the size of the edges. This paper presents a robust algorithm implementation for fast moving object detection by extracting accurate object boundaries in video sequences.

A Study on the New Binary Block Matching Algorithm for Motion Estimation of Real time Video Coding (실시간 비디오 압축의 움직임 추정을 위한 새로운 이진 블록 정합 알고리즘에 관한 연구)

  • 이완범;김환용
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.2
    • /
    • pp.126-131
    • /
    • 2004
  • Full search algorithm(FA) provides the best performance but this is usually impractical because of the large number of computations required for large search region. Fast search and conventional Boolean matching algorithms reduce computational complexity and data processing time but this algorithms have disadvantages that is difficult of implementation of hardware because of high control overhead and that is less performance than FA. This paper presents new Boolean matching algorithm, called BCBM(Bit Converted Boolean Matching). Proposed algorithm has performance closed to the FA by Boolean only block matching that may be very efficiently implemented in hardware for real time video communication. Simulation results show that the PSNR of the proposed algorithm is about 0.08㏈ loss than FA but is about 0.96∼2.02㏈ gain than fast search algorithm and conventional Boolean matching algorithm.

  • PDF

frequency Domain processor nor ADSL G.LITE Modem (ADSL G.LITE모뎀을 위한 주파수 영역 프로세서의 설계)

  • 고우석;기준석;고태호;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12C
    • /
    • pp.233-239
    • /
    • 2001
  • Among the operations in frequency domain for ADSL G.LITE Modem to perform, FFT and FEQ are most computation-intensive part, of which many researches have been focused on the efficient implementation. Previous papers suggested hardwares suitable for ADSL G.DMT system, which is not feasible for simple G.LITE system. The analysis of frequency domain operations and computational efficiency according to the allocation of hardware resources is performed in this paper. The suggested processor has the structure of one real multiplier and two real adders connected in parallel, which can perform the operations efficiently through the pipeline- and/or parallel-type job scheduling. The suggested processor uses less hardware resources than Kiss\`s ALU structure or FFT/IFFT processor suggested by Wang, so the suggested one is more suitable for G.LITE system than previous works.

  • PDF

Design of Parallel Decimal Floating-Point Arithmetic Unit for High-speed Operations (고속 연산을 위한 병렬 구조의 십진 부동소수점 연산 장치 설계)

  • Yun, Hyoung-Kie;Moon, Dai-Tchul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.12
    • /
    • pp.2921-2926
    • /
    • 2013
  • In this paper, a decimal floating-point arithmetic unit(DFP) was proposed and redesigned to support high speed arithmetic operation employed parallel processing technique. The basic architecture of the proposed DFP was based on the L.K.Wang's DFP and improved it enabling high speed operation by parallel processing for two operands with same size of exponent. The proposed DFP was synthesized as a target device of xc2vp30-7ff896 using Xilinx ISE and verified by simulation using Flowrian tool of System Centroid co. Compared to L.K.Wang's DFP and reference [6]'s method, the proposed DFP improved data processing speed about 8.4% and 3% respectively in case of same input data.

Low Area Hardware Design of Efficient SAO for HEVC Encoder (HEVC 부호기를 위한 효율적인 SAO의 저면적 하드웨어 설계)

  • Cho, Hyunpyo;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.169-177
    • /
    • 2015
  • This paper proposes a hardware architecture for an efficient SAO(Sample Adaptive Offset) with low area for HEVC(High Efficiency Video Coding) encoder. SAO is a newly adopted technique in HEVC as part of the in-loop filter. SAO reduces mean sample distortion by adding offsets to reconstructed samples. The existing SAO requires a great deal of computational and processing time for UHD(Ultra High Definition) video due to sample by sample processing. To reduce SAO processing time, the proposed SAO hardware architecture processes four samples simultaneously, and is implemented with a 2-step pipelined architecture. In addition, to reduce hardware area, it has a single architecture for both luma and chroma components and also uses optimized and common operators. The proposed SAO hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 190k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 200MHz, it can support 4K UHD video encoding at 60fps in real time, but operates at a maximum of 250MHz.

Design of Serial Decimal Multiplier using Simultaneous Multiple-digit Operations (동시연산 다중 digit을 이용한 직렬 십진 곱셈기의 설계)

  • Yu, ChangHun;Kim, JinHyuk;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.4
    • /
    • pp.115-124
    • /
    • 2015
  • In this paper, the method which improves the performance of a serial decimal multiplier, and the method which operates multiple-digit simultaneously are proposed. The proposed serial decimal multiplier reduces the delay by removing encoding module that generates 2X, 4X multiples, and by generating partial product using shift operation. Also, this multiplier reduces the number of operations using multiple-digit operation. In order to estimate the performance of the proposed multiplier, we synthesized the proposed multiplier with design compiler with SMIC 110nm CMOS library. Synthesis results show that the area of the proposed serial decimal multiplier is increased by 4%, but the delay is reduced by 5% compared to existing serial decimal multiplier. In addition, the trade off between area and latency with respect to the number of concurrent operations in the proposed multiple-digit multiplier is confirmed.

Design of an Efficient MAC Unit for RSA Cryptoprocessors (RSA 암호화 프로세서에 적용 가능한 효율적인 누적곱셈 연산기 설계)

  • Moon, Sang-Gook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.1
    • /
    • pp.65-70
    • /
    • 2008
  • RSA crypto-processors equipped with more than 1024 bits of key space handle the entire key stream in units of blocks. The RSA processor which will be the target design in this paper defines the length of the basic word as 128 bits, and uses an 256-bits register as the accumulator. For efficient execution of 128-bit multiplication, 32b${\times}$32b multiplier was designed and adopted and the results are stored in 8 separate 128-bit registers according to the status flag. In this paper, an efficient method to execute 128-bit MAC (multiplication and accumulation) operation is proposed. The suggested method pre-analyze the all possible cases so that the MAC unit can remove unnecessary calculations to speed up the execution. The proposed architecture prototype of the MAC unit was automatically synthesized, and successfully operated at 20MHz, which will be the operation frequency in the target RSA processor.

A Study on Implementation of the High Speed Feature Extraction System Based on Block Type Classification (블록 유형 분류 알고리즘 기반 고속 특징추출 시스템 구현에 관한 연구)

  • Lee, Juseong;An, Ho-Myoung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.3
    • /
    • pp.186-191
    • /
    • 2019
  • In this paper, we propose a implementation approach of the high-speed feature extraction algorithm. The proposed method is based on the block type classification algorithm which reduces the computation time when target macro block is divided to smooth block type that has no image features. It is quantitatively identified that occurs at 29.5% of the total image using 200 standard test images with $64{\times}64$ macro block size. This means that within a standard test image containing various image information, 29.5% can reduce the complexity of the operation. When the proposed approach is applied to the Canny edge detection, the required latency of the edge detection can be completely eliminated, such as 2D derivative filter, gradient magnitude/direction computation, non-maximal suppression, adaptive threshold calculation, hysteresis thresholding. Also, it is expected that operation time of the feature detection can be reduced by applying block type classification algorithm to various feature extraction algorithms in this way.

Evaluating Computational Efficiency of Spatial Analysis in Cloud Computing Platforms (클라우드 컴퓨팅 기반 공간분석의 연산 효율성 분석)

  • CHOI, Changlock;KIM, Yelin;HONG, Seong-Yun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.119-131
    • /
    • 2018
  • The increase of high-resolution spatial data and methodological developments in recent years has enabled a detailed analysis of individual experiences in space and over time. However, despite the increasing availability of data and technological advances, such individual-level analysis is not always possible in practice because of its computing requirements. To overcome this limitation, there has been a considerable amount of research on the use of high-performance, public cloud computing platforms for spatial analysis and simulation. The purpose of this paper is to empirically evaluate the efficiency and effectiveness of spatial analysis in cloud computing platforms. We compare the computing speed for calculating the measure of spatial autocorrelation and performing geographically weighted regression analysis between a local machine and spot instances on clouds. The results indicate that there could be significant improvements in terms of computing time when the analysis is performed parallel on clouds.