• Title/Summary/Keyword: 연산시간 감소

Search Result 401, Processing Time 0.03 seconds

A Study on Performance Improvement of Mobile Rake Finger System for the IMT-2000 (IMT-2000을 위한 이동국 Rake Finger 시스템 성능개선에 관한 연구)

  • 정우열;이선근
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.3
    • /
    • pp.135-142
    • /
    • 2002
  • In this paper, we proposed the new structure of the Rake Finger using Walsh Switch, the shared accumulator and the pipeline FWHT algorithm for reducing the signal processing complexity resulting from the increase of the number of data correlators. The number of computational operation in the proposed data correlators is 160 additions when the number of walsh code channels is 4. As a result, it is reduced about 3.2 times other than the number of computational operation of the conventional ones. Also, the result shows that the data processing time of the proposed Rake Finger architecture is 90,496〔ns〕 and the conventional ones is 110,696〔ns〕. It is 18.3% faster than the data processing time of the conventional Rake Finger architecture.

  • PDF

Fast Coding Unit Decision Algorithm Based on Region of Interest by Motion Vector in HEVC (움직임 벡터에 의한 관심영역 기반의 HEVC 고속 부호화 유닛 결정 방법)

  • Hwang, In Seo;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.41-47
    • /
    • 2016
  • High efficiency video coding (HEVC) employs a coding tree unit (CTU) to improve the coding efficiency. A CTU consists of coding units (CU), prediction units (PU), and transform units (TU). All possible block partitions should be performed on each depth level to obtain the best combination of CUs, PUs, and TUs. To reduce the complexity of block partitioning process, this paper proposes the PU mode skip algorithm with region of interest (RoI) selection using motion vector. In addition, this paper presents the CU depth level skip algorithm using the co-located block information in the previously encoded frames. First, the RoI selection algorithm distinguishes between dynamic CTUs and static CTUs and then, asymmetric motion partitioning (AMP) blocks are skipped in the static CTUs. Second, the depth level skip algorithm predicts the most probable target depth level from average depth in one CTU. The experimental results show that the proposed fast CU decision algorithm can reduce the total encoding time up to 44.8% compared to the HEVC test model (HM) 14.0 reference software encoder. Moreover, the proposed algorithm shows only 2.5% Bjontegaard delta bit rate (BDBR) loss.

Interframe Wavelet Coding by Considering time-band Properties (시간 밴드 특성을 고려한 인터프레임 웨이블릿 부호화)

  • 정세윤;김원하;김규헌;김진웅
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.11a
    • /
    • pp.183-186
    • /
    • 2003
  • 인터프레임 웨이블렛 부호화(Interframe Wavelet Coding)는 3D 서브밴드 부호화라고도 하며, 기존의 DCT 기반 동영상 부호화 방식에 비해 압축 효율이 우수하고, 특히 스케일러빌리티 기능이 뛰어난 부호화 방법이다. 본 논문에서는 기존의 인터프레임 웨이블렛 부호화 방법에서 시간 밴드 영상에 대해 동일한 웨이블렛 필터를 사용하여 공간 웨이블렛 필터를 적용하던 것을, 시간 밴드 영상의 특성을 고려하여 로우 밴드와 하이 밴드에 서로 다른 웨이블렛 필터를 적용하는 방법을 제안하였다. 본 논문에서는 로우밴드에는 9/7 필터를 적용하고 하이 밴드에는 Haar필터를 적용하여 보았다. 이렇게 적용함으로서 부호과정에서 가장 많은 연산량을 필요로하는 역 웨이블렛 변환이 간단하게 되어 복호기의 복잡도가 감소하는 효과가 있다. PSNR 실험에서 기존의 9/7 필터만을 사용하는 경우와 비교한 결과 거의 차이가 없었다.

  • PDF

Fast SHVC Decoder using PU-based On-the-fly Up-Sampling (PU 기반 On-the-fly 업샘플링을 이용한 SHVC 복호화기 고속화 방법)

  • Kim, Seoung-Hwi;Lee, Dongkyu;Chae, Chan-Yup;Sim, Donggyu;Kang, Jung-Won;Oh, Seoung-Jun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.11a
    • /
    • pp.110-113
    • /
    • 2015
  • SHVC(Scalable High efficiency Video Coding)는 다양한 멀티미디어 서비스 환경에서 높은 코딩 효율을 위해 공간적, 시간적, 화질적 스케일러빌리티를 이용한 표준 기술이다. SHVC는 멀티-계층 부/복호화를 수행하기 때문에 싱글-계층인 HEVC(High Efficiency Video Coding) 보다 추가적인 복잡도를 요구한다. 본 논문에서는 SHVC 복호화기의 복잡도를 분석하고 SHVC 복호화기에서 높은 복잡도를 차지하는 프레임 기반 업샘플링을 PU 기반 On-the-fly 업샘플링(On-the-fly Up-sampling) 방법과 SIMD 연산을 통해 고속화 한다. 제안하는 알고리즘이 적용된 SHVC 복호화기는 기존 SHVC 복호화기의 복호화 시간보다 평균 1.23배 고속화 성능을 보이며 업샘플링의 복잡도가 24.7%에서 1.9%로 감소하였다. On-the-fly 업샘플링 과정은 기존 프레임 레벨 업샘플링 과정 대비 평균 90.3% 수행시간 감소율을 보인다.

  • PDF

A Parallel Sphere Decoder Algorithm for High-order MIMO System (고차 MIMO 시스템을 위한 저 복잡도 병렬 구형 검출 알고리즘)

  • Koo, Jihun;Kim, Jaehoon;Kim, Yongsuk;Kim, Jaeseok
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.5
    • /
    • pp.11-19
    • /
    • 2014
  • In this paper, a low complexity parallel sphere decoder algorithm is proposed for high-order MIMO system. It reduces the computational complexity compared to the fixed-complexity sphere decoder (FSD) algorithm by static tree-pruning and dynamic tree-pruning using scalable node operators, and offers near-maximum likelihood decoding performance. Moreover, it also offers hardware-friendly node operation algorithm through fixing the variable computational complexity caused by the sequential nature of the conventional SD algorithm. A Monte Carlo simulation shows our proposed algorithm decreases the average number of expanded nodes by 55% with only 6.3% increase of the normalized decoding time compared to a full parallelized FSD algorithm for high-order MIMO communication system with 16 QAM modulation.

Design of Hash Processor for SHA-1, HAS-160, and Pseudo-Random Number Generator (SHA-1과 HAS-160과 의사 난수 발생기를 구현한 해쉬 프로세서 설계)

  • Jeon, Shin-Woo;Kim, Nam-Young;Jeong, Yong-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1C
    • /
    • pp.112-121
    • /
    • 2002
  • In this paper, we present a design of a hash processor for data security systems. Two standard hash algorithms, Sha-1(American) and HAS-1600(Korean), are implemented on a single hash engine to support real time processing of the algorithms. The hash processor can also be used as a PRNG(Pseudo-random number generator) by utilizing SHA-1 hash iterations, which is being used in the Intel software library. Because both SHA-1 and HAS-160 have the same step operation, we could reduce hardware complexity by sharing the computation unit. Due to precomputation of message variables and two-stage pipelined structure, the critical path of the processor was shortened and overall performance was increased. We estimate performance of the hash processor about 624 Mbps for SHA-1 and HAS-160, and 195 Mbps for pseudo-random number generation, both at 100 MHz clock, based on Samsung 0.5um CMOS standard cell library. To our knowledge, this gives the best performance for processing the hash algorithms.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

A Design of the Vehicle Crisis Detection System(VCDS) based on vehicle internal and external data and deep learning (차량 내·외부 데이터 및 딥러닝 기반 차량 위기 감지 시스템 설계)

  • Son, Su-Rak;Jeong, Yi-Na
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.128-133
    • /
    • 2021
  • Currently, autonomous vehicle markets are commercializing a third-level autonomous vehicle, but there is a possibility that an accident may occur even during fully autonomous driving due to stability issues. In fact, autonomous vehicles have recorded 81 accidents. This is because, unlike level 3, autonomous vehicles after level 4 have to judge and respond to emergency situations by themselves. Therefore, this paper proposes a vehicle crisis detection system(VCDS) that collects and stores information outside the vehicle through CNN, and uses the stored information and vehicle sensor data to output the crisis situation of the vehicle as a number between 0 and 1. The VCDS consists of two modules. The vehicle external situation collection module collects surrounding vehicle and pedestrian data using a CNN-based neural network model. The vehicle crisis situation determination module detects a crisis situation in the vehicle by using the output of the vehicle external situation collection module and the vehicle internal sensor data. As a result of the experiment, the average operation time of VESCM was 55ms, R-CNN was 74ms, and CNN was 101ms. In particular, R-CNN shows similar computation time to VESCM when the number of pedestrians is small, but it takes more computation time than VESCM as the number of pedestrians increases. On average, VESCM had 25.68% faster computation time than R-CNN and 45.54% faster than CNN, and the accuracy of all three models did not decrease below 80% and showed high accuracy.

Design of an Efficient Bit-Parallel Multiplier using Trinomials (삼항 다항식을 이용한 효율적인 비트-병렬 구조의 곱셈기)

  • 정석원;이선옥;김창한
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.13 no.5
    • /
    • pp.179-187
    • /
    • 2003
  • Recently efficient implementation of finite field operation has received a lot of attention. Among the GF($2^m$) arithmetic operations, multiplication process is the most basic and a critical operation that determines speed-up hardware. We propose a hardware architecture using Mastrovito method to reduce processing time. Existing Mastrovito multipliers using the special generating trinomial p($\chi$)=$x^m$+$x^n$+1 require $m^2$-1 XOR gates and $m^2$ AND gates. The proposed multiplier needs $m^2$ AND gates and $m^2$+($n^2$-3n)/2 XOR gates that depend on the intermediate term xn. Time complexity of existing multipliers is $T_A$+( (m-2)/(m-n) +1+ log$_2$(m) ) $T_X$ and that of proposed method is $T_X$+(1+ log$_2$(m-1)+ n/2 ) )$T_X$. The proposed architecture is efficient for the extension degree m suggested as standards: SEC2, ANSI X9.63. In average, XOR space complexity is increased to 1.18% but time complexity is reduced 9.036%.

Fast RSA Montgomery Multiplier and Its Hardware Architecture (고속 RSA 하드웨어 곱셈 연산과 하드웨어 구조)

  • Chang, Nam-Su;Lim, Dae-Sung;Ji, Sung-Yeon;Yoon, Suk-Bong;Kim, Chang-Han
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.1
    • /
    • pp.11-20
    • /
    • 2007
  • A fast Montgomery multiplication occupies important to the design of RSA cryptosystem. Montgomery multiplication consists of two addition, which calculates using CSA or RBA. In terms of CSA, the multiplier is implemented using 4-2 CSA o. 5-2 CSA. In terms of RBA, the multiplier is designed based on redundant binary system. In [1], A new redundant binary adder that performs the addition between two binary signed-digit numbers and apply to Montgomery multiplier was proposed. In this paper, we reconstruct the logic structure of the RBA in [1] for reducing time and space complexity. Especially, the proposed RB multiplier has no coupler like the RBA in [1]. And the proposed RB multiplier is suited to binary exponentiation as modified input and output forms. We simulate to the proposed NRBA using gates provided from SAMSUNG STD130 $0.18{\mu}m$ 1.8V CMOS Standard Cell Library. The result is smaller by 18.5%, 6.3% and faster by 25.24%, 14% than 4-2 CSA, existing RBA, respectively. And Especially, the result is smaller by 44.3% and faster by 2.8% than the RBA in [1].