• Title/Summary/Keyword: Arithmetic Operation Algorithm

Search Result 94, Processing Time 0.029 seconds

Efficient Pipeline Architecture of CABAC in H.264/AVC (H.264/AVC의 효율적인 파이프라인 구조를 적용한 CABAC 하드웨어 설계)

  • Choi, Jin-Ha;Oh, Myung-Seok;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.7
    • /
    • pp.61-68
    • /
    • 2008
  • In this paper, we propose an efficient hardware architecture and algorithm to increase an encoding process rate and implement a hardware for CABAC (Context Adaptive Binary Arithmetic Coding) which is used with one of the entropy coding ways for the latest video compression technique, H.264/AVC (Advanced Video Coding). CABAC typically provides a better high compression performance maximum 15% compared with CAVLC. However, the complexity of operation of CABAC is significantly higher than the CAVLC. Because of complicated data dependency during the encoding process, the complexity of operation is higher. Therefore, various architectures were proposed to reduce an amount of operation. However, they have still latency on account of complicated data dependency. The proposed architecture has two techniques to implement efficient pipeline architecture. The one is quick calculation of 7, 8th bits used to calculate a probability is the first step in Binary arithmetic coding. The other is one step reduced pipeline arcbitecture when the type of the encoded symbols is MPS. By adopting these two techniques, the required processing time was reduced about 27-29% compared with previous architectures. It is designed in a hardware description language and total logic gate count is 19K using 0.18um standard cell library.

A Performance Comparison of DSE-MMA and DQE-MMA Adaptive Equalization Algorithm using Dither Signal (Dither 신호를 이용한 DSE-MMA와 DQE-MMA 적응 등화 알고리즘의 성능 비교)

  • Lim, Seung-Gag;You, Jeong-Bong;Kang, Dae-Soo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.45-50
    • /
    • 2022
  • This paper compares the equalization performance of the DSE-MMA (Dithered Signed Error-MMA) and DQE-MMA (Dithered Quantized Error-MMA) adaptive equalization algorithm based on the dither signal in order to reduce the intersymbol interference occurs at communication channel. These algorithm was emerged in ordr to reduction of arithmetic operation than current MMA, it makes the independent and identical distribute the quantized error component by performing the 1 or N bit quautizer after adding the dither singal in obtaining the error signal for adapting process. It is possible to improve the robustness performance of adaptive algorithm, but degrade the MSE performance in steady state by dither signal. The paper directly compare the DSE-MMA and DQE-MMA adaptive equalization performance of the same concept of dithering in the same communication channel and signal to noise ratio by computer simulation. As a result of simulation, the DQE-MMA has more better in the every performance index, equalizer output constellation, residual isi, MSE and SER performance, but not in convergence speed.

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Efficient Hardware Architecture of SEED S-box for Smart Cards

  • Hwang, Joon-Ho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.4 no.4
    • /
    • pp.307-311
    • /
    • 2004
  • This paper presents an efficient architecture that optimizes the design of SEED S-box using composite field arithmetic. SEED is the Korean standard 128-bit block cipher algorithm developed by Korea Information Security Agency. The nonlinear function S-box is the most costly operation in terms. of size and power consumption, taking up more than 30% of the entire SEED circuit. Therefore the S-box design can become a crucial factor when implemented in systems where resources are limited such as smart cards. In this paper, we transform elements in $GF(2^8)$ to composite field $GF(((2^2)^2)^2)$ where more efficient computations can be implemented and transform the computed result back to $GF(2^8)$. This technique reduces the S-box portion to 15% and the entire SEED algorithm can be implemented at 8,700 gates using Samsung smart card CMOS technology.

A Study on Constructing Inverse Element Generator over $GF(3^{m})$

  • Park Chun Myoung;Song Hong Bok
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.514-518
    • /
    • 2004
  • This paper presents an algorithm generating inverse element over finite fields $GF(3^{m})$, and constructing method of inverse element generator based on inverse element generating algorithm. A method computing inverse of an element over $GF(3^{m})$ which corresponds to a polynomial over $GF(3^{m})$ with order less than equal to m-l. Here, the computation is based on multiplication, square and cube method derived from the mathematics properties over finite fields.

  • PDF

A Study on State Synthesis Algorithm for ICSC(InCheon Silicon Compiler) (ICSC(InCheon Silicon Compiler)를 위한 상태 합성알고리즘에 대한 연구)

  • Cho, Joong-Hwee
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.521-524
    • /
    • 1988
  • This paper describes BSDL(Behavioral/Structural Description Language), CDTF(Control Data Text File) and state synthesizer built for use in ICSC(InCheon Silicon Compiler). BSDL describes structral and behaviral specifications of an ASIC(Application Specific IC) for digital system design. ICSC's paser generates CDTF consists of if-then-else, arithmetic and data transfer statement according to each BSDL statement. State synthesizer generates CCG(Control Constraint Graph) in consideration of execution of statement and generates VCG (Variable Constraint Graph) in consideration use of variable generation and use of variable. Also, it involves allocating algorithm operation nodes in the data path and the control path to machine states with minimum state number and as small area as possible.

  • PDF

Implementation of Lattice Reduction-aided Detector using GPU on SDR System (SDR 시스템에서 GPU를 사용한 Lattice Reduction-aided 검출기 구현)

  • Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.3
    • /
    • pp.55-61
    • /
    • 2011
  • This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.

A Study on Minimization Algorithm for ESOP of Multiple - Valued Function (다치 논리 함수의 ESOP 최소화 알고리즘에 관한 연구)

  • Song, Hong-Bok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.7
    • /
    • pp.1851-1864
    • /
    • 1997
  • This paper presents an algorithm simplifying the ESOP function by several rules. The algorithm is repeatedly performing operations based on the state of each terms by the product transformation operation of two functions and thus it is simplifying the ESOP function through the reduction of the product terms. Through the minimization of the product terms of the multi-valued input binary multi-output function, an optimization of the input has been done using EXOR PLA with input decoder. The algorithm when applied to four valued arithmetic circuit has been used for a EXOR logic circuit design and the two bits input decoder has been used for a EXOR-PLA design. It has been found from a computer simulation(IBM PC486) that the suggested algorithm can reduce the product terms of the output function remarkably regardless of the number of input variables when the variable AND-EXOR PLA is applied to the poperation circuit.

  • PDF

Fast Stream Cipher AA32 for Software Implementation (소프트웨어 구현에 적합한 고속 스트림 암호 AA32)

  • Kim, Gil-Ho;Park, Chang-Soo;Kim, Jong-Nam;Cho, Gyeong-Yeon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.6B
    • /
    • pp.954-961
    • /
    • 2010
  • Stream cipher was worse than block cipher in terms of security, but faster in execution speed as an advantage. However, since so far there have been many algorithm researches about the execution speed of block cipher, these days, there is almost no difference between them in the execution speed of AES. Therefore an secure and fast stream cipher development is urgently needed. In this paper, we propose a 32bit output fast stream cipher, AA32, which is composed of ASR(Arithmetic Shifter Register) and simple logical operation. Proposed algorithm is a cipher algorithm which has been designed to be implemented by software easily. AA32 supports 128bit key and executes operations by word and byte unit. As Linear Feedback Sequencer, ASR 151bit is applied to AA32 and the reduction function is a very simple structure stream cipher, which consists of two major parts, using simple logical operations, instead of S-Box for a non-linear operation. The proposed stream cipher AA32 shows the result that it is faster than SSC2 and Salsa20 and satisfied with the security required for these days. Proposed cipher algorithm is a fast stream cipher algorithm which can be used in the field which requires wireless internet environment such as mobile phone system and real-time processing such as DRM(Digital Right Management) and limited computational environments such as WSN(Wireless Sensor Network).

MPEG-4 Audio Decoding Technique using Integer Operations for Real-time Playback on Embedded Processor (휴대용 임베디드 프로세서에서의 MPEG-4 오디오의 실시간 재생을 위한 정수 디코딩 기법)

  • Cha, Kyung-Ae
    • Journal of Broadcast Engineering
    • /
    • v.13 no.3
    • /
    • pp.415-418
    • /
    • 2008
  • Some embedded microprocessors do not have an FPU(Floating Point Unit) due to a circuit complexity and power consumption. The performance speed of MPEG-4 AAC decoder on this hardware environment would be slower than corresponding speed for playing back of the decoded results. Therefore, irritating and high-pitched noises are interleaved in the original the audio data. So, in order to play MPEG-4 AAC file on such PDA, a new algorithm that transforms floating-point arithmetic to one with integers, is needed. We have developed a transformation algorithm from floating-point operation to integer operation and implemented the PDA's AAC Player. We also show the efficiency of our proposed method with the experimental results.