• Title/Summary/Keyword: arithmetic processors

Search Result 30, Processing Time 0.057 seconds

Optimized AntNet-Based Routing for Network Processors (네트워크 프로세서에 적합한 개선된 AntNet기반 라우팅 최적화기법)

  • Park Hyuntae;Bae Sung-il;Ahn Jin-Ho;Kang Sungho
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.5 s.335
    • /
    • pp.29-38
    • /
    • 2005
  • In this paper, a new modified and optimized AntNet algorithm which can be implemented efficiently onto network processor is proposed. The AntNet that mimics the activities of the social insect is an adaptive agent-based routing algorithm. This method requires a complex arithmetic calculating system. However, since network processors have simple arithmetic units for a packet processing, it is very difficult to implement the original AntNet algorithm on network processors. Therefore, the proposed AntNet algorithm is a solution of this problem by decreasing arithmetic executing cycles for calculating a reinforcement value without loss of the adaptive performance. The results of the simulations show that the proposed algorithm is more suitable and efficient than the original AntNet algorithm for commercial network processors.

Study of Modular Multiplication Methods for Embedded Processors

  • Seo, Hwajeong;Kim, Howon
    • Journal of information and communication convergence engineering
    • /
    • v.12 no.3
    • /
    • pp.145-153
    • /
    • 2014
  • The improvements of embedded processors make future technologies including wireless sensor network and internet of things feasible. These applications firstly gather information from target field through wireless network. However, this networking process is highly vulnerable to malicious attacks including eavesdropping and forgery. In order to ensure secure and robust networking, information should be kept in secret with cryptography. Well known approach is public key cryptography and this algorithm consists of finite field arithmetic. There are many works considering high speed finite field arithmetic. One of the famous approach is Montgomery multiplication. In this study, we investigated Montgomery multiplication for public key cryptography on embedded microprocessors. This paper includes helpful information on Montgomery multiplication implementation methods and techniques for various target devices including 8-bit and 16-bit microprocessors. Further, we expect that the results reported in this paper will become part of a reference book for advanced Montgomery multiplication methods for future researchers.

A design of transcendental function arithmetic unit for lighting operation of mobile 3D graphic processor (모바일 3차원 그래픽 프로세서의 조명처리 연산을 위한 초월함수 연산기 구현)

  • Lee, Sang-Hun;Lee, Chan-Ho
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.715-718
    • /
    • 2005
  • Mobile devices is getting to include more functions according to the demand of digital convergence. Applications based on 3D graphic calculation such as 3D games and navigation are one of the functions. 3D graphic calculation requires heavy calculation. Therefore, we need dedicated 3D graphic hardware unit with high performance. 3D graphic calculation needs a lot of complicated floating-point arithmetic operation. However, most of current mobile 3D graphics processors do not have efficient architecture for mobile devices because they are based on those for conventional computer systems. In this paper, we propose arithmetic units for special functions of lighting operation of 3D graphics. Transcendental arithmetic units are designed using approximation of logarithm function. Special function units for lighting operation such as reciprocal, square root, reciprocal of square root, and power can be obtained. The proposed arithmetic unit has lower error rate and smaller silicon area than conventional arithmetic architecture.

  • PDF

Hardware Design of High Performance Arithmetic Unit with Processing of Complex Data for Multimedia Processor (복소수 데이터 처리가 가능한 멀티미디어 프로세서용 고성능 연산회로의 하드웨어 설계)

  • Choi, Byeong-yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.1
    • /
    • pp.123-130
    • /
    • 2016
  • In this paper, a high-performance arithmetic unit which can efficiently accelerate a number of algorithms for multimedia application was designed. The 3-stage pipelined arithmetic unit can execute 38 operations for complex and fixed-point data by using efficient configuration for four 16-bit by 16-bit multipliers, new sign extension method for carry-save data, and correction constant scheme to eliminate sign-extension in compression operation of multiple partial multiplication results. The arithmetic unit has about 300-MHz operating frequency and about 37,000 gates on 45nm CMOS technology and its estimated performance is 300 MCOPS(Million Complex Operations Per Second). Because the arithmetic unit has high processing rate and supports a number of operations dedicated to various applications, it can be efficiently applicable to multimedia processors.

A Fixed-point Digital Signal Processor Development System Employing an Automatic Scaling (자동 스케일링 기능이 지원되는 고정 소수집 디지털 시그날 프로세서 개발 시스템)

  • 김시현;성원용
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.29A no.3
    • /
    • pp.96-105
    • /
    • 1992
  • The use of fixed-point digital signal processors, such as the TMS 320C25, requires scaling of data at each arithmetic step to prevent overflows while keeping the accuracy. A software which automatizes this process is developed for TMS 320C25. The programmers use a model of a hypothetical floating-point digital signal processor and a floating-point format for data representation. However, the program and data are automatically translated to a fixed-point version by this software. Thus, the execution speed is not sacrificed. A fixed-point variable has a unique binary-point location, which is dependent on the range of the variable. The range is estimated from the floating-point simulation. The number of shifts needed for arithmetic or data transfer step is determined by the binary-points of the variables associated with the operation. A fixed-point code generator is also developed by using the proposed automatic scaling software. This code generator produces floating-point assembly programs from the specifiations of FIR, IIR, and adaptive transversal filters, then floating-point programs are transformed to fixed-point versions by the automatic scaling software.

  • PDF

A Hardware Implementation of Ogg Vorbis Audio Decoder with Embedded Processor

  • Kosaka, Atsushi;Yamaguchi, Satoshi;Okuhata, Hiroyuki;Onoye, Takao;Shirakawa, Isao
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.94-97
    • /
    • 2002
  • A VLSI architecture of an Ogg Vorbis decoder is proposed : which is dedicated to portable audio appliances. Referring to the computational cost analysis of the decoding processes, the LSP (Line Spectrum Pair) process, which takes more than 50% of the total processing time, can be regarded as a bottleneck to achieve realtime processing by embedded Processors. Thus in our decoder a specific hardware architecture is devised for the LSP process so as to be integrated into a single chip together with an ARM7TDMI processor. In addition, in order to reduce the total hardware cost, instead of the floating point arithmetic, the fixed point arithmetic is adopted. The LSP module has been implemented with 9,740 gates by using a Virtual Silicon 0.l5$\mu\textrm{m}$ CMOS technology, which operates at 58.8MHz with the total CPU load reduced by 57%. It is also verified that the use of the fixed point arithmetic does not incur any significant sound distortion.

  • PDF

A Design of a 8-Thread Graphics Processor Unit with Variable-Length Instructions

  • Lee, Kwang-Yeob;Kwak, Jae-Chang
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.3
    • /
    • pp.285-288
    • /
    • 2008
  • Most of multimedia processors for 2D/3D graphics acceleration use a lot of integer/floating point arithmetic units. We present a new architecture with an efficient ALU, built in a smaller chip size. It reduces instruction cycles significantly based on a foundation of multi-thread operation, variable length instruction words, dual phase operation, and phase instruction's coordination. We can decrease the number of instruction cycles up to 50%, and can achieve twice better performance.

Algebraic Accuracy Verification for Division-by-Convergence based 24-bit Floating-point Divider Complying with OpenGL (Division-by-Convergence 방식을 사용하는 24-비트 부동소수점 제산기에 대한 OpenGL 정확도의 대수적 검증)

  • Yoo, Sehoon;Lee, Jungwoo;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.346-351
    • /
    • 2013
  • Low-cost and low-power are important requirements in mobile systems. Thus, when a floating-point arithmetic unit is needed, 24-bit floating-point format can be more useful than 32-bit floating-point format. However, a 24-bit floating-point arithmetic unit can be risky because it usually has lower accuracy than a 32-bit floating-point arithmetic unit. Consecutive floating-point operations are performed in 3D graphic processors. In this case, the verification of the floating-point operation accuracy is important. Among 3D graphic arithmetic operations, the floating-point division is one of the most difficult operations to satisfy the accuracy of $10^{-5}$ which is the required accuracy in OpenGL ES 3.0. No 24-bit floating-point divider, whose accuracy is algebraically verified, has been reported. In this paper, a 24-bit floating-point divider is analyzed and it is algebraically verified that its accuracy satisfies the OpenGL requirement.

Design and Simulation of ARM Processor with Floating Point Instructions (부동소수점 명령어를 지원하는 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.187-193
    • /
    • 2020
  • Floating point arithmetic in microprocessor is the computation of addition, subtraction, multiplication, and division of floating point data to improve accuracy. In general, when designing a processor, floating point instructions are often excluded because of its complexity and only integer instructions are provided. However, in order to carry out the computations for not only engineering and technical operations but also artificial intelligence and neural networks that are in the spotlight today, floating point operations must be included. In this paper, we design a 32-bit ARMv4 family of processors with floating-point arithmetic instructions using VHDL and verify with ModelSim. As a result, ARM's floating point instructions are successfully executed.

Efficient programmable power-of-two scaler for the three-moduli set {2n+p, 2n - 1, 2n+1 - 1}

  • Taheri, MohammadReza;Navi, Keivan;Molahosseini, Amir Sabbagh
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.596-607
    • /
    • 2020
  • Scaling is an important operation because of the iterative nature of arithmetic processes in digital signal processors (DSPs). In residue number system (RNS)-based DSPs, scaling represents a performance bottleneck based on the complexity of intermodulo operations. To design an efficient RNS scaler for special moduli sets, a body of literature has been dedicated to the study of the well-known moduli sets {2n - 1, 2n, 2n + 1} and {2n, 2n - 1, 2n+1 - 1}, and their extension in vertical or horizontal forms. In this study, we propose an efficient programmable RNS scaler for the arithmetic-friendly moduli set {2n+p, 2n - 1, 2n+1 - 1}. The proposed algorithm yields high speed and energy-efficient realization of an RNS programmable scaler based on the effective exploitation of the mixed-radix representation, parallelism, and a hardware sharing technique. Experimental results obtained for a 130 nm CMOS ASIC technology demonstrate the superiority of the proposed programmable scaler compared to the only available and highly effective hybrid programmable scaler for an identical moduli set. The proposed scaler provides 43.28% less power consumption, 33.27% faster execution, and 28.55% more area saving on average compared to the hybrid programmable scaler.