• Title/Summary/Keyword: 연산지연

Search Result 453, Processing Time 0.026 seconds

Design of a NAND Plash File System for Embedded Devices (임베디드 기기를 위한 NAND 플래시 파일 시스템의 설계)

  • Park Song-Hwa;Lee Tae-Hoon;Chung Ki-Dong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.151-153
    • /
    • 2006
  • 본 논문은 NAND 플래시 메모리를 기반으로 한 임베디드 시스템에서 빠른 부팅을 지원하는 파일 시스템을 제안한다. 플래시 메모리는 비휘발성이며 기존의 하드디스크와 같은 자기 매체에 비해서 크기가 작고 전력소모도 적으며 내구성이 높은 장점을 지니고 있다. 그러나 제자리 덮어쓰기가 불가능하고 지움 연산단위가 쓰기 연산 단위보다 크다. 또한 지움 연산 획수가 제한되는 단점이 있다. 이러한 특성 때문에 기존의 파일 시스템들은 갱신 연산 발생 시, 갱신된 데이터를 다른 위치에 기록한다. 따라서 마운팅 시, 최신의 데이터를 얻기 위해 전체 플래시 메모리 공간을 읽어야만 한다. 이러한 파일 시스템의 마운팅 과정은 전체 시스템의 부팅 시간을 지연시킨다. 본 논문은 임베디드 시스템에서 빠른 부팅을 제공할 수 있는 NAND 플래시 메모리 파일 시스템의 구조를 제안한다. 제안된 시스템은 플래시 메모리 이미지 정보와 메타 데이터 블록만을 읽어 파일 시스템을 구축한다. 메타 데이터가 데이터 위치를 포함하기 때문에 마운팅 시, 전체 플래시 메모리 영역을 읽을 필요가 없으며 파일 데이터 위치 저장을 위한 별도의 자료 구조를 RAM 상에 유지할 필요가 없다. 실험 결과, YAFFS에 비해 $76%{\sim}85%$ 마운팅 시간은 감소시켰다. 또한 YAFFS에 비해 $64%{\sim}75%$ RAM 사용량을 감소시켰다.

  • PDF

A 32-bit Microprocessor with enhanced digital signal process functionality (디지털 신호처리 기능을 강화한 32비트 마이크로프로세서)

  • Moon, Sang-ook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.820-822
    • /
    • 2005
  • We have designed a 32-bit microprocessor with fixed point digital signal processing functionality. This processor, combines both general-purpose microprocessor and digital signal processor functionality using the reduced instruction set computer design principles. It has functional units for arithmetic operation, digital signal processing and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline stucture.

  • PDF

Hardware Design of Efficient Montgomery Multiplier for Low Area RSA (저면적 RSA를 위한 효율적인 Montgomery 곱셈기 하드웨어 설계)

  • Nti, Richard B.;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.575-577
    • /
    • 2017
  • In public key cryptography such as RSA, modular exponentiation is the most time-consuming operation. RSA's modular exponentiation can be computed by repeated modular multiplication. To attain high efficiency for RSA, fast modular multiplication algorithms have been proposed to speed up decryption/encryption. Montgomery multiplication is limited by the carry propagation delay from the addition of long operands. In this paper, we propose a hardware structure that reduces the area of the Montgomery multiplication implementation for lightweight applications of RSA. Experimental results showed that the new design can achieve higher performance and reduce hardware area. A frequency of 884.9MHz and 250MHz were achieved with 84K and 56K gates respectively using the 90nm technology.

  • PDF

Design of an Efficient Turbo Decoder by Initial Threshold Setting (초기 임계값 설정에 의한 효율적인 터보 복호기 설계)

  • 김동한;황선영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.5B
    • /
    • pp.582-591
    • /
    • 2001
  • 터보 부호는 반복적인 복호 알고리즘을 사용함으로써 가산성 백색 가우시안 잡음(AWGN) 채널 환경에서 Shannon 한계에 가까운 성능을 보이는 오류정정 방식으로 제안되었으나, 반복 연산량에 따른 복호 지연과 인터리버에 따른 지연에 의해 실시간 처리의 어려움이라는 문제점을 안고 있다. 본 논문에서는 터보 부호의 성능을 저하시키지 않는 범위에서 적절한 초기 임계값 설정에 따라 불필요한 반복 복호 횟수를 줄일 수 있는 터보 복호기 구조를 제안한다. 적절한 초기 임계값 설정은 LLR(Log-Likelihood Ratio)값의 평균값과 분산, 복호기의 출력에 대한 BER에 근거하여 여러 번의 모의 실험을 통해서 최적의 값으로 결정된다. 제안한 방식은 초기 임계값을 적절히 선택하면 손실이 없는 범위 내에서 반복횟수를 감소시킴으로써 기존의 정해진 반복횟수로 인한 큰 복호 지연을 미연에 방지하고, 이에 따른 계산량 감소는 저전력의 효과도 가져온다. 성능 평가를 위해 BER = $10^{-6}$이내이고, 전송속도가 32kbps 이상인 IMT2000의 고속 데이터 전송 환경에서 모의 실험을 하였다. 실험 결과로 기존의 정해진 반복횟수를 갖는 터보 복호기에 비해 SNR 변동(0~3dB)에서 평균적으로 55~90% 정도의 감소된 반복횟수를 검증하였다.

  • PDF

Performance Comparison of Deep Reinforcement Learning based Computation Offloading in MEC (MEC 환경에서 심층 강화학습을 이용한 오프로딩 기법의 성능비교)

  • Moon, Sungwon;Lim, Yujin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.52-55
    • /
    • 2022
  • 5G 시대에 스마트 모바일 기기가 기하급수적으로 증가하면서 멀티 액세스 엣지 컴퓨팅(MEC)이 유망한 기술로 부상했다. 낮은 지연시간 안에 계산 집약적인 서비스를 제공하기 위해 MEC 서버로 오프로딩하는 특히, 태스크 도착률과 무선 채널의 상태가 확률적인 MEC 시스템 환경에서의 오프로딩 연구가 주목받고 있다. 본 논문에서는 차량의 전력과 지연시간을 최소화하기 위해 로컬 실행을 위한 연산 자원과 오프로딩을 위한 전송 전력을 할당하는 심층 강화학습 기반의 오프로딩 기법을 제안하였다. Deep Deterministic Policy Gradient (DDPG) 기반 기법과 Deep Q-network (DQN) 기반 기법을 차량의 전력 소비량과 큐잉 지연시간 측면에서 성능을 비교 분석하였다.

Path Delay Testing for Micropipeline Circuits (마이크로파이프라인 회로를 위한 지연 고장 테스트)

  • Kang, Yong-Seok;Huh, Kyung-Hoi;Kang, Sung-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.8
    • /
    • pp.72-84
    • /
    • 2001
  • The timings of all computational elements in the micropipeline circuits are important. The previous researches on path delay testing using scan methods make little account of the characteristic of the path delay tests that the second test pattern must be more controllable. In this paper, a new scan latch is proposed which is suitable to path delay testing of the micropipelines and has small area overhead. Results show that path delay faults in the micropipeline circuits using the new scan are testable robustly and the fault coverage is higher than the previous researches. In addition, the new scan latch for path delay faults testing in the micropipeline circuits can be easily expanded to the applications such as BIST for stuck-at faults.

  • PDF

Fast Bit-Serial Finite Field Multipliers (고속 비트-직렬 유한체 곱셈기)

  • Chang, Nam-Su;Kim, Tae-Hyun;Lee, Ok-Suk;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.49-54
    • /
    • 2008
  • In cryptosystems based on finite fields, a modular multiplication operation is the most crucial part of finite field arithmetic. Also, in multipliers with resource constrained environments, bit-serial output structures are used in general. This paper proposes two efficient bit-serial output multipliers with the polynomial basis representation for irreducible trinomials. The proposed multipliers have lower time complexity compared to previous bit-serial output multipliers. One of two proposed multipliers requires the time delay of $(m+1){\cdot}MUL+(m+1){\cdot}ADD$ which is more efficient than so-called Interleaved Multiplier with the time delay of $m{\cdot}MUL+2m{\cdot}ADD$. Therefore, in elliptic curve cryptosystems and pairing based cryptosystems with small characteristics, the proposed multipliers can result in faster overall computation. For example, if the characteristic of the finite fields used in cryprosystems is small then the proposed multipliers are approximately two times faster than previous ones.

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

Design of Efficient NTT-based Polynomial Multiplier (NTT 기반의 효율적인 다항식 곱셈기 설계)

  • Lee, SeungHo;Lee, DongChan;Kim, Yongmin
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.88-94
    • /
    • 2021
  • Public-key cryptographic algorithms such as RSA and ECC, which are currently in use, have used mathematical problems that would take a long time to calculate with current computers for encryption. But those algorithms can be easily broken by the Shor algorithm using the quantum computer. Lattice-based cryptography is proposed as new public-key encryption for the post-quantum era. This cryptographic algorithm is performed in the Polynomial Ring, and polynomial multiplication requires the most processing time. Therefore, a hardware model module is needed to calculate polynomial multiplication faster. Number Theoretic Transform, which called NTT, is the FFT performed in the finite field. The logic verification was performed using HDL, and the proposed design at the transistor level using Hspice was compared and analyzed to see how much improvement in delay time and power consumption was achieved. In the proposed design, the average delay was improved by 30% and the power consumption was reduced by more than 8%.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.