• Title/Summary/Keyword: 연산 복잡도

Search Result 1,176, Processing Time 0.028 seconds

Design and Simulation of ARM Processor with Floating Point Instructions (부동소수점 명령어를 지원하는 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.187-193
    • /
    • 2020
  • Floating point arithmetic in microprocessor is the computation of addition, subtraction, multiplication, and division of floating point data to improve accuracy. In general, when designing a processor, floating point instructions are often excluded because of its complexity and only integer instructions are provided. However, in order to carry out the computations for not only engineering and technical operations but also artificial intelligence and neural networks that are in the spotlight today, floating point operations must be included. In this paper, we design a 32-bit ARMv4 family of processors with floating-point arithmetic instructions using VHDL and verify with ModelSim. As a result, ARM's floating point instructions are successfully executed.

Design of a Pipelined PC Cluster using Idle PCs on LAN (LAN상의 유휴 PC들을 사용한 파이프라인 방식의 PC Cluster의 설계)

  • Kim, Young-Gyun;Oh, Gil-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11b
    • /
    • pp.1037-1040
    • /
    • 2003
  • 본 논문에서는 LAN 상에서 유휴 PC 들을 연산에 활용하는 PC Cluster 시스템에 대해 연구하였다. 특히, PC 실습실에 있는 PC 들의 유휴시간(Idle time)대를 이용하여 Cluster 연산에 사용함으로써 별도의 전용 클러스터 시스템을 설치하기 위한 하드웨어 및 설치 공간이 필요로 하지 않는다는 장점을 갖는다. PC 실습실의 PC 들은 주간에는 주로 교육 및 실습에 사용되며 오후 6시부터 오전 9시까지의 실습에 사용되지 않는 유휴시간을 CPU-Intensive 한 작업들을 병렬로 수행하는 PC Cluster로 구성하여 저가격의 고성능 시스템을 구축할 수 있다. 그리고 특정 연산을 전담하는 노드들을 지정하고 이 노드들의 연산 결과를 인접한 다른 노드들에게 전달함으로써 연속적인 다음 연산을 적용할 수 있도록 파이프라인(Pipeline) 형태로 구성한다. 파이프라인 형태의 PC Cluster 에서 연산을 겹침(Overlapped)으로서 처리량(Throughput)을 높일 수 있다. LAN으로 연결된 PC 실습실의 PC 들은 인터넷상의 연산 자원들보다 안정되고 신뢰성이 있기 때문에 복잡한 보안 기법을 사용하지 않아도 된다. 또한 연산시간이 유휴시간으로 고정되어 있기 때문에 네트워크의 부하 및 노드의 부하를 고려하는 복잡한 부하균등화 기법이나 스케줄링 기법이 필요로 하지 않는다.

  • PDF

Multilevel Threshold Selection Method Based on Gaussian-Type Finite Mixture Distributions (가우시안형 유한 혼합 분포에 기반한 다중 임계값 결정법)

  • Seo, Suk-T.;Lee, In-K.;Jeong, Hye-C.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.6
    • /
    • pp.725-730
    • /
    • 2007
  • Gray-level histogram-based threshold selection methods such as Otsu's method, Huang and Wang's method, and etc. have been widely used for the threshold selection in image processing. They are simple and effective, but take too much time to determine the optimal multilevel threshold values as the number of thresholds are increased. In this paper, we measure correlation between gray-levels by using the Gaussian function and define a Gaussian-type finite mixture distribution which is combination of the Gaussian distribution function with the gray-level histogram, and propose a fast and effective threshold selection method using it. We show the effectiveness of the proposed through experimental results applied it to three images and the efficiency though comparison of the computational complexity of the proposed with that of Otsu's method.

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

An Ultrasonic Vessel-Pattern Imaging Algorithm with Low Computational Complexity (낮은 연산 복잡도를 지니는 초음파 혈관 패턴 영상 알고리즘)

  • Um, Ji-Yong
    • Journal of IKEEE
    • /
    • v.26 no.1
    • /
    • pp.27-35
    • /
    • 2022
  • This paper proposes an ultrasound vessel-pattern imaging algorithm with low computational complexity. The proposed imaging algorithm reconstructs blood-vessel patterns by only detecting blood flow, and can be applied to a real-time signal processing hardware that extracts an ultrasonic finger-vessel pattern. Unlike a blood-flow imaging mode of typical ultrasound medical imaging device, the proposed imaging algorithm only reconstructs a presence of blood flow as an image. That is, since the proposed algorithm does not use an I/Q demodulation and detects a presence of blood flow by accumulating an absolute value of the clutter-filter output, a structure of the algorithm is relatively simple. To verify a complexity of the proposed algorithm, a simulation model for finger vessel was implemented using Field-II program. Through the behavioral simulation, it was confirmed that the processing time of the proposed algorithm is around 54 times less than that of the typical color-flow mode. Considering the required main building blocks and the amount of computation, the proposed algorithm is simple to implement in hardware such as an FPGA and an ASIC.

An Intra Prediction Hardware Design for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 화면내 예측 하드웨어 설계)

  • Park, Seung-yong;Guard, Kanda;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.875-878
    • /
    • 2015
  • In this paper, we propose an intra prediction hardware architecture with less processing time, computations and reduced hardware area for a high performance HEVC encoder. The proposed intra prediction hardware architecture uses common operation units to reduce computational complexity and uses $4{\times}4$ block unit to reduce hardware area. In order to reduce operation time, common operation unit uses one operation unit to generate predicted pixels and filtered pixels in all prediction modes. Intra prediction hardware architecture introduces the $4{\times}4$ PU design processing to reduce the hardware area and uses intemal registers to support $32{\times}32$ PU processmg. The proposed hardware architecture uses ten common operation units which can reduce execution cycles of intra prediction. The proposed Intra prediction hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 41.5k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 150MHz, it can support 4K UHD video encoding at 30fps in real time, and operates at a maximum of 200MHz.

  • PDF

Diffusive DTW Algorithm for Optimizing Distance Matrix Computation Structure (거리 행렬 연산 구조 최적화를 위한 확산 동적 시간 왜곡(Diffusive DTW) 알고리즘)

  • Kim, Young-tak;Jin, Kyo-hong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.93-96
    • /
    • 2022
  • DTW can eliminate gaps between sequences of different lengths and find out the similarity of patterns, but due to the time and space complexity, it requires a high computational cost on large datasets. In this paper, we propose a DDTW algorithm that not only reduces computational costs but also has no error in the results. In addition, the algorithm complexity of DTW and DDTW is compared by measuring the computational time according to the length of the sequence. Simulation results show a noticeable reduction in computational time in DDTW compared to DTW.

  • PDF

Design of 1-D DCT processor using a new efficient computation sharing multiplier (새로운 연산 공유 승산기를 이용한 1차원 DCT 프로세서의 설계)

  • Lee, Tae-Wook;Cho, Sang-Bock
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.347-356
    • /
    • 2003
  • The OCT algorithm needs efficient hardware architecture to compute inner product. The conventional methods have large hardware complexity. Because of this reason. a computation sharing multiplier was proposed for implementing inner product. However, the existing multiplier has inefficient hardware architecture in precomputer and select units. Therefore it degrades the performance of the multiplier. In this paper, we proposed a new efficient computation sharing multiplier and applied it to implementation of 1-D DCT processor. The comparison results show that the new multiplier is more efficient than an old one when hardware architectures and logic synthesis results were compared. The designed 1-D DCT processor by using the proposed multiplier is more high performance than typical design methods.

An Efficient Bit-serial Systolic Multiplier over GF($2^m$) (GF($2^m$)상의 효율적인 비트-시리얼 시스톨릭 곱셈기)

  • Lee Won-Ho;Yoo Kee-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.62-68
    • /
    • 2006
  • The important arithmetic operations over finite fields include multiplication and exponentiation. An exponentiation operation can be implemented using a series of squaring and multiplication operations over GF($2^m$) using the binary method. Hence, it is important to develop a fast algorithm and efficient hardware for multiplication. This paper presents an efficient bit-serial systolic array for MSB-first multiplication in GF($2^m$) based on the polynomial representation. As compared to the related multipliers, the proposed systolic multiplier gains advantages in terms of input-pin and area-time complexity. Furthermore, it has regularity, modularity, and unidirectional data flow, and thus is well suited to VLSI implementation.

Parallel Pipelined Spatial Join Method for Efficient Query Processing In Distributed Spatial Database Systems (분산 공간 데이터베이스 시스템에서의 효율적인 질의 처리를 위한 병렬 연쇄 공간 죠인 기법)

  • Ko, Ju-Il;Lee, Hwan-Jae;Kim, Myoung-Keun;Lee, Soon-Jo;Bae, Hae-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.11-14
    • /
    • 2002
  • 분산 공간 데이터베이스 시스템에서 자주 수행되는 공간 죠인 질의는 공간 데이터의 특징인 대용량성과 복잡성으로 인하여 공간 연산 수행시 연간을 수행하는 서버의 CPU 및 디스크 I/O상의 과부하를 일으킨다. 본 논문은 이러한 분산 광간 데이터베이스 시스템에서 수행 비용이 많이 드는 원격 사이트간의 공간 죠인 질의를 병렬적이며 연쇄적으로 수행하는 기법을 제안한다. 본 기법은 공간 죠인 연산의 대상이 되는 릴레이션들을 공간 연산의 특성에 따라 순서화하고, 그 중 최하위의 죠인에 참여하는 릴레이션들 중 하나를 이등분 하는 방법으로 공간 죠인 연산을 분리한 추, 질의 수행에 참여하는 두 서버에게 죠인 연산을 분배한다. 각 서버는 분할된 공간 죠인 연산을 동시에 연쇄적으로 저리하고 결과를 병합하여 최종 죠인 결과를 생성한다. 본 기법은 릴레이션을 분할하여 죠인을 수행함으로써 공간 연산에 참여하는 객체의 수를 절반으로 줄이며 R-Tree 등의 공간 인덱스 탐색 횟수와 그 범위를 감소시킨다. 또한 연쇄적인 질의 처리로 죠인의 결과인 임시 릴레이션을 생성하지 않으므로 대용량의 데이터에 대한 복잡한 질의에 대해서도 제한 없이 수행한다.

  • PDF