• Title/Summary/Keyword: SIMD 명령어

Search Result 49, Processing Time 0.021 seconds

Efficient Maximum Intensity Projection using SIMD Instruction and Streaming Memory Transfer (단일 명령 복수 데이터 연산과 순차적 메모리 참조를 이용한 효율적인 최대 휘소 투영 볼륨 가시화)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.512-520
    • /
    • 2009
  • Maximum intensity projection (MIP) is a volume rendering method which extracts maximum values along the viewing direction through volume data. It visualizes high-density structures, such as angio-graphic datasets so that it is frequently used in medical imaging systems. We have proposed an efficient two-step MIP acceleration method that uses the recent CPUs. First, we exploited SIMD instructions to reduce conditional branch instructions which take up a considerable part of whole rendering process, so that we improved rendering speed. Second, we proposed a new method, which accesses volume and image data successively by modifying the shear-warp rendering. This method improves memory access patterns so that cache misses are reduced. Using the current CPUs, our method improved the rendering speed by a factor of 7 than that of the shear-warp rendering.

  • PDF

On the Conceptual Design of the SIMD Vector Machine Attachable to SISD Machine (SISD 머신에 부착 가능한 SIMD 벡터 머신의 개념적 설계)

  • Cho Young-Il;Ko Young-Woong
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.263-272
    • /
    • 2005
  • The addressing mode for data is performed by the software in yon Neumann-concept(SISD) computer a priori without hardware design of an address counter for operands. Therefore, in the addressing mode for the vector the corresponding variables as much as the number of the elements should be specified and used also in the software method. This is because not for operand but only for an instructions, quasi PC(program counter) is designed in hardware physically. A vector has a characteristic of a structural dimension. In this paper we propose to design a hardware unit physically external to the CPU for addressing only the elements of a vector unit with the structure and dimension. Because of the high speed performance for a vector processing it should be designed in the SIMD pipeline mechanics. The proposed mechanics is evaluated through a simulation. Our result shows $12\%$ to $30\%$ performance enhancement over CRAY architecture under the same hardware consideration(processing unit).

Parallelization mathod of IDCT with SIMD for fast HEVC decoding (HEVC 고속 복호화를 위한 SIMD 기반의 IDCT 병렬 프로그래밍 기법)

  • Hong, Seungbo;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2013.06a
    • /
    • pp.113-116
    • /
    • 2013
  • 최근 방송, 의료, 우주산업, 게임, UCC, 핸드폰 등 여러 사업 분야에 걸쳐 실제에 근접한 영상을 요구하고 있고 이것은 3D와 Ultra High Definition (UHD) 영상의 출현으로 현실화 되고 있다. UHD 급에 걸맞는 압축률을 위해 Joint Collaborative Team on Video Coding (JCT-VC) 에서는 MPEG-4 Part 10 AVC/H.264를 뒤이을 차세대 코덱으로 High Efficiency Video Coding (HEVC) 를 개발을 시작했다. HEVC는 기존 MPEG-4 Part 10 AVC/H.264코덱과 비교해 40%이상의 압축률을 나타내지만 복잡도 역시 상승했다. 특히 복호화기에서 복잡도는 중요한 요소이며, 역 코사인변환 (Inverse Discrete Cosine Transform, IDCT) 은 전체 복호화시간의 8% ~ 16%를 차지하는 알고리즘이다. 본 논문에서는 IDCT 의 수행시간을 줄이기 위해 병렬프로그래밍 중의 하나인 SIMD명령어를 사용하여 효율적으로 병렬화 프로그래밍을 하는 기법들을 제안한다. 본 제안 기법은 IDCT 수행시간을 평균 59% 단축하는 결과를 보였다.

  • PDF

An Efficient high-speed reverse conversion method of the SIMD base for the decoder of the H.264 (H.264의 복호화기를 위한 SIMD기반의 효율적인 고속 역 변환 방법)

  • Yu Sang-Jun;Kim Seong-Hoon;Oh Seoung-Jun;Sohn Chae-Bong;Ahn Chang-Beom;Park Ho-Chong
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.99-102
    • /
    • 2004
  • 본 논문에서는 SIMD 명령어를 이용하여 H.264 복호화기의 역 정수 변환 과정과 역 양자화 과정을 고속으로 처리 할 수 있는 방법을 제안한다. 제안하는 고속 역 변환 방법을 ZERO 블록에 대하여 역 변환과 역 양자화 과정을 수행하지 않음으로써 속도 향상을 얻을 수 있다. 움직임이 적은 Akiyo 영상에서는 QP=0일 때 참조 코드(reference code)의 역 정수 변환과 역 양자화 과정에 비하여 7.52배, QP=24인 경우 8.1배의 속도 향상을 얻을 수 있다. 또한 움직임이 많은 Stefan 영상에 대해서는 QP=0일 때 고속 역 변환 방법이 참조 코드의 역 정수 변환과 역 양자화 과정에 비하여 6.7배. QP=36인 경우 7.83배의 속도 향상을 얻을 수 있다

  • PDF

A High-Speed SIMD MAC Unit (고속 SIMD형 곱셈 누산기)

  • 조민석;오형철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.694-696
    • /
    • 2004
  • 본 논문에서는 32$\times$32비트 곱셈 연산의 하위 32비트 결과를 한 클록 주기에 얻기 위한, 130MHz 파이프라인용 SIMD형 2단 곱셈 누산기를 설계하였다. 이 과정에서, Booth 부호기의 부분곱의 생성에 소요되는 지연을 줄이면서 부호가 있는 수의 연산을 수행할 수 있는 Booth 부호기를 설계하였다. 생성된 부분곱을 SIMD 명령어에 따라 크기가 선택된 Wallace Tree로 합산하고, 32$\times$32비트 곱셈 연산의 하위 32비트 결과를 제외한 모든 결과들은 두 번째 파이프라인 단에서 얻어지도록 하였다 현재 설계된 SIMD형 곱셈 누산기는 삼성 0.18$\mu\textrm{m}$ 표준 셀로 합성할 때, 1.65V, +1$25^{\circ}C$에서 약 7.61㎱의 임계 경로 지연을 갖는다

  • PDF

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Implementation of Parallel Processor for Sound Synthesis of Guitar (기타의 음 합성을 위한 병렬 프로세서 구현)

  • Choi, Ji-Won;Kim, Yong-Min;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.191-199
    • /
    • 2010
  • Physical modeling is a synthesis method of high quality sound which is similar to real sound for musical instruments. However, since physical modeling requires a lot of parameters to synthesize sound of a musical instrument, it prevents real-time processing for the musical instrument which supports a large number of sounds simultaneously. To solve this problem, this paper proposes a single instruction multiple data (SIMD) parallel processor that supports real-time processing of sound synthesis of guitar, a representative plucked string musical instrument. To control six strings of guitar, we used a SIMD parallel processor which consists of six processing elements (PEs). Each PE supports modeling of the corresponding string. The proposed SIMD processor can generate synthesized sounds of six strings simultaneously when a parallel synthesis algorithm receives excitation signals and parameters of each string as an input. Experimental results using a sampling rate 44.1 kHz and 16 bits quantization indicate that synthesis sounds using the proposed parallel processor were very similar to original sound. In addition, the proposed parallel processor outperforms commercial TI's TMS320C6416 in terms of execution time (8.9x better) and energy efficiency (39.8x better).

Implementation of Multi-Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 알고리즘을 위한 멀티코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.2
    • /
    • pp.45-52
    • /
    • 2011
  • In the past, a patient went to the room where an ultrasound image diagnosis device was set, and then he or she was examined by a doctor. However, currently a doctor can go and examine the patient with a handheld ultrasound device who stays in a room. However, it was implemented with only fundamental functions, and can not meet the high performance required by the focusing algorithm of ultrasound beam which determines the quality of ultrasound image. In addition, low energy consumption was satisfied for the mobile ultrasound device. To satisfy these requirements, this paper proposes a high-performance and low-power single instruction, multiple data (SIMD) based multi-core processor that supports a representative beamforming algorithm out of several focusing methods of mobile ultrasound image signals. The proposed SIMD multi-core processor, which consists of 16 processing elements (PEs), satisfies the high-performance required by the beamforming algorithm by exploiting considerable data-level parallelism inherent in the echo image data of ultrasound. Experimental results showed that the proposed multi-core processor outperforms a commercial high-performance processor, TI DSP C6416, in terms of execution time (15.8 times better), energy efficiency (6.9 times better), and area efficiency (10 times better).

Design and Implementation of a DSP Chip for Portable Multimedia Applications (휴대 멀티미디어 응용을 위한 DSP 칩 설계 및 구현)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.12
    • /
    • pp.31-39
    • /
    • 1998
  • This paper presents the design and implementation of a new multimedia fixed-point DSP (MDSP) core for portable multimedia applications. The MDSP instruction set is designed through the analysis of multimedia algorithms and DSP instruction sets. The MDSP architecture employs parallel processing techniques, such as SIMD and vector processing as well as DSP techniques. The instruction set can handle various data formats and MDSP can perform two MAC operations in parallel. The switching network and packing network can increase the performance by overlapping data rearrangement cycles with computation cycles. We have designed Verilog HDL models and the 0.6 $\mu\textrm{m}$ Samsung KG75000 SOG library is used. The total gate count is 68,831 and the clock frequency is 30 MHz.

  • PDF

Control Unit Design and Implementation for SIMD Programmable Unified Shader (SIMD 프로그래머블 통합 셰이더를 위한 제어 유닛 설계 및 구현)

  • Kim, Kyeong-Seob;Lee, Yun-Sub;Yu, Byung-Cheol;Jung, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.7
    • /
    • pp.37-47
    • /
    • 2011
  • Real picture like high quality computer graphic is widely used in various fields and shader processor, a key part of a graphic processor, has been advanced to programmable unified shader. However, The existing graphic processors have been optimized to commercial algorithms, so development of an algorithm which is not based on it requires an independent shader processor. In this paper, we have designed and implemented a control unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed control unit. Hardware resource usage rate are measured by implementing directly on FPGA Virtex-4 and execution speed are verified by applying ASIC library. the result of an evaluation shows that the control unit has the commands more about 1.5 times compared to the other shader processors that is a behavior similar to the control unit and with a number of processing units used in a shader processor, compared with the other processors, overall performance of the control unit is improved about 3.1 GFLOPS.