Search | Korea Science

Design-for-Testability of The Floating-Point DSP Processor (부동 소수점 DSP 프로세서의 테스트 용이 설계)

Yun, Dae-Han;Song, Oh-Young;Chang, Hoon
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.5B
- /
- pp.685-691
- /
- 2001
본 논문은 4단계 파이프 라인과 VLIW (Very Long Instruction Word) 구조를 갖는 FLOVA라는 DSP 프로세서의 테스트용이 설계 기법을 다룬다. Full-scan design, BIST(Built-In-Self-Test), IEEE 1149.1의 기법들이 플립플롭과 floaing point unit, 내장된 메모리, I/O cell 등에 각각 적용되었다. 이러한 기법들은 테스트 용이도의 관점에서 FLOVA의 구조에 적절하게 적용되었다. 본 논문에서는 이와 같이 FLOVA에 적용된 테스트 용이 설계의 특징들을 중심으로 상세하게 기술한다.
PDF

Parallelizing Imperfectly Nested Loops

Kim, Ki-Chang
- Journal of Electrical Engineering and information Science
- /
- v.1 no.1
- /
- pp.140-150
- /
- 1996
Loops are some of the richest program constructs where parallelism is available. Exploiting fine-grain parallelizm out these constructs is particularly important in light of the growing popularity of superscalar and VLIW machines. This paper explains how the fine-grain parallelization techniques can be generalized to handle nested loops. Our technique integrates nested loop parallelization techniques at the fine-grain level, thus exposing more fine-grain parallelism, and is flexible enough to handle non-perfectly nested loops. Examples and some experimental results are presented to illustrate our approach.
PDF

A Predicate-Sensitive Scheduling Algorithm in Instruction-Level Parallelism Processors (ILP 프로세서를 위한 조건실행 지원 스케쥴링 알고리즘)

Yoo, Byung-Kang;Lee, Sang-Jeong
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.1
- /
- pp.202-214
- /
- 1998
Exploitation of instruction-level parallelism(ILP) is an effective mechanism for improving the performance of modern super-scalar and VLIW processors. Various software techniques can be applied to increase ILP. Among these techniques, predicated execution is the one that increases the degree of ILP by allowing instructions from different basic blocks to be converted to a single basic block by removing branch instructions. In this paper, a global predicate-sensitive scheduling algorithm is proposed to improve the performance for ILP processors that support predicated execution. In order to examine the performance of proposed algorithm, a C compiler and a simulator are developed. By simulating various benchmark programs with the compiler and the simulator, the performance results of this algorithm are measured and the effectiveness of the algorithm is verified. As a result of measure performance with I, 2, 4 issue execution, this study was confirmed average performance by 20% or more.
PDF

Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems

Li, Meng;der Perre, Liesbet Van;van Thillo, Wim;Lee, Youngjoo
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.17 no.3
- /
- pp.333-340
- /
- 2017
In this paper, we describe HW/SW co-optimizations for reconfigurable application specific instruction-set processors (ASIPs). Based on our previous very long instruction word (VLIW) ASIP, the proposed framework realizes various forward error-correction (FEC) algorithms for wireless communication systems. In order to enhance the energy efficiency, we newly introduce several design methodologies including high-radix algorithms, task-level out-of-order executions, and intensive resource allocations with loop-level rescheduling. The case study on the radix-4 turbo decoding shows that the proposed techniques improve the energy efficiency by 3.7 times compared to the previous architecture.
https://doi.org/10.5573/JSTS.2017.17.3.333 인용 PDF KSCI

Real-time implementation of the G.728 speech codec using the Vincent6 DSP core (Vincent6 DSP코어를 이용한 G.728 음성 부호화기의 실시간 구현)

성호상
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.131-135
- /
- 2000
본 논문에서는 고성능 고정 소수점 DSP (Digital Signal Processor) 코어인 Vincent6 코어 [1]를 이용하여 ITU-T C.728 음성 부호화기를 실시간으로 구현하였다 G.728 은 16 kb/s전송률의 ITU-T표준 음성 부호화기이며, 입력신호는 8 kHz로 샘플링되며 샘플 당 16 bit 로 양자화된 PCM 신호이다. G.728 은 LD-CELP(Low Delay Code Excited Linear Prediction)라고도 하며, 알고리 듬 delay는 0.625ms 이다. Vincent6 DSP core 는 VLIW (Very-Long Instruction Word) 특성을 가지므로 다중 명령 (multiple instruction)을 수행할 수 있다 이를 위해서 G.728 annex G를 이용하여 고정 소숫점 연산으로 코드를 작성한 후, 이를 vincent6 어셈블리 코드로 구현하였다. 최종적으로 구현된 코드는 ITU-T 의 test vector 에 대 해 bit exact 한 결과를 보이며 34 MCPS (Million Cycles Per Second)의 계산량을 가지며 사용 메모리크기는 데이터 메모리가 약 9KByte, 프로그램 메모리가 약 57 KByte 이다.
PDF

Forecasting the Diffusion of Microprocessor Technology Based on Bibliometrics (Bibliometrics를 이용한 마이크로프로세서의 기술확산 예측)

손소영;안병주
- Journal of Korean Society for Quality Management
- /
- v.28 no.1
- /
- pp.27-40
- /
- 2000
Technological forecasting for microprocessor market can provide timely insight into the prospects for significant technological changes in computer hardware as well as software. In this paper, we use bibliometrics to forecast R&D trend on microprocessor technology. Cumulative numbers of US Patents on several generations of microprocessor technology (pipeline, superpipeline, supersclar and VLIW) approved since 1980 are applied to fit diffusion models. Our study results provide both the maximum market potential and the maturity time for each generation of microprocessor technology. Such information is expected to make contribution on making better decisions with regard to strategic corporate planning, R&D management, product development and investment in new technology of microprocessor.
PDF

Design and Verification of PCI Controller in a Multimedia Processor (멀티미디어 프로세서의 PCI 컨트롤러 디자인 및 검증)

이준희;남상준;김병운;임연호;권영수;경종민
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.499-502
- /
- 1999
This paper presents a PCI (Peripheral Component Interconnect) controller embedded in a multimedia processor, called FLOVA (FLOating point VLIW Architecture), targeting for 3D graphics applications. Fast I/O interfaces are essential for multimedia processors which usually handle large amount of multimedia data. Therefore, in FLOVA, PCI bus is adopted for I/O interface due to fast burst transaction. However, there are several problems in implementation and verification to use burst transaction of PCI. It is difficult to handle data transaction between two units which have two different operating frequency. FLOVA has more higher operating frequency about 100MHz than that of PCI local bus and it makes lower utilization of FLOVA bus. Also, traditional simulation is not sufficient for verification of PCI functionality. In this paper, we propose buffering schemes to implement the PCI controller with wide bandwidth and high bus utilization. Also, this paper shows how to verify the PCI controller using real PCI bus environments before its fabrication.
PDF

Design of a Parallel Pipelined Processor Architecture (병렬 파이프라인 프로세서 아키덱처의 설계)

이상정;김광준
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.3
- /
- pp.11-23
- /
- 1995
In this paper, a parallel pipelined processor model which acts as a small VLIW processor architecture and a scheduling algorithm for extracting instruction-level parallelism on this architecture are proposed. The proposed model has a dual-instruction mode which has maximum 4 basic operations being executed in parallel. By combining these basic operations, variable instruction set can be designed for various applications. The scheduling algorithm schedules basic operations for parallel execution and removes pipeline hazards by examining data dependency and resource conflict relations. In order to examine operation and evaluate the performance,a C compiler and a simulator are developed. By simulating various test programs with the compiler and the simulator, the characteristics and the performance result of the proposed architecture are measured.
PDF

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
- Information and Communications Magazine
- /
- v.25 no.12
- /
- pp.10-18
- /
- 2008
In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.
PDF KSCI

A Software And Hardware Scheme For Reducing The Branch Penalty In Parallel Computers (병렬구조 컴퓨터에서 Branch penalty를 감소시키기 위한 소프트웨어와 하드웨어 방법)

함찬숙;조종현;조영일
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.30B no.11
- /
- pp.11-16
- /
- 1993
VLIW architecture capable of testing multiple conditions in a cycle must support an efficient mechanism for multi-way branches. This paper proposes a mechanism to speed up the execution of multi-way branches and an efficient memory packing method of instructions, which reduced the wasted memory space. Also, we develops a new compiler technique which can transform program segments that are not applied to multi-way branches into ones that are applied to multi-way branches. The benefits gained by the transformation are to reduce branch penalty and to increase instruction-level parallelism.
PDF

Search Result 54, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)