Search | Korea Science

Implementation of MPEG/Audio Decoder based on RISC Processor With Minimized DSP Accelerator (DSP 가속기가 내장된 RISC 프로세서 기반 MPEG/Audio 복호화기의 구현)

Bang Kyoung Ho;Lee Ken Sup;Park Young Cheol;Youn Dae Hee
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.12C
- /
- pp.1617-1622
- /
- 2004
MPEG/Audio decoder for mobile multimedia systems requires low power consumption. Implementations of AV decoder using a single RISC processor often need high power consumption owing to cash-miss in case of insufficient cash memory. In this paper, we present a MPEG/Audio decoder for mobile handset applications and implement it on a RISC processor embedding a minimized DSP accelerator. Audio decoding algorithm is splined into two parts; computation intensive and control intensive parts. Those parts we, respectively, allocated to DSP and RISC core, which are designed to run in parallel to increase the processing efficiency. The proposed system implements MP3 and AAC decoders at l7MHz and 24MHz clocks, which are reductions of 48% and 40% of complexities in comparison with implementations on a single RISC processor. The proposed method is adequate for mobile multimedia applications with insufficient cash memory.
PDF KSCI

Processor Design Technique for Low-Temperature Filter Cache (필터 캐쉬의 저온도 유지를 위한 프로세서 설계 기법)

Choi, Hong-Jun;Yang, Na-Ra;Lee, Jeong-A;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.1
- /
- pp.1-12
- /
- 2010
Recently, processor performance has been improved dramatically. Unfortunately, as the process technology scales down, energy consumption in a processor increases significantly whereas the processor performance continues to improve. Moreover, peak temperature in the processor increases dramatically due to the increased power density, resulting in serious thermal problem. For this reason, performance, energy consumption and thermal problem should be considered together when designing up-to-date processors. This paper proposes three modified filter cache schemes to alleviate the thermal problem in the filter cache, which is one of the most energy-efficient design techniques in the hierarchical memory systems : Bypass Filter Cache (BFC), Duplicated Filter Cache (DFC) and Partitioned Filter Cache (PFC). BFC scheme enables the direct access to the L1 cache when the temperature on the filter cache exceeds the threshold, leading to reduced temperature on the filter cache. DFC scheme lowers temperature on the filter cache by appending an additional filter cache to the existing filter cache. The filter cache for PFC scheme is composed of two half-size filter caches to lower the temperature on the filter cache by reducing the access frequency. According to our simulations using Wattch and Hotspot, the proposed partitioned filter cache shows the lowest peak temperature on the filter cache, leading to higher reliability in the processor.
https://doi.org/10.9708/jksci.2010.15.1.001 인용 PDF KSCI

An Efficient H.264/AVC Entropy Decoder Design (효율적인 H.264/AVC 엔트로피 복호기 설계)

Moon, Jeon-Hak;Lee, Seong-Soo
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.12
- /
- pp.102-107
- /
- 2007
This paper proposes a H.264/AVC entropy decoder without embedded processor nor memory fabrication process. Many researches on H.264/AVC entropy decoders require ROM or RAM fabrication process, which is difficult to be implemented in general digital logic fabrication process. Furthermore, many researches require embedded processors for bitstream manipulation, which increases area and power consumption. This papers proposes hardwired H.264/AVC entropy decoder without embedded processor, which improves data processing speed and reduces power consumption. Furthermore, its CAVLC decoder optimizes lookup table and internal buffer without embedded memory, which reduces hardware size and can be implemented in general digital logic fabrication process without ROM or RAM fabrication process. Designed entropy decoder was embedded in H.264/AVC video decoder, and it was verified to operate correctly in the system. Synthesized in TSMC 90nm fabrication process, its maximum operation frequency is 125MHz. It supports QCIF, CIF, and QVGA image format. Under slight modification of nC register and other blocks, it also support VGA image format.
PDF KSCI

Efficient Loop Accelerator for Motion Estimation Specific Instruction-set Processor (움직임 추정 전용 프로세서를 위한 효율적인 루프 가속기)

Ha, Jae Myung;Jung, Ho Sun;Sunwoo, Myung Hoon
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.7
- /
- pp.159-166
- /
- 2013
This paper proposes an efficient loop accelerator for a motion estimation specific instruction-set processor. ME algorithms in nature contain complex and multiple loop operations. To support efficient hardware (HW) loop operations, this paper introduces four loop instructions and their specific HW architecture. The simulation results show that the proposed loop accelerator can reduce about 29% average instruction cycles for ME early-termination schemes compared with typical implementation having a combination of compare and conditional jump instructions. The proposed loop accelerator of the motion estimation specific instruction-set processor can significantly reduce the number of program memory accesses and greatly save power consumption. Hence, it can be quite suitable for low power and flexible ME implementation.
https://doi.org/10.5573/ieek.2013.50.7.159 인용 PDF KSCI

ECC Processor Supporting Elliptic Curve B-233 over GF(2^m) using 32-b WMM (GF(2^m) 상의 타원곡선 B-233을 지원하는 32-비트 WMM 기반 ECC 프로세서)

Lee, Sang-Hyun;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2018.05a
- /
- pp.169-170
- /
- 2018
이진체 상의 타원곡선 B-233을 지원하는 타원곡선 암호 프로세서를 32-비트 워드기반 몽고메리 곱셈기를 이용하여 설계하였다. 스칼라 곱셈을 위해 수정된 몽고메리 래더 (Modified montgomery ladder) 알고리즘을 적용하여 단순 전력분석에 내성을 갖도록 하였으며, Lopez-Dahab 투영 좌표계와 페르마의 소정리(Fermat's little theorem)를 적용하여 하드웨어 자원 소모가 큰 나눗셈과 역원 연산을 제거하여 저면적으로 설계하였다. 설계된 ECC 프로세서는 Xilinx ISim을 이용하여 기능검증을 하였으며, $0.18{\mu}m$ CMOS 셀 라이브러리로 합성한 결과 100 MHz의 동작 주파수에서 9,614 GEs와 4 Kbit RAM으로 구현되었으며, 최대 동작 주파수는 125 MHz로 예측되었다.
PDF

224-bit ECC Processor supporting the NIST P-224 elliptic curve (NIST P-224 타원곡선을 지원하는 224-비트 ECC 프로세서)

Park, Byung-Gwan;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2017.05a
- /
- pp.188-190
- /
- 2017
투영(projective) 좌표계를 이용한 스칼라 곱셈(scalar multiplication) 연산을 지원하는 224-비트 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. 소수체 GF(p)상의 덧셈, 뺄셈, 곱셈 등의 유한체 연산을 지원하며, 연산량과 하드웨어 자원소모가 큰 나눗셈 연산을 제거함으로써 하드웨어 복잡도를 감소시켰다. 수정된 Montgomery ladder 알고리듬을 이용하여 스칼라 곱셈 연산을 제어하였으며, 단순 전력분석에 보다 안전하다. 스칼라 곱셈 연산은 최대 2,615,201 클록 사이클이 소요된다. 설계된 ECC-P224 프로세서는 Xilinx ISim을 이용한 기능검증을 하였다. Xilinx Virtex5 FPGA 디바이스 합성결과 7,078 슬라이스로 구현되었으며, 최대 79 MHz에서 동작하였다.
PDF

233-bit ECC processor supporting NIST B-233 elliptic curve (NIST B-233 타원곡선을 지원하는 233-비트 ECC 프로세서)

Park, Byung-Gwan;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2016.10a
- /
- pp.158-160
- /
- 2016
전자서명(ECDSA), 키 교환(ECDH) 등에 사용되는 233-비트 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. $GF(2^{333})$ 상의 덧셈, 곱셈, 나눗셈 등의 유한체 연산을 지원하며, 하드웨어 자원 소모가 적은 쉬프트 연산과 XOR 연산만을 이용하여 구현하였다. 스칼라 곱셈은 modified montgomery ladder 알고리듬을 이용하여 구현하였으며, 정수 k의 정보를 노출하지 않고, 단순 전력분석에 보다 안전하다. 스칼라 곱셈 연산은 최대 490,699 클록 사이클이 소요된다. 설계된 ECC 프로세서는 Xilinx ISim을 이용한 시뮬레이션 결과값과 한국인터넷진흥원(KISA)의 참조 구현 값을 비교하여 정상 동작함을 확인하였다. Xilinx Virtex5 XC5VSX95T FPGA 디바이스 합성결과 1,576 슬라이스로 구현되었으며, 189 MHz의 최대 동작주파수를 갖는다.
PDF

Low-Energy Intra-Task Voltage Scheduling using Static Timing Analysis (정적 시간 분석을 이용한 저전력 태스크내 전압 스케줄링)

Sin, Dong-Gun;Kim, Ji-Hong;Lee, Seong-Su
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.11
- /
- pp.561-572
- /
- 2001
Since energy consumption of CMOS circuits has a quadratic dependency on the supply voltage, lowering the supply voltage is the most effective way of reducing energy consumption. We propose an intra-task voltage scheduling algorithm for low-energy hard real-time applications. Based on a static timing analysis technique, the proposed algorithm controls the supply voltage within an individual task boundary. By fully exploiting all the slack times, as scheduled program by the proposed algorithm always complete its execution near the deadline, thus achieving a high energy reduction ratio. In order to validate the effectiveness of the proposed algorithm, we built a software tool that automatically converts a DVS-unaware program into an equivalent low-energy program. Experimental results show that the low-energy version of an MPEG-4 encoder/decoder (converted by the software tool) consumes less than 7~25% of the original program running on a fixed-voltage system with a power-down mode.
PDF

Real-time Scheduling for (m,k)-firm Deadline Tasks on Energy-constrained Multiprocessors (한정된 전력량을 가진 멀티프로세서 시스템에서 (m,k)-firm 데드라인 태스크를 위한 실시간 스케줄링 기법)

Kong, Yeonhwa;Cho, Hyeonjoong
- KIPS Transactions on Computer and Communication Systems
- /
- v.2 no.6
- /
- pp.237-244
- /
- 2013
We propose Energy-constrained Multiprocessor Real-Time Scheduling algorithms for (m,k)-firm deadline constrained tasks (EMRTS-MK). Rather than simply saving as much energy as possible, we consider energy as hard constraint under which the system remains functional and delivers an acceptable performance at least during the prescribed mission time. We evaluate EMRTS-MKs in several experiments, which quantitatively show that they achieve the scheduling objectives.
https://doi.org/10.3745/KTCCS.2013.2.6.237 인용 PDF KSCI

Research on Event Mechanism for Reducing Power Overheads in Cache Memory Synchronization (캐시 메모리 동기화 전력 감소를 위한 이벤트 메커니즘에 대한 연구)

Pak, Young-Jin;Jeong, Ha-Young;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.48 no.3
- /
- pp.69-75
- /
- 2011
In this paper, we propose an anycast event driven synchronization mechanism to reduce power overheads. Our proposed mechanism can reduce unnecessary polling operations on SHI(Snoop Hit Invalidate) or SHR(Snoop Hit Read) states. It prevents waisting bandwidth and reduces power overheads on polling operation. Also it decreases transition power of state change compared to broadcast model. Simulation results indicated that the proposed architecture had about 15.3% of power decrease compared to spin-lock model and about 4.7% of power decrease compared to broadcast model. Overall results indicated that proposed synchronization mechanism could increase power efficiency of multi-core system by reducing power overheads.
PDF KSCI

Search Result 163, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)