Search | Korea Science

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.10
- /
- pp.11-21
- /
- 2011
This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.
https://doi.org/10.9708/jksci.2011.16.10.011 인용 PDF KSCI

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms (병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법)

Lee, Kwan-Ho;Kim, Chi-Yong
- Journal of IKEEE
- /
- v.21 no.1
- /
- pp.92-95
- /
- 2017
In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.
https://doi.org/10.7471/ikeee.2017.21.1.92 인용 PDF KSCI

Motion Estimation Specific Instructions and Their Hardware Architecture for ASIP (ASIP을 위한 움직임 추정 전용 연산기 구조 및 명령어 설계)

Hwang, Sung-Jo;SunWoo, Myung-Hoon
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.48 no.3
- /
- pp.106-111
- /
- 2011
This paper presents an ASIP (Application-specific Instruction Processor) for motion estimation that employs specific IME instructions and its programmable and reconfigurable hardware architecture for various video codecs, such as H.264/AVC, MPEG4, etc. With the proposed specific instructions and hardware accelerator, it can handle the real-time processing requirement of High Definition (HD) video. With the parallel operations and SAD unit control using pattern information, the proposed IME instruction supports not only full search algorithm but also other fast search algorithms. The hardware size is 77K gates for each Processing Element Group (PEG) which has 256 SAD PEs. The proposed ASIP runs at 160MHz with sixteen PEGs and it can handle 1080p@30 frame in real time.
PDF KSCI

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.1
- /
- pp.1-9
- /
- 2011
Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.
https://doi.org/10.9708/jksci.2011.16.1.001 인용 PDF KSCI

Dual-mode Pseudorandom Number Generator Extension for Embedded System (임베디드 시스템에 적합한 듀얼 모드 의사 난수 생성 확장 모듈의 설계)

Lee, Suk-Han;Hur, Won;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.8
- /
- pp.95-101
- /
- 2009
Random numbers are used in many sorts of applications. Some applications, like simple software simulation tests, communication protocol verifications, cryptography verification and so forth, need various levels of randomness with various process speeds. In this paper, we propose a fast pseudorandom generator module for embedded systems. The generator module is implemented in hardware which can run in two modes, one of which can generate random numbers with higher randomness but which requires six cycles, the other providing its result within one cycle but with less randomness. An ASIP (Application Specific Instruction set Processor) was designed to implement the proposed pseudorandom generator instruction sets. We designed a processor based on the MIPS architecture,, by using LISA, and have run statistical tests passing the sequence of the Diehard test suite. The HDL models of the processor were generated using CoWare's Processor Designer and synthesized into the Dong-bu 0.18um CMOS cell library using the Synopsys Design Compiler. With the proposed pseudorandom generator module, random number generation performance was 239% faster than software model, but the area increased only 2.0% of the proposed ASIP.
PDF KSCI

Design and Inplementation of S/W for a Davinci-based Smart Camera (다빈치 기반 스마트 카메라 S/W 설계 및 구현)

Yu, Hui-Jse;Chung, Sun-Tae;Jung, Souhwan
- Proceedings of the Korea Contents Association Conference
- /
- 2008.05a
- /
- pp.116-120
- /
- 2008
Smart Camera provides intelligent vision functionalities which can interpret captured video, extract context-aware information and execute a necessary action in real-timeliness in addition to the functionality of network cameras which transmit the compressed acquired videos through networks. Intelligent vision algorithms demand tremendous computations so that real-time processing of computation of intelligent vision algorithms as well as compression and transmission of videos simultaneously is too much burden for a single CPU. Davinci processor of Texas Instruments is a popular ASSP(Application Specific Standard Product) which has dual core architecture of ARM core and DSP core and provides various I/O interfaces as well as networking interface and video acquiring interface necessary for developing digital video embedded applications. In this paper, we report the results of designing and implementing S/W for Davinci-based smart camera. We implement a face detection as an example of vision application and verify the implementation works well. In the future, for the development of a smart camera with more broad and real-time vision functionalities, it is necessary to study about more efficient vision application S/W architecture and optimization of vision algorithms on DSP core of Davichi processor.
PDF

Converting Interfaces on Application-specific Network-on-chip

Han, Kyuseung;Lee, Jae-Jin;Lee, Woojoo
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.17 no.4
- /
- pp.505-513
- /
- 2017
As mobile systems are performing various functionality in the IoT (Internet of Things) era, network-on-chip (NoC) plays a pivotal role to support communication between the tens and in the future potentially hundreds of interacting modules in system-on-chips (SoCs). Owing to intensive research efforts more than a decade, NoCs are now widely adopted in various SoC designs. Especially, studies on application-specific NoCs (ASNoCs) that consider the heterogeneous nature of modern SoCs contribute a significant share to use of NoCs in actual SoCs, i.e., ASNoC connects non-uniform processing units, memory, and other intellectual properties (IPs) using flexible router positions and communication paths. Although it is not difficult to find the prior works on ASNoC synthesis and optimization, little research has addressed the issues how to convert different protocols and data widths to make a NoC compatible with various IPs. Thus, in this paper, we address important issues on ASNoC implementation to support and convert multiple interfaces. Based on the in-depth discussions, we finally introduce our FPGA-proven full-custom ASNoC.
https://doi.org/10.5573/JSTS.2017.17.4.505 인용 PDF KSCI

A Novel Instruction Set for Packet Processing of Network ASIP (패킷 프로세싱을 위한 새로운 명령어 셋에 관한 연구)

Chung, Won-Young;Lee, Jung-Hee;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.9B
- /
- pp.939-946
- /
- 2009
In this paper, we propose a new network ASIP(Application Specific Instruction-set Processor) which was designed for simulation models by a machine descriptions language LISA(Language for Instruction Set Architecture). This network ASIP is aimed for an exclusive engine undertaking packet processing in a router. To achieve the purpose, we added a new necessary instruction set for processing a general ASIP based on MIPS(Microprocessor without Interlock Pipeline Stages) architecture in high speed. The new instructions can be divided into two groups: a classification instruction group and a modification instruction group, and each group is to be processed by its own functional unit in an execution stage. The functional unit was optimized for area and speed through Verilog HDL, and the result after synthesis was compared with the area and operation delay time. Moreownr, it was allocated to the Macro function ana low-level standardized programming language C using CKF(Compiler Known Function). Consequently, we verified performance improvement achieved by analysis and comparison of execution cycles of application programs.
PDF KSCI

Assistant Professor, Department of Computer Engineering Pukyong Universisty (한국형 방송 프로그램 시스템 디코더 ASSP의 개발)

Jo, Gyeong-Yeon
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.5
- /
- pp.1229-1239
- /
- 1996
The increase of additional information broadcasting of TV demands a graphic overlay processor. This paper is about the design, implementation and testing of a graphic overlay processor called by KBPS decoder ASSP (Applicatio n Specific Standard Product) which is compliance with Korea Broadcast Programming System. KBPS decoder ASSP consists of embedded 8 bit microprocessor Z80, graphic overlay controller, KBPS schedule decoder, memory controller, priority interrupt controller, MIDI controller, infrared raccoon receiver, async scrial communication controller, timer, bus controller, universal parallel input-output port and serial-parallel interface. The 0.8 micron CMOS Sea of Gate is used to implement the ASSP in amount of about 31,500 gates, and it is running at 14.318MHz.
PDF

Code Generation Techniques for the Optimized Energy Consumption (최적화된 에너지 소비를 위한 코드 생성 기술)

Ko, Kwang-Man;So, Kyoung-Young
- The Journal of the Korea Contents Association
- /
- v.8 no.12
- /
- pp.63-71
- /
- 2008
Recently, together with a new advent of embedded processor developed to support specific application area, and it evolution, a new study of software development to support the embedded processor and its commercial use has been revitalized. Specially, In a mobile device that is built-in embedded processor, software management is as important as hardware management for the limited power/energy. In this paper, we suggest that the code generation technique considering the energy dissipation through the verified retargetable compiler backend tool, EXPRESSION. For this goals, we describes the efficient code generation patterns and showed the variable performance results.
https://doi.org/10.5392/JKCA.2008.8.12.063 인용 PDF

Search Result 74, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)