• Title/Summary/Keyword: Multi-processors

Search Result 213, Processing Time 0.026 seconds

Implementation and Verification of a Multi-Core Processor including Multimedia Specific Instructions (멀티미디어 전용 명령어를 내장한 멀티코어 프로세서 구현 및 검증)

  • Seo, Jun-Sang;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.1
    • /
    • pp.17-24
    • /
    • 2013
  • In this paper, we present a multi-core processor including multimedia specific instructions to process multimedia data efficiently in the mobile environment. Multimedia specific instructions exploit subword level parallelism (SLP), while the multi-core processor exploits data level parallelism (DLP). These combined parallelisms improve the performance of multimedia processing applications. The proposed multi-core processor including multimedia specific instructions is implemented and tested using a Xilinx ISE 10.1 tool and SoCMaster3 testbed system including Vertex 4 FPGA. Experimental results using a fire detection algorithm show that multimedia specific instructions outperform baseline instructions in the same multi-core architecture in terms of performance (1.2x better), energy efficiency (1.37x better), and area efficiency (1.23x better).

Low-Cost Elliptic Curve Cryptography Processor Based On Multi-Segment Multiplication (멀티 세그먼트 곱셈 기반 저비용 타원곡선 암호 프로세서)

  • LEE Dong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.8 s.338
    • /
    • pp.15-26
    • /
    • 2005
  • In this paper, we propose an efficient $GF(2^m)$ multi-segment multiplier architecture and study its application to elliptic curve cryptography processors. The multi-segment based ECC datapath has a very small combinational multiplier to compute partial products, most of its internal data buses are word-sized, and it has only a single m bit multiplexer and a single m bit register. Hence, the resource requirements of the proposed ECC datapath can be minimized as the segment number increases and word-size is decreased. Hence, as compared to the ECC processor based on digit-serial multiplication, the proposed ECC datapath is more efficient in resource usage. The resource requirement of ECC Processor implementation depends not only on the number of basic hardware components but also on the complexity of interconnection among them. To show the realistic area efficiency of proposed ECC processors, we implemented both the ECC processors based on the proposed multi-segment multiplication and digit serial multiplication and compared their FPGA resource usages. The experimental results show that the Proposed multi-segment multiplication method allows to implement ECC coprocessors, requiring about half of FPGA resources as compared to digit serial multiplication.

SAMBA Type MPSoC Bus Architecture Optimization under Performance Constraints (성능 제약 조건 하에서의 SAMBA 형 MPSoC 버스 구조 최적화)

  • Kim, Hong-Yeom;Jung, Sung-Chul;Shin, Hyun-Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.94-101
    • /
    • 2010
  • Optimization of interconnects among processors and memories becomes important as multiple processors and memories can be integrated on a Multi-Processor System-on-Chip (MPSoC). Since the optimal interconnection architecture is usually dependent on the applications, systematic design methodology for various data transfer requirements is necessary. In this paper, we focus on bus interconnection for MPSoC applications which use 4 ~ 16 processors. We propose a new systematic bus design methodology under performance constraints using Single Arbitration Multiple Bus Accesses (SAMBA) style bus architectures. Optimized bus architecture is found to satisfy performance constraints for a single or multiple applications. When compared to the unoptimized architecture, our method can reduce the bus switch logic circuits significantly (by more than 50% sometimes). Furthermore, low cost bus architectures can be found to satisfy the performance constraints for multiple applications.

Time-Efficient Event Processing Using Provisioning-to-Signaling Method in Data Transport Systems Requiring Multiple Processors

  • Kim, Bup-Joong;Ryoo, Jeong-dong;Cho, Kyoungrok
    • ETRI Journal
    • /
    • v.39 no.1
    • /
    • pp.41-50
    • /
    • 2017
  • In connection-oriented data transport services, data loss can occur when a service experiences a problem in its end-to-end path. To resolve the problem promptly, the data transport systems providing the service must quickly modify their internal configurations, which are distributed among different locations within each system. The configurations are modified through a series of problem (event) handling procedures, which are carried out by multiple control processors in the system. This paper proposes a provisioning-to-signaling method for inter-control-processor messaging to improve the time efficiency of event processing. This method simplifies the sharing of the runtime event, and minimizes the time variability caused by the amount of event data, which results in a decrease in the latency time and an increase in the time determinacy when processing global events. The proposed method was tested for an event that required 4,000 internal path changes, and was found to lessen the latency time of global event processing by about 50% compared with the time required for general methods to do the same; in addition, it reduced the impact of the event data on the event processing time to about 30%.

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.1
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

An Optimal SMT Processor Architecture for IPv4 Packet Routing (IPv4 라우팅에 적합한 SMT 아키텍처 개발)

  • 임정빈;홍인표;조정현;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.3A
    • /
    • pp.347-357
    • /
    • 2004
  • Network systems have been developed to meet the high performance of forwarding packets and flexibility for providing various services, so network processor emerged. In order to improve the performance of network processors, fast external interface and special functional units have been used. Recently as an architectural method of improving performance, the SMT(Simultaneous Multi Threading) architecture is proposed, but this architecture is difficult to implement due to its complexity. Therefore research for architectural optimization is needed to develop the SMT network processors. In this paper we analyze each functional units on performing network algorithms and propose an optimized SMT network Processor architecture.

Two-Level Multi-Scan Scheduler Using Resource Partition Strategy by Loose Processor-Affinity

  • Sohn, Jong-Moon;Kim, Gil-Yong
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.3
    • /
    • pp.105-112
    • /
    • 1997
  • The performance of a shared memory multiprocessor system is very sensitive to process scheduling. w can enhance the performance of a whole system as well as of an individual process by taking the multiprocessor characteristics into account in the design of the process scheduler. In this paper, we proposed a general purpose scheduler for a shared memory multiprocessor, called the Two-Level Multi-Scan (TLMS) process scheduler, that considers the processor affinity loosely and decreases the interference among multiple processors greatly. The TLMS scheduler is composed of a local scheduler at each processor and a semi-global scheduler that balances the load among processors. In particular, the semi-global scheduler tries to minimize priority inversion, which is an important factor of the system performance. The TLMS scheduler also tries to reduce the number of resources to be shared and improves the processor utilization. to meet these requirements, th semi-global scheduler interacts with the operation of the local scheduler when a need arises, thus the name is loose processor-affinity. We also show that the proposed scheduling technique can be extended for other types of resources making it a general purpose resource management queue.

  • PDF

Performance Evaluation of a New Scheduling Algorithm for the Simultaneous MultiThreading Microprocessor (동시 다중 쓰레딩 마이크로프로세서를 위한 스케줄링 알고리즘의 성능 평가)

  • Lee Jung-Hoon;Kim Jin Suk
    • The KIPS Transactions:PartA
    • /
    • v.12A no.2 s.92
    • /
    • pp.145-150
    • /
    • 2005
  • Recently, many Processor manufacturers have implemented simultaneous multi treading technology, which can simultaneously execute independent threads in one processor cycle, as a way of increasing processor efficiency, ana one particular example is Hyper Threading. Hyper Threading technology, which enables many logical processors to reside a physical processor, differs from the current multiprocessing environment which has many independent processors, and calls for a particular work assignment method optimized for Hyper Threading environment Thus, in this paper, We have proposed a scheduling algorithm compatible with Hyper Threading technology and analyzed its performance using various methods. As a result, we shall expect its efficient performance by properly understanding and managing Hyper Threading system.

Web Based Monitoring Systems for Multi-Axis Force/Torque Sensors Using Embedded Systems

  • Nam, Hyun-Do;Lim, Hong-Sik;Kang, Chul-Goo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1675-1678
    • /
    • 2004
  • In this paper, web based monitoring systems are implemented for multi-axis force control systems of an intelligence robot. A brief review about the principle of multi-axis force sensors and a method that can reduce the effect of noise signal to sensor performance is presented. A web based monitoring system is implemented by porting Linux at embedded systems which include Xscale processors. A device driver is developed to receive data from multi-axis force sensors in Linux operation systems. To control this device driver, a socket program for web browser is also developed. The experiments are performed to investigate the effectiveness of proposed methods. The experimental results show that the values of force sensors can be monitored by remote PCs.

  • PDF