• Title/Summary/Keyword: Multi-Processor

Search Result 576, Processing Time 0.023 seconds

A Study on the Parallel Routing in Hybrid Optical Networks-on-Chip (하이브리드 광학 네트워크-온-칩에서 병렬 라우팅에 관한 연구)

  • Seo, Jung-Tack;Hwang, Yong-Joong;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.8
    • /
    • pp.25-32
    • /
    • 2011
  • Networks-on-chip (NoC) is emerging as a key technology to overcome severe bus traffics in ever-increasing complexity of the Multiprocessor systems-on-chip (MPSoC); however traditional electrical interconnection based NoC architecture would be faced with technical limits of bandwidth and power consumptions in the near future. In order to cope with these problems, a hybrid optical NoC architecture which use both electrical interconnects and optical interconnects together, has been widely investigated. In the hybrid optical NoCs, wormhole switching and simple deterministic X-Y routing are used for the electrical interconnections which is responsible for the setup of routing path and optical router to transmit optical data through optical interconnects. Optical NoC uses circuit switching method to send payload data by preset paths and routers. However, conventional hybrid optical NoC has a drawback that concurrent transmissions are not allowed. Therefore, performance improvement is limited. In this paper, we propose a new routing algorithm that uses circuit switching and adaptive algorithm for the electrical interconnections to transmit data using multiple paths simultaneously. We also propose an efficient method to prevent livelock problems. Experimental results show up to 60% throughput improvement compared to a hybrid optical NoC and 65% power reduction compared to an electrical NoC.

A study on the implementation of a digital video/audio system to support multi-audio format (다양한 오디오 포맷을 지원하는 비디오/오디오 시스템 구현에 관한 연구)

  • Park In-Gyu
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.123-132
    • /
    • 2006
  • In this paper, the digital video and audio system is improved so that various digital video data formats in DVD disc, and digital audio data formats through the S/PDIF ports may be decoded. It is not easy to implement all decoding functions of video and audio by a DVD processor. The special structure in audio decoding circuit is proposed in this system so as to have simultaneously almost same video and audio performance in quality. By dividing the decoding circuit separately into video and audio part, the audio quality can be dramatically improved together with supporting several audio formats and with several effects. In order to satisfy the perfect audio system to support to audio decoding formats, it is just enough to get the expensive, complicated decoder. However, it may be not easy to get this expensive decoder in near future. Therefore it is rather to adopt the downloading method by which the host should download the appropriate code into memory by detecting the corresponding audio bit streams. It is proved that this method may be efficient in the point of sharing resource of audio data for video decoding.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.

Application-specific Traffic Generator (응용 프로그램의 특성 반영이 가능한 트래픽 생성기)

  • Yeo, Phil-Koo;Cho, Keol;Yu, Dae-Chul;Hwang, Young-Si;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.9
    • /
    • pp.40-49
    • /
    • 2011
  • Integrating massive components and low-power policies have been actively investigated for system-on-chip designs. But in recent years, finding the optimal interconnection structure among heterogeneous components has emerged as a critical system design issue. Therefore, various simulation tools to model interconnection designs are being developed and performance evaluation of simulation is reflected in the real design. But most of the simulation environments employ traffic generation based on the mathematical probability functions, and such traffic generation cannot fully cover for various situations that may be occurred in the real system. Therefore, the demand for traffic pattern generation based on real applications is increasing. However, there have been few simulators that adopt application-specific traffic generators. This paper proposes a novel traffic generation method in simulating various interconnection structures for multi-processor system-on-chip design. The proposed traffic generation method can generate traffic patterns that can reflect the actual characteristics of the application and evaluate the performance of an interconnection structure under more realistic circumstance than traffic patterns using mathematical probability functions. By comparing the differences between the proposed method and the one based on mathematical probability functions, this paper shows advantages of the proposed traffic generation method.

Parallel SystemC Cosimulation using Virtual Synchronization (가상 동기화 기법을 이용한 SystemC 통합시뮬레이션의 병렬 수행)

  • Yi, Young-Min;Kwon, Seong-Nam;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.12
    • /
    • pp.867-879
    • /
    • 2006
  • This paper concerns fast and time accurate HW/SW cosimulation for MPSoC(Multi-Processor System-on-chip) architecture where multiple software and/or hardware components exist. It is becoming more and more common to use MPSoC architecture to design complex embedded systems. In cosimulation of such architecture, as the number of the component simulators participating in the cosimulation increases, the time synchronization overhead among simulators increases, thereby resulting in low overall cosimulation performance. Although SystemC cosimulation frameworks show high cosimulation performance, it is in inverse proportion to the number of simulators. In this paper, we extend the novel technique, called virtual synchronization, which boosts cosimulation speed by reducing time synchronization overhead: (1) SystemC simulation is supported seamlessly in the virtual synchronization framework without requiring the modification on SystemC kernel (2) Parallel execution of component simulators with virtual synchronization is supported. We compared the performance and accuracy of the proposed parallel SystemC cosimulation framework with MaxSim, a well-known commercial SystemC cosimulation framework, and the proposed one showed 11 times faster performance for H.263 decoder example, while the accuracy was maintained below 5%.

Lightweight Loop Invariant Code Motion for Java Just-In-Time Compiler on Itanium (Itanium상의 자바 적시 컴파일러를 위한 가벼운 루프 불변 코드 이동)

  • Yu Jun-Min;Choi Hyung-Kyu;Moon Soo-Mook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.215-226
    • /
    • 2005
  • Loop invariant code motion (LICM) optimization includes relatively heavy code analyses, thus being not readily applicable to Java Just-In-Time (JIT) compilation where the JIT compilation time is part of the whole running time. 'Classical' LICM optimization first analyzes the code and constructs both the def-use chains and the use-def chains. which are then used for performing code motions. This paper proposes a light-weight LICM algorithm, which requires only the def-use chains of loop invariant code (without use-def chains) by exploiting the fact that the Java virtual machine is based on a stack machine, hence generating code with simpler patterns. We also propose two techniques that allow more code motions than classical LICM techniques. First, unlike previous JIT techniques that uses LICM only in single-path loops for simplicity, we apply LICM to multi-path loops (natural loops) safely for partially redundant code. Secondly, we move loop-invariant, partially-redundant null pointer check code via predication support in Itanium. The proposed techniques were implemented in a JIT compiler for Itanium processor on ORP (Open Runtime Platform) Java virtual machine of Intel. On SPECjvrn98 benchmarks, the proposed technique increases the JIT compilation overhead by the geometric mean of 1.3%, yet it improves the total running time by the geometric mean of 2.2%.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.

Design and Implementation of Initial OpenSHMEM Based on PCI Express (PCI Express 기반 OpenSHMEM 초기 설계 및 구현)

  • Joo, Young-Woong;Choi, Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.105-112
    • /
    • 2017
  • PCI Express is a bus technology that connects the processor and the peripheral I/O devices that widely used as an industry standard because it has the characteristics of high-speed, low power. In addition, PCI Express is system interconnect technology such as Ethernet and Infiniband used in high-performance computing and computer cluster. PGAS(partitioned global address space) programming model is often used to implement the one-sided RDMA(remote direct memory access) from multi-host systems, such as computer clusters. In this paper, we design and implement a OpenSHMEM API based on PCI Express maintaining the existing features of OpenSHMEM to implement RDMA based on PCI Express. We perform experiment with implemented OpenSHMEM API through a matrix multiplication example from system which PCs connected with NTB(non-transparent bridge) technology of PCI Express. The PCI Express interconnection network is currently very expensive and is not yet widely available to the general public. Nevertheless, we actually implemented and evaluated a PCI Express based interconnection network on the RDK evaluation board. In addition, we have implemented the OpenSHMEM software stack, which is of great interest recently.

Design and Implementation of Adaptive Beam-forming System for Wi-Fi Systems (무선랜 시스템을 위한 적응형 빔포밍 시스템의 설계 및 구현)

  • Oh, Joohyeon;Gwag, Gyounghun;Oh, Youngseok;Cho, Sungmin;Oh, Hyukjun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.9
    • /
    • pp.2109-2116
    • /
    • 2014
  • This paper presents the implementation and design of the advanced WI-FI systems with beam-forming antenna that radiate their power to the direction of user equipment to improve the overall throughput, contrast to the general WI-FI systems equipped with omni-antenna. The system consists of patch array antenna, DSP, FPGA, and Qualcomm's commercial chip. The beam-forming system on the FPGA utilizes the packet information from Qualcomm's commercial chip to control the phase shifters and attenuators of the patch array antenna. The PCI express interface has been used to maximize the communication speed between DSP and FPGA. The directions of arrival of users are managed using the database, and each user is distinguished by the MAC address given from the packet information. When the system wants to transmit a packet to one user, it forms beams to the direction of arrival of the corresponding user stored in the database to maximize the throughput. Directions of arrival of users are estimated using the received preamble in the packet to make its SINR as high as possible. The proposed beam-forming system was implemented using an FPGA and Qualcommm's commercial chip together. The implemented system showed considerable throughput improvement over the existing general AP system with omni-directional antenna in the multi-user communication environment.

Development of the HEMP Generation, Propagation Analysis, and Optimal Shelter Design Tool (고 고도 전자기파(HEMP) 발생과 전파해석 및 방호실 최적 설계 Tool 개발)

  • Kim, Dong Il;Min, Gyeong Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.10
    • /
    • pp.2331-2338
    • /
    • 2014
  • The HEMP threat may have acquired new, and urgent, relevance as the proliferation of nuclear weapons and missile technology accelerates of the North Korea, for example, is assessed as already having developed few atomic weapons, and is on the verge of North Korea already has missiles capable of delivering a nuclear warhead against South Korea. ITU K.78, K81 and IEC recommended its counter-measuring for the industrial facilities with navigation and sailing facilities in order to obviate the all of processor equipped system malfunctions from the EMP/HEMP but its simulation must only be done by the computer simulation which had studied on the 1960-1990 years USA/AFWL papers. This result has a significant activities to the South Korea being under the North Korea threat because all of HEMP related products was strongly limited for export. The HEMP cord which was developed newly by the KTI including the HEMP generation & propagation analysis, optimal shelter design tool, essential EM energy attenuation in multi-layered various soils and rocks and HEMP filter design tool. Especially, the least square fitting method was adopted to analysis for the EM energy attenuation in the soils and rocks because it has a various characteristics based on the many times field test reports.