• Title/Summary/Keyword: Pipelining

Search Result 140, Processing Time 0.03 seconds

An optimization of synchronous pipeline design for IP-based H.264 decoder design (IP기반 H.264 디코더 설계를 위한 동기화 파이프라인 최적화)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.407-408
    • /
    • 2008
  • This paper presents a synchronous pipeline design for IP-based H.264 decoding system. The first optimization for pipelining aims at efficiently resolving the data dependency due to motion compensation/intra prediction feedback data flow in H.264 decoder. The second one would enhance the efficiency of execution per each pipelining stage to explore the optimized latency and stage number. Thus, the 3 stage pipeline of CAVLD&ITQ|MC/IP&Rec.|DF is obtained to yield the best throughput and implementation. In experiments, it is found that the synchronous pipelined H.264 decoding system, based on existing IPs, could deal with Full HD video at 125.34MHz, in real time.

  • PDF

A design of synchronous nonlinear and parallel for pipeline stage on IP-based H.264 decoder implementation (IP기반 H.264 디코더 설계를 위한 동기식 비선형 및 병렬화 파이프라인 설계)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.409-410
    • /
    • 2008
  • This paper presents nonlinear and parallel design for synchronous pipelining in IP-based H.264 decoder implementation. Since H.264 decoder includes the dataflow of feedback loop, the data dependency requires one NOP stage per pipelining latency to drop the throughput into 1/2. Further, it is found that, in execution time, the stage scheduled for MC is more occupied than that for CAVLD/ITQ/DF. The less efficient stage would be improved by nonlinear scheduling, while the fully-utilized stage could be accelerated by parallel scheduling of IP. The optimization yields 3 nonlinear {CAVLD&ITQ}|3 parallel (MC/IP&Rec.)| 3 nonlinear {DF} pipelined architecture for IP-based H.264 decoder. In experiments, the nonlinear and parallel pipelined H.264 decoder, including existing IPs, could deal with full HD video at 41.86MHz, in real time processing.

  • PDF

VLSI Implementation for the MPDSAP Adaptive Filter

  • Choi, Hun;Kim, Young-Min;Ha, Hong-Gon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.238-243
    • /
    • 2010
  • A new implementation method for MPDSAP(Maximally Polyphase Decomposed Subband Affine Projection) adaptive filter is proposed. The affine projection(AP) adaptive filter achieves fast convergence speed, however, its implementation is so expensive because of the matrix inversion for a weight-updating of adaptive filter. The maximally polyphase decomposed subband filtering allows the AP adaptive filter to avoid the matrix inversion, moreover, by using a pipelining technique, the simple subband structured AP is suitable for VLSI implementations concerning throughput, power dissipation and area. Computer simulations are presented to verify the performance of the proposed algorithm.

MLP Design Method Optimized for Hidden Neurons on FPGA (FPGA 상에서 은닉층 뉴런에 최적화된 MLP의 설계 방법)

  • Kyoung Dong-Wuk;Jung Kee-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.429-438
    • /
    • 2006
  • Neural Networks(NNs) are applied for solving a wide variety of nonlinear problems in several areas, such as image processing, pattern recognition etc. Although NN can be simulated by using software, many potential NN applications required real-time processing. Thus they need to be implemented as hardware. The hardware implementation of multi-layer perceptrons(MLPs) in several kind of NNs usually uses a fixed-point arithmetic due to a simple logic operation and a shorter processing time compared to the floating-point arithmetic. However, the fixed-point arithmetic-based MLP has a drawback which is not able to apply the MLP software that use floating-point arithmetic. We propose a design method for MLPs which has the floating-point arithmetic-based fully-pipelining architecture. It has a processing speed that is proportional to the number of the hidden nodes. The number of input and output nodes of MLPs are generally constrained by given problems, but the number of hidden nodes can be optimized by user experiences. Thus our design method is using optimized number of hidden nodes in order to improve the processing speed, especially in field of a repeated processing such as image processing, pattern recognition, etc.

A Study on Efficiency Improvement of USN Logistics Management System applied Pipelining Techniques (파이프라이닝 기법을 적용한 USN 물류관리 시스템 효율성 향상에 관한 연구)

  • Kim, Seok-Soo;Jung, Sung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.6
    • /
    • pp.1214-1219
    • /
    • 2009
  • Many studies are being applied for various parts of USN (Ubiquitous Sensor Network) technology. The world's large retail stores and warehouses that apply logistic management are also studied. With this, USN technology is increasing in its utilization. However, to handle and process real-time data will never be never easy if these huge warehouses are using too many sensors, and real-time data correction is almost impossible. Software implementation and high-speed hardware are insufficient to solve these complex problems. To solve this problem, a key solution is to implement high-speed software. Hence, this paper suggests a USN logistics management system that applies pipelining techniques for efficiency in real-time data correction and reduces errors of generated values.

Pipelining Semantically-operated Services Using Ontology-based User Constraints (온톨로지 기반 사용자 제시 조건을 이용한 시맨틱 서비스 조합)

  • Jung, Han-Min;Lee, Mi-Kyoung;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.32-39
    • /
    • 2009
  • Semantically-operated services, which is different from Web services or semantic Web services with semantic markup, can be defined as the services providing search function or reasoning function using ontologies. It performs a pre-defined task by exploiting URI, ontology classes, and ontology properties. This study introduces a method for pipelining semantically-operated services based on a semantic broker which refers to ontologies and service description stored in a service manager and invokes by user constraints. The constraints consist of input instances, an output class, a visualization type, service names, and properties. This method provides automatically-generated service pipelines including composit services and a simple workflow to the user. The pipelines provided by the semantic broker can be executed in a fully-automatic manner to find a set of meaningful semantic pipelines. After all, this study would epochally contribute to develop a portal service by ways of supporting human service planners who want to find specific composit services pipelined from distributed semantically-operated services.

Multi-Channel Pipelining for Energy Efficiency and Delay Reduction in Wireless Sensor Network (무선 센서 네트워크에서 에너지 효율성과 지연 감소를 위한 다중 채널 파리프라인 기법)

  • Lee, Yoh-Han;Kim, Daeyoung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.11-18
    • /
    • 2014
  • Most of the energy efficient MAC protocols for wireless sensor networks (WSNs) are based on duty cycling in a single channel and show competitive performances in a small number of traffic flows; however, under concurrent multiple flows, they result in significant performance degradation due to contention and collision. We propose a multi-channel pipelining (MCP) method for convergecast WSN in order to address these problems. In MCP, a staggered dynamic phase shift (SDPS) algorithms devised to minimize end-to-end latency by dynamically staggering wake-up schedule of nodes on a multi-hop path. Also, a phase-locking identification (PLI) algorithm is proposed to optimize energy efficiency. Based on these algorithms, multiple flows can be dynamically pipelined in one of multiple channels and successively handled by sink switched to each channel. We present an analytical model to compute the duty cycle and the latency of MCP and validate the model by simulation. Simulation evaluation shows that our proposal is superior to existing protocols: X-MAC and DPS-MAC in terms of duty cycle, end-to-end latency, delivery ratio, and aggregate throughput.

Bounding Worst-Case DRAM Performance on Multicore Processors

  • Ding, Yiqiang;Wu, Lan;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.53-66
    • /
    • 2013
  • Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

Design of Grid Workflow System Scheduler for Task Pipelining (작업 파이프라이닝을 위한 그리드 워크플로우 스케줄러 설계)

  • Lee, In-Seon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.1-10
    • /
    • 2010
  • The power of computational Grid resources can be utilized on users desktop by employing workflow managers. It also helps scientists to conveniently put together and run their own scientific workflows. Generally, stage-in, process and stage-out are serially executed and workflow systems help automate this process. However, as the data size is exponentially increasing and more and more scientific workflows require multiple processing steps to obtain the desired output, we argue that the data movement will possess high portion of overall running time. In this paper, we improved staging time and design a new scheduler where the system can execute concurrently as many jobs as possible. Our simulation study shows that 10% to 40% improvement in running time can be achieved through our approach.

A Hardware Design Space Exploration toward Low-Area and High-Performance Architecture for the 128-bit Block Cipher Algorithm SEED (128-비트 블록 암호화 알고리즘 SEED의 저면적 고성능 하드웨어 구조를 위한 하드웨어 설계 공간 탐색)

  • Yi, Kang
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.4
    • /
    • pp.231-239
    • /
    • 2007
  • This paper presents the trade-off relationship between area and performance in the hardware design space exploration for the Korean national standard 128-bit block cipher algorithm SEED. In this paper, we compare the following four hardware design types of SEED algorithm : (1) Design 1 that is 16 round fully pipelining approach, (2) Design 2 that is a one round looping approach, (3) Design 3 that is a G function sharing and looping approach, and (4) Design 4 that is one round with internal 3 stage pipelining approach. The Design 1, Design 2, and Design 3 are the existing design approaches while the Design 4 is the newly proposed design in this paper. Our new design employs the pipeline between three G-functions and adders consisting of a F function, which results in the less area requirement than Design 2 and achieves the higher performance than Design 2 and Design 3 due to pipelining and module sharing techniques. We design and implement all the comparing four approaches with real hardware targeting FPGA for the purpose of exact performance and area analysis. The experimental results show that Design 4 has the highest performance except Design 1 which pursues very aggressive parallelism at the expanse of area. Our proposed design (Design 4) shows the best throughput/area ratio among all the alternatives by 2.8 times. Therefore, our new design for SEED is the most efficient design comparing with the existing designs.