• Title/Summary/Keyword: Data Latency

Search Result 753, Processing Time 0.027 seconds

Dual-Port SDRAM Optimization with Semaphore Authority Management Controller

  • Kim, Jae-Hwan;Chong, Jong-Wha
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.84-92
    • /
    • 2010
  • This paper proposes the semaphore authority management (SAM) controller to optimize the dual-port SDRAM (DPSDRAM) in the mobile multimedia systems. Recently, the DPSDRAM with a shared bank enabling the exchange of data between two processors at high speed has been developed for mobile multimedia systems based on dual-processors. However, the latency of DPSDRAM caused by the semaphore for preventing the access contention at the shared bank slows down the data transfer rate and reduces the memory bandwidth. The methodology of SAM increases the data transfer rate by minimizing the semaphore latency. The SAM prevents the latency of reading the semaphore register of DPSDRAM, and reduces the latency of waiting for the authority of the shared bank to be changed. It also reduces the number of authority requests and the number of times authority changes. The experimental results using a 1 Gb DPSDRAM (OneDRAM) with the SAM controllers at 66 MHz show 1.6 times improvement of the data transfer rate between two processors compared with the traditional controller. In addition, the SAM shows bandwidth enhancement of up to 38% for port A and 31% for port B compared with the traditional controller.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

HTSC and FH HTSC: XOR-based Codes to Reduce Access Latency in Distributed Storage Systems

  • Shuai, Qiqi;Li, Victor O.K.
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.582-591
    • /
    • 2015
  • A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.

Low Latency Encoding Algorithm for Duo-Binary Turbo Codes with Tail Biting Trellises (이중 입력 터보 코드를 위한 저지연 부호화 알고리즘)

  • Park, Sook-Min;Kwak, Jae-Young;Lee, Kwy-Ro
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.46 no.2
    • /
    • pp.47-51
    • /
    • 2009
  • The low latency encoder for high data rate duo-binary turbo codes with tail biting trellises is considered. Encoder hardware architecture is proposed using inherent encoding property of duo-binary turbo codes. And we showed that half of execution time as well as the energy can be reduced with the proposed architecture.

Latency Hiding based Warp Scheduling Policy for High Performance GPUs

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.4
    • /
    • pp.1-9
    • /
    • 2019
  • LRR(Loose Round Robin) warp scheduling policy for GPU architecture results in high warp-level parallelism and balanced loads across multiple warps. However, traditional LRR policy makes multiple warps execute long latency operations at the same time. In cases that no more warps to be issued under long latency, the throughput of GPUs may be degraded significantly. In this paper, we propose a new warp scheduling policy which utilizes latency hiding, leading to more utilized memory resources in high performance GPUs. The proposed warp scheduler prioritizes memory instruction based on GTO(Greedy Then Oldest) policy in order to provide reduced memory stalls. When no warps can execute memory instruction any more, the warp scheduler selects a warp for computation instruction by round robin manner. Furthermore, our proposed technique achieves high performance by using additional information about recently committed warps. According to our experimental results, our proposed technique improves GPU performance by 12.7% and 5.6% over LRR and GTO on average, respectively.

Performance Evaluation and Analysis of Multiple Scenarios of Big Data Stream Computing on Storm Platform

  • Sun, Dawei;Yan, Hongbin;Gao, Shang;Zhou, Zhangbing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.2977-2997
    • /
    • 2018
  • In big data era, fresh data grows rapidly every day. More than 30,000 gigabytes of data are created every second and the rate is accelerating. Many organizations rely heavily on real time streaming, while big data stream computing helps them spot opportunities and risks from real time big data. Storm, one of the most common online stream computing platforms, has been used for big data stream computing, with response time ranging from milliseconds to sub-seconds. The performance of Storm plays a crucial role in different application scenarios, however, few studies were conducted to evaluate the performance of Storm. In this paper, we investigate the performance of Storm under different application scenarios. Our experimental results show that throughput and latency of Storm are greatly affected by the number of instances of each vertex in task topology, and the number of available resources in data center. The fault-tolerant mechanism of Storm works well in most big data stream computing environments. As a result, it is suggested that a dynamic topology, an elastic scheduling framework, and a memory based fault-tolerant mechanism are necessary for providing high throughput and low latency services on Storm platform.

Slotted Transmission: A New MAC Scheme for Reduced Frame Latency in Ad-hoc Networks

  • Rahman, Md. Mustafizur;Hong, Choong-Seon
    • Annual Conference of KIPS
    • /
    • 2007.05a
    • /
    • pp.1294-1296
    • /
    • 2007
  • The IEEE 802.11 DCF forces neighboring nodes of an active transmitter to switch into inactive state. This conservative nature brings frame latency at transmitter neighborhood. This work exploits the IEEE 802.11n Frame Aggregation scheme to allow simultaneous transmissions from nodes that are neighbors to each-other. This is accomplished by the synchronization of control and data transmissions in slots of fixed length. Proposed scheme reduces the frame latency and improves aggregated network throughput.

Muscle Latency Time and Activation Patterns for Upper Extremity During Reaching and Reach to Grasp Movement

  • Choi, Sol-a;Kim, Su-jin
    • Physical Therapy Korea
    • /
    • v.25 no.3
    • /
    • pp.51-59
    • /
    • 2018
  • Background: Despite muscle latency times and patterns were used as broad examination tools to diagnose disease and recovery, previous studies have not compared the dominant arm to the non-dominant arm in muscle latency time and muscle recruitment patterns during reaching and reach-to-grasp movements. Objects: The present study aimed to investigate dominant and non-dominant hand differences in muscle latency time and recruitment pattern during reaching and reach-to-grasp movements. In addition, by manipulating the speed of movement, we examined the effect of movement speed on neuromuscular control of both right and left hands. Methods: A total of 28 right-handed (measured by Edinburgh Handedness Inventory) healthy subjects were recruited. We recorded surface electromyography muscle latency time and muscle recruitment patterns of four upper extremity muscles (i.e., anterior deltoid, triceps brachii, flexor digitorum superficialis, and extensor digitorum) from each left and right arm. Mixed-effect linear regression was used to detect differences between hands, reaching and reach-to-grasp, and the fast and preferred speed conditions. Results: There were no significant differences in muscle latency time between dominant and non-dominant hands or reaching and reach-to-grasp tasks (p>.05). However, there was a significantly longer muscle latency time in the preferred speed condition than the fast speed condition on both reaching and reach-to-grasp tasks (p<.05). Conclusion: These findings showed similar muscle latency time and muscle activation patterns with respect to movement speeds and tasks. Our findings hope to provide normative muscle physiology data for both right and left hands, thus aiding the understanding of the abnormal movements from patients and to develop appropriate rehabilitation strategies specific to dominant and non-dominant hands.

A Study on Automatic Interface Generation by Protocol Mapping (Protocol Mapping을 이용한 인터페이스 자동생성 기법 연구)

  • Lee Ser-Hoon;Kang Kyung-Goo;Hwang Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.8A
    • /
    • pp.820-829
    • /
    • 2006
  • IP-based design methodology has been popularly employed for SoC design to reduce design complexity and to cope with time-to-market pressure. Due to the request for high performance of current mobile systems, embedded SoC design needs a multi-processor to manage problems of high complexity and the data processing such as multimedia, DMB and image processing in real time. Interface module for communication between system buses and processors are required, since many IPs employ different protocols. High performance processors require interface module to minimize the latency of data transmission during read-write operation and to enhance the performance of a top level system. This paper proposes an automatic interface generation system based on FSM generated from the common protocol description sequence of a bus and an IP. The proposed interface does not use a buffer which stores data temporally causing the data transmission latency. Experimental results show that the area of the interface circuits generated by the proposed system is reduced by 48.5% on the average, when comparing to buffer-based interface circuits. Data transmission latency is reduced by 59.1% for single data transfer and by 13.3% for burst mode data transfer. By using the proposed system, it becomes possible to generate a high performance interface circuit automatically.

Register-Based Parallel Pipelined Scheme for Synchronous DRAM (동기식 기억소자를 위한 레지스터를 이용한 병렬 파이프라인 방식)

  • Song, Ho Jun
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.12
    • /
    • pp.108-114
    • /
    • 1995
  • Recently, along wtih the advance of high-performance system, synchronous DRAM's (SDRAM's) which provide consecutive data output synchronized with an external clock signal, have been reported. However, in the conventional SDRAM's which utilize a multi-stage serial pipelined scheme, the column path is divided into multi-stages depending on CAS latency N. Thus, as the operating speed and CAS latency increase, new stages must be added, thereby causing a large area penalty due to additinal latches and I/O lines. In the proposed register-based parallel pipelined scheme, (N-1) registers are located between the read data bus line pair and the data output buffer and the coming data are sequentially stored. Since the column data path is not divided and the read data is directly transmitted to the registers, the busrt read operation can easily be achieved at higher frequencies without a large area penalty and degradation of internal timing margin. Simulation results for 0.32um-Tech. 4-Bank 64M SDRAM show good operation at 200MHz and an area increment is less than 0.1% when CAS latency N is increased from 3 to 4.. This pipelined scheme is more advantageous as the operating frequency increases.

  • PDF