• 제목/요약/키워드: data latency

검색결과 747건 처리시간 0.021초

Dual-Port SDRAM Optimization with Semaphore Authority Management Controller

  • Kim, Jae-Hwan;Chong, Jong-Wha
    • ETRI Journal
    • /
    • 제32권1호
    • /
    • pp.84-92
    • /
    • 2010
  • This paper proposes the semaphore authority management (SAM) controller to optimize the dual-port SDRAM (DPSDRAM) in the mobile multimedia systems. Recently, the DPSDRAM with a shared bank enabling the exchange of data between two processors at high speed has been developed for mobile multimedia systems based on dual-processors. However, the latency of DPSDRAM caused by the semaphore for preventing the access contention at the shared bank slows down the data transfer rate and reduces the memory bandwidth. The methodology of SAM increases the data transfer rate by minimizing the semaphore latency. The SAM prevents the latency of reading the semaphore register of DPSDRAM, and reduces the latency of waiting for the authority of the shared bank to be changed. It also reduces the number of authority requests and the number of times authority changes. The experimental results using a 1 Gb DPSDRAM (OneDRAM) with the SAM controllers at 66 MHz show 1.6 times improvement of the data transfer rate between two processors compared with the traditional controller. In addition, the SAM shows bandwidth enhancement of up to 38% for port A and 31% for port B compared with the traditional controller.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권1호
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

HTSC and FH HTSC: XOR-based Codes to Reduce Access Latency in Distributed Storage Systems

  • Shuai, Qiqi;Li, Victor O.K.
    • Journal of Communications and Networks
    • /
    • 제17권6호
    • /
    • pp.582-591
    • /
    • 2015
  • A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.

이중 입력 터보 코드를 위한 저지연 부호화 알고리즘 (Low Latency Encoding Algorithm for Duo-Binary Turbo Codes with Tail Biting Trellises)

  • 박숙민;곽재영;이귀로
    • 전자공학회논문지SC
    • /
    • 제46권2호
    • /
    • pp.47-51
    • /
    • 2009
  • 본 논문은 다중 입력을 가진 터보 코드 구조에 대한 연구로서 tail-biting 기법에서 효율적으로 Latency를 줄이는 터보 부호 알고리즘 및 하드웨어를 제안하였다. Mobile WiMAX 및 DVB-RCS 등에 적용된 이중 입력 터보 부호기의 고유 특성을 이용, 병렬 처리 하드웨어로 구현한 결과 tail-biting 기법을 위해 필요한 Latency를 기존 대비 약 47%로 줄이는 동시에 파워 소모량도 감소를 시켰다.

Latency Hiding based Warp Scheduling Policy for High Performance GPUs

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권4호
    • /
    • pp.1-9
    • /
    • 2019
  • LRR(Loose Round Robin) warp scheduling policy for GPU architecture results in high warp-level parallelism and balanced loads across multiple warps. However, traditional LRR policy makes multiple warps execute long latency operations at the same time. In cases that no more warps to be issued under long latency, the throughput of GPUs may be degraded significantly. In this paper, we propose a new warp scheduling policy which utilizes latency hiding, leading to more utilized memory resources in high performance GPUs. The proposed warp scheduler prioritizes memory instruction based on GTO(Greedy Then Oldest) policy in order to provide reduced memory stalls. When no warps can execute memory instruction any more, the warp scheduler selects a warp for computation instruction by round robin manner. Furthermore, our proposed technique achieves high performance by using additional information about recently committed warps. According to our experimental results, our proposed technique improves GPU performance by 12.7% and 5.6% over LRR and GTO on average, respectively.

Performance Evaluation and Analysis of Multiple Scenarios of Big Data Stream Computing on Storm Platform

  • Sun, Dawei;Yan, Hongbin;Gao, Shang;Zhou, Zhangbing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권7호
    • /
    • pp.2977-2997
    • /
    • 2018
  • In big data era, fresh data grows rapidly every day. More than 30,000 gigabytes of data are created every second and the rate is accelerating. Many organizations rely heavily on real time streaming, while big data stream computing helps them spot opportunities and risks from real time big data. Storm, one of the most common online stream computing platforms, has been used for big data stream computing, with response time ranging from milliseconds to sub-seconds. The performance of Storm plays a crucial role in different application scenarios, however, few studies were conducted to evaluate the performance of Storm. In this paper, we investigate the performance of Storm under different application scenarios. Our experimental results show that throughput and latency of Storm are greatly affected by the number of instances of each vertex in task topology, and the number of available resources in data center. The fault-tolerant mechanism of Storm works well in most big data stream computing environments. As a result, it is suggested that a dynamic topology, an elastic scheduling framework, and a memory based fault-tolerant mechanism are necessary for providing high throughput and low latency services on Storm platform.

Slotted Transmission: A New MAC Scheme for Reduced Frame Latency in Ad-hoc Networks

  • Rahman, Md. Mustafizur;Hong, Choong-Seon
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2007년도 춘계학술발표대회
    • /
    • pp.1294-1296
    • /
    • 2007
  • The IEEE 802.11 DCF forces neighboring nodes of an active transmitter to switch into inactive state. This conservative nature brings frame latency at transmitter neighborhood. This work exploits the IEEE 802.11n Frame Aggregation scheme to allow simultaneous transmissions from nodes that are neighbors to each-other. This is accomplished by the synchronization of control and data transmissions in slots of fixed length. Proposed scheme reduces the frame latency and improves aggregated network throughput.

Muscle Latency Time and Activation Patterns for Upper Extremity During Reaching and Reach to Grasp Movement

  • Choi, Sol-a;Kim, Su-jin
    • 한국전문물리치료학회지
    • /
    • 제25권3호
    • /
    • pp.51-59
    • /
    • 2018
  • Background: Despite muscle latency times and patterns were used as broad examination tools to diagnose disease and recovery, previous studies have not compared the dominant arm to the non-dominant arm in muscle latency time and muscle recruitment patterns during reaching and reach-to-grasp movements. Objects: The present study aimed to investigate dominant and non-dominant hand differences in muscle latency time and recruitment pattern during reaching and reach-to-grasp movements. In addition, by manipulating the speed of movement, we examined the effect of movement speed on neuromuscular control of both right and left hands. Methods: A total of 28 right-handed (measured by Edinburgh Handedness Inventory) healthy subjects were recruited. We recorded surface electromyography muscle latency time and muscle recruitment patterns of four upper extremity muscles (i.e., anterior deltoid, triceps brachii, flexor digitorum superficialis, and extensor digitorum) from each left and right arm. Mixed-effect linear regression was used to detect differences between hands, reaching and reach-to-grasp, and the fast and preferred speed conditions. Results: There were no significant differences in muscle latency time between dominant and non-dominant hands or reaching and reach-to-grasp tasks (p>.05). However, there was a significantly longer muscle latency time in the preferred speed condition than the fast speed condition on both reaching and reach-to-grasp tasks (p<.05). Conclusion: These findings showed similar muscle latency time and muscle activation patterns with respect to movement speeds and tasks. Our findings hope to provide normative muscle physiology data for both right and left hands, thus aiding the understanding of the abnormal movements from patients and to develop appropriate rehabilitation strategies specific to dominant and non-dominant hands.

Protocol Mapping을 이용한 인터페이스 자동생성 기법 연구 (A Study on Automatic Interface Generation by Protocol Mapping)

  • 이서훈;강경구;황선영
    • 한국통신학회논문지
    • /
    • 제31권8A호
    • /
    • pp.820-829
    • /
    • 2006
  • SoC 설계는 복잡도 증가 및 빠른 time-to-market에 만족하기 위해 IP에 기반한 설계방식을 채택하고 있다. Mobile 기기의 고성능에 대한 시장의 요구로 인해 embedded용 SoC는 멀티미디어, DMB 및 이미지처리 등 복잡도와 데이터 처리량이 높은 프로그램을 실시간으로 동작시키기 위해 다중 프로세서를 사용한 설계가 요구된다. 시스템 버스와 프로토콜이 상이한 프로세서를 단일 SoC내에서 사용하기 위해선 프로세서 프로토콜을 시스템 버스 프로토콜에 맞도록 변화하여 주는 인터페이스 회로의 설계가 요구된다. 고속으로 동작하는 프로세서의 인터페이스 회로는 데이터 쓰기와 읽기 시의 전송 지연을 최소화하여 시스템 전체의 성능을 향상시켜야 한다. 버퍼를 사용한 인터페이스 회로의 구조는 버퍼에 데이터를 일시 저장하는 동작으로 인하여 데이터 전송 latency가 증가하게 되므로 본 논문에서는 버퍼를 사용하지 않고 버스와 마스터 모듈 프로토콜이 가진 공통된 동작 시퀀스를 이용하여 단일 FSM 구조를 가진 인터페이스 회로를 자동생성하는 방법을 제안한다. 제안된 방법으로 자동생성된 인터페이스 회로는 버퍼를 사용한 인터페이스 회로에 비해 면적은 평균 48.5%의 감소를 보였으며, 데이터 전송 latency는 단일 데이터 전송 시 평균 59.1%의 감소를 보였고 버스트 모드 데이터 전송 시 13.3%의 감소를 보였다. 본 논문에서 제안한 시스템을 사용하여 데이터 전송 latency를 최소화하는 고성능의 인터페이스 회로를 자동으로 생성할 수 있다.

동기식 기억소자를 위한 레지스터를 이용한 병렬 파이프라인 방식 (Register-Based Parallel Pipelined Scheme for Synchronous DRAM)

  • Song, Ho Jun
    • 전자공학회논문지A
    • /
    • 제32A권12호
    • /
    • pp.108-114
    • /
    • 1995
  • Recently, along wtih the advance of high-performance system, synchronous DRAM's (SDRAM's) which provide consecutive data output synchronized with an external clock signal, have been reported. However, in the conventional SDRAM's which utilize a multi-stage serial pipelined scheme, the column path is divided into multi-stages depending on CAS latency N. Thus, as the operating speed and CAS latency increase, new stages must be added, thereby causing a large area penalty due to additinal latches and I/O lines. In the proposed register-based parallel pipelined scheme, (N-1) registers are located between the read data bus line pair and the data output buffer and the coming data are sequentially stored. Since the column data path is not divided and the read data is directly transmitted to the registers, the busrt read operation can easily be achieved at higher frequencies without a large area penalty and degradation of internal timing margin. Simulation results for 0.32um-Tech. 4-Bank 64M SDRAM show good operation at 200MHz and an area increment is less than 0.1% when CAS latency N is increased from 3 to 4.. This pipelined scheme is more advantageous as the operating frequency increases.

  • PDF