• Title/Summary/Keyword: High-Throughput Computing

Search Result 94, Processing Time 0.027 seconds

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.4 no.2
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

  • Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1035-1045
    • /
    • 2015
  • Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.

Design of Efficient Big Data Collection Method based on Mass IoT devices (방대한 IoT 장치 기반 환경에서 효율적인 빅데이터 수집 기법 설계)

  • Choi, Jongseok;Shin, Yongtae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.4
    • /
    • pp.300-306
    • /
    • 2021
  • Due to the development of IT technology, hardware technologies applied to IoT equipment have recently been developed, so smart systems using low-cost, high-performance RF and computing devices are being developed. However, in the infrastructure environment where a large amount of IoT devices are installed, big data collection causes a load on the collection server due to a bottleneck between the transmitted data. As a result, data transmitted to the data collection server causes packet loss and reduced data throughput. Therefore, there is a need for an efficient big data collection technique in an infrastructure environment where a large amount of IoT devices are installed. Therefore, in this paper, we propose an efficient big data collection technique in an infrastructure environment where a vast amount of IoT devices are installed. As a result of the performance evaluation, the packet loss and data throughput of the proposed technique are completed without loss of the transmitted file. In the future, the system needs to be implemented based on this design.

A Study on Improving TCP Performance over ABR/UBR Services in ATM Network (ATM 망에서 ABR/UBR 서비스상의 TCP 성능 향상에 관한 연구)

  • 김명희;박승섭
    • Journal of Internet Computing and Services
    • /
    • v.1 no.2
    • /
    • pp.1-10
    • /
    • 2000
  • ATM network technology is generally used for the solution of integrating multimedia service in high-speed Internet. In Internet protocol based on ATM services, If single cell is lost in ATM layer, the entire TCP packet will be lost. Therefore, TCP performance will be degraded. In order to reduce cell loss, when congestion occur, UBR+EPD mechanism is proposed to improve the throughput in TCP over UBR, and ER scheme is suggested in TCP over ABR. In this paper, we analyzed the performance improvement effect of UBR+EPD with FRR (Fast Retransmission and Recovery), the adjusting EPD threshold parameter (R), and variation of MTU (Maximum Transport Unit) size. As a result, through the analysis of performance, we know that the improved throughput and fairness are shown by the proposed scheme.

  • PDF

Multi-threaded system to support reconfigurable hardware accelerators on Zynq SoC (Zynq SoC에서 재구성 가능한 하드웨어 가속기를 지원하는 멀티쓰레딩 시스템 설계)

  • Shin, Hyeon-Jun;Lee, Joo-Heung
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.186-193
    • /
    • 2020
  • In this paper, we propose a multi-threading system to support reconfigurable hardware accelerators on Zynq SoC. We implement high-performance JPEG decoder with reconfigurable 2D IDCT hardware accelerators to achieve maximum performance available on the platform. In this system, up to four reconfigurable hardware accelerators synchronized with SW threads can be dynamically reconfigured to provide adaptive computing capabilities according to the given image resolution and the compression ratio. JPEG decoding is operated using images with resolutions 480p, 720p, 1080p at the compression ratio of 7:1-109:1. We show that significant performance improvements are achieved as the image resolution or the compression ratio increase. For 1080p resolution, the performance improvement is up to 79.11 times with throughput speed of 99 fps at the compression ratio 17:1.

Optimal Implementation of Lightweight Block Cipher PIPO on CUDA GPGPU (CUDA GPGPU 상에서 경량 블록 암호 PIPO의 최적 구현)

  • Kim, Hyun-Jun;Eum, Si-Woo;Seo, Hwa-Jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.6
    • /
    • pp.1035-1043
    • /
    • 2022
  • With the spread of the Internet of Things (IoT), cloud computing, and big data, the need for high-speed encryption for applications is emerging. GPU optimization can be used to validate cryptographic analysis results or reduced versions theoretically obtained by the GPU in a reasonable time. In this paper, PIPO lightweight encryption implemented in various environments was implemented on GPU. Optimally implemented considering the brute force attack on PIPO. In particular, the optimization implementation applying the bit slicing technique and the GPU elements were used as much as possible. As a result, the implementation of the proposed method showed a throughput of about 19.5 billion per second in the RTX 3060 environment, achieving a throughput of about 122 times higher than that of the previous study.

A Systolic Array for High-Speed Computing of Full Search Block Matching Algorithm

  • Jung, Soon-Ho;Woo, Chong-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1275-1286
    • /
    • 2011
  • This paper proposes a high speed systolic array architecture for full search block matching algorithm (FBMA). The pixels of the search area for a reference block are input only one time to find the matched candidate block and reused to compute the sum of absolute difference (SAD) for the adjacent candidate blocks. Each row of designed 2-dimensional systolic array compares the reference block with the adjacent blocks of the same row in search area. The lower rows of the designed array get the pixels from the upper row and compute the SAD with reusing the overlapped pixels of the candidate blocks within same column of the search area. This designed array has no data broadcasting and global paths. The comparison with existing architectures shows that this array is superior in terms of throughput through it requires a little more hardware.

A High-Speed Hardware Design of IDEA Cipher Algorithm by Applying of Fermat′s Theorem (Fermat의 소정리를 응용한 IDEA 암호 알고리즘의 고속 하드웨어 설계)

  • Choi, Young-Min;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.6
    • /
    • pp.696-702
    • /
    • 2001
  • In this paper, we design IDEA cipher algorithm which is cryptographically superior to DES. To improve the encryption throughput, we propose an efficient design methodology for high-speed implementation of multiplicative inverse modulo $2^{15}$+1 which requires the most computing powers in IDEA. The efficient hardware architecture for the multiplicative inverse in derived from applying of Fermat's Theorem. The computing powers for multiplicative inverse in our proposal is a decrease 50% compared with the existing method based on Extended Euclid Algorithm. We implement IDEA by applying a single iterative round method and our proposal for multiplicative inverse. With a system clock frequency 20MGz, the designed hardware permits a data conversion rate of more than 116 Mbit/s. This result show that the designed device operates about 2 times than the result of the paper by H. Bonnenberg et al. From a speed point of view, out proposal for multiplicative inverse is proved to be efficient.

  • PDF

Fuzzy Logic-driven Virtual Machine Resource Evaluation Method for Cloud Provisioning Service (클라우드 프로비저닝 서비스를 위한 퍼지 로직 기반의 자원 평가 방법)

  • Kim, Jae-Kwon;Lee, Jong-Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.1
    • /
    • pp.77-86
    • /
    • 2013
  • Cloud computing is one of the distributed computing environments and utilizes several computing resources. Cloud environment uses a virtual machine to process a requested job. To balance a workload and process a job rapidly, cloud environment uses a provisioning technique and assigns a task with a status of virtual machine. However, a scheduling method for cloud computing requires a definition of virtual machine availabilities, which have an obscure meaning. In this paper, we propose Fuzzy logic driven Virtual machine Provisioning scheduling using Resource Evaluation(FVPRE). FVPRE analyzes a state of every virtual machine and actualizes a value of resource availability. Thus FVPRE provides an efficient provisioning scheduling with a precise evaluation of resource availability. FVPRE shows a high throughput and utilization for job processing on cloud environments.

Column-aware Polarization Scheme for High-Speed Database Systems (고속 데이터베이스 시스템을 위한 컬럼-인지 양분화 기법)

  • Byun, Si-Woo
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.83-91
    • /
    • 2012
  • Recently, column-oriented storage has become a progressive model for high-speed database systems because of its superior I/O performance. In this paper, we analysis traditional raw-oriented storage model and then propose a new column-aware storage management model using flash memory drive and assist drive to improve the effective performance of the high-speed column-oriented database system. Our storage management scheme called column-aware polarization improves the performance of update operation by dividing and compressing table columns into active-columns or inactive-columns, and balancing congested update operations using a assist drive in high workload periods. The results obtained from experimental tests show that our scheme improves the update throughput of column-oriented storage by 19 percent, and the response time by up to 49 percent.