• Title/Summary/Keyword: Parallel Decoding

Search Result 152, Processing Time 0.044 seconds

Performance improvement for Streaming of High Capacity Panoramic Video (대용량 파노라마 비디오 스트리밍의 성능개선)

  • Kim, Young-Back;Kim, Tae-Ho;Lee, Dae-Gyu;Kim, Jae-Joon
    • Journal of Internet Computing and Services
    • /
    • v.11 no.2
    • /
    • pp.143-153
    • /
    • 2010
  • When providing high quality panoramic video across the Internet, mobile communications, and broadcasting areas, it requires a suitable video codec that satisfies both high-compression efficiency and random access functionality. The users must have high-compression efficiency in order to enable video streaming of high-volume panoramic data. Random access allows the user to move the viewpoint and direction freely. In this paper, we propose the parallel processing scheme under cell units in order to improve the performance of streaming service for large screen panoramic video in 10Mbps bandwidths based on H.264/AVC with high compression rate. This improved algorithm divides a screen composed of cells less than $256{\times}256$ in size, encodes it, and decodes it with the cells in the present view. At this point, encoding/decoding is parallel processed by the present cell units. Also, since the cells only included in the present view are packed and transmitted, the possible processing of not extricating blocks is proven by experiment.

Performance analysis on the complexity of turbo code with short frame sizes (프레임 크기가 작은 터보 코드의 복잡도에 대한 성능 분석)

  • Kim, Yeun-Goo;Ko, Young-Hoon;Kim, Nam
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.7A
    • /
    • pp.1046-1051
    • /
    • 1999
  • It is well known that Parallel Concatenated Convolutional Codes(turbo codes) has a good performance for long block sizes. This thesis has analyzed the performance of turbo code which is based on voice or control frames with short frame sizes in the future mobile communication system. Also, at the similar decoding complexity, the performance of turbo code and convolutional codes in the speech/control frames, and the applicability of this system are considered. As a result, turbo code in short frame sizes present the performance of a BER of $10^{-3}$ or more over 3 iterations in the future mobile communication system. However, at a BER of $10^{-3}$ , if the same complexity is considered, the performance of rate 1/2 turbo code with K = 5 is better than that of convolutional code with K = 9 at low $E_b/N_0$, and the performance of turbo code with K = 3 is superior to that of convolutional code with K = 7. Rate 1/3 turbo code with K = 3 and 5 have similar to performance of rate 1/2 turbo code.

  • PDF

Design of Viterbi Decoder for Wireless LAN (무선 LAN용 비터비 복호기의 효율적인 설계)

  • 정인택;송상섭
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.1
    • /
    • pp.61-66
    • /
    • 2001
  • In this paper, we design high speed Viterbi decoding algorithm which is aimed for Wireless LAN. Wireless LAN transmits data at rate 6∼54 Mbps. This high speed is not easy to implement Viterbi decoder with single ACS. So parallel ACS butterfly structure is to be used and several time-dependent problem is to be solved. We simulate Viterbi algorithm using new branch metric calculating method to save time, and consider trace back algorithm which is adaptable to high speed Viterbi decoder. With simulated, we determine the structure of Viterbi decoder. This architecture is available to high speed and low power Viterbi decoder.

  • PDF

Turbo Coded MIMO System with Adaptive Turbo Space- Time Processing for High-Speed Wireless Communications (고속 무선 통신을 위한 적응형 터보 시공간 처리를 갖는 터보 부호화된 다중 입출력 시스템)

  • 조동균;김상준;박주남;황금찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.9C
    • /
    • pp.843-850
    • /
    • 2003
  • Turbo coding and turbo processing have been known as methods close to Shannon limit in the aspect of wireless MIMO communications similarly to wireless single antenna communication. The iterative processing can maximize the mutual effect of coding and interference cancellation, but turbo coding has not been used for turbo processing because of the inherent decoding process delay. This paper proposes a turbo coded MIMO system with adaptive turbo parallel space-time (Turbo-PAST) processing for high-speed wireless communications and a enhanced cyclic redundancy check (E-CRC) scheme as an efficient and simple priori stopping criterion. Simulation results show that the Turbo-PAST outperforms conventional system with 1.3dB and the proposed E-CRC scheme effectively reduces the amount of turbo processing iterations from the point of average number of iterations.

Image Browse for JPEG Decoder

  • Chong, Ui-Pil
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.96-100
    • /
    • 1998
  • Due to expected wide spread use of DCT based image/video coding standard, it is advantageous to process data directly in the DCT domain rather than decoding the source back to the spatial domain. The block processing algorithm provides a parallel processing method since multiple input data are processed in the block filter structure. Hence a fast implementation of the algorithm is well suited. In this paper, we propose the JPEG browse by Block Transform Domain Filtering(BTDF) using subband filter banks. Instead of decompressing the entire image to retrieve at full resolution from compressed format, a user can select the level of expansion required$(2^N{\times}2^N)$. Also this approach reduces the computer cpu time by reducing the number of multiplication through BTDF in the filter banks.

  • PDF

Molecular genetic decoding of malformations of cortical development

  • Lim, Jae Seok;Lee, Jeong Ho
    • Journal of Genetic Medicine
    • /
    • v.12 no.1
    • /
    • pp.12-18
    • /
    • 2015
  • Malformations of cortical development (MCD) cover a broad spectrum of developmental disorders which cause the various clinical manifestations including epilepsy, developmental delay, and intellectual disability. MCD have been clinically classified based on the disruption of developmental processes such as proliferation, migration, and organization. Molecular genetic studies of MCD have improved our understanding of these disorders at a molecular level beyond the clinical classification. These recent advances are resulted from the development of massive parallel sequencing technology, also known as next-generation sequencing (NGS), which has allowed researchers to uncover novel molecular genetic pathways associated with inherited or de novo mutations. Although an increasing number of disease-related genes or genetic variations have been identified, genotype-phenotype correlation is hampered when the biological or pathological functions of identified genetic variations are not fully understood. To elucidate the causality of genetic variations, in vivo disease models that reflect these variations are required. In the current review, we review the use of NGS technology to identify genes involved in MCD, and discuss how the functions of these identified genes can be validated through in vivo disease modeling.

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

Design and Performance Evaluation of Multilevel LDPC Codes (다중 레벨 LDPC 부호의 설계 및 성능 분석)

  • ;Yu Yi;Jia Hou
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.1
    • /
    • pp.51-59
    • /
    • 2004
  • We design multilevel coding(MLC) with a semi bit-interleaved coded modulation(BICM) scheme based on low density parity check(LDPC) codes. Different from traditional designs, we joint the MLC and BICM together by using the Gray mapping, which can transmit the multimedia data over several equivalent channels with different code rates. To get a good performance from signal-to-noise ratio(SNR) very close to the capacity of the additive white Gaussian noise(AWGN) channel, random regular LDPC code and a simple semi-algebra LDPC(SA-LDPC) code are discussed in MLC with parallel independent decoding(PID). Finally, the numerical results demonstrate that the proposed scheme could achieve both power and bandwidth efficiency for multimedia communication system.

On the Hardware Complexity of Tree Expansion in MIMO Detection

  • Kong, Byeong Yong;Lee, Youngjoo;Yoo, Hoyoung
    • Journal of Semiconductor Engineering
    • /
    • v.2 no.3
    • /
    • pp.136-141
    • /
    • 2021
  • This paper analyzes the tree expansion for multiple-input multiple-output (MIMO) detection in the viewpoint of hardware implementation. The tree expansion is to calculate path metrics of child nodes performed in every visit to a node while traversing the detection tree. Accordingly, the tree-expansion unit (TEU), which is responsible for such a task, has been an essential component in a MIMO detector. Despite the paramount importance, the analyses on the TEUs in the literature are not thorough enough. Accordingly, we further investigate the hardware complexity of the TEUs to suggest a guideline for selection. In this paper, we focus on a pair of major ways to implement the TEU: 1) a full parallel realization; 2) a transformation of the formulae followed by common subexpression elimination (CSE). For a logical comparison, the numbers of multipliers and adders are first enumerated. To evaluate them in a more practical manner, the TEUs are implemented in a 65-nm CMOS process, and their propagation delays, gate counts, and power consumptions were measured explicitly. Considering the target specification of a MIMO system and the implementation results comprehensively, one can choose which architecture to adopt in realizing a detector.

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.