• Title/Summary/Keyword: 병렬 연산 처리

Search Result 552, Processing Time 0.03 seconds

Hardware Design of In-loop Filter for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 루프 내 필터 하드웨어 설계)

  • Park, Seungyong;Im, Junseong;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.2
    • /
    • pp.335-342
    • /
    • 2016
  • This paper proposes efficient hardware structure of in-loop filter for a high-performance HEVC (High Efficiency Video Coding) encoder. HEVC uses in-loop filter consisting of deblocking filter and SAO (Sample Adaptive Offset) to improve the picture quality in a reconstructed image due to a quantization error. However, in-loop filter causes an increase in complexity due to the additional encoder and decoder operations. A proposed in-loop filter is implemented as a three-stage pipeline to perform the deblocking filtering and SAO operation with a reduced number of cycles. The proposed deblocking filter is also implemented as a six-stage pipeline to improve efficiency and performs a new filtering order for efficient memory architecture. The proposed SAO processes six pixels parallelly at a time to reduce execution cycles. The proposed in-loop filter encoder architecture is designed by Verilog HDL, and implemented by 131K logic gates in TSMC $0.13{\mu}m$ process. At 164MHz, the proposed in-loop filter encoder can support 4K Ultra HD video encoding at 60fps in real time.

Benchmark Results of a Monte Carlo Treatment Planning system (몬데카를로 기반 치료계획시스템의 성능평가)

  • Cho, Byung-Chul
    • Progress in Medical Physics
    • /
    • v.13 no.3
    • /
    • pp.149-155
    • /
    • 2002
  • Recent advances in radiation transport algorithms, computer hardware performance, and parallel computing make the clinical use of Monte Carlo based dose calculations possible. To compare the speed and accuracies of dose calculations between different developed codes, a benchmark tests were proposed at the XIIth ICCR (International Conference on the use of Computers in Radiation Therapy, Heidelberg, Germany 2000). A Monte Carlo treatment planning comprised of 28 various Intel Pentium CPUs was implemented for routine clinical use. The purpose of this study was to evaluate the performance of our system using the above benchmark tests. The benchmark procedures are comprised of three parts. a) speed of photon beams dose calculation inside a given phantom of 30.5 cm$\times$39.5 cm $\times$ 30 cm deep and filled with 5 ㎣ voxels within 2% statistical uncertainty. b) speed of electron beams dose calculation inside the same phantom as that of the photon beams. c) accuracy of photon and electron beam calculation inside heterogeneous slab phantom compared with the reference results of EGS4/PRESTA calculation. As results of the speed benchmark tests, it took 5.5 minutes to achieve less than 2% statistical uncertainty for 18 MV photon beams. Though the net calculation for electron beams was an order of faster than the photon beam, the overall calculation time was similar to that of photon beam case due to the overhead time to maintain parallel processing. Since our Monte Carlo code is EGSnrc, which is an improved version of EGS4, the accuracy tests of our system showed, as expected, very good agreement with the reference data. In conclusion, our Monte Carlo treatment planning system shows clinically meaningful results. Though other more efficient codes are developed such like MCDOSE and VMC++, BEAMnrc based on EGSnrc code system may be used for routine clinical Monte Carlo treatment planning in conjunction with clustering technique.

  • PDF

Trace-Back Viterbi Decoder with Sequential State Transition Control (순서적 역방향 상태천이 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.40 no.11
    • /
    • pp.51-62
    • /
    • 2003
  • This paper presents a novel survivor memeory management and decoding techniques with sequential backward state transition control in the trace back Viterbi decoder. The Viterbi algorithm is an maximum likelihood decoding scheme to estimate the likelihood of encoder state for channel error detection and correction. This scheme is applied to a broad range of digital communication such as intersymbol interference removing and channel equalization. In order to achieve the area-efficiency VLSI chip design with high throughput in the Viterbi decoder in which recursive operation is implied, more research is required to obtain a simple systematic parallel ACS architecture and surviver memory management. As a method of solution to the problem, this paper addresses a progressive decoding algorithm with sequential backward state transition control in the trace back Viterbi decoder. Compared to the conventional trace back decoding techniques, the required total memory can be greatly reduced in the proposed method. Furthermore, the proposed method can be implemented with a simple pipelined structure with systolic array type architecture. The implementation of the peripheral logic circuit for the control of memory access is not required, and memory access bandwidth can be reduced Therefore, the proposed method has characteristics of high area-efficiency and low power consumption with high throughput. Finally, the examples of decoding results for the received data with channel noise and application result are provided to evaluate the efficiency of the proposed method.

Thermodynamics-Based Weight Encoding Methods for Improving Reliability of Biomolecular Perceptrons (생체분자 퍼셉트론의 신뢰성 향상을 위한 열역학 기반 가중치 코딩 방법)

  • Lim, Hee-Woong;Yoo, Suk-I.;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1056-1064
    • /
    • 2007
  • Biomolecular computing is a new computing paradigm that uses biomolecules such as DNA for information representation and processing. The huge number of molecules in a small volume and the innate massive parallelism inspired a novel computation method, and various computation models and molecular algorithms were developed for problem solving. In the meantime, the use of biomolecules for information processing supports the possibility of DNA computing as an application for biological problems. It has the potential as an analysis tool for biochemical information such as gene expression patterns. In this context, a DNA computing-based model of a biomolecular perceptron has been proposed and the result of its experimental implementation was presented previously. The weight encoding and weighted sum operation, which are the main components of a biomolecular perceptron, are based on the competitive hybridization reactions between the input molecules and weight-encoding probe molecules. However, thermodynamic symmetry in the competitive hybridizations is assumed, so there can be some error in the weight representation depending on the probe species in use. Here we suggest a generalized model of hybridization reactions considering the asymmetric thermodynamics in competitive hybridizations and present a weight encoding method for the reliable implementation of a biomolecular perceptron based on this model. We compare the accuracy of our weight encoding method with that of the previous one via computer simulations and present the condition of probe composition to satisfy the error limit.

40Gb/s Foward Error Correction Architecture for Optical Communication System (광통신 시스템을 위한 40Gb/s Forward Error Correction 구조 설계)

  • Lee, Seung-Beom;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.101-111
    • /
    • 2008
  • This paper introduces a high-speed Reed-Solomon(RS) decoder, which reduces the hardware complexity, and presents an RS decoder based FEC architecture which is used for 40Gb/s optical communication systems. We introduce new pipelined degree computationless modified Euclidean(pDCME) algorithm architecture, which has high throughput and low hardware complexity. The proposed 16 channel RS FEC architecture has two 8 channel RS FEC architectures, which has 8 syndrome computation block and shared single KES block. It can reduce the hardware complexity about 30% compared to the conventional 16 channel 3-parallel FEC architecture, which is 4 syndrome computation block and shared single KES block. The proposed RS FEC architecture has been designed and implemented with the $0.18-{\mu}m$ CMOS technology in a supply voltage of 1.8 V. The result show that total number of gate is 250K and it has a data processing rate of 5.1Gb/s at a clock frequency of 400MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

The Motion Estimator Implementation with Efficient Structure for Full Search Algorithm of Variable Block Size (다양한 블록 크기의 전역 탐색 알고리즘을 위한 효율적인 구조를 갖는 움직임 추정기 설계)

  • Hwang, Jong-Hee;Choe, Yoon-Sik
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.66-76
    • /
    • 2009
  • The motion estimation in video encoding system occupies the biggest part. So, we require the motion estimator with efficient structure for real-time operation. And for motion estimator's implementation, it is desired to design hardware module of an exclusive use that perform the encoding process at high speed. This paper proposes motion estimation detection block(MED), 41 SADs(Sum of Absolute Difference) calculation block, minimum SAD calculation and motion vector generation block based on parallel processing. The parallel processing can reduce effectively the amount of the operation. The minimum SAD calculation and MED block uses the pre-computation technique for reducing switching activity of the input signal. It results in high-speed operation. The MED and 41 SADs calculation blocks are composed of adder tree which causes the problem of critical path. So, the structure of adder tree has changed the most commonly used ripple carry adder(RCA) with carry skip adder(CSA). It enables adder tree to operate at high speed. In addition, as we enabled to easily control key variables such as control signal of search range from the outside, the efficiency of hardware structure increased. Simulation and FPGA verification results show that the delay of MED block generating the critical path at the motion estimator is reduced about 19.89% than the conventional strukcture.

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

An SoC-based Context-Aware System Architecture (SoC 기반 상황인식 시스템 구조)

  • Sohn, Bong-Ki;Lee, Keon-Myong;Kim, Jong-Tae;Lee, Seung-Wook;Lee, Ji-Hyong;Jeon, Jae-Wook;Cho, Jun-Dong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.512-516
    • /
    • 2004
  • Context-aware computing has been attracting the attention as an approach to alleviating the inconvenience in human-computer interaction. This paper proposes a context-aware system architecture to be implemented on an SoC(System-on-a-Chip). The proposed architecture supports sensor abstraction, notification mechanism for context changes, modular development, easy service composition using if-then rules, and flexible context-aware service implementation. It consists of the communication unit, the processing unit, the blackboard, and the rule-based system unit, where the first three components reside in the microprocessor part of the SoC and the rule-based system unit is implemented in hardware. For the proposed architecture, an SoC system has been designed and tested in an SoC development platform called SystemC and the feasibility of the behavoir modules for the microprocessor part has been evaluated by implementing software modules on the conventional computer platform. This SoC-based context-aware system architecture has been developed to apply to mobile intelligent robots which would assist old people at home in a context-aware manner.

A Chosen Plaintext Linear Attack On Block Cipher Cipher CIKS-1 (CIKS-1 블록 암호에 대한 선택 평문 선형 공격)

  • 이창훈;홍득조;이성재;이상진;양형진;임종인
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.13 no.1
    • /
    • pp.47-57
    • /
    • 2003
  • In this paper, we firstly evaluate the resistance of the reduced 5-round version of the block cipher CIKS-1 against linear cryptanalysis(LC) and show that we can attack full-round CIKS-1 with \ulcorner56-bit key through the canonical extension of our attack. A feature of the CIKS-1 is the use of both Data-Dependent permutations(DDP) and internal key scheduling which consist in data dependent transformation of the round subkeys. Taking into accout the structure of CIKS-1 we investigate linear approximation. That is, we consider 16 linear approximations with p=3/4 for 16 parallel modulo $2^2$ additions to construct one-round linear approximation and derive one-round linear approximation with the probability P=1/2+$2^{-17}$ by Piling-up lemma. Then we present 3-round linear approximation with 1/2+$2^{-17}$ using this one-round approximation and attack the reduced 5-round CIKS-1 with 64-bit block by LC. In conclusion we present that our attack requires $2^{38}$chosen plaintexts with a probability of success of 99.9% and about $2^{67-7}$encryption times to recover the last round key.(But, for the full-round CIKS-1, our attack requires about $2^{166}$encryption times)