• Title/Summary/Keyword: 병렬프로세서

Search Result 579, Processing Time 0.024 seconds

Three-dimensional Wave Propagation Modeling using OpenACC and GPU (OpenACC와 GPU를 이용한 3차원 파동 전파 모델링)

  • Kim, Ahreum;Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.2
    • /
    • pp.72-77
    • /
    • 2017
  • We calculated 3D frequency- and Laplace-domain wavefields using time-domain modeling and Fourier transform or Laplace transform. We adopted OpenACC and GPU for an efficient parallel calculation. The OpenACC makes it easy to use GPU accelerators by adding directives in conventional C, C++, and Fortran programming languages. Accordingly, one doesn't have to learn new GPGPU programming languages such as CUDA or OpenCL to use GPU. An OpenACC program allocates GPU memory, transfers data between the host CPU and GPU devices and performs GPU operations automatically or following user-defined directives. We compared performance of 3D wave propagation modeling programs using OpenACC and GPU to that using single-core CPU through numerical tests. Results using a homogeneous model and the SEG/EAGE salt model show that the OpenACC programs are approximately 53 and 30 times faster than those using single-core CPU.

A Design of Multimedia Content Management through Cloud Computing Paradigm (클라우드 컴퓨팅 파라다임을 통한 멀티미디어 컨텐츠 관리 설계)

  • Tolentino, Randy;Kim, Yong-Tae;Jeong, Yoon-Su
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.343-349
    • /
    • 2012
  • Usage control models are the new breed of access control models that allow description of comprehensive policies for usage of protected content. In this paradigm, decisions regarding access to objects are not limited to request time only. It is coupled with the usage of the protected objects and becomes a continuous process carried out in parallel to the usage. The realization of usage control has been a long standing research problem to overcome the issue of loss of control in secure document dissemination. With the emergence of cloud computing, documents are stored in the cloud, the document viewers and editors themselves reside in the cloud and are accessed from thin clients such as browsers. We note that such scenarios provide an ideal opportunity for the realization of usage control for securing the usage of documents based on the stakeholders' policies. In this paper, we proposed Multimedia Content Management (MCM) for a better realization multimedia content in the cloud based applications. We designed a robust architecture to provide fine-grained control over usage of protected objects through the use of emerging cloud computing paradigm. We present the design principles for this realization and discuss our proposed architecture.

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

A Scalable Word-based RSA Cryptoprocessor with PCI Interface Using Pseudo Carry Look-ahead Adder (가상 캐리 예측 덧셈기와 PCI 인터페이스를 갖는 분할형 워드 기반 RSA 암호 칩의 설계)

  • Gwon, Taek-Won;Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.8
    • /
    • pp.34-41
    • /
    • 2002
  • This paper describes a scalable implementation method of a word-based RSA cryptoprocessor using pseudo carry look-ahead adder The basic organization of the modular multiplier consists of two layers of carry-save adders (CSA) and a reduced carry generation and Propagation scheme called the pseudo carry look-ahead adder for the high-speed final addition. The proposed modular multiplier does not need complicated shift and alignment blocks to generate the next word at each clock cycle. Therefore, the proposed architecture reduces the hardware resources and speeds up the modular computation. We implemented a single-chip 1024-bit RSA cryptoprocessor based on the word-based modular multiplier with 256 datapaths in 0.5${\mu}{\textrm}{m}$ SOG technology after verifying the proposed architectures using FPGA with PCI bus.

Efficient DSP Architecture For High- Quality Audio Algorithms (고음질 오디오 알고리즘을 위한 효율적인 DSP 설계)

  • Moon, Jong-Ha;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.5
    • /
    • pp.112-117
    • /
    • 2007
  • This paper presents specialized DSP instructions and their hardware architecture for audio coding algorithms, such as the MPEG-2/4 Advanced Audio Coding(AAC), Dolby AC-3, MPEG-2 Backward Compatible(BC), etc. The proposed architecture is specially designed and optimized for the MDCT/IMDCT(Inverse Modified Discrete Cosine Transform), and Huffman decoding of the AAC decoding algorithm. Performance comparisons show a significant improvement compared with TMS320C62x and ASDSP21060 for the MDCT/IMDCT computation. In addition, the dedicated Huffman decoding accelerator performs decoding and preparing operand in only one cycle. The proposed DPU(Data Processing Unit) consists of 107,860 gates and achieves 150 MIPS.

Design of a Realtime Stereo Vision System using Adaptive Support-weight (적응적 영역 가중치를 이용한 실시간 스테레오 비전 시스템 설계)

  • Ryu, Donghoon;Park, Taegeun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.90-98
    • /
    • 2013
  • The stereo system based on local matching is very popular due to its algorithmic simplicity, however it is limited to apply to various applications because it shows poor quality with low matching rates. In this paper, we propose and design a realtime stereo system based on an adaptive support-weight and the system shows low error rates and realtime performance. Generally, in the adaptive support-weight algorithm the intermediate computing results can not be reused to reduce the number of computations. In this research we modify the scheduling to reuse the intermediate results for the better performance by processing rows and columns separately. The nonlinear functions such as exponential or arc tangent have been designed with piecewise linear and step functions by empirical simulations and error analysis. The proposed architecture is composed of 9 processing elements for realtime performance. The proposed stereo system has been designed and synthesized using Donbu Hitek 0.18um standard cell library and can run up to 350Mhz operation frequency (33 frames per second) with 424K gates.

Research on efficient HW/SW co-design method of light-weight cryptography using GEZEL (경량화 암호의 GEZEL을 이용한 효율적인 하드웨어/소프트웨어 통합 설계 기법에 대한 연구)

  • Kim, Sung-Gon;Kim, Hyun-Min;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.4
    • /
    • pp.593-605
    • /
    • 2014
  • In this paper, we propose the efficient HW/SW co-design method of light-weight cryptography such as HIGHT, PRESENT and PRINTcipher using GEZEL. At first the symmetric cryptographic algorithms were designed using the GEZEL language which is efficiently used for HW/SW co-design. And for the improvement of performance the HW optimization theory such as unfolding, retiming and so forth were adapted to the cryptographic HW module conducted by FSMD. Also, the operation modes of those algorithms were implemented using C language in 8051 microprocessor, it can be compatible to various platforms. For providing reliable communication between HW/SW and preventing the time delay the improved handshake protocol was chosen for enhancing the performance of the connection between HW/SW. The improved protocol can process the communication-core and cryptography-core on the HW in parallel so that the messages can be transmitted to SW after HW operation and received from SW during encryption operation.

An Efficient Data Distribution Method on a Distributed Shared Memory Machine (분산공유 메모리 시스템 상에서의 효율적인 자료분산 방법)

  • Min, Ok-Gee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1433-1442
    • /
    • 1996
  • Data distribution of SPMD(Single Program Multiple Data) pattern is one of main features of HPF (High Performance Fortran). This paper describes design is sues for such data distribution and its efficient execution model on TICOM IV computer, named SPAX(Scalable Parallel Architecture computer based on X-bar network). SPAX has a hierarchical clustering structure that uses distributed shared memory(DSM). In such memory structure, it cannot make a full system utilization to apply unanimously either SMDD(shared Memory Data Distribution) or DMDD(Distributed Memory Data Distribution). Here we propose another data distribution model, called DSMDD(Distributed Shared Memory Data Distribution), a data distribution model based on hierarchical masters-slaves scheme. In this model, a remote master and slaves are designated in each node, shared address scheme is used within a node and message passing scheme between nodes. In our simulation, assuming a node size in which system performance degradation is minimized,DSMDD is more effective than SMDD and DMDD. Especially,the larger number of logical processors and the less data dependency between distributed data,the better performace is obtained.

  • PDF

A Design of AES-based CCMP core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP 코어 설계)

  • Hwang Seok-Ki;Kim Jong-Whan;Shin Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.6A
    • /
    • pp.640-647
    • /
    • 2006
  • This paper describes a design of AES-based CCMP(Counter mode with CBC-MAC Protocol) core for IEEE 802.11i wireless LAN security. To maximize the performance of CCMP core, two AES cores are used, one is the counter mode for data confidentiality and the other is the CBC node for authentication and data integrity. The S-box that requires the largest hardware in ARS core is implemented using composite field arithmetic, and the gate count is reduced by about 27% compared with conventional LUT(Lookup Table)-based design. The CCMP core was verified using Excalibur SoC kit, and a MPW chip is fabricated using a 0.35-um CMOS standard cell technology. The test results show that all the function of the fabricated chip works correctly. The CCMP processor has 17,000 gates, and the estimated throughput is about 353-Mbps at 116-MHz@3.3V, satisfying 54-Mbps data rate of the IEEE 802.11a and 802.11g specifications.

A Workqueue Replication Scheduling Algorithm Using Static Information on Grid Systems (그리드 시스템에서 정적정보를 활용한 작업큐 중복 스케줄링 알고리즘)

  • Kang, Oh-Han;Kang, Sang-Sung;Song, Hee-Heon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.1
    • /
    • pp.9-16
    • /
    • 2009
  • Because Grid system consists of heterogenous computing resources, which are distributed on a wide scale, it is impossible to efficiently execute applications with scheduling algorithms of a conventional parallel system that, in contrast, aim at homogeneous and controllable resources. To suggest an algorithm that can fully reflect the characteristics of a grid system, our research is focused on examining the type of information used in current scheduling algorithms and consequently, deriving factors that could develop algorithms further. The results from the analysis of these algorithms not only show that static information of resources such as capacity or the number of processors can facilitate the scheduling algorithms but also verified a decrease in efficiency in case of utilizing real time load information of resources due to the intrinsic characteristics of a grid system relatively long computing time, and the need for the means to evade unfeasible resources or ones with slow processing time. In this paper, we propose a new algorithm, which is revised to reflect static information in the logic of WQR(Workqueue Replication) algorithms and show that it provides better performance than the one used in the existing method through simulation.