Search | Korea Science

Implementation and Translation of Major OpenMP Directives for Chip Multiprocessor without using OS (단일 칩 다중 프로세서상에서 운영체제를 사용하지 않은 OpenMP 구현 및 주요 디렉티브 변환)

Jeun, Woo-Chul;Ha, Soon-Hoi
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.4
- /
- pp.145-157
- /
- 2007
OpenMP is an attractive parallel programming model for a chip multiprocessor because there is no standard parallel programming method for a chip multiprocessor and it is easy to write a parallel program in OpenMP. Then, chip multiprocessor systems can have various architectures according to target application programs. So, we need to implement OpenMP in different way for each system. In this paper, we propose the implementation and the effective translation of major OpenMP directives for a chip multiprocessor without using OS to improve the performance without using special hardware and without extending the OpenMP directives. We present the experimental results on our target platform CT3400.
PDF KSCI

OpenMP Implementation using POSIX thread library on ARM11MPCore (ARM11MPCore에서 POSIX 쓰레드를 이용한 OpenMP 구현)

Lee, Jae-Won;Jeun, Woo-Chul;Ha, Soon-Hoi
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10b
- /
- pp.414-418
- /
- 2007
멀티프로세서 환경에서 OpenMP는 MPI 에 비해 병렬 프로그래밍을 쉽게 할 수 있다는 장점을 가지고 있고, OpenMP는 표준이 없는 병렬 프로그래밍 세계에서 실질적인 표준으로써 인정받고 있다. OPenMP는 대상 플랫폼에 따라 OpenMP 구현을 다르게 해야 하기 때문에 새로운 프로세서가 등장하면 그에 맞는 OpenMP구현을 만들어야 한다. 이 논문에선 다중 프로세서 시스템-온-칩 시스템인 ARM11MPCore 시스템 위에 POSIX 쓰레드에 기반하여 OpenMP 환경을 구축하고 그 성능을 측정한다.
PDF

Implementation of Underwater Simulation of a Net using OpenMP (OpenMP 병렬프로그램을 이용한 그물의 수중형상 시뮬레이션 구현)

Park, Myeong-Chul;Park, Seok-Gyu
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.2
- /
- pp.11-17
- /
- 2008
The net shape effects by the various vectors in underwater. Each particle of the net calculating the effect of all vectors augments an accuracy and reality. But, the time complexity becomes larger because of huge calculation. The previous techniques reduced a physics reality. And embodied the underwater virtual reality which augments visual reality with simulation. In this paper, parallel processing the particles, it embodied the simulation which is satisfied a physical reality and time reality. The parallel processing used the OpenMP, and the reality graphic expression used the OpenGL. The simulation which this paper Proposes will be the possibility becoming the fundamental data for a model analysis or a specialist system from game and marine field.
PDF

Efficient Translation of OpenMP Directives for Cluster Systems (클러스터 시스템을 위한 효과적인 OpenMP 디렉티브 변환)

기양석;하순회
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04a
- /
- pp.10-12
- /
- 2003
SMP 클러스터가 고성능 계산을 위한 플랫폼으로 등장함에 따라, 이 시스템을 활용하기 위한 프로그래밍 환경에 대한 관심이 증가하고 있다. 이 논문에서 우리는 ParADE라고 부르는 쉽고, 이식성이 높으며. 고성능의 프로그래밍이 가능한 새로운 프로그래밍 환경을 소개한다. ParADE는 OpenMP 프로그래밍 환경으로 HLRC 변종 프로토콜을 구현한 다중 쓰레드 DSM 시스템을 기반으로 하고 있다. 특별히. 이 논문에서는 성능 개선을 위한 OpenMP 변환기의 역할에 중점을 둔다. OpenMP 변화기는 OpenMP 프로그램 모델과 실행 시스템의 수행 모델 사이에서 가교 역할을 한다. 특히, OpenMP 변환기는 동기화 디렉티브를 변환하고 임계 영역에 있는 작은 변수의 메모리 일관성을 유지하기 위해 집합 통신 함수를 활용한다. 동기화 디렉티브 성능 측정을 위한 마이크로벤치마크 프로그램을 통한 실험에서 ParADE 시스템은 기존의 DSM 시스템에 비해 우수한 성능을 보였다.
PDF

OpenMP application to implement CUDA for FDTD algorithm and performance measurement (CUDA로 구현한 FDTD알고리즘의 OpenMP기술 적용 및 성능 측정)

Jung, Bok-Jae;Oh, Seung-Take;Lee, Cheol-Hoon
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2013.01a
- /
- pp.3-6
- /
- 2013
반도체 공정에서 소자의 제조 비용 감소를 위해 제조 공정 검증을 위한 시뮬레이션을 수행하게 된다. 이 시뮬레이션은 반도체 소자 내부의 물리량 계산을 통해 반도체 소자 내부의 불순물의 거동을 해석하게 된다. 이를 위해 사용되는 알고리즘으로 3차원적 형상을 표현하는 물리적 미분 미분방정식을 계산하게 되는데, 정확한 계산을 위해 유한 차분 시간 영역법(이하 FDTD)과 같은 수치해석 기법을 이용한다. 실제적으로 반도체 공정의 시뮬레이션에서 FDTD연산의 실행 시간은 90% 이상을 소요하게 된다. 이러한 연산에서 더욱 빠른 성능을 확보하기 위해 본 논문에서는 기존의 CUDA(Compute Unified Device Architecture)로 구현된 FDTD알고리즘을 OpenMP를 통한 다중 GPU제어를 이용하여 연산 수행시간을 감소하고, 그 결과물을 통하여 성능 향상도를 측정한다.
PDF

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
- Journal of KIISE:Software and Applications
- /
- v.36 no.4
- /
- pp.253-260
- /
- 2009
Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.
PDF KSCI

Development of High speed FFT system using OpenMP on TI multicore DSP (OpenMP를 활용한 TI 다중코어 DSP기반의 고속 FFT 처리부 개발)

Nam, Kyungho;Oh, Woojin
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2014.10a
- /
- pp.962-964
- /
- 2014
신호처리 시스템에서 FFT는 많이 사용되고 있으며, 고속화를 위하여 많은 연구가 진행되어 왔다. FFT은 통신, 영상처리, 레이더 등 많은 영역에서 직접 또는 변형되어 많이 활용되고 있으나 실시간 처리 속도 한계와 가격의 문제로 FFT 길이가 제한되는 경우가 많다. 본 연구에서는 TI사의 고속 DSP인 8 core의 TMS320C6678에 OpenMP 병렬처리 기법으로 FFT를 구현한 결과를 제시한다. 속도 개선을 위한 다양한 병렬처리 방안에 대하여 단일 FFT의 길이별 성능과 다중 FFT를 처리하기 위한 방안을 제안하였다. 이러한 OpenMP기반의 FFT는 DSP간 hyperlink 연결로 다수의 DSP로 병렬처리로 성능 개선이 가능하며, 본 연구에서는 16 core로 확장하여 그 성능이 30% 내외 개선되는 것을 보였다. 본 연구 결과는 초 고속 신호처리가 요구되는 의료영상, 초고해상도 영상처리, 고정밀 레이더 등에 활용이 가능할 것이다.
PDF

Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OpenMP를 이용한 신경망 구현)

Jang, Hong-Hoon;Jung, Kee-Chul
- Proceedings of the Korean Information Science Society Conference
- /
- 2008.06a
- /
- pp.289-290
- /
- 2008
PDF

A Performance Analysis on Task Scheduling Mechanisms Using CPU Pinning in OpenMP Based on Xen Virtualization (Xen 가상화 기반 OpenMP 환경에서 물리 CPU 지정에 따른 태스크 스케줄링 기법들의 성능 분석)

Song, ChungGeon;Myung, Rohyoung;Choi, HeeSeok;Yu, HeonChang;Lee, EunYoung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2015.10a
- /
- pp.223-226
- /
- 2015
최근 클라우드를 지원하는 Xen 가상화 환경에서 HPC를 구현하는 서비스의 수가 증가하고 있다. 따라서 SMP기반의 병렬컴퓨팅 구현을 위한 표준 라이브러리인 OpenMP 연산효율의 중요성이 높아지고 있다. 본 논문에서는 Xen 가상화 기반 OpenMP 환경에서 CPU Pinning 적용 여부에 따라 다양한 태스크 스케줄링의 성능 변화를 측정하기 위한 실험을 수행하였다. 실험결과, CPU Pinning을 적용했을 시정적 스케줄링은 3.7%, 동적 스케줄링은 3.4%, 태스크 지시자 스케줄링은 3.8%의 성능 향상을 보였다. 이러한 결과는 Xen 가상화 환경에서 효율적인 병렬 컴퓨팅 기법 설계를 위한 방향을 제시한다.
https://doi.org/10.3745/PKIPS.y2015m10a.223 인용 PDF

Parallel implementation of HEVC deblocking filter with OpenMP (OpenMP를 이용한 HEVC 디블록킹 필터의 병렬화 구현)

Jo, Hyun-Ho;Seo, Junghan;Ryu, Eun-Kyung;Sim, Dong-Gyu
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2011.11a
- /
- pp.328-330
- /
- 2011
본 논문에서는 OpenMP를 이용하여 HEVC 복호화기의 디블록킹 필터를 병렬화하는 것을 제안한다. 본 논문에서는 HEVC 디블록킹 필터를 병렬화하기 위하여 슬라이스를 병렬 처리가 가능한 코어의 개수만큼의 영역으로 균등하게 분할 한 후 각 영역에 코어를 할당하였다. 각 영역에 할당된 코어들은 자신의 영역 내의 LCU에 대해서 순차 주사 순으로 필터링을 수행하는데, 먼저 영역 내의 모든 LCU에 대하여 수평방향으로 필터링을 수행한다. 이러한 수평방향 필터링이 완료된 후 동일한 영역에 대하여 다시 수직 방향으로 필터링을 수행한다. 본 논문에서 제안하는 OpenMP를 이용한 HEVC 디블록킹 필터 병렬화를 통하여 4-Core 환경에서 복호화기에서 디블록킹 필터링의 수행 시간을 약 2.51배 감소 시켰다.
PDF

Search Result 33, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)