Search | Korea Science

Implementation and Performance Analysis of High Performance Computing Library for Parallel Processing (병렬처리를 위한 고성능 라이브러리의 구현과 성능 평가)

김영태;이용권
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.7
- /
- pp.379-386
- /
- 2004
We designed a portable parallel library HPCL(High Performance Computing Library) with following objectives: (1) to provide a close relationship between the parallel code and the original sequential code that will help future versions of the sequential code and (2) to enhance performance of the parallel code. The library is an interface written in C and Fortran programming languages between MPI(Message Passing Interface) and parallel programs in Fortran. Performance results were determined on clusters of PC's and IBM SP4.
PDF KSCI

Parallel Pipeline Architecture of H.264 Decoder and U-Chip Based on Parallel Array (병렬 어레이 프로세서 기반 U-Chip 및 H.264 디코더의 병렬 파이프라인 구조)

Suk, Jung-Hee;Lyuh, Chun-Gi;Roh, Tae Moon
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2013.11a
- /
- pp.161-164
- /
- 2013
본 논문에서는 다양한 멀티미디어 코덱을 고속으로 처리하기 위하여 전용하드웨어가 아닌 병렬 어레이 프로세서 기반의 U-Chip(Universal-Chip) 구조를 제안하고 TSMC 80nm 공정을 사용하여 11,865,090개의 게이트 수를 가지는 칩으로 개발하였다. U-Chip은 역양자화(IQ), 역변환(IT), 움직임 보상(MC) 연산을 위한 $4{\times}16$ 개의 프로세싱 유닛으로 구성된 병렬 어레이 프로세서와 문맥적응적 가변길이디코딩(CAVLC)을 위한 비트스트림 프로세서와 인트라 예측(IP), 디블록킹필터(DF) 연산을 위한 순차 프로세서와 DMAC의 데이터 전송 및 각 프로세서를 제어하여 병렬 파이프라인 스케쥴링을 처리하는 시퀀서 프로세서 등으로 구성된다. 1개의 프로세싱 유닛에 1개의 매크로블록 데이터를 맵핑하여 총 64개의 매크로블록을 병렬처리 하였다. 64개 매크로블록의 대용량 데이터 전송 시간과 각 프로세서들의 연산을 동시에 병렬 파이프라인 함으로서 전체 연산 성능을 높일 수 있는 이점이 있다. 병렬 파이프라인 구조의 H.264 디코더 프로그램을 개발하였고 제작된 U-Chip을 통해 $720{\times}480$ 크기의 베이스라인 프로파일 영상에 대하여 코어 192MHz 동작, DDR 메모리 96MHz 동작에서 30fps의 처리율을 가짐을 확인하였다.
PDF

A Parallel Transmission Overlay Multicast Scheme for Massive Contents Delivery (대용량 콘텐츠 전송을 위한 병렬전송 오버레이 멀티캐스트)

Park, Jin-Hong;;Kim, Seon-Ho;Shin, Yong-Tae;Shin, Seok-Kyoo
- Journal of KIISE:Information Networking
- /
- v.32 no.5
- /
- pp.593-602
- /
- 2005
Overlay multicast delivery method is a new approach in which multicast functionality is implemented at the end-hosts application layer in the timing of sparse deployment of IP multicast. However, existing overlay multicast protocols are not being standardization and many restrictions occur when delivering high capacity contents. Therefore, new delivery mechanism is required for the overlay multicast based high capacity contents delivery. In this paper. we separate group management and delivery management of overlay multicast and describe a capable group management. We also defined high speed delivery method better than that of existing overlay multicast through use of collaborated distribute downloading. This improved efficiency of massive contents transmission.
PDF KSCI

Merged-Packet based Effective Queuing Mechanism for Underwater Networks (결합패킷 활용기반 수중네트워크 전송 큐 관리 기법)

Shin, Soo Young;Park, Soo-Hyun;Namgung, Jung Il
- Journal of the Institute of Electronics and Information Engineers
- /
- v.54 no.2
- /
- pp.61-67
- /
- 2017
In this paper, an adaptive MAC technique for various underwater environment with narrow-bandwidth and low transmission speed was proposed. In previously published Underwater Packet Flow Control (UPFC) technique, three transmission types (normal, block and parallel transmission) had been proposed using the number of transmission and transmission time. In addition to the UPFC, the proposed technique is an improved version of UPFC having more effective queuing technique for merge transmission. A mathematical model of the proposed queuing theory was constructed and its increased efficiency per unit transmission number was also verified based on simulations.
https://doi.org/10.5573/ieie.2017.54.2.061 인용 PDF KSCI

Improvement of Parallel Transmission Throughput for Transporting Real-time Mass Media Based on MMT (MMT 기반의 실시간 대용량 미디어 전달을 위한 병렬 전송 효율 개선)

An, Eun-bin;Kim, A-young;Won, Kwang-eun;Yoon, Jae-kwan;Seo, Kwang-deok
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.06a
- /
- pp.338-339
- /
- 2018
최근 실시간 대용량 미디어에 대한 사용자의 요구가 증가함에 따라 자연스러운 영상 재생을 위한 전송 기법이 활발히 연구되고 있다. MPEG MMT 는 이러한 차세대 대용량 미디어 전송 규격으로 주목 받고 있다. 하지만 실시간 대용량 미디어의 크기는 점차 커지고 있고 이에 따라 보다 효율적이고 빠른 전송을 위해서 다각도의 연구가 필요하다. 본 논문에서는 MMT 기반의 실시간 대용량 미디어 전송의 개선을 위하여 병렬 전송을 제안하고 이에 따른 MMT 의 활용 방법을 제시한 인터페이스를 소개한다.
PDF

The Design and Implementation of VIA-based Cluster System for spatial data's parallelism (공간 데이터의 병렬성을 고려한 VIA 기반의 클러스트 시스템 설계 및 구현)

Park, Si-Yong;Park, Sung-Ho;Chung, Ki-Dong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.10a
- /
- pp.653-656
- /
- 2000
본 논문에서는 공간데이터의 병렬성을 고려한 클러스트 시스템을 제안하였다. 클러스트 시스템의 큰 단점인 다단계 프로토콜 스택에서 오는 메시지 전송 부하를 줄인 VIA(Virtual Interface Architecture)를 기반으로 클러스트 시스템을 구성하고 저장 서버들간에는 공간데이터의 지역성에 기반하여 데이터를 배치하며 저장 서버들 내에서는 공간 데이터의 병렬성을 고려하여 EPR(Enhanced Parallel R-tree)로 데이터를 배치하였다. 위의 클러스트 시스템을 기반으로 적절한 전송 데이터 크기와 전송 횟수를 구하기 위한 실험을 실시하였다.
PDF

Modulation Level-Controlled Multicarrier CDMA System (변조레벨 제어 다중반송파 CDMA 시스템)

Whang, Bong-Jun;Park, Hyung-Kun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.9
- /
- pp.1646-1653
- /
- 2008
In this paper, we propose multicarrier CDMA system using the concept of the modulation level-controlled system in order to make the system robust to the frequency selectivity and to provide a maximum data rate maintaining an acceptable transmission quality over various channel environments. This system selects higher data rate when the channel experiences the low delay spread and slow fading. On the other hand, when the fading changes very fast and the delay spread is very long, the system selects lower data rate. In both cases, the system controls the number of serial-to-parallel converted streams and the number of fed streams. This system has the fixed number of sub-carriers. So the product of tile number of serial-to-parallel converted streams and the number of fed streams is always kept constant. With the same data fed at different sub-carriers, the frequency diversity is achieved. And a RAKE receiver also is utilized to achieve path (time) diversity.
https://doi.org/10.6109/jkiice.2008.12.9.1646 인용 PDF KSCI

Deployment and Performance Analysis of Data Transfer Node Cluster for HPC Environment (HPC 환경을 위한 데이터 전송 노드 클러스터 구축 및 성능분석)

Hong, Wontaek;An, Dosik;Lee, Jaekook;Moon, Jeonghoon;Seok, Woojin
- KIPS Transactions on Computer and Communication Systems
- /
- v.9 no.9
- /
- pp.197-206
- /
- 2020
Collaborative research in science applications based on HPC service needs rapid transfers of massive data between research colleagues over wide area network. With regard to this requirement, researches on enhancing data transfer performance between major superfacilities in the U.S. have been conducted recently. In this paper, we deploy multiple data transfer nodes(DTNs) over high-speed science networks in order to move rapidly large amounts of data in the parallel filesystem of KISTI's Nurion supercomputer, and perform transfer experiments between endpoints with approximately 130ms round trip time. We have shown the results of transfer throughput in different size file sets and compared them. In addition, it has been confirmed that the DTN cluster with three nodes can provide about 1.8 and 2.7 times higher transfer throughput than a single node in two types of concurrency and parallelism settings.
https://doi.org/10.3745/KTCCS.2020.9.9.197 인용 PDF KSCI

Generating Raster DSM from Airborne Laser Scanned Data Using Parallel Processing and Virtual Grid (병렬처리와 가상격자를 이용한 대용량 항공 레이저 스캔 자료의 정규격자 수치표면모델 생성)

Han, Soo-Hee;Heo, Joon;Kim, Sung-Sam;Kim, Sung-Hoon
- Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
- /
- 2008.06a
- /
- pp.318-321
- /
- 2008
본 연구에서는 대용량의 항공 레이저 스캔 포인트 자료로부터 정규 격자 형태의 수치 표면 모델을 고속으로 생성하기 위하여 가상격자와 병렬처리를 기반으로 한 자료 처리 기법을 제안하였다. 수십$\sim$수백 평방 킬로미터 영역에 대하여 항공 레이저 스캔을 중복적으로 수행할 경우 포인트 수는 수억$\sim$수십억에 이르며 이를 일반적인 시스템에서 처리하는 데에는 한계가 존재한다. 이에 본 연구에서는 병렬처리를 위해 구성한 피씨 클러스터 상에서 자료를 분산시켜 가상격자를 이용하여 처리하는 방식을 제안하였다. 즉, 마스터 노드는 포인트 자료를 읽어 들여 포인트의 평면 좌표 값에 따라 슬래이브 노드로 전송하고 각 슬래이브 노드에서는 전송받은 포인트를 가상 격자에 저장한 후 보간(interpolation)을 수행한다. 보간 방식으로는 IDW(Inverse Distance Weightin)을 사용하였으며 제안한 방식의 효율성을 평가하기 위하여 사용된 슬래이브 노드 수에 대한 처리 시간을 측정하였다.
PDF

Parallel Intersection Detection Algorithm using CUDA (CUDA 를 이용한 가상 객체들간의 병렬 충돌 검사 알고리즘)

Lee, Yeon-Hee;Kim, Young-J.
- 한국HCI학회:학술대회논문집
- /
- 2008.02a
- /
- pp.451-455
- /
- 2008
In this paper, we present how we implement the low-level, triangle intersection routine in a massively parallel fashion using n VIDIA's new GPGPU language, CUDA. Triangle intersection often becomes a computational bottleneck in the collision detection problem. Due to the relatively low bandwidth between CPU and GPU, it has been challenging to implement efficient, object-space collision detection between triangle sets. However, thanks to the improved data transmission rates in CUDA architecture, in this paper, we improved the performance of triangle intersection substantially better than the optimized CPU counterpart.
PDF

Search Result 468, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)