• Title/Summary/Keyword: multi-core processing

Search Result 218, Processing Time 0.027 seconds

Parallel Processing of Airborne Laser Scanning Data Using a Hybrid Model Based on MPI and OpenMP (MPI와 OpenMP기반 하이브리드 모델을 이용한 항공 레이저 스캐닝 자료의 병렬 처리)

  • Han, Soo-Hee;Park, Il-Suk;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.2
    • /
    • pp.135-142
    • /
    • 2012
  • In the present study, a parallel processing method running on a multi-core PC-Cluster is introduced to produce digital surface model (DSM) and digital terrain model (DTM) from huge airborne laser scanning data. A hybrid model using both message passing interface (MPI) and OpenMP was devised by revising a conventional MPI model which utilizes only MPI, and tested on a multi-core PC-Cluster for performance validation. In the results, the hybrid model has not shown better performances in the interpolation process to produce DSM, but the overall performance has turned out to be better by the help of reduced MPI calls. Additionally, scheduling function of OpenMP has revealed its ability to enhance the performance by controlling inequal overloads charged on cores induced by irregular distribution of airborne laser scanning data.

Performance Comparison of Tilera Many-core and x86-64 Multi-core Systems (Tilera 다중코어와 x86-64 멀티코어 시스템의 성능 비교)

  • Choi, HeeSeok;Lyoo, TaeMuk;Park, JiSu;Jung, Daeyong;Lim, JongBeom;Lee, Jungha;Suh, Teaweon;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.102-105
    • /
    • 2013
  • 최근 멀티코어 시스템은 컴퓨터의 성능을 향상시키기 위해 더 많은 수의 코어를 연결시키는 다중코어 시스템으로 발전하고 있다. 그러나 멀티코어 시스템은 사용하는 코어의 아키텍처 구조와 개수에 따라 성능 차이가 발생한다. 이에, 본 논문에서는 코어의 아키텍처 구조와 코어의 개수가 성능에 미치는 영향을 분석하기 위해 Tilera의 다중코어 시스템인 Tile-Gx36, TilePro64와 Intel의 x86-64 멀티코어 시스템인 Core i5의 성능을 비교하였다. 코어의 사용률이 늘어남에 따른 성능차이를 알아보기 위해 벤치마크 프로그램인 SPEC CPU 2006을 이용하여 각 시스템 내 단일코어의 성능을 측정하고, OpenMP 벤치마크 프로그램을 이용하여 시스템의 모든 코어를 사용했을 때의 입력 데이터 크기에 따른 성능을 측정하였다. 실험 결과, 단일코어에서의 성능은 정수형 데이터를 사용하여 측정하였을 경우 Core i5가 Tile-Gx36보다 약 87%, 실수형 데이터를 사용하여 측정하였을 경우 약 94% 더 빠른 것으로 나타났다. 그러나 코어 전체를 이용한 성능 결과에서는 정수형 배열 크기가 이상일 경우 Tile-Gx36 시스템의 처리 속도가 Core i5 시스템 보다 평균적으로 약 7.6배 향상됨을 확인할 수 있었다. 따라서 Tilera의 다중코어 시스템은 클럭 속도와 아키텍처 구조의 영향으로 단일코어의 성능은 떨어지나, 병렬 처리를 이용한 고속연산에서는 성능이 향상된다고 할 수 있다.

Comparison of Go and C++ TBB on Parallel Processing (Go와 C++ TBB의 병렬처리 비교)

  • Park, Dong-Ha;Moon, Bong-Kyo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.64-67
    • /
    • 2017
  • Applying concurrent structure and parallel processing are a common issue for these day's programs. In this research, Dynamic Programming is used to compare the parallel performance of Go language and Intel C++ Thread Building Blocks. The experiment was performed on 4 core machine and its result contains execution time under Simultaneous Multi-Threading environment. Static Optimal Binary Search Tree was used as an example. From the result, the speed-up of Go was higher than the number of cores, and that of TBB was close to it. TBB performed better in general, but for larger scale, Go was partially faster than the other.

Performance Analysis and Identifying Characteristics of Processing-in-Memory System with Polyhedral Benchmark Suite (프로세싱 인 메모리 시스템에서의 PolyBench 구동에 대한 동작 성능 및 특성 분석과 고찰)

  • Jeonggeun Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.142-148
    • /
    • 2023
  • In this paper, we identify performance issues in executing compute kernels from PolyBench, which includes compute kernels that are the core computational units of various data-intensive workloads, such as deep learning and data-intensive applications, on Processing-in-Memory (PIM) devices. Therefore, using our in-house simulator, we measured and compared the various performance metrics of workloads based on traditional out-of-order and in-order processors with Processing-in-Memory-based systems. As a result, the PIM-based system improves performance compared to other computing models due to the short-term data reuse characteristic of computational kernels from PolyBench. However, some kernels perform poorly in PIM-based systems without a multi-layer cache hierarchy due to some kernel's long-term data reuse characteristics. Hence, our evaluation and analysis results suggest that further research should consider dynamic and workload pattern adaptive approaches to overcome performance degradation from computational kernels with long-term data reuse characteristics and hidden data locality.

  • PDF

Adaptive Multi-view Video Interpolation Method Based on Inter-view Nonlinear Moving Blocks Estimation (시점 간 비선형 움직임 블록 예측에 기초한 적응적 다시점 비디오 보상 보간 기법)

  • Kim, Jin-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.4
    • /
    • pp.9-18
    • /
    • 2014
  • Recently, many researches have been focused on multi-view video applications and services such as wireless video surveillance networks, wireless video sensor networks and wireless mobile video. In multi-view video signal processing, to exploit the strong correlation between images acquired by different cameras plays great role in developing a core technique of multi-view video coding. This paper proposes an adaptive multi-view video interpolation technique which is applicable for multi-view distributed video coding without requiring any cooperation amongst the cameras. The proposed algorithm estimates the non-linear moving blocks and employs disparity compensated view prediction, and then fills in the unreliable blocks. Through computer simulations, it is shown that the proposed method outperforms the conventional methods.

An Optimization Tool for Determining Processor Affinity of Networking Processes (통신 프로세스의 프로세서 친화도 결정을 위한 최적화 도구)

  • Cho, Joong-Yeon;Jin, Hyun-Wook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.131-136
    • /
    • 2013
  • Multi-core processors can improve parallelism of application processes and thus can enhance the system throughput. Researchers have recently revealed that the processor affinity is an important factor to determine network I/O performance due to architectural characteristics of multi-core processors; thus, many researchers are trying to suggest a scheme to decide an optimal processor affinity. Existing schemes to dynamically decide the processor affinity are able to transparently adapt for system changes, such as modifications of application and upgrades of hardware, but these have limited access to characteristics of application behavior and run-time information that can be collected heuristically. Thus, these can provide only sub-optimal processor affinity. In this paper, we define meaningful system variables for determining optimal processor affinity and suggest a tool to gather such information. We show that the implemented tool can overcome limitations of existing schemes and can improve network bandwidth.

Optimal Design Space Exploration of Multi-core Architecture for Real-time Lane Detection Algorithm (실시간 차선인식 알고리즘을 위한 최적의 멀티코어 아키텍처 디자인 공간 탐색)

  • Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.3
    • /
    • pp.339-349
    • /
    • 2017
  • This paper proposes a four-stage algorithm for detecting lanes on a driving car. In the first stage, it extracts region of interests in an image. In the second stage, it employs a median filter to remove noise. In the third stage, a binary algorithm is used to classify two classes of backgrond and foreground of an input image. Finally, an image erosion algorithm is utilized to obtain clear lanes by removing noises and edges remained after the binary process. However, the proposed lane detection algorithm requires high computational time. To address this issue, this paper presents a parallel implementation of a real-time line detection algorithm on a multi-core architecture. In addition, we implement and simulate 8 different processing element (PE) architectures to select an optimal PE architecture for the target application. Experimental results indicate that 40×40 PE architecture show the best performance, energy efficiency and area efficiency.

Hybrid Transactional Memory using Sampling-based Retry Policy in Multi-Core Environment (멀티코어 환경에서 샘플링 기반 재시도 정책을 이용한 하이브리드 트랜잭셔널 메모리)

  • Kang, Moon-Hwan;Jang, Yeon-Woo;Yoon, Min;Chang, Jae-Woo
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.2
    • /
    • pp.49-61
    • /
    • 2017
  • Transactional Memory (TM) has greatly changed the parallel programming paradigm for transaction processing and is classified into STM, HTM, HyTM according to hardware or software frameworks. However, the existing studies have a problem that they provide static retry policy for all workloads. To solve the problems, we propose an hybrid transactional memory scheme using sampling-based adaptive retry policy in multi-core environment. First, the proposed scheme determines whether to use STM or HTM according to the characteristic of a transaction. Otherwise, it executes HTM and STM concurrently by using a bloom filter. Second, the proposed scheme provides adaptive retry policy for HTM according to the characteristic of transactions in each workload. Finally, through the experimental performance evaluation using STAMP, the proposed scheme shows 10~20% better performance than the existing schemes.

Current Status of SOFC Materials and Processing Core Technology (고체산화물 연료전지 소재공정 요소기술 개발 현황)

  • Lee, Jong-Ho;Son, Jiwon;Kim, Heryong;Kim, Byong-Kook;Lee, Hae-Weon
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2010.06a
    • /
    • pp.123.1-123.1
    • /
    • 2010
  • The solid oxide fuel cell (SOFC) has attracted great deal of attention due to its high electrical efficiency, high waste-heat utilization, fuel flexibility, and application versatility. However, SOFC technology is still not matured enough to fulfill the practical requirements for commercialization. Therefore, all the research and development activities are mainly focused on a development of practically viable SOFCs with higher performance and better reliability. We were successful in fabricating high-performance anode-supported unit cells by employing hierarchically controlled multi-layered electrodes for both structural reliability and high performance. In addition, a novel composite sealing gasket made it possible to achieve excellent sealing integrity even with considerable surface irregularities in a multi-cell planar arrayed stack.

  • PDF

Multi-core-based Parallel Query of 3D Point Cloud Indexed in Octree (옥트리로 색인한 3차원 포인트 클라우드의 다중코어 기반 병렬 탐색)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.4
    • /
    • pp.301-310
    • /
    • 2013
  • The aim of the present study is to enhance query speed of large 3D point cloud indexed in octree by parallel query using multi-cores. Especially, it is focused on developing methods of accessing multiple leaf nodes in octree concurrently to query points residing within a radius from a given coordinates. To the end, two parallel query methods are suggested using different strategies to distribute query overheads to each core: one using automatic division of 'for routines' in codes controlled by OpenMP and the other considering spatial division. Approximately 18 million 3D points gathered by a terrestrial laser scanner are indexed in octree and tested in a system with a 8-core CPU to evaluate the performances of a non-parallel and the two parallel methods. In results, the performances of the two parallel methods exceeded non-parallel one by several times and the two parallel rivals showed competing aspects confronting various query radii. Parallel query is expected to be accelerated by anticipated improvements of distribution strategies of query overhead to each core.