• Title/Summary/Keyword: Parallel running

Search Result 163, Processing Time 0.024 seconds

GPU-based Stereo Matching Algorithm with the Strategy of Population-based Incremental Learning

  • Nie, Dong-Hu;Han, Kyu-Phil;Lee, Heng-Suk
    • Journal of Information Processing Systems
    • /
    • v.5 no.2
    • /
    • pp.105-116
    • /
    • 2009
  • To solve the general problems surrounding the application of genetic algorithms in stereo matching, two measures are proposed. Firstly, the strategy of simplified population-based incremental learning (PBIL) is adopted to reduce the problems with memory consumption and search inefficiency, and a scheme for controlling the distance of neighbors for disparity smoothness is inserted to obtain a wide-area consistency of disparities. In addition, an alternative version of the proposed algorithm, without the use of a probability vector, is also presented for simpler set-ups. Secondly, programmable graphics-hardware (GPU) consists of multiple multi-processors and has a powerful parallelism which can perform operations in parallel at low cost. Therefore, in order to decrease the running time further, a model of the proposed algorithm, which can be run on programmable graphics-hardware (GPU), is presented for the first time. The algorithms are implemented on the CPU as well as on the GPU and are evaluated by experiments. The experimental results show that the proposed algorithm offers better performance than traditional BMA methods with a deliberate relaxation and its modified version in terms of both running speed and stability. The comparison of computation times for the algorithm both on the GPU and the CPU shows that the former has more speed-up than the latter, the bigger the image size is.

Design and Implementation of The Priority based Round Robin Scheduling Operating System for Compact Size Embedded System (소규모 임베디드 시스템을 위한 우선 순위 기반 라운드 로빈 스케줄링 운영체제의 설계 및 구현)

  • 남상엽;이상원;박인정
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.4
    • /
    • pp.222-231
    • /
    • 2003
  • In This paper, the operating system using priority based round robin scheduling system is designed and implemented. Using this scheduler, Real-Time operation is possible because High priority Task is running first and the other Task is running in parallel. Also Intertask Communication, Device Driver and operating system suitable for using the compact sized embedded system were implemented. Therefore this Operating system provides efficient and rapid implementation for the compact sized embedded system application.

Assistant Professor, Department of Computer Engineering Pukyong Universisty (한국형 방송 프로그램 시스템 디코더 ASSP의 개발)

  • Jo, Gyeong-Yeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1229-1239
    • /
    • 1996
  • The increase of additional information broadcasting of TV demands a graphic overlay processor. This paper is about the design, implementation and testing of a graphic overlay processor called by KBPS decoder ASSP (Applicatio n Specific Standard Product) which is compliance with Korea Broadcast Programming System. KBPS decoder ASSP consists of embedded 8 bit microprocessor Z80, graphic overlay controller, KBPS schedule decoder, memory controller, priority interrupt controller, MIDI controller, infrared raccoon receiver, async scrial communication controller, timer, bus controller, universal parallel input-output port and serial-parallel interface. The 0.8 micron CMOS Sea of Gate is used to implement the ASSP in amount of about 31,500 gates, and it is running at 14.318MHz.

  • PDF

COMPUTATIONAL DETERMINATION OF NEUTRON DOSE EQUIVALENT LEVEL AT THE MAZE ENTRANCE OF A MEDICAL ACCELERATOR FACILITY

  • Kim, Hong-Suk;Lee, Jai-Ki
    • Journal of Radiation Protection and Research
    • /
    • v.32 no.1
    • /
    • pp.15-20
    • /
    • 2007
  • An empirical formula fur the neutron dose equivalent at the maze entrance of medical accelerator treatment rooms was derived on the basis of a Monte Carlo simulation. The simulated neutron dose equivalents around the Varian medical accelerator by the MCNPX code were employed. Two cases of target rotational planes were considered: parallel and perpendicular to maze walls. Most of the maximum neutron dose equivalents at the doorway were found when the target rotational planes were parallel to maze walls and the beams were directed to the inner maze entrances. The neutron dose equivalents at the outer maze entrances were calculated for about 698 medical accelerator facilities which were generated from the geometry configurations of running treatment rooms, based on such gantry rotation that produces the maximum neutron dose at the doorway. The results calculated with the empirical formula in this study were compared with those calculated by the Kersey method for 7 operating facilities. It was found that the maximum disagreement between the calculation of this study and that of the Kersey method was a factor of 8.54 with the value calculated by the Kersey method exceeding that of this study. It was concluded that the kersey method estimated the neutron dose equivalent at the doorway computed by MCNPX more conservatively than this study technique.

Aerodynamic Simulation of Korea next generation high speed train using open source CFD code (오픈 소스 CFD 코드를 이용한 차세대 고속열차 공력 해석)

  • Kim, B.Y.;Gill, J.H.;Kwon, H.B.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2011.05a
    • /
    • pp.327-330
    • /
    • 2011
  • CFD simulation is widely used in various industries, universities and research centers. In Korea most of the researchers use foreign commercial S/W packages especially in industries. But commercial CFD packages have some problems as limit to source code and very high license foe. So from several years ago open source CFD code has been widely spread as an alternative. But in Korea there are a few users of open source code. Insufficiency of performance validation as for accuracy, robustness, convenience and parallel speed-up is important obstacles of open source code. So we tested some validation cases as to incompressible external aerodynamics and internal flaws and now are doing compressible flaws. As the first stage of compressible flow validation, we simulated Korea next generation high speed train(HEMU). It's running condition is 400km/hr and maximum Mach number reaches up to 0.4. With the high speed train we tested accuracy, robustness and parallel performance of open source CFD code OpenFOAM Because there isn't experimental data we compared results with widely used commercial code. When use $1^{st}$ order upwind scheme aerodynamic forces are very similar to commercial code. But using $2^{nd}$ order upwind scheme there was some discrepancy. The reason of the difference is not clear yet. Mesh manipulation, domain decomposition, post-processing and robustness are satisfactory. Paralle lperformance is similar to commercial code.

  • PDF

Efficiency of Marine Hydropower Farms Consisting of MultipleVertical Axis Cross-Flow Turbines

  • Georgescu, Andrei-Mugur;Georgescu, Sanda-Carmen;Cosoiu, Costin Ioan;Alboiu, Nicolae
    • International Journal of Fluid Machinery and Systems
    • /
    • v.4 no.1
    • /
    • pp.150-160
    • /
    • 2011
  • This study focuses on the Achard turbine, a vertical axis, cross-flow, marine current turbine module. Similar modules can be superposed to form towers. A marine or river hydropower farm consists of a cluster of barges, each gathering several parallel rows of towers, running in stabilized current. Two-dimensional numerical modelling is performed in a horizontal cross-section of all towers, using FLUENT and COMSOL Multiphysics. Numerical models validation with experimental results is performed through the velocity distribution, depicted by Acoustic Doppler Velocimetry, in the wake of the middle turbine within a farm model. As long as the numerical flow in the wake fits the experiments, the numerical results for the power coefficient (turbine efficiency) are trustworthy. The overall farm efficiency, with respect to the spatial arrangement of the towers, was depicted by 2D modelling of the unsteady flow inside the farm, using COMSOL Multiphysics. Rows of overlapping parallel towers ensure the increase of global efficiency of the farm.

Design criteria of wind barriers for traffic -Part 1: wind barrier performance

  • Kwon, Soon-Duck;Kim, Dong Hyawn;Lee, Seung Ho;Song, Ho Sung
    • Wind and Structures
    • /
    • v.14 no.1
    • /
    • pp.55-70
    • /
    • 2011
  • This study investigates the design criteria required for wind barriers to protect vehicles running on an expressway under a high side wind. At the first stage of this study, the lateral deviations of vehicles in crosswinds were computed from the commercial software, CarSim and TruckSim, and the critical wind speeds for a car accident were then evaluated from a predefined car accident index. The critical wind speeds for driving stability were found to be 35 m/s for a small passenger car, yet 30 m/s for a truck and a bus. From the wind tunnel tests, the minimum height of a wind barrier required to reduce the wind speed by 50% was found to be 12.5% of the road width. In the case of parallel bridges, the placement of two edge wind barriers plus one wind barrier at center was recommended for a separation distance larger than 20 m (four lanes) and 10 m (six lanes) respectively, otherwise two wind barriers were recommended.

Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs

  • Ordonez, Carlos;Navas, Mario;Garcia-Alvarado, Carlos
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.111-120
    • /
    • 2011
  • Data mining algorithms should exploit new hardware technologies to accelerate computations. Such goal is difficult to achieve in database management system (DBMS) due to its complex internal subsystems and because data mining numeric computations of large data sets are difficult to optimize. This paper explores taking advantage of existing multithreaded capabilities of multicore CPUs as well as caching in RAM memory to efficiently compute summaries of a large data set, a fundamental data mining problem. We introduce parallel algorithms working on multiple threads, which overcome the row aggregation processing bottleneck of accessing secondary storage, while maintaining linear time complexity with respect to data set size. Our proposal is based on a combination of table scans and parallel multithreaded processing among multiple cores in the CPU. We introduce several database-style and hardware-level optimizations: caching row blocks of the input table, managing available RAM memory, interleaving I/O and CPU processing, as well as tuning the number of working threads. We experimentally benchmark our algorithms with large data sets on a DBMS running on a computer with a multicore CPU. We show that our algorithms outperform existing DBMS mechanisms in computing aggregations of multidimensional data summaries, especially as dimensionality grows. Furthermore, we show that local memory allocation (RAM block size) does not have a significant impact when the thread management algorithm distributes the workload among a fixed number of threads. Our proposal is unique in the sense that we do not modify or require access to the DBMS source code, but instead, we extend the DBMS with analytic functionality by developing User-Defined Functions.

Parallel Processing of Airborne Laser Scanning Data Using a Hybrid Model Based on MPI and OpenMP (MPI와 OpenMP기반 하이브리드 모델을 이용한 항공 레이저 스캐닝 자료의 병렬 처리)

  • Han, Soo-Hee;Park, Il-Suk;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.2
    • /
    • pp.135-142
    • /
    • 2012
  • In the present study, a parallel processing method running on a multi-core PC-Cluster is introduced to produce digital surface model (DSM) and digital terrain model (DTM) from huge airborne laser scanning data. A hybrid model using both message passing interface (MPI) and OpenMP was devised by revising a conventional MPI model which utilizes only MPI, and tested on a multi-core PC-Cluster for performance validation. In the results, the hybrid model has not shown better performances in the interpolation process to produce DSM, but the overall performance has turned out to be better by the help of reduced MPI calls. Additionally, scheduling function of OpenMP has revealed its ability to enhance the performance by controlling inequal overloads charged on cores induced by irregular distribution of airborne laser scanning data.

A Global Framework for Parallel and Distributed Application with Mobile Objects (이동 객체 기반 병렬 및 분산 응용 수행을 위한 전역 프레임워크)

  • Han, Youn-Hee;Park, Chan-Yeol;Hwang, Chong-Sun;Jeong, Young-Sik
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.6
    • /
    • pp.555-568
    • /
    • 2000
  • The World Wide Web has become the largest virtual system that is almost universal in scope. In recent research, it has become effective to utilize idle hosts existing in the World Wide Web for running applications that require a substantial amount of computation. This novel computing paradigm has been referred to as the advent of global computing. In this paper, we implement and propose a mobile object-based global computing framework called Tiger, whose primary goal is to present novel object-oriented programming libraries that support distribution, dispatching, migration of objects and concurrency among computational activities. The programming libraries provide programmers with access, location and migration transparency for distributed and mobile objects. Tiger's second goal is to provide a system supporting requisites for a global computing environment - scalability, resource and location management. The Tiger system and the programming libraries provided allow a programmer to easily develop an objectoriented parallel and distributed application using globally extended computing resources. We also present the improvement in performance gained by conducting the experiment with highly intensive computations such as parallel fractal image processing and genetic-neuro-fuzzy algorithms.

  • PDF