• Title/Summary/Keyword: Speedup

Search Result 274, Processing Time 0.027 seconds

Composite Stock Cutting using Distributed Simulated Annealing (분산 시뮬레이티드 어닐링을 이용한 복합 재료 재단)

  • Hong, Chul-Eui
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.20-29
    • /
    • 2002
  • The composite stock cutting problem is to allocate rectangular and/or irregular patterns onto a large composite stock sheet of finite dimensions in such a way that the resulting scrap will be minimized. In this paper, the distributed simulated annealing with the new cost error tolerant spatial decomposition is applied to the composite stock cutting problem in MPI environments. The cost error tolerant scheme relaxes synchronization and chooses small perturbations on states asynchronously in a dynamically changed stream length to keep the convergence property of the sequential annealing. This paper proposes the efficient data structures for representation of patterns and their affinity relations and also shows how to determine move generations, annealing parameters, and a cost function. The spatial decomposition method is addressed in detail. This paper identifies that the final quality is not degraded with almost linear speedup. Composite stock shapes are not constrained to convex polygons or even regular shapes, but the rotations are only allowed to 2 or 4 due to its composite nature.

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

Discussion for Ride Evaluation of High Speed Train by Using Inferential Statistics (추리통계학을 이용한 고속철도 승차감 평가에 대한 고찰)

  • Hwang, Hee-Soo;Kim, Seog-Won;Park, Chan-Kyeong;Mok, Jin-Yong;Kim, Ki-Hwan;Kim, Young-Guk
    • Journal of the Korean Society for Railway
    • /
    • v.11 no.6
    • /
    • pp.543-549
    • /
    • 2008
  • The ride comfort is more important according to train speedup. Generally it is defined as the vehicle vibration. There are many studies on evaluation method of ride comfort for railway. But the ride comfort for Korean high speed train (HSR 350x) has been assessed by statistical method according to UIC 5l3R. In this paper, the ride indices, which were measured in the Korean high speed train. have been analyzed and reviewed by using the inferential statistics such as t-test, variance analysis (ANOVA) and regression analysis.

An Efficient Computation of Matrix Triple Products (삼중 행렬 곱셈의 효율적 연산)

  • Im, Eun-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.3
    • /
    • pp.141-149
    • /
    • 2006
  • In this paper, we introduce an improved algorithm for computing matrix triple product that commonly arises in primal-dual optimization method. In computing $P=AHA^{t}$, we devise a single pass algorithm that exploits the block diagonal structure of the matrix H. This one-phase scheme requires fewer floating point operations and roughly half the memory of the generic two-phase algorithm, where the product is computed in two steps, computing first $Q=HA^{t}$ and then P=AQ. The one-phase scheme achieved speed-up of 2.04 on Intel Itanium II platform over the two-phase scheme. Based on memory latency and modeled cache miss rates, the performance improvement was evaluated through performance modeling. Our research has impact on performance tuning study of complex sparse matrix operations, while most of the previous work focused on performance tuning of basic operations.

  • PDF

Deterministic Parallelism for Symbolic Execution Programs based on a Name-Freshness Monad Library

  • Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.1-9
    • /
    • 2021
  • In this paper, we extend a generic library framework based on the state monad to exploit deterministic parallelism in a purely functional language Haskell and provide benchmarks for the extended features on a multicore machine. Although purely functional programs are known to be well-suited to exploit parallelism, unintended squential data dependencies could prohibit effective parallelism. Symbolic execution programs usually implement fresh name generation in order to prevent confusion between variables in different scope with the same name. Such implementations are often based on squential state management, working against parallelism. We provide reusable primitives to help developing parallel symbolic execution programs with unbound-genercis, a generic name-binding library for Haskell, avoiding sequential dependencies in fresh name generation. Our parallel extension does not modify the internal implementation of the unbound-generics library, having zero possibility of degrading existing serial implementations of symbolic execution based on unbound-genecrics. Therefore, our extension can be applied only to the parts of source code that need parallel speedup.

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

An implementation of 2D/3D Complex Optical System and its Algorithm for High Speed, Precision Solder Paste Vision Inspection (솔더 페이스트의 고속, 고정밀 검사를 위한 이차원/삼차원 복합 광학계 및 알고리즘 구현)

  • 조상현;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.139-146
    • /
    • 2004
  • A 2D/3D complex optical system and its vision inspection algerian is proposed and implemented as a single probe system for high speed, precise vision inspection of the solder pastes. One pass un length labeling algorithm is proposed instead of the conventional two pass labeling algorithm for fast extraction of the 2D shape of the solder paste image from the recent line-scan camera as well as the conventional area-scan camera, and the optical probe path generation is also proposed for the efficient 2D/3D inspection. The Moire interferometry-based phase shift algerian and its optical system implementation is introduced, instead of the conventional laser slit-beam method, for the high precision 3D vision inspection. All of the time-critical algorithms are MMX SIMD parallel-coded for further speedup. The proposed system is implemented for simultaneous 2D/3D inspection of 10mm${\times}$10mm FOV with resolutions of 10 ${\mu}{\textrm}{m}$ for both x, y axis and 1 ${\mu}{\textrm}{m}$ for z axis. Experiments conducted on several nBs show that the 2D/3D inspection of an FOV, excluding an image capturing, results in high speed of about 0.011sec/0.01sec, respectively, after image capturing, with $\pm$1${\mu}{\textrm}{m}$ height accuracy.

R Based Parallelization of a Climate Suitability Model to Predict Suitable Area of Maize in Korea (국내 옥수수 재배적지 예측을 위한 R 기반의 기후적합도 모델 병렬화)

  • Hyun, Shinwoo;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.164-173
    • /
    • 2017
  • Alternative cropping systems would be one of climate change adaptation options. Suitable areas for a crop could be identified using a climate suitability model. The EcoCrop model has been used to assess climate suitability of crops using monthly climate surfaces, e.g., the digital climate map at high spatial resolution. Still, a high-performance computing approach would be needed for assessment of climate suitability to take into account a complex terrain in Korea, which requires considerably large climate data sets. The objectives of this study were to implement a script for R, which is an open source statistics analysis platform, in order to use the EcoCrop model under a parallel computing environment and to assess climate suitability of maize using digital climate maps at high spatial resolution, e.g., 1 km. The total running time reduced as the number of CPU (Central Processing Unit) core increased although the speedup with increasing number of CPU cores was not linear. For example, the wall clock time for assessing climate suitability index at 1 km spatial resolution reduced by 90% with 16 CPU cores. However, it took about 1.5 time to compute climate suitability index compared with a theoretical time for the given number of CPU. Implementation of climate suitability assessment system based on the MPI (Message Passing Interface) would allow support for the digital climate map at ultra-high spatial resolution, e.g., 30m, which would help site-specific design of cropping system for climate change adaptation.

Low Complexity Motion Estimation Based on Spatio - Temporal Correlations (시간적-공간적 상관성을 이용한 저 복잡도 움직임 추정)

  • Yoon Hyo-Sun;Kim Mi-Young;Lee Guee-Sang
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1142-1149
    • /
    • 2004
  • Motion Estimation(ME) has been developed to reduce temporal redundancy in digital video signals and increase data compression ratio. ME is an Important part of video encoding systems, since it can significantly affect the output quality of encoded sequences. However, ME requires high computational complexity, it is difficult to apply to real time video transmission. for this reason, motion estimation algorithms with low computational complexity are viable solutions. In this paper, we present an efficient method with low computational complexity based on spatial and temporal correlations of motion vectors. The proposed method uses temporally and spatially correlated motion information, the motion vector of the block with the same coordinate in the reference frame and the motion vectors of neighboring blocks around the current block in the current frame, to decide the search pattern and the location of search starting point adaptively. Experiments show that the image quality improvement of the proposed method over MVFAST (Motion Vector Field Adaptive Search Technique) and PMVFAST (Predictive Motion Vector Field Adaptive Search Technique) is 0.01~0.3(dB) better and the speedup improvement is about 1.12~l.33 times faster which resulted from lower computational complexity.

Optimal-synchronous Parallel Simulation for Large-scale Sensor Network (대규모 센서 네트워크를 위한 최적-동기식 병렬 시뮬레이션)

  • Kim, Bang-Hyun;Kim, Jong-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.5
    • /
    • pp.199-212
    • /
    • 2008
  • Software simulation has been widely used for the design and application development of a large-scale wireless sensor network. The degree of details of the simulation must be high to verify the behavior of the network and to estimate its execution time and power consumption of an application program as accurately as possible. But, as the degree of details becomes higher, the simulation time increases. Moreover, as the number of sensor nodes increases, the time tends to be extremely long. We propose an optimal-synchronous parallel discrete-event simulation method to shorten the time in a large-scale sensor network simulation. In this method, sensor nodes are partitioned into subsets, and each PC that is interconnected with others through a network is in charge of simulating one of the subsets. Results of experiments using the parallel simulator developed in this study show that, in the case of the large number of sensor nodes, the speedup tends to approach the square of the number of PCs participating in the simulation. In such a case, the ratio of the overhead due to parallel simulation to the total simulation time is so small that it can be ignored. Therefore, as long as PCs are available, the number of sensor nodes to be simulated is not limited. In addition, our parallel simulation environment can be constructed easily at the low cost because PCs interconnected through LAN are used without change.