• Title/Summary/Keyword: Parallel processor

Search Result 485, Processing Time 0.024 seconds

Molecular Interaction Interface Computing Based on Voxel Map (복셀맵을 기반으로 한 분자 간 상호작용 인터페이스의 계산)

  • Choi, Jihoon;Kim, Byungjoo;Kim, Ku-jin
    • Journal of the Korea Computer Graphics Society
    • /
    • v.18 no.3
    • /
    • pp.1-7
    • /
    • 2012
  • In this paper, we propose a method to compute the interface between protein molecules. When a molecules is represented as a set of spheres with van der Waals radii, the distance from a spatial point p to the molecule corresponds to the distance from p to the closet sphere. The molecular interface is composed of equi-distant points from two molecules. Our algorithm decomposes the space into a set of voxels, and then constructs a voxel map by storing the information of spheres intersecting each voxel. By using the voxel map, we compute the distance between a point and the molecule. We also use GPU for the parallel processing, and efficiently approximate the interface of a pair of molecules.

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.3
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.

Development of Automated Surface Inspection System using the Computer V (컴퓨터 비젼을 이용한 표면결함검사장치 개발)

  • Lee, Jong-Hak;Jung, Jin-Yang
    • Proceedings of the KIEE Conference
    • /
    • 1999.07b
    • /
    • pp.668-670
    • /
    • 1999
  • We have developed a automatic surface inspection system for cold Rolled strips in steel making process for several years. We have experienced the various kinds of surface inspection systems, including linear CCD camera type and the laser type inspection system which was installed in cold rolled strips production lines. But, we did not satisfied with these inspection systems owing to insufficient detection and classification rate, real time processing performance and limited line speed of real production lines. In order to increase detection and computing power, we have used the Dark Field illumination with Infra_Red LED, Bright Field illumination with Xenon Lamp, Parallel Computing Processor with Area typed CCD camera and full software based image processing technique for the ease up_grading and maintenance. In this paper, we introduced the automatic inspection system and real time image processing technique using the Object Detection, Defect Detection, Classification algorithms. As a result of experiment, under the situation of the high speed processed line(max 1000 meter per minute) defect detection is above 90% for all occurred defects in real line, defect name classification rate is about 80% for most frequently occurred 8 defect, and defect grade classification rate is 84% for name classified defect.

  • PDF

A New Fast Algorithm for Short Range Force Calculation (근거리 힘 계산의 새로운 고속화 방법)

  • Lee, Sang-Hwan;Ahn, Cheol-O
    • 유체기계공업학회:학술대회논문집
    • /
    • 2006.08a
    • /
    • pp.383-386
    • /
    • 2006
  • In this study, we propose a new fast algorithm for calculating short range forces in molecular dynamics, This algorithm uses a new hierarchical tree data structure which has a high adaptiveness to the particle distribution. It can divide a parent cell into k daughter cells and the tree structure is independent of the coordinate system and particle distribution. We investigated the characteristics and the performance of the tree structure according to k. For parallel computation, we used orthogonal recursive bisection method for domain decomposition to distribute particles to each processor, and the numerical experiments were performed on a 32-node Linux cluster. We compared the performance of the oct-tree and developed new algorithm according to the particle distributions, problem sizes and the number of processors. The comparison was performed sing tree-independent method and the results are independent of computing platform, parallelization, or programming language. It was found that the new algorithm can reduce computing cost for a large problem which has a short search range compared to the computational domain. But there are only small differences in wall-clock time because the proposed algorithm requires much time to construct tree structure than the oct-tree and he performance gain is small compared to the time for single time step calculation.

  • PDF

Performance Evaluation of Value Predictor in High Performance Microprocessors (고성능 마이크로프로세서에서 값 예측기의 성능평가)

  • Jeon Byoung-Chan;Kim Hyeock-Jin;RU Dae-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.87-95
    • /
    • 2005
  • value prediction in high performance micro processors is a technique that exploits Instruction Level Parallelism(ILP) by predicting the outcome of an instruction and by breaking and executing true data dependences. In this paper, the mean Performance improvements by predictor according to a point of time for update of each table as well as prediction accuracy and Prediction rate are measured and assessed by comparison and analysis of value predictor that issues in parallel and run by predicting value, which is for Performance improvements of ILP in micro Processor. For the verification of its validity the SPECint95 benchmark through the simulation is compared by making use of execution driven system.

  • PDF

The study on the Efficient methodology to apply the GPU for military information system improvement (국방정보시스템 성능향상을 위한 효율적인 GPU적용방안 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.1
    • /
    • pp.27-35
    • /
    • 2015
  • Increasing the number of GPU (Graphic Processor Unit) cores, the studies on High Performance Computing Platform using GPU have actively been made in recent. This trend has led to the development of GPGPU (General Purpose GPU) and CUDA (Compute Unified Device Architecture) Framework. In this paper, we explain the many benefits of the GPU based system, and propose the ICIDF(Identify Compute-Intensive Data set and Function) methodology to apply GPU technology to legacy military information system for performance improvement. To demonstrate the efficiency of this methodology, we applied this method to AES CPU based program obtained from the Internet web site. Simply changing the data structure made improved the performance of AES program. As a result, the performance of AES based GPU program is improved gradually up to 10 times. Depending on the developer's ability, additional performance improvement can be expected. The problem to be solved is heat issue, but this problem has been much improved by the development of the cooling technology.

A Study on the design of RNS Multiplier to speed up the Graphic Process (고속 그래픽 처리를 위한 잉여수계 승산기 설계에 관한 연구)

  • Kim, Yong-Sung;Cho, Won-Kyung
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.25-37
    • /
    • 1996
  • To process computer graphics in real time, the high-speed operations(multiplier and adder) are needed to increase the speed of graphic process. RNS(Residue Number System) is integer number system that has the parallel and high-speed operation. Also, it is able to design both high-speed multiplier and adder, since a cyclic group has an isomorphic relation between multiplication and addition in RNS. So in this paper, DRNS(Double Residue Number System) is proposed, it is used for the multiplier and the adder, which are designed using a circulative code for the high-speed graphic processor in RNS. The designed multiplier would operate with the speed of 87Mzz two TTL using 74s09 and 74s32.

  • PDF

Comparison of PWM Strategies for Three-Phase Current-fed DC/DC Converters

  • Cha, Han-Ju;Choi, Soon-Ho;Han, Byung-Moon
    • Journal of Power Electronics
    • /
    • v.8 no.4
    • /
    • pp.363-370
    • /
    • 2008
  • In this paper, three kinds of PWM strategies for a three-phase current-fed dc/dc converter are proposed and compared in terms of losses and voltage transfer ratio. Each PWM strategy is described graphically and their switching losses are analyzed. With the proposed PWM C strategy, one turn-off switching of each bridge switch is eliminated to reduce switching losses under the same switching frequency. In addition, RMS current through the bridge switches is lowered by using parallel connection between two bridge switches and thus, conduction losses of the switches are reduced. Further, copper losses of the transformer are decreased due to the reduced RMS current of each transformer's winding. Therefore, total losses are minimized and the efficiency of the converter is improved by using the proposed PWM C strategy. Digital signal processor (DSP: TI320LF2407) and a field-programmable gate array (FPGA: EPM7128) board are used to generate PWM patterns for three-phase bridge and clamp MOSFETs. A 500W prototype converter is built and its experimental results verify the validity of the proposed PWM strategies.

Reducing False Sharing based on Memory Reference Patterns in Distributed Shared Memory Systems (분산 공유 메모리 시스템에서 메모리 참조 패턴에 근거한 거짓 공유 감속 기법)

  • Jo, Seong-Je
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1082-1091
    • /
    • 2000
  • In Distributed Shared Memory systems, false sharing occurs when two different data items, not shared but accessed by two different processors, are allocated to a single block and is an important factor in degrading system performance. The paper first analyzes shared memory allocation and reference patterns in parallel applications that allocate memory for shared data objects using a dynamic memory allocator. The shared objects are sequentially allocated and generally show different reference patterns. If the objects with the same size are requested successively as many times as the number of processors, each object is referenced by only a particular processor. If the objects with the same size are requested successively much more than the number of processors, two or more successive objects are referenced by only particular processors. On the basis of these analyses, we propose a memory allocation scheme which allocates each object requested by different processors to different pages and evaluate the existing memory allocation techniques for reducing false sharing faults. Our allocation scheme reduces a considerable amount of false sharing faults for some applications with a little additional memory space.

  • PDF

A study on the implementation simulation and system for 2-D doppler system using second-order sampling (2차 샘플링을 이용한 2-D 도플러 시스템의 시뮬레이션과 시스템구현에 관한 연구)

  • 임춘성;임용곤
    • Journal of Biomedical Engineering Research
    • /
    • v.11 no.1
    • /
    • pp.147-156
    • /
    • 1990
  • A two-dimensional pulsed doppler system for ultrasonic blood velocity doppler signals is studied and implemented. The second-order sampling method and serial data processing procedures are utillized in the sys- tem, which eliminates the untuning problems at phase channels in the quadrature detection method as well as in the channels of parallel data processing. rho digital signal processor used in this system allows a hardware savings and flexible design options. The efficiency of the various mean frequency estimators in the second-order sampling system is examined by computer simulation as a function of the intersequence sample delay time. The temporal delay for the quadrature component is changed from $1/(4f_o){\;}to{\;}3/(4f_o){\;}and{\;}5/(4f_o)$ where to is the center frequency of the transducer, It is found that autocorrelator is the optimum frequency estimator for the second-order sampling: with !he intersequence sample delay of $1/(4f_o){\;}to{\;}3/(4f_o){\;}and{\;}5/(4f_o)$. The qualitative variation and information proportional to blood velocity in the vessel system are obtained in the VIVO experiments.

  • PDF