• 제목/요약/키워드: Parallel computation

검색결과 594건 처리시간 0.019초

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • 제13권3호
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.

다중 Neocognitron 모둘을 이용한 표적 인식 (Target recognition using multiple necognitron-module)

  • 주기현;서춘원;류충상;김은수
    • 한국통신학회논문지
    • /
    • 제21권11호
    • /
    • pp.2739-2749
    • /
    • 1996
  • This aper introduces the multiple Neocognitron module approach for the effective target reognition. The Neocognitron which is designed to classify a pattern by extracting the local features from it, seems to be an unique method that can perform a pattern recognition using the neural networks. But due to its rigid structure, the Neocognitron must be reconstructed whenever there exists a variation on the number of classes. This is a quite difficult problem for the target recognition application that needs huge amount of computation and numerous classes to be classified. In this paper, we construct several smaller Necognitrom modules and train each module to adapt each class. After construction of the mulules, we integrate them in parallel so as to adaopt input at the same time and to produce each score that shold be matched to be learned class. This approach can reduce the sizes of the networks and is adaptive to the increase of classes as well as the authentic distortion, shift, scale variation and slight rotation invariant properties of general Neocognitron. This paper show the effectiveness of the proposed approach through some experience and performs analysis of the inhibitory interconnections in the architecture of the multiple module structure.

  • PDF

근거리 힘 계산의 새로운 고속화 방법 (A New Fast Algorithm for Short Range Force Calculation)

  • 안상환;안철오
    • 유체기계공업학회:학술대회논문집
    • /
    • 유체기계공업학회 2006년 제4회 한국유체공학학술대회 논문집
    • /
    • pp.383-386
    • /
    • 2006
  • In this study, we propose a new fast algorithm for calculating short range forces in molecular dynamics, This algorithm uses a new hierarchical tree data structure which has a high adaptiveness to the particle distribution. It can divide a parent cell into k daughter cells and the tree structure is independent of the coordinate system and particle distribution. We investigated the characteristics and the performance of the tree structure according to k. For parallel computation, we used orthogonal recursive bisection method for domain decomposition to distribute particles to each processor, and the numerical experiments were performed on a 32-node Linux cluster. We compared the performance of the oct-tree and developed new algorithm according to the particle distributions, problem sizes and the number of processors. The comparison was performed sing tree-independent method and the results are independent of computing platform, parallelization, or programming language. It was found that the new algorithm can reduce computing cost for a large problem which has a short search range compared to the computational domain. But there are only small differences in wall-clock time because the proposed algorithm requires much time to construct tree structure than the oct-tree and he performance gain is small compared to the time for single time step calculation.

  • PDF

밀봉제 도포용 마이크로 노즐 설계를 위한 유동해석 (NUMERICAL INVESTIGATION OF THE FLOW IN A MICRONOZZLE FOR SEAL DISPENSER)

  • 박규진;곽호상;손병철;김경진
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2007년도 추계 학술대회논문집
    • /
    • pp.236-242
    • /
    • 2007
  • A theoretical and numerical investigation is performed on the flow in a micronozzle for precision-controlled seal dispenser. The working fluid is a highly viscous epoxy used as sealant in producing LCD panels, which contains a number of tiny solid spacers. Flow analysis is conducted in order to achieve the optimal design oj internal geometry of a nozzle. A simplified design analysis methodology is proposed for predicting the flow in the nozzle based on the assumption that the Reynolds number is much less than O(1). The parallel numerical computations are performed by using a CFD package FLUENT. Comparison discloses that the theoretical model gives a good prediction on the distribution of pressure and wall shear stress in the nozzle. However, the theoretical model has a difficulty in predicting the maximum wall shear stress as found in a limited region near edge by numerical computation. The theoretical and numerical simulations provide the good guideline for designing a dispensing micronozzle.

  • PDF

재순환 및 선회 유동에 대한 대와동모사(LES)의 성능검토 (Performance Evaluation of Large Eddy Simulation for Recirculating and Swirling Flows)

  • 황철홍;이창언
    • 대한기계학회논문집B
    • /
    • 제30권4호
    • /
    • pp.364-372
    • /
    • 2006
  • The objective of this study is to evaluate the efficiency and the prediction accuracy of developed large eddy simulation (LES) program for complex turbulent flows, such as recirculating and swirling flows. To save the computational cost, a Beowulf cluster system consisting 16 processors was constructed. The flows in backward-facing step and dump combustor were examined as representative recirculating and swirling flows. Firstly, a direct numerical simulation (DNS) for laminar backward-facing step flows was previously conducted to validate the overall performance of program. Then LES was carried out for turbulent backward-facing step flows. The results of laminar flow showed a qualitative and quantitative agreement between simulations and experiments. The simulations of the turbulent flow also showed reasonable results. Secondly, LES results for non-swirling and swirling flows in a dump combustor were compared with the results of Reynolds-averaged Navier-Stokes (RANS) using standard $k-{\varepsilon}$ model. The results show that LES has a better performance in predicting the mean axial and azimuthal velocities, comer recirculation zone (CRZ) and center toroidal recirculation zone (CTRZ) than those of RANS. Finally, it was examined the capability of LES for the description of unsteady phenomena.

An applied model for steel reinforced concrete columns

  • Lu, Xilin;Zhou, Ying
    • Structural Engineering and Mechanics
    • /
    • 제27권6호
    • /
    • pp.697-711
    • /
    • 2007
  • Though extensive research has been carried out for the ultimate strength of steel reinforced concrete (SRC) members under static and cyclic load, there was only limited information on the applied analysis models. Modeling of the inelastic response of SRC members can be accomplished by using a microcosmic model. However, generally used microcosmic model, which usually contains a group of parameters, is too complicated to apply in the nonlinear structural computation for large whole buildings. The intent of this paper is to develop an effective modeling approach for the reliable prediction of the inelastic response of SRC columns. Firstly, five SRC columns were tested under cyclic static load and constant axial force. Based on the experimental results, normalized trilinear skeleton curves were then put forward. Theoretical equation of normalizing point (ultimate strength point) was built up according to the load-bearing mechanism of RC columns and verified by the 5 specimens in this test and 14 SRC columns from parallel tests. Since no obvious strength deterioration and pinch effect were observed from the load-displacement curve, hysteresis rule considering only stiffness degradation was proposed through regression analysis. Compared with the experimental results, the applied analysis model is so reasonable to capture the overall cyclic response of SRC columns that it can be easily used in both static and dynamic analysis of the whole SRC structural systems.

Application of Multi-Frontal Method in Collaborative Engineering Environment

  • Cho, Seong-Wook;Choi, Young;Lee, Gyu-Bong;Kwon, Ki-Eak
    • International Journal of CAD/CAM
    • /
    • 제3권1_2호
    • /
    • pp.51-60
    • /
    • 2003
  • The growth of the World Wide Web and the advances in high-speed network access have greatly changed existing CAD/CAE environment. The WWW has enabled us to share various distributed product data and to collaborate in the design process. An international standard for the product model data, STEP, and a standard for the distributed object technology, CORBA, are very important technological components for the interoperability in the advanced design and manufacturing environment. These two technologies provide background for the sharing of product data and the integration of applications on the network. This paper describes a distributed CAD/CAE environment that is integrated on the network by CORBA and product model data standard STEP. Several prototype application modules were implemented to verify the proposed concept and the test result is discussed. Finite element analysis server are further distributed into several frontal servers for the implementation of distributed parallel solution of finite element system equations. Distributed computation of analysis server is also implemented by using CORBA for the generalization of the proposed method.

무베어링 로터 허브 형상에 대한 요구도 분석 및 항력 예측 (Requirement Analysis and Drag Prediction for the Aerodynamic Configuration of a Bearingless Rotor Hub)

  • 강희정
    • 항공우주기술
    • /
    • 제11권1호
    • /
    • pp.19-26
    • /
    • 2012
  • 무베어링 로터 허브시스템 개발에서 할당된 공기역학적 허브 항력 요구도를 분석하여, 요구도에서 제시된 방법으로 입증 가능하도록 요구도를 구체화 시켰다. 초기 허브 형상에 대해 공력계수에 기반하여 항력 예측을 수행하였으며, 요구도 충족을 위한 설계 변경안을 제시하였다. 최종 형상에 대해 전산유체기법을 사용하여 항력 예측을 수행하였으며, 그 결과 구체화된 요구도를 만족시킴을 확인할 수 있었다. 또한 기 개발된 헬리콥터의 추세선으로부터 유추할 수 있는 허브 항력의 범위 내에 있음을 확인할 수 있다.

DPCM-GR 방식을 이용한 CUDA 기반 초고해상도 게임 영상 무손실 비동기 압축 (CUDA based Lossless Asynchronous Compression of Ultra High Definition Game Scenes using DPCM-GR)

  • 김영식
    • 한국게임학회 논문지
    • /
    • 제14권6호
    • /
    • pp.59-68
    • /
    • 2014
  • 초고해상도 UHD($096{\times}2160$) 게임 영상의 메모리 대역폭 요구량은 기하급수적으로 늘어난다. 본 논문에서는 화질 저하 없이 메모리 대역폭 문제를 해결하기 위하여 CUDA 환경에서 비트 병렬 파이프라인을 지원하는 논문 [4]의 DDPCM-GR 압축 알고리즘을 변형한 DPCM-GR 방식을 적용한 무손실 압축을 구현하였다. CUDA 공유메모리 사용을 통한 효율성을 증대하였으며, paged-locked 호스트 메모리 비동기 전송을 통한 커널과 데이터 전송 중첩의 다양한 구성을 구현하였다. 실험을 통하여 CPU 방식에 비하여 최대 31.3배 속도 향상을 이루었으며, 비동기 전송 구성의 변화를 통하여 최대 30.3% 수행 시간이 감소하였다.

고온 나노임프린트 장비용 핫플레이트의 열제어에 대한 수치모사 (NUMERICAL SIMULATION OF THERMAL CONTROL OF A HOT PLATE FOR THERMAL NANOIMPRINT LITHOGRAPHY MACHINES)

  • 박규진;곽호상;신동원;이재종
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2007년도 춘계 학술대회논문집
    • /
    • pp.153-158
    • /
    • 2007
  • Since the introduction of Nanoimprint in the mid-1990s, Nanoimprint lithography, a low-cost, non-convential method, has been the dominant lithography technology that guarantees high-throughput patterning of nanostructures. Based on the mechanical embossing mechanism, Nanoimprint lithography creates the nanopatterns on the polymer material cast on the substrate. In essence, the process needs nanofabrication equipment for printing with the adequate control of temperature, pressure and control of parallels of the stamp and substrate. This article introduce the possibility and reality of the thermal control on the hot plate using a CFD code. Numerical computation has been conducted for assessing the feasibility of a hot plate($120{\times}120\;mm2$). PID control is adopted to ensure high temperature uniformity in several zones. Parallel experiments have also been performed for verifying thermal performance. Not only show the results the optimum number of thermocouples related to controllers but also suggest that the thermal simulation using a CFD code would be an alternative method to design and develop the thermal control equipment in the financial aspect.

  • PDF