• Title/Summary/Keyword: Software Pipelining

Search Result 23, Processing Time 0.019 seconds

Optimization of Gaussian Mixture Computation of ASR on DSP 67x (DSP 67x 기반 음성인식 시스템의 가우시안 확률 계산 최적화 구현)

  • Choi Taeil;Kim Taeyun;Ko Hanseok
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.53-56
    • /
    • 2004
  • 본 논문은 HMM 기반 임베디드 음성인식 시스템 구현에 관한 몇 가지 주제들을 설명한다. 임베디드 환경은 한정된 자원을 가지고 있고 그러한 가운데 타당한 인식률과 향상된 인식 속도를 얻기 위해서 몇가지 방법들을 이 논문에서 설명한다. 구현 환경은 DSP6711 기반에서 이루어졌다. 가우시안 mixture 계산 루틴을 부동소수점 연산에서 고정소수점 연산 및 software pipelining을 적용하였다. 고정소수점 변환 전과 후 비슷한 인식률을 얻었고 고정소수점 변환과 software pipelining 적용 후 연산 속도의 향상을 얻었다.

  • PDF

MLP Design Method Optimized for Hidden Neurons on FPGA (FPGA 상에서 은닉층 뉴런에 최적화된 MLP의 설계 방법)

  • Kyoung Dong-Wuk;Jung Kee-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.429-438
    • /
    • 2006
  • Neural Networks(NNs) are applied for solving a wide variety of nonlinear problems in several areas, such as image processing, pattern recognition etc. Although NN can be simulated by using software, many potential NN applications required real-time processing. Thus they need to be implemented as hardware. The hardware implementation of multi-layer perceptrons(MLPs) in several kind of NNs usually uses a fixed-point arithmetic due to a simple logic operation and a shorter processing time compared to the floating-point arithmetic. However, the fixed-point arithmetic-based MLP has a drawback which is not able to apply the MLP software that use floating-point arithmetic. We propose a design method for MLPs which has the floating-point arithmetic-based fully-pipelining architecture. It has a processing speed that is proportional to the number of the hidden nodes. The number of input and output nodes of MLPs are generally constrained by given problems, but the number of hidden nodes can be optimized by user experiences. Thus our design method is using optimized number of hidden nodes in order to improve the processing speed, especially in field of a repeated processing such as image processing, pattern recognition, etc.

A Study on Efficiency Improvement of USN Logistics Management System applied Pipelining Techniques (파이프라이닝 기법을 적용한 USN 물류관리 시스템 효율성 향상에 관한 연구)

  • Kim, Seok-Soo;Jung, Sung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.6
    • /
    • pp.1214-1219
    • /
    • 2009
  • Many studies are being applied for various parts of USN (Ubiquitous Sensor Network) technology. The world's large retail stores and warehouses that apply logistic management are also studied. With this, USN technology is increasing in its utilization. However, to handle and process real-time data will never be never easy if these huge warehouses are using too many sensors, and real-time data correction is almost impossible. Software implementation and high-speed hardware are insufficient to solve these complex problems. To solve this problem, a key solution is to implement high-speed software. Hence, this paper suggests a USN logistics management system that applies pipelining techniques for efficiency in real-time data correction and reduces errors of generated values.

A Vectorization Technique at Object Code Level (목적 코드 레벨에서의 벡터화 기법)

  • Lee, Dong-Ho;Kim, Ki-Chang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1172-1184
    • /
    • 1998
  • ILP(Instruction Level Parallelism) processors use code reordering algorithms to expose parallelism in a given sequential program. When applied to a loop, this algorithm produces a software-pipelined loop. In a software-pipelined loop, each iteration contains a sequence of parallel instructions that are composed of data-independent instructions collected across from several iterations. For vector loops, however the software pipelining technique can not expose the maximum parallelism because it schedules the program based only on data-dependencies. This paper proposes to schedule differently for vector loops. We develop an algorithm to detect vector loops at object code level and suggest a new vector scheduling algorithm for them. Our vector scheduling improves the performance because it can schedule not only based on data-dependencies but on loop structure or iteration conditions at the object code level. We compare the resulting schedules with those by software-pipelining techniques in the aspect of performance.

  • PDF

New Partitioning Techniques in Hrdware-Software Codesign (하드웨어-소프트웨어 통합설계에서의 새로운 분할 방법)

  • 김남훈;신현철
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.5
    • /
    • pp.1-10
    • /
    • 1998
  • In this paper, a new hardware-software patitioning algorithm is presented, in which the system behavioral description containing a mixture of hardware and softwae components is partitioned into the hardware part and the software part. In this research, new techniques to optimally partition a mixed system under certain specified constaints such as performance, area, and delay, have been developed. During the partitioning process, the overhead due to the communication between the hardware and software parts are considered. New featues have been added to adjust the hierarchical level of partitioning. Power consumption, memory cost, and the effect of pipelining can also be considered during partitioning. Another new feature is the ability to partition a DSP system under throughput constraints. This feature is important for real time processing. The developed partitioning system can also be used to evaluate various design alternatives and architectures.

  • PDF

Priority Data Handling in Pipeline-based Workflow (파이프라인 기반 워크플로우의 우선 데이터 처리 방안)

  • Jeon, Wonpyo;Heo, Daeyoung;Hwang, Suntae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.12
    • /
    • pp.691-697
    • /
    • 2017
  • Volcanic ash has been predicted to be the main source of damage caused by a potential volcanic disaster around Mount Baekdu and the regions of the Korean peninsula. Computer simulations to predict the diffusion of volcanic ash should be performed according to prevalent meteorological situations within a predetermined time. Therefore, a workflow using pipelining is proposed to parallelize the software used for this computation. Due to the nature of volcanic calamities, the simulations need to be carried out for various plausible conditions given that the parameters cannot be precisely determined during the simulations, even at the time of a volcanic eruption. Among the given conditions, computations need to be first performed for the condition with the highest probability so that a response to the volcanic disaster can be provided using these results. Further action can then be performed later based on subsequent results. The computations need to be performed using a volcanic disaster damage prediction system on a computing server with limited computing performance. Hence, an optimal distribution of the computing resources is required. We propose a method through which specific data can be provided first to the proposed pipeline-based workflow.

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

PDA-based Text Extraction System using Client/Server Architecture (Client/Server구조를 이용한 PDA기반의 문자 추출 시스템)

  • Park Anjin;Jung Keechul
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.85-98
    • /
    • 2005
  • Recently, a lot of researches about mobile vision using Personal Digital Assistant(PDA) has been attempted. Many CPUs for PDA are integer CPUs, which have no floating-computation component. It results in slow computation of the algorithms peformed by vision system or image processing, which have much floating-computation. In this paper, in order to resolve this weakness, we propose the Client(PDA)/server(PC) architecture which is connected to each other with a wireless LAN, and we construct the system with pipelining processing using two CPUs of the Client(PDA) and the Server(PC) in image sequence. The Client(PDA) extracts tentative text regions using Edge Density(ED). The Server(PC) uses both the Multi-1.aver Perceptron(MLP)-based texture classifier and Connected Component(CC)-based filtering for a definite text extraction based on the Client(PDA)'s tentativel99-y extracted results. The proposed method leads to not only efficient text extraction by using both the MLP and the CC, but also fast running time using Client(PDA)/server(PC) architecture with the pipelining processing.

Preprocessing Methods for Effective Modulo Scheduling on High Performance DSPs (고성능 디지털 신호 처리 프로세서상에서 효율적인 모듈로 스케쥴링을 위한 전처리 기법)

  • Cho, Doo-San;Paek, Yun-Heung
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.5
    • /
    • pp.487-501
    • /
    • 2007
  • To achieve high resource utilization for multi-issue DSPs, production compiler commonly includes variants of iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, unduly restrict modulo scheduling freedom. As a result, replicated functional units in multi-issue DSPs are often under-utilized. To address this resource under-utilization problem, our paper describes a novel compiler preprocessing strategy for effective modulo scheduling. The preprocessing strategy proposed capitalizes on two new transformations, which are referred to as cloning and dismantling. Our preprocessing strategy has been validated by an implementation for StarCore SC140 DSP compiler.

Enhanced Pipeline Scheduling for IA-64 (IA-64를 위한 향상된 소프트웨어 파이프라인 명령어 스케줄링)

  • Lee Jae-Mok;Moon Soo-Mook
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.826-828
    • /
    • 2005
  • 인텔의 IA-64 프로세서는 명령어 수준의 병렬수행을 지원하는 EPIC (Explicitly Parallel Instruction Computing) 구조를 채택하고 있으며 컴파일러가 순차적 코드에서 병렬 수행이 가능한 독립적인 명령어들을 스케줄링 하도록 되어있다. 본 논문에서는 IA-64 스케줄링을 위해 향상된 파이프라인 스케줄링 (Enhanced Pipeline Scheduling, EPS) 기법[1]을 적용한 결과를 소개한다. EPS는 루프수준의 병렬화를 위한 소프트웨어 파이프라이닝 (software pipelining)기법으로 전역 스케줄링 (global Scheduling) 기법을 기반으로 하고 있다. 우리는 IA-64 프로세서를 위한 공개소스 컴파일러인 ORC (Open Research Compiler)에 EPS를 구현하고 실제 프로세서인 Itanium에서 실험을 수행하였다. 상용 프로세서와 컴파일러에 구현과 튜닝을 하는 과정에서 얻은 경험을 소개하고 기존의 ORC 컴파일러와 비교하여 얻은 성능 향상을 보고하고 분석한다.

  • PDF