• Title/Summary/Keyword: Shift-and-add

Search Result 67, Processing Time 0.03 seconds

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units (GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭)

  • Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.6
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

Synchronization Algorithm and Demodulation using the Phase Transition Detection in the DSP based MPSK Receiver (DSP 기반 MPSK 수신기에서 위상천이 검출을 이용한 동기 알고리즘과 복조)

  • Lee Jun-Seo;Maing Jun-Ho;Ryu Heung-Gyoon;Park Cheol-Sun;Jang Won
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.10 s.89
    • /
    • pp.952-960
    • /
    • 2004
  • PSK(Phase Shift Keying) is useful because of the power and spectral efficient modulation. In this paper, no additional hardware will be needed to support various transmit mode in the suggested DSP scheme. We design and implement the synchronization algorithm for M-ary PSK(M=2, 4) demodulator based on DSP scheme, instead of complex analog PSK demodulator. TMS320C6203 is used as DSP. We check the all kinds of waveforms via the graph view window after software programming the emulation on the DSP tool. The result of implementation proves that demodulator using the suggested algorithm has equal performance with demodulator using analog circuits.

CNN Accelerator Architecture using 3D-stacked RRAM Array (3차원 적층 구조 저항변화 메모리 어레이를 활용한 CNN 가속기 아키텍처)

  • Won Joo Lee;Yoon Kim;Minsuk Koo
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.234-238
    • /
    • 2024
  • This paper presents a study on the integration of 3D-stacked dual-tip RRAM with a CNN accelerator architecture, leveraging its low drive current characteristics and scalability in a 3D stacked configuration. The dual-tip structure is utilized in a parallel connection format in a synaptic array to implement multi-level capabilities. It is configured within a Network-on-chip style accelerator along with various hardware blocks such as DAC, ADC, buffers, registers, and shift & add circuits, and simulations were performed for the CNN accelerator. The quantization of synaptic weights and activation functions was assumed to be 16-bit. Simulation results of CNN operations through a parallel pipeline for this accelerator architecture achieved an operational efficiency of approximately 370 GOPs/W, with accuracy degradation due to quantization kept within 3%.

A Modified SaA Architecture for the Implementation of a Multiplierless Programmable FIR Filter for Medical Ultrasound Signal Processing (곱셈기가 제거된 의료 초음파 신호처리용 프로그래머블 FIR 필터 구현을 위한 수정된 SaA 구조)

  • Han, Ho-San;Song, Jae-Hee;Kim, Hak-Hyun;Goh, Bang-Young;Song, Tai-Kyong
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.3
    • /
    • pp.423-428
    • /
    • 2007
  • Programmable FIR filters are used in various signal processing tasks in medical ultrasound imaging, which are one of the major factors increasing hardware complexity. A widely used method to reduce the hardware complexity of a programmable FIR filter is to encode the filter coefficients in the canonic signed digit (CSD) format to minimize the number of nonzero digits (NZD) so that the multipliers for each filter coefficients can be replaced with fixed shifters and programmable multiplexers (PM). In this paper, a new structure for programmable FIR filters with a improved frequency response and a reduced hardware complexity compared to the conventional shift-and-add architecture using PM is proposed for implementing a very small portable ultrasound scanner. The CSD codes are optimized such that there exists at least one common nonzero digit between neighboring coefficients. Such common digits are then implemented with the same shifters. For comparison, synthesisable VHDL models for programmable FIR filters are developed based on the proposed and the conventional architectures. When these filters have the same hardware complexity, pass-band ana stop-band ripples of the proposed filter are lower than those of the conventional filter by about $0.01{\sim}0.19dB$ and by about $5{\sim}10dB$, respectively. For the same filter performance, the hardware complexity of the proposed architecture is reduced by more than 20% compare to the conventional SaA architecture.

Image Resolution Enhancement by Improved S&A Method using POCS (POCS 이론을 이용한 개선된 S&A 방법에 의한 영상의 화질 향상)

  • Yoon, Soo-Ah;Lee, Tae-Gyoun;Lee, Sang-Heon;Son, Myoung-Kyu;Kim, Duk-Gyoo;Won, Chul-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1392-1400
    • /
    • 2011
  • In most digital imaging applications, high-resolution images or videos are usually desired for later image processing and analysis. The image signal obtained from general imaging system occurs image degradation during the process of image acquirement caused by the optics, physical constraints and the atmosphere effects. Super-resolution reconstruction, one of the solution to address this problem, is image reconstruction technique that produces a high-resolution image from several low-resolution frames in video sequences. In this paper, we propose an improved super-resolution method using Projection onto Convex Sets (POCS) method based on Shift & Add (S&A). The image using conventional algorithms is sensitive to noise. To solve this problem, we propose a fusion algorithm of S&A and POCS. Also we solve the problem using BLPF (Butterworth Low-pass Filter) in frequency domain as optical blur. Our method is robust to noise and has sharpness enhancement ability. Experimental results show that the proposed super-resolution method has better resolution enhancement performance than other super-resolution methods.

Development of Few-second 40 kV, 280 kW High Voltage Pulse Power Supply (수 초 지속 40 kV, 280 kW 고전압 펄스전원장치 개발)

  • Kim, S.C.;Nam, S.H.;Heo, H.;Heo, H.;Moon, C.;Kim, J.H.;Oh, S.S.;Yang, J.W.;Sho, J.H.
    • Proceedings of the KIEE Conference
    • /
    • 2015.07a
    • /
    • pp.990-991
    • /
    • 2015
  • To drive a magnetron injection gun, thsi paper decribes a design, fabrication and analysis results of proposed compact capacitor charging power supply (CCPS) formed resonant full-bridge inverter for electron gun power supply (EGPS). EGPS needs the -40 kV output voltage and 280 kW output power for few seconds continuously and have to be designed for the rise and fall time to be less than 1 ms with the ripple stability of output voltage of lower than 1%. In order to meet the requirements, we used eight resonant full-bridge modules operated in parallel. Each resonant full-bridge module can supply the current of 0.9 A and the voltage of 40 kV, and is operated by N-phase shift switching pattern. In this paper, we present the design, simulation and test results of interleaved CCPS.

  • PDF

A Study on the classification of Underwater Acoustic Signal Using an Artificial Neural Network (신경회로망을 이용한 수중음향 신호의 식별에 관한 연구)

  • Na, Young-Nam;Shim, Tae-Bo;Han, Jeong-Woo;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2E
    • /
    • pp.57-64
    • /
    • 1995
  • In this study, we examine the applicability of the classifier based on an artifical neural network (ANN) for the low-frequency acoustic signal in shallow water environment. The estimations of the Doppler shift and frequency spreading effect at 220 Hz reveal the frequency variation of less than 2 Hz with time This small variation enables the ANN-based classifier to identify signals using only tonal frequency information. The ANN consists of 4 layers, and has 60 input processing elements (PEs) and 4 output PEs, respectively. When measured tonal signals in the frequency 200-250 Hz are applied to the ANN-based classifier, the classifier can identify more than 67% of the signals for instantaneous frame and more than 91% for averaged one over 5 frames.

  • PDF

A low-power systolic structure for MP3 IMDCT Using addition and shift operation (덧셈과 쉬프트 연산을 사용한 MP3 IMDCT의 저전력 Systolic 구조)

  • Jang Young Beom;Lee Won Sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10C
    • /
    • pp.1451-1459
    • /
    • 2004
  • In this paper, a low-power 32-point IMDCT structure is proposed for MP3. Through re-odering of IMDCT matrices, we propose the systolic structure operating with 16, 8, 4, 2, and 1 cycle, respectively. To reduce power consumption, multiplication of each sub blocks are implemented by add and shift operation with CSD(Canrmic sigled digit) form coefficients. To reduce, furthermore, the number of adders, we utilize the common sub-expression sharing techniques. With these techniques, the relative power consumption of the proposed structure is reduced by 58.4% comparison to the conventional structure using only 2's complement form coefficient. Validity of the proposed structure is proved through Verilog-HDL coding.

Audio /Speech Codec Using Variable Delay MDCT/IMDCT (가변 지연 MDCT/IMDCT를 이용한 오디오/음성 코덱)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • A high-quality audio/voice codec using the MDCT/IMDCT process can perfectly restore the current frame through an overlap-add process with the previous frame. In the overlap-add process, an algorithm delay equal to the frame length occurs. In this paper, we propose a MDCT/IMDCT process that reduces algorithm delay by using a variable phase shift in MDCT/IMDCT process. In this paper, a low-delay audio/speech codec was proposed by applying the low delay MDCT/IMDCT algorithm to the ITU-T standard codec G.729.1 codec. The algorithm delay in the MDCT/IMDCT process can be reduced from 20 ms to 1.25 ms. The performance of the decoded output signal of the audio/speech codec to which low-delay MDCT/IMDCT is applied is evaluated through the PESQ test, which is an objective quality test method. Despite of the reduction in transmission delay, it was confirmed that there is no difference in sound quality from the conventional method.

Improving streamflow and flood predictions through computational simulations, machine learning and uncertainty quantification

  • Venkatesh Merwade;Siddharth Saksena;Pin-ChingLi;TaoHuang
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.29-29
    • /
    • 2023
  • To mitigate the damaging impacts of floods, accurate prediction of runoff, streamflow and flood inundation is needed. Conventional approach of simulating hydrology and hydraulics using loosely coupled models cannot capture the complex dynamics of surface and sub-surface processes. Additionally, the scarcity of data in ungauged basins and quality of data in gauged basins add uncertainty to model predictions, which need to be quantified. In this presentation, first the role of integrated modeling on creating accurate flood simulations and inundation maps will be presented with specific focus on urban environments. Next, the use of machine learning in producing streamflow predictions will be presented with specific focus on incorporating covariate shift and the application of theory guided machine learning. Finally, a framework to quantify the uncertainty in flood models using Hierarchical Bayesian Modeling Averaging will be presented. Overall, this presentation will highlight that creating accurate information on flood magnitude and extent requires innovation and advancement in different aspects related to hydrologic predictions.

  • PDF