• Title/Summary/Keyword: On-Chip Memory

Search Result 296, Processing Time 0.031 seconds

Fixed Point Implementation of the QCELP Speech Coder

  • Yoon, Byung-Sik;Kim, Jae-Won;Lee, Won-Myoung;Jang, Seok-Jin;Choi, Song_in;Lim, Myoung-Seon
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.242-258
    • /
    • 1997
  • The Qualcomm code excited linear prediction (QCELP) speech coder was adopted to increase the capacity of the CDMA Mobile System (CMS). In this paper, we implemented the QCELP speech coding algorithm by using TMS320C50 fixed point DSP chip. Also the fixed point simulation was done with C language. The computation complexity of QCELP on TMS320C50 was 10k words and data memory was 4k words. In the normal call test on the CMS, where mobile to mobile call test was done in the bypass mode without double vocoding, mean opinion score for the speech quality was he Qualcomm code excited linear prediction (QCELP) speech quality was 3.11.

  • PDF

An experimental study on Intel KNL processor to improve the performance of high bandwidth on-chip memory (인텔 KNL 프로세서 사례를 통한 고성능 온칩 메모리의 성능 병목 분석 및 해결 방안 연구)

  • Byun, Eun-Kyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.92-95
    • /
    • 2020
  • 나날이 커져가는 데이터 처리량의 수요를 충족시키기 위한 방법의 하나로 수십개의 코어와 여러 채널의 고대역폭 메모리를 탑재한 프로세서가 상위 슈퍼컴퓨터 시스템에 도입되어 사용되고 있다. 이러한 Scale-out 방식은 성능 한계를 크게 끌어올릴 수 있지만 제대로 된 작업 배분이 되지 않았을 때 성능이 떨어질 가능성이 있다. 본 연구에서는 인텔 KNL 프로세서의 고성능 온칩 메모리의 성능 벤치마크를 진행하여 병목 현상이 실제로 존재함을 확인하였다. 또한 이런 성능 저하 패턴을 찾아내고 원인을 분석하여 향후의 시스템에서 이러한 문제를 최소화하기 위해서 하드웨어, 시스템 소프트웨어 수준에의 보완 방안을 제안한다.

ASIC Design of Wavelet Transform Filter for Moving Picture (동영상용 웨이브렛 변환 필터의 ASIC 설계)

  • Kang, Bong-Hoon;Lee, Ho-Joon;Koh, Hyung-Hwa
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.12
    • /
    • pp.67-75
    • /
    • 1999
  • In this paper, we present an ASIC(Application Specific Integrated Circuit) design of wavelet transform filter Wavelet transform is used in lots of application fields which include image compression, because it has an excellent energy compaction. The operation characteristic and performance of wavelet transform filter are analyzed by using verilog-HDL(Hardware Description Language). In this paper, the designed wavelet transform filter uses line memory to improve data processing rate. Generally, when it reads and writes data of DRAM by using Fast Page Mode, input and output processing is very fast in horizontal direction but substantially slow in vertical direction. The use of line memory solves this low speed processing problem. As a result, though the size of the chip is getting larger, processing time for an image frame becomes 4.66ms. Generally, since the limit of 1 frame processing time on the data of TV video is 33ms, so it is appropriate for TV video.

  • PDF

FPGA Implementation of SURF-based Feature extraction and Descriptor generation (SURF 기반 특징점 추출 및 서술자 생성의 FPGA 구현)

  • Na, Eun-Soo;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.4
    • /
    • pp.483-492
    • /
    • 2013
  • SURF is an algorithm which extracts feature points and generates their descriptors from input images, and it is being used for many applications such as object recognition, tracking, and constructing panorama pictures. Although SURF is known to be robust to changes of scale, rotation, and view points, it is hard to implement it in real time due to its complex and repetitive computations. Using 3.3 GHz Pentium, in our experiment, it takes 240ms to extract feature points and create descriptors in a VGA image containing about 1,000 feature points, which means that software implementation cannot meet the real time requirement, especially in embedded systems. In this paper, we present a hardware architecture that can compute the SURF algorithm very fast while consuming minimum hardware resources. Two key concepts of our architecture are parallelism (for repetitive computations) and efficient line memory usage (obtained by analyzing memory access patterns). As a result of FPGA synthesis using Xilinx Virtex5LX330, it occupies 101,348 LUTs and 1,367 KB on-chip memory, giving performance of 30 frames per second at 100 MHz clock.

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

  • Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.149-154
    • /
    • 2004
  • In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.

Bus Splitting Techniques for MPSoC to Reduce Bus Energy (MPSoC 플랫폼의 버스 에너지 절감을 위한 버스 분할 기법)

  • Chung Chun-Mok;Kim Jin-Hyo;Kim Ji-Hong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.699-708
    • /
    • 2006
  • Bus splitting technique reduces bus energy by placing modules with frequent communications closely and using necessary bus segments in communications. But, previous bus splitting techniques can not be used in MPSoC platform, because it uses cache coherency protocol and all processors should be able to see the bus transactions. In this paper, we propose a bus splitting technique for MPSoC platform to reduce bus energy. The proposed technique divides a bus into several bus segments, some for private memory and others for shared memory. So, it minimizes the bus energy consumed in private memory accesses without producing cache coherency problem. We also propose a task allocation technique considering cache coherency protocol. It allocates tasks into processors according to the numbers of bus transactions and cache coherence protocol, and reduces the bus energy consumption during shared memory references. The experimental results from simulations say the bus splitting technique reduces maximal 83% of the bus energy consumption by private memory accesses. Also they show the task allocation technique reduces maximal 30% of bus energy consumed in shared memory references. We can expect the bus splitting technique and the task allocation technique can be used in multiprocessor platforms to reduce bus energy without interference with cache coherency protocol.

High Throughput Parallel Decoding Method for H.264/AVC CAVLC

  • Yeo, Dong-Hoon;Shin, Hyun-Chul
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.510-517
    • /
    • 2009
  • A high throughput parallel decoding method is developed for context-based adaptive variable length codes. In this paper, several new design ideas are devised and implemented for scalable parallel processing, a reduction in area, and a reduction in power requirements. First, simplified logical operations instead of memory lookups are used for parallel processing. Second, the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of the input stream can be analyzed simultaneously. For comparison, we designed a logical-operation-based parallel decoder for M=8 and a conventional parallel decoder. High-speed parallel decoding becomes possible with our method. In addition, for similar decoding rates (1.57 codes/cycle for M=8), our new approach uses 46% less chip area than the conventional method.

Development of new Multifunction Voltage Recorder (다기능 디지털 전압기록장치 시스템 개발)

  • Shon, Su-Goog;Choi, Sang-Joon
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.693-696
    • /
    • 1999
  • This paper describes a new voltage recorder for the voltage management of a power distribution line by using a new voltage measurement technique. The RMS(Root Mean Square) voltage measurement for the power line under the assumption of a sinusoidal input voltage is taken by the full-wave rectifier, half-adder utilizing operational amplifier(OP) circuit. A/D converter utilizing a dual slope converter converts an analog voltage signal into a serial pulse. The pulse is counted with a single chip micro-controller, converted with the RMS voltage, and saved into a flash memory. In the last, a new voltage recorder with compact size and multifunction is developed. Also, Voltage Management System that can analyze the stored data via RS-232C cable is developed based on Windows 95 and Visual C++.

  • PDF

An efficient LIN MCU design for In-Vehicle Networks

  • Yeon, Kyu-Bong;Chong, Jong-Wha
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.5
    • /
    • pp.451-458
    • /
    • 2013
  • This paper describes a design of LIN MCU using efficient memory accessing architecture which provides concurrent data and address fetch for faster communication. By using slew rate control it can reduce EMI emission while satisfying required communication specifications. To verify the efficiency of the LIN MCU, we developed a SoC and tested for several data packets. Measurements show that this LIN MCU improves network efficiency up to 17.19 % and response time up to 31.26 % for nominal cases. EMI radiation also can be reduced up to 10 dB.

CMOS Temperature Sensor with Ring Oscillator for Mobile DRAM Self-refresh Control (링 오실레이터를 가진 CMOS 온도 센서)

  • Kim, Chan-kyung;Lee, Jae-Goo;Kong, Bai-Sun;Jun, Young-Hyun
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.485-486
    • /
    • 2006
  • This paper proposes a novel low-cost CMOS temperature sensor for controlling the self-refresh period of a mobile DRAM. In this temperature sensor, ring oscillators composed of cascaded inverter stages are used to obtain the temperature of the chip. This method is highly area-efficient, simple and easy for IC implementation as compared to traditional temperature sensors based on analog bandgap reference circuits. The proposed CMOS temperature sensor was fabricated with 80 nm 3-metal DRAM process. It occupies a silicon area of only about less than $0.02\;mm^2$ at $10^{\circ}C$ resolution with under 5uW power consumption at 1 sample/s processing rate. This area is about 33% of conventional temperature sensor in mobile DRAM.

  • PDF