• Title/Summary/Keyword: Processing-in-Memory

Search Result 1,846, Processing Time 0.034 seconds

An Efficient Array Algorithm for VLSI Implementation of Vector-radix 2-D Fast Discrete Cosine Transform (Vector-radix 2차원 고속 DCT의 VLSI 구현을 위한 효율적인 어레이 알고리듬)

  • 신경욱;전흥우;강용섬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1970-1982
    • /
    • 1993
  • This paper describes an efficient array algorithm for parallel computation of vector-radix two-dimensional (2-D) fast discrete cosine transform (VR-FCT), and its VLSI implementation. By mapping the 2-D VR-FCT onto a 2-D array of processing elements (PEs), the butterfly structure of the VR-FCT can be efficiently importanted with high concurrency and local communication geometry. The proposed array algorithm features architectural modularity, regularity and locality, so that it is very suitable for VLSI realization. Also, no transposition memory is required, which is invitable in the conventional row-column decomposition approach. It has the time complexity of O(N+Nnzp-log2N) for (N*N) 2-D DCT, where Nnzd is the number of non-zero digits in canonic-signed digit(CSD) code, By adopting the CSD arithmetic in circuit desine, the number of addition is reduced by about 30%, as compared to the 2`s complement arithmetic. The computational accuracy analysis for finite wordlength processing is presented. From simulation result, it is estimated that (8*8) 2-D DCT (with Nnzp=4) can be computed in about 0.88 sec at 50 MHz clock frequency, resulting in the throughput rate of about 72 Mega pixels per second.

  • PDF

Improvement of Residual Delay Compensation Algorithm of KJJVC (한일상관기의 잔차 지연 보정 알고리즘의 개선)

  • Oh, Se-Jin;Yeom, Jae-Hwan;Roh, Duk-Gyoo;Oh, Chung-Sik;Jung, Jin-Seung;Chung, Dong-Kyu;Oyama, Tomoaki;Kawaguchi, Noriyuki;Kobayashi, Hideyuki;Kawakami, Kazuyuki;Ozeki, Kensuke;Onuki, Hirohumi
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.2
    • /
    • pp.136-146
    • /
    • 2013
  • In this paper, the residual delay compensation algorithm is proposed for FX-type KJJVC. In case of initial version as that design algorithm of KJJVC, the integer calculation and the cos/sin table for the phase compensation coefficient were introduced in order to speed up of calculation. The mismatch between data timing and residual delay phase and also between bit-jump and residual delay phase were found and fixed. In final design of KJJVC residual delay compensation algorithm, the initialization problem on the rotation memory of residual delay compensation was found when the residual delay compensated value was applied to FFT-segment, and this problem is also fixed by modifying the FPGA code. Using the proposed residual delay compensation algorithm, the band shape of cross power spectrum becomes flat, which means there is no significant loss over the whole bandwidth. To verify the effectiveness of proposed residual delay compensation algorithm, we conducted the correlation experiments for real observation data using the simulator and KJJVC. We confirmed that the designed residual delay compensation algorithm is well applied in KJJVC, and the signal to noise ratio increases by about 8%.

Deinterlacing Method for improving Motion Estimator based on multi arithmetic Architecture (다중연산구조기반의 고밀도 성능향상을 위한 움직임추정의 디인터레이싱 방법)

  • Lee, Kang-Whan
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.1
    • /
    • pp.49-55
    • /
    • 2007
  • To improved the multi-resolution fast hierarchical motion estimation by using de-interlacing algorithm that is effective in term of both performance and VLSI implementation, is proposed so as to cover large search area field-based as well as frame based image processing in SoC design. In this paper, we have simulated a various picture mode M=2 or M=3. As a results, the proposed algorithm achieved the motion estimation performance PSNR compare with the full search block matching algorithm, the average performance degradation reached to -0.7dB, which did not affect on the subjective quality of reconstructed images at all. And acquiring the more desirable to adopt design SoC for the fast hierarchical motion estimation, we exploit foreground and background search algorithm (FBSA) base on the dual arithmetic processor element(DAPE). It is possible to estimate the large search area motion displacement using a half of number PE in general operation methods. And the proposed architecture of MHME improve the VLSI design hardware through the proposed FBSA structure with DAPE to remove the local memory. The proposed FBSA which use bit array processing in search area can improve structure as like multiple processor array unit(MPAU).

Implementation of a TCP/IP Offload Engine Using Lightweight TCP/IP on an Embedded System (임베디드 시스템상에서 Lightweight TCP/IP를 이용한 TCP/IP Offload Engine의 구현)

  • Yoon In-Su;Chung Sang-Hwa;Choi Bong-Sik;Jun Yong-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.7
    • /
    • pp.413-420
    • /
    • 2006
  • The speed of present-day network technology exceeds a gigabit and is developing rapidly. When using TCP/IP in these high-speed networks, a high load is incurred in processing TCP/IP protocol in a host CPU. To solve this problem, research has been carried out into TCP/IP Offload Engine (TOE). The TOE processes TCP/IP on a network adapter instead of using a host CPU; this reduces the processing burden on the host CPU. In this paper, we developed two software-based TOEs. One is the TOE implementation using an embedded Linux. The other is the TOE implementation using Lightweight TCP/IP (lwIP). The TOE using an embedded Linux did not have the bandwidth more than 62Mbps. To overcome the poor performance of the TOE using an embedded Linux, we ported the lwIP to the embedded system and enhanced the lwIP for the high performance. We eliminated the memory copy overhead of the lwIP. We added a delayed ACK and a TCP Segmentation Offload (TSO) features to the lwIP and modified the default parameters of the lwIP for large data transfer. With the aid of these modifications, the TOE using the modified lwIP shows a bandwidth of 194 Mbps.

Work Hours and Cognitive Function: The Multi-Ethnic Study of Atherosclerosis

  • Charles, Luenda E.;Fekedulegn, Desta;Burchfiel, Cecil M.;Fujishiro, Kaori;Hazzouri, Adina Zeki Al;Fitzpatrick, Annette L.;Rapp, Stephen R.
    • Safety and Health at Work
    • /
    • v.11 no.2
    • /
    • pp.178-186
    • /
    • 2020
  • Background: Cognitive impairment is a public health burden. Our objective was to investigate associations between work hours and cognitive function. Methods: Multi-Ethnic Study of Atherosclerosis (MESA) participants (n = 2,497; 50.7% men; age range 44-84 years) reported hours per week worked in all jobs in Exams 1 (2000-2002), 2 (2002-2004), 3 (2004-2005), and 5 (2010-2011). Cognitive function was assessed (Exam 5) using the Cognitive Abilities Screening Instrument (version 2), a measure of global cognitive functioning; the Digit Symbol Coding, a measure of processing speed; and the Digit Span test, a measure of attention and working memory. We used a prospective approach and linear regression to assess associations for every 10 hours of work. Results: Among all participants, associations of hours worked with cognitive function of any type were not statistically significant. In occupation-stratified analyses (interaction p = 0.051), longer work hours were associated with poorer global cognitive function among Sales/Office and blue-collar workers, after adjustment for age, sex, physical activity, body mass index, race/ethnicity, educational level, annual income, history of heart attack, diabetes, apolipoprotein E-epsilon 4 allele (ApoE4) status, birth-place, number of years in the United States, language spoken at MESA Exam 1, and work hours at Exam 5 (β = -0.55, 95% CI = -0.99, -0.09) and (β = -0.80, -1.51, -0.09), respectively. In occupation-stratified analyses (interaction p = 0.040), we also observed an inverse association with processing speed among blue-collar workers (adjusted β = -0.80, -1.52, -0.07). Sex, race/ethnicity, and ApoE4 did not significantly modify associations between work hours and cognitive function. Conclusion: Weak inverse associations were observed between work hours and cognitive function among Sales/Office and blue-collar workers.

Hardware Design of High Performance HEVC Deblocking Filter for UHD Videos (UHD 영상을 위한 고성능 HEVC 디블록킹 필터 설계)

  • Park, Jaeha;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.178-184
    • /
    • 2015
  • This paper proposes a hardware architecture for high performance Deblocking filter(DBF) in High Efficiency Video Coding for UHD(Ultra High Definition) videos. This proposed hardware architecture which has less processing time has a 4-stage pipelined architecture with two filters and parallel boundary strength module. Also, the proposed filter can be used in low-voltage design by using clock gating architecture in 4-stage pipeline. The segmented memory architecture solves the hazard issue that arises when single port SRAM is accessed. The proposed order of filtering shortens the delay time that arises when storing data into the single port SRAM at the pre-processing stage. The DBF hardware proposed in this paper was designed with Verilog HDL, and was implemented with 22k logic gates as a result of synthesis using TSMC 0.18um CMOS standard cell library. Furthermore, the dynamic frequency can process UHD 8k($7680{\times}4320$) samples@60fps using a frequency of 150MHz with an 8K resolution and maximum dynamic frequency is 285MHz. Result from analysis shows that the proposed DBF hardware architecture operation cycle for one process coding unit has improved by 32% over the previous one.

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

Development of a High-Performance Vehicle Imaging Information System for an Efficient Vehicle Imaging Stabilization (효율적인 차량 영상 안정화를 위한 고성능 차량 영상 정보 시스템 개발)

  • Hong, Sung-Il;Lin, Chi-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.6
    • /
    • pp.78-86
    • /
    • 2013
  • In this paper, we propose design of a high-performance vehicle imaging information system for an efficient vehicle imaging stabilization. The proposed system was designed the algorithm by divided as motion estimation and motion compensation. The motion estimation were configured as local motion vector estimation and irregular local motion vector detection, global motion vector estimation. The motion compensation was corrected for the four directions for compensate to the shake of vehicle video image using estimate GMV. The designed algorithm were designed the motion compensation technology chip by applied to IP for vehicle imaging stabilization. In this paper, the experimental results of the proposed vehicle imaging information system were proved to the effectiveness by compared with other methods, because imaging stabilization of moving vehicle was not used of memory by processing real-time. Also, it could be obtained to reduction effect of calculation time by arithmetic operation through to block matching.

Massive Parallel Processing Algorithm for Semiconductor Process Simulation (반도체 공정 시뮬레이션을 위한 초고속 병렬 연산 알고리즘)

  • 이제희;반용찬;원태영
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.36D no.3
    • /
    • pp.48-58
    • /
    • 1999
  • In this paper, a new parallel computation method, which fully utilize the parallel processors both in mesh generation and FEM calculation for 2D/3D process simulation, is presented. High performance parallel FEM and parallel linear algebra solving technique was showed that excessive computational requirement of memory size and CPU time for the three-dimensional simulation could be treated successively. Our parallelized numerical solver successfully interpreted the transient enhanced diffusion (TED) phenomena of dopant diffusion and irregular shape of R-LOCOS within 15 minutes. Monte Carlo technique requires excessive computational requirement of CPU time. Therefore high performance parallel solving technique were employed to our cascade sputter simulation. The simulation results of Our sputter simulator allowed the calculation time of 520 sec and speedup of 25 using 30 processors. We found the optimized number of ion injection of our MC sputter simulation is 30,000.

  • PDF

Development of Computer Based Ultrasonic Flaw Detector for Nondestructive Testing (컴퓨터 내장형 비파괴검사용 초음파탐상기 개발)

  • Lee, Weon-Heum;Kim, J.K.;Kim, Y.R.;Choi, K.S.;Kim, S.H.;Lee, S.H.
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.17 no.2
    • /
    • pp.108-113
    • /
    • 1997
  • Ultrasonic testing is one of the most widely used method of nondestructive testing for pre-service inspection(PSI) & in-service inspection(ISI) in structures of bridges, power plants, chemical plants & heavy industrial fields. It is very important to estimate safety, life, quality of structures. Also, a lot of research for quantities evaluation & analyses inspection data is proceeding. But traditional portable ultrasonic flaw detector had been following disadvantages. 1) Analog ultrasonic flaw detector decreased credibility of ultrasonic test, because it is impossible for saving data & digital signal processing. 2) Stand-alone digital ultrasonic flaw detector cannot effectively evaluate received signals because of lack of its storage memory. To overcome this shortcoming, we develop the computer based ultrasonic flaw detector for nondestructive testing. It can store the received signal and effectively evaluate the signal, and then enhance the reliability of the testing results.

  • PDF