• Title/Summary/Keyword: Clock performance

Search Result 564, Processing Time 0.029 seconds

Hardware Design of VLIW coprocessor for Computer Vision Application (컴퓨터 비전 응용을 위한 VLIW 보조프로세서의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.9
    • /
    • pp.2189-2196
    • /
    • 2014
  • In this paper, a VLIW(Very Long Instruction Word) vision coprocessor which can efficiently accelerate computer vision algorithm for automotive is designed. The VLIW coprocessor executes four instructions per clock cycle via 8-stage pipelined structure and has 36 integer and floating-point instructions to accelerate computer vision algorithm for pedestrian detection. The processor has about 300-MHz operating frequency and about 210,900 gates under 45nm CMOS technology and its estimated performance is 1.2 GOPS(Giga Operations Per Second). The vision system composed of vision primitive engine and eight VLIW coprocessors can execute pedestrian detection at 25~29 frames per second(FPS). Because the VLIW coprocessor has high detection rate and loosely coupled interface with host processor, it can be efficiently applicable to a wide range of vision applications.

Neuropsychological Assessment of Adult Patients with Shunted Hydrocephalus

  • Bakar, Emel Erdogan;Bakar, Bulent
    • Journal of Korean Neurosurgical Society
    • /
    • v.47 no.3
    • /
    • pp.191-198
    • /
    • 2010
  • Objective : This study is planned to determine the neurocognitive difficulties of hydrocephalic adults. Methods : The research group contained healthy adults (control group, n : 15), and hydrocephalic adults (n : 15). Hydrocephalic group consisted of patients with idiopathic aquaduct stenosis and post-meningitis hydrocephalus. All patients were followed with shunted hydrocephalus and not gone to shunt revision during last two years. They were chosen from either asymptomatic or had only minor symptoms without motor and sensorineural deficit. A neuropsychological test battery (Raven Standart Progressive Matrices, Bender-Gestalt Test, Cancellation Test, Clock Drawing Test, Facial Recognition Test, Line Orientation Test, Serial Digit Learning Test, Stroop Color Word Interference Test-TBAG Form, Verbal Fluency Test, Verbal Fluency Test, Visual-Aural Digit Span Test-B) was applied to all groups. Results : Neuropsychological assessment of hydrocephalic patients demonstrated that they had poor performance on visual, semantic and working memory, visuoconstructive and frontal functions, reading, attention, motor coordination and executive function of parietal lobe which related with complex and perseverative behaviour. Eventually, these patients had significant impairment on the neurocognitive functions of their frontal, parietal and temporal lobes. On the other hand, the statistical analyses performed on demographic data showed that the aetiology of the hydrocephalus, age, sex and localization of the shunt (frontal or posterior parietal) did not affect the test results. Conclusion : This prospective study showed that adult patients with hydrocephalus have serious neuropsychological problems which might be directly caused by the hydrocephalus; and these problems may cause serious adaptive difficulties in their social, cultural, behavioral and academic life.

Recent Developments in High Resolution Delta-Sigma Converters

  • Kim, Jaedo;Roh, Jeongjin
    • Journal of Semiconductor Engineering
    • /
    • v.2 no.1
    • /
    • pp.109-118
    • /
    • 2021
  • This review paper describes the overall operating principle of a discrete-time delta-sigma modulator (DTDSM) and a continuous-time delta-sigma modulator (CTDSM) using a switched-capacitor (SC). In addition, research that has solved the problems related to each delta-sigma modulator (DSM) is introduced, and the latest developments are explained. This paper describes the chopper-stabilization technique that mitigates flicker noise, which is crucial for the DSM. In the case of DTDSM, this paper addresses the problems that arise when using SC circuits and explains the importance of the operational transconductance amplifier performance of the first integrator of the DSM. In the case of CTDSM, research that has reduced power consumption, and addresses the problems of clock jitter and excess loop delay is described. The recent developments of the analog front end, which have become important due to the increasing use of wireless sensors, is also described. In addition, this paper presents the advantages and disadvantages of the three-opamp instrumentation amplifier (IA), current feedback IA (CFIA), resistive feedback IA, and capacitively coupled IA (CCIA) methods for implementing instrumentation amplifiers in AFEs.

An efficient interconnection network topology in dual-link CC-NUMA systems (이중 연결 구조 CC-NUMA 시스템의 효율적인 상호 연결망 구성 기법)

  • Suh, Hyo-Joong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.49-56
    • /
    • 2004
  • The performance of the multiprocessor systems is limited by the several factors. The system performance is affected by the processor speed, memory delay, and interconnection network bandwidth/latency. By the evolution of semiconductor technology, off the shelf microprocessor speed breaks beyond GHz, and the processors can be scalable up to multiprocessor system by connecting through the interconnection networks. In this situation, the system performances are bound by the latencies and the bandwidth of the interconnection networks. SCI, Myrinet, and Gigabit Ethernet are widely adopted as a high-speed interconnection network links for the high performance cluster systems. Performance improvement of the interconnection network can be achieved by the bandwidth extension and the latency minimization. Speed up of the operation clock speed is a simple way to accomplish the bandwidth and latency betterment, while its physical distance makes the difficulties to attain the high frequency clock. Hence the system performance and scalability suffered from the interconnection network limitation. Duplicating the link of the interconnection network is one of the solutions to resolve the bottleneck of the scalable systems. Dual-ring SCI link structure is an example of the interconnection network improvement. In this paper, I propose a network topology and a transaction path algorism, which optimize the latency and the efficiency under the duplicated links. By the simulation results, the proposed structure shows 1.05 to 1.11 times better latency, and exhibits 1.42 to 2.1 times faster execution compared to the dual ring systems.

Design of Systolic Multipliers in GF(2$^{m}$ ) Using an Irreducible All One Polynomial (기약 All One Polynomial을 이용한 유한체 GF(2$^{m}$ )상의 시스톨릭 곱셈기 설계)

  • Gwon, Sun Hak;Kim, Chang Hun;Hong, Chun Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1047-1054
    • /
    • 2004
  • In this paper, we present two systolic arrays for computing multiplications in CF(2$\^$m/) generated by an irreducible all one polynomial (AOP). The proposed two systolic mays have parallel-in parallel-out structure. The first systolic multiplier has area complexity of O(㎡) and time complexity of O(1). In other words, the multiplier consists of m(m+1)/2 identical cells and produces multiplication results at a rate of one every 1 clock cycle, after an initial delay of m/2+1 cycles. Compared with the previously proposed related multiplier using AOP, our design has 12 percent reduced hardware complexity and 50 percent reduced computation delay time. The other systolic multiplier, designed for cryptographic applications, has area complexity of O(m) and time complexity of O(m), i.e., it is composed of m+1 identical cells and produces multiplication results at a rate of one every m/2+1 clock cycles. Compared with other linear systolic multipliers, we find that our design has at least 43 percent reduced hardware complexity, 83 percent reduced computation delay time, and has twice higher throughput rate Furthermore, since the proposed two architectures have a high regularity and modularity, they are well suited to VLSI implementations. Therefore, when the proposed architectures are used for GF(2$\^$m/) applications, one can achieve maximum throughput performance with least hardware requirements.

Design and Implementation for Portable Low-Power Embedded System (저전력 휴대용 임베디드 시스템 설계 및 구현)

  • Lee, Jung-Hwan;Kim, Myung-Jung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.7
    • /
    • pp.454-461
    • /
    • 2007
  • Portable embedded systems have recently become smaller in size and offer a variety of junctions for users. These systems require high performance processors to handle the many functions and also a small battery to fit inside the system. However, due to its size, the battery life has become a major issue. It is important to have both efficient power design and management for each function, while optimizing processor voltage and clock frequency in order to extend the battery life of the system. In this paper, we calculated the efficiency of power in optimizing power rail. This system has two microprocessors. One is used to play music and movie files while the other is for DMB. In order to reduce power consumption, the DMB microprocessor is turned of while music or videos are played. Lastly, DVFS is applied to the processor in the system to reduce power consumption. Experimental results of the implemented system have resulted in reduced power consumption.

Hardware Architecture of High Performance Cipher for Security of Digital Hologram (디지털 홀로그램의 보안을 위한 고성능 암호화기의 하드웨어 구조)

  • Seo, Young-Ho;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.374-387
    • /
    • 2012
  • In this paper, we implement a new hardware for finding the significant coefficients of a digital hologram and ciphering them using discrete wavelet packet transform (DWPT). Discrete wavelet transform (DWT) and packetization of subbands is used, and the adopted ciphering technique can encrypt the subbands with various robustness based on the level of the wavelet transform and the threshold of subband energy. The hologram encryption consists of two parts; the first is to process DWPT, and the second is to encrypt the coefficients. We propose a lifting based hardware architecture for fast DWPT and block ciphering system with multi-mode for the various types of encryption. The unit cell which calculates the repeated arithmetic with the same structure is proposed and then it is expanded to the lifting kernel hardware. The block ciphering system is configured with three block cipher, AES, SEED and 3DES and encrypt and decrypt data with minimal latency time(minimum 128 clocks, maximum 256 clock) in real time. The information of a digital hologram can be hided by encrypting 0.032% data of all. The implemented hardware used about 200K gates in $0.25{\mu}m$ CMOS library and was stably operated with 165MHz clock frequency in timing simulation.

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

A Novel Globally Asynchronous, Locally Dynamic System Bus Architecture Based on Multitasking Bus (다중처리가 가능한 새로운 Globally Asynchronous, Locally Dynamic System 버스 구조)

  • Choi, Chang-Won;Shin, Hyeon-Chul;Wee, Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.5
    • /
    • pp.71-81
    • /
    • 2008
  • In this paper, we propose a novel Globally Asynchronous, Locally Dynamic System(GALDS) bus and demonstrate its performance. The proposed GALDS bus is the bidirectional multitasking bus with the segmented bus architecture supporting the concurrent operation of multi-masters and multi-slaves. By analyzing system tasks, the bus architecture chooses the optimal frequency for each If among multiples of bus frequency and thus we can reduce the overall power consumption. For efficient data communications between IPs operating in different frequencies, we designed an asynchronous and bidirectional FIFO based on an asynchronous wrapper with hand-shaking interface. In addition, since systems can be easily expandable by inserting bus segments, the proposed architecture has advantages in IP reusability and structural flexibility As a test example, a four-segment bus haying four masters and four slaves were designed by using Verilog HDL. We demonstrate multitasking operations with read/write data transfers by simulation when the ratios of operation frequency are 1:1, 1:2, 1:4 and 1:8. The data transfer mode is a 16 burst increment mode compatible with Advanced Microcontroller Bus Architecture(AMBA). The maximum operation latency of the proposed GALDS bus is 22 clock cycles for the bus write operation, and 44 clock cycles for read.

Study of Cross Correlation Using DRS(Delayed Reference Sample) for Precision Time Measurement of Input Signal on Multilateration (다변측정감시시스템 신호 입력 시각 정밀 측정을 위한 DRS(Delayed Reference Sample)를 이용한 Cross Correlation 방안 연구)

  • Chang, Jae-Won;Lee, Sang Jeong
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.3
    • /
    • pp.244-250
    • /
    • 2018
  • Multilateration acquires the transponder signal of target from receivers installed on the ground and calculates the position of the target using the difference of the signal acquisition time of each receiver. One of the factors that influence the positioning accuracy of Multilateration using the TDOA calculation method is the error due to the precision measurement of signal input time. When measuring the signal input time at the receiver, the input signal is sampled using the reference clock of the receiver and a reference sample having the same sampling rate is applied to the cross correlation technique. Therefore, the accuracy of the signal input time is proportional to the reference clock. In this paper, the algorithm for precisely measuring the signal input time by performing cross correlation between the input signal of the receiver and DRS(Delayed Reference Sample) is proposed. In order to verify this, we implemented the pulse signal of the transponder that is transmitted from the target using Matlab. Through the simulation, cross correlation between the proposed DRS and the input signal was performed. From this result, the performance of the precise measurement of signal input time was analyzed.