Search | Korea Science

High-Performance Low-Power FFT Cores

Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
- ETRI Journal
- /
- v.30 no.3
- /
- pp.451-460
- /
- 2008
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.
PDF

A study on the genetic algorithms for the scheduling of parallel computation (병렬계산의 스케쥴링에 있어서 유전자알고리즘에 관한 연구)

성기석;박지혁
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 1997.10a
- /
- pp.166-169
- /
- 1997
For parallel processing, the compiler partitions a loaded program into a set of tasks and makes a schedule for the tasks that will minimize parallel processing time for the loaded program. Building an optimal schedule for a given set of partitioned tasks of a program has known to be NP-complete. In this paper we introduce a GA(Genetic Algorithm)-based scheduling method in which a chromosome consists of two parts of a string which decide the number and order of tasks on each processor. An additional computation is used for feasibility constraint in the chromosome. By granularity theory, a partitioned program is categorized into coarse-grain or fine-grain types. There exist good heuristic algorithms for coarse-grain type partitioning. We suggested another GA adaptive to the coarse-grain type partitioning. The infeasibility of chromosome is overcome by the encoding and operators. The number of processors are decided while the GA find the minimum parallel processing time.
PDF

A two-level parallel algorithm for material nonlinearity problems

Lee, Jeeho;Kim, Min Seok
- Structural Engineering and Mechanics
- /
- v.38 no.4
- /
- pp.405-416
- /
- 2011
An efficient two-level domain decomposition parallel algorithm is suggested to solve large-DOF structural problems with nonlinear material models generating unsymmetric tangent matrices, such as a group of plastic-damage material models. The parallel version of the stabilized bi-conjugate gradient method is developed to solve unsymmetric coarse problems iteratively. In the present approach the coarse DOF system is solved parallelly on each processor rather than the whole system equation to minimize the data communication between processors, which is appropriate to maintain the computing performance on a non-supercomputer level cluster system. The performance test results show that the suggested algorithm provides scalability on computing performance and an efficient approach to solve large-DOF nonlinear structural problems on a cluster system.
https://doi.org/10.12989/sem.2011.38.4.405 인용 KSCI

Vibration and precision position control of dual actuators with parallel type piezoactuator (이단 압전 구동기를 가진 이중 구동기의 진동 및 정밀위치제어)

Lee, Yong-Gwon;Cho, Won-Ik;Yang, Hyun-Suk;Park, Young-Pil
- Proceedings of the KSME Conference
- /
- 2000.04a
- /
- pp.475-480
- /
- 2000
A new positioning mechanism with Parallel type actuator using piezoelectric material and with dual type actuators using voice coil motor (VCM) and piezoactuator is proposed for optical disk drive or near-field recording type drive, and high speed position and vibration control are investigated. Parallel type bimorph piezoactuator is used as a fine motion actuator with self-sensing technique, which allows a piezoelectric material to concurrently sense and actuate in a closed loop frame work, and positive position feedback control algorithm is adopted to further control residual vibration. For positioning control of VCM, PID control algorithm is adopted.
PDF

Cellular Parallel Processing Networks-based Dynamic Programming Design and Fast Road Boundary Detection for Autonomous Vehicle (셀룰라 병렬처리 회로망에 의한 동적계획법 설계와 자율주행 자동차를 위한 도로 윤곽 검출)

홍승완;김형석
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.53 no.7
- /
- pp.465-472
- /
- 2004
Analog CPPN-based optimal road boundary detection algorithm for autonomous vehicle is proposed. The CPPN is a massively connected analog parallel array processor. In the paper, the dynamic programming which is an efficient algorithm to find the optimal path is implemented with the CPPN algorithm. If the image of road-boundary information is utilized as an inter-cell distance, and goals and start lines are positioned at the top and the bottom of the image, respectively, the optimal path finding algorithm can be exploited for optimal road boundary detection. By virtue of the parallel and analog processing of the CPPN and the optimal solution of the dynamic programming, the proposed road boundary detection algorithm is expected to have very high speed and robust processing if it is implemented into circuits. The proposed road boundary algorithm is described and simulation results are reported.
PDF KSCI

Development of Parallel Eigenvalue Solution Algorithm with Substructuring Techniques (부구조기법을 이용한 병렬 고유치해석 알고리즘 개발)

김재홍;성창원;박효선
- Proceedings of the Computational Structural Engineering Institute Conference
- /
- 1999.10a
- /
- pp.411-420
- /
- 1999
The computational model and a new eigenvalue solution algorithm for large-scale structures is presented in the form of parallel computation. The computational loads and data storages required during the solution process are drastically reduced by evenly distributing computational loads to each processor. As the parallel computational model, multiple personal computers are connected by 10Mbits per second Ethernet card. In this study substructuring techniques and static condensation method are adopted for modeling a large-scale structure. To reduce the size of an eigenvalue problem the interface degrees of freedom and one lateral degree of freedom are selected as the master degrees of freedom in each substructure. The performance of the proposed parallel algorithm is demonstrated by applying the algorithm to dynamic analysis of two-dimensional structures.
PDF

Parallel Computation Algorithm of Gauss Elimination in Power system Analysis (전력계통해석을 위한 자코비안행렬 가우스소거의병렬계산 알고리즘)

서의석;오태규
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.43 no.2
- /
- pp.189-196
- /
- 1994
This paper describes a parallel computing algorithm in Gauss elimination of Jacobian matrix to large-scale power system. The structure of Jacobian matrix becomes different according to ordering method of buses. In sequential computation buses are ordered to minimize the number of fill-in in the triangulation of the Jacobian matrix. The proposed method develops the parallelism in the Gauss elimination by using ND(nested dissection) ordering. In this procedure the level structure of the power system network is transformed to be long and narrow by using end buses which results in balance of computing load among processes and maximization of parallel computation. Each processor uses the sequential computation method to preserve the sqarsity of matrix.
PDF

Performance Enhancement of Parallel Prime Sieving with Hybrid Programming and Pipeline Scheduling (혼합형 병렬처리 및 파이프라이닝을 활용한 소수 연산 알고리즘)

Ryu, Seung-yo;Kim, Dongseung
- KIPS Transactions on Computer and Communication Systems
- /
- v.4 no.10
- /
- pp.337-342
- /
- 2015
We develop a new parallelization method for Sieve of Eratosthenes algorithm, which enhances both computation speed and energy efficiency. A pipeline scheduling is included for better load balancing after proper workload partitioning. They run on multicore CPUs with hybrid parallel programming model which uses both message passing and multithreading computation. Experimental results performed on both small scale clusters and a PC with a mobile processor show significant improvement in execution time and energy consumptions.
https://doi.org/10.3745/KTCCS.2015.4.10.337 인용 PDF KSCI

Design of High-speed Digit Serial-Parallel Multiplier in Finite Field GF($2^m$) (Finite Field GF($2^m$)상의 Digit Serial-Parallel Multiplier 구현)

Choi, Won-Ho;Hong, Sung-Pyo
- Proceedings of the KIEE Conference
- /
- 2003.11c
- /
- pp.928-931
- /
- 2003
This paper presents a digit-serial/parallel multiplier for finite fields GF(2m). The hardware requirements of the implemented multiplier are less than those of the existing multiplier of the same class, while processing time and area complexity. The implemented multiplier possesses the features of regularity and modularity. Thus, it is well suited to VLSI implementation. If the implemented digit-serial multiplier chooses the digit size D appropriately, it can meet the throughput requirement of a certain application with minimum hardware. The multipliers and squarers analyzed in this paper can be used efficiently for crypto processor in Elliptic Curve Cryptosystem.
PDF

David II: A new architecture for parallel rendering processors with effective memory system (David II: 효과적인 메모리 시스템을 가지는 병렬 렌더링 프로세서)

Lee, Kil-Whan;Park, Woo-Chan;Kim, Il-San;Han, Tack-Don
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.1655-1658
- /
- 2004
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture, called DAVID II, resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that DAVID II achieves almost linear speedup at best case even in sixteen rasterizers.
PDF

Search Result 482, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)