• Title/Summary/Keyword: computation-intensive

Search Result 107, Processing Time 0.026 seconds

Fast Motion Estimation for Variable Motion Block Size in H.264 Standard (H.264 표준의 가변 움직임 블록을 위한 고속 움직임 탐색 기법)

  • 최웅일;전병우
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.209-220
    • /
    • 2004
  • The main feature of H.264 standard against conventional video standards is the high coding efficiency and the network friendliness. In spite of these outstanding features, it is not easy to implement H.264 codec as a real-time system due to its high requirement of memory bandwidth and intensive computation. Although the variable block size motion compensation using multiple reference frames is one of the key coding tools to bring about its main performance gain, it demands substantial computational complexity due to SAD (Sum of Absolute Difference) calculation among all possible combinations of coding modes to find the best motion vector. For speedup of motion estimation process, therefore, this paper proposes fast algorithms for both integer-pel and fractional-pel motion search. Since many conventional fast integer-pel motion estimation algorithms are not suitable for H.264 having variable motion block sizes, we propose the motion field adaptive search using the hierarchical block structure based on the diamond search applicable to variable motion block sizes. Besides, we also propose fast fractional-pel motion search using small diamond search centered by predictive motion vector based on statistical characteristic of motion vector.

Fixed-Point Modeling and Performance Analysis of a SIFT Keypoints Localization Algorithm for SoC Hardware Design (SoC 하드웨어 설계를 위한 SIFT 특징점 위치 결정 알고리즘의 고정 소수점 모델링 및 성능 분석)

  • Park, Chan-Ill;Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.49-59
    • /
    • 2008
  • SIFT(Scale Invariant Feature Transform) is an algorithm to extract vectors at pixels around keypoints, in which the pixel colors are very different from neighbors, such as vortices and edges of an object. The SIFT algorithm is being actively researched for various image processing applications including 3-D image constructions, and its most computation-intensive stage is a keypoint localization. In this paper, we develope a fixed-point model of the keypoint localization and propose its efficient hardware architecture for embedded applications. The bit-length of key variables are determined based on two performance measures: localization accuracy and error rate. Comparing with the original algorithm (implemented in Matlab), the accuracy and error rate of the proposed fixed point model are 93.57% and 2.72% respectively. In addition, we found that most of missing keypoints appeared at the edges of an object which are not very important in the case of keypoints matching. We estimate that the hardware implementation will give processing speed of $10{\sim}15\;frame/sec$, while its fixed point implementation on Pentium Core2Duo (2.13 GHz) and ARM9 (400 MHz) takes 10 seconds and one hour each to process a frame.

Automatic Face and Eyes Detection: A Scale and Rotation Invariant Approach based on Log-Polar Mapping (Log-Polar 사상의 크기와 회전 불변 특성을 이용한 얼굴과 눈 검출)

  • Choi, Il;Chien, Sung-Il
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.8
    • /
    • pp.88-100
    • /
    • 1999
  • Detecting human face and facial landmarks automatically in an image is as essential step to a fully automatic face recognition system. In this paper, we present a new approach to detect automatically face and its eyes of input image with scale and rotation variations of faces by using an intensity based template matching with a single log-polar face template. In a template-based matching it is necessary to normalize the scale changes and rotations of an input image to a template ones. The log-polar mapping which simulates space-variant human visual system converts scale changes and rotations of input image into constant horizontal and cyclic vertical shifts in the output plane. Intelligent use of this property allows us to shift of the candidate log-polar faces mapped at various fixation points of an input image to be matched to a template over the log-polar plane. Thus, the proposed method eliminates the need of adapting multitemplate and multiresolution schemes, which inevitably give rise to intensive computation involved to cope with scale and rotation variations of faces. Through this scale and rotation involved to cope with scale and method can lead to detecting face and its eyes simultaneously. Experimental results on a database of 795 images show over 98% detection rate.

  • PDF

Design of a Low Power Reconfigurable DSP with Fine-Grained Clock Gating (정교한 클럭 게이팅을 이용한 저전력 재구성 가능한 DSP 설계)

  • Jung, Chan-Min;Lee, Young-Geun;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.82-92
    • /
    • 2008
  • Recently, many digital signal processing(DSP) applications such as H.264, CDMA and MP3 are predominant tasks for modern high-performance portable devices. These applications are generally computation-intensive, and therefore, require quite complicated accelerator units to improve performance. Designing such specialized, yet fixed DSP accelerators takes lots of effort. Therefore, DSPs with multiple accelerators often have a very poor time-to-market and an unacceptable area overhead. To avoid such long time-to-market and high-area overhead, dynamically reconfigurable DSP architectures have attracted a lot of attention lately. Dynamically reconfigurable DSPs typically employ a multi-functional DSP accelerator which executes similar, yet different multiple kinds of computations for DSP applications. With this type of dynamically reconfigurable DSP accelerators, the time to market reduces significantly. However, integrating multiple functionalities into a single IP often results in excessive control and area overhead. Therefore, delay and power consumption often turn out to be quite excessive. In this thesis, to reduce power consumption of dynamically reconfigurable IPs, we propose a novel fine-grained clock gating scheme, and to reduce size of dynamically reconfigurable IPs, we propose a compact multiplier-less multiplication unit where shifters and adders carry out constant multiplications.

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Adaptive Data Hiding Techniques for Secure Communication of Images (영상 보안통신을 위한 적응적인 데이터 은닉 기술)

  • 서영호;김수민;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5C
    • /
    • pp.664-672
    • /
    • 2004
  • Widespread popularity of wireless data communication devices, coupled with the availability of higher bandwidths, has led to an increased user demand for content-rich media such as images and videos. Since such content often tends to be private, sensitive, or paid for, there exists a requirement for securing such communication. However, solutions that rely only on traditional compute-intensive security mechanisms are unsuitable for resource-constrained wireless and embedded devices. In this paper, we propose a selective partial image encryption scheme for image data hiding , which enables highly efficient secure communication of image data to and from resource constrained wireless devices. The encryption scheme is invoked during the image compression process, with the encryption being performed between the quantizer and the entropy coder stages. Three data selection schemes are proposed: subband selection, data bit selection and random selection. We show that these schemes make secure communication of images feasible for constrained embed-ded devices. In addition we demonstrate how these schemes can be dynamically configured to trade-off the amount of ded devices. In addition we demonstrate how these schemes can be dynamically configured to trade-off the amount of data hiding achieved with the computation requirements imposed on the wireless devices. Experiments conducted on over 500 test images reveal that, by using our techniques, the fraction of data to be encrypted with our scheme varies between 0.0244% and 0.39% of the original image size. The peak signal to noise ratios (PSNR) of the encrypted image were observed to vary between about 9.5㏈ to 7.5㏈. In addition, visual test indicate that our schemes are capable of providing a high degree of data hiding with much lower computational costs.

An Iterative, Interactive and Unified Seismic Velocity Analysis (반복적 대화식 통합 탄성파 속도분석)

  • Suh Sayng-Yong;Chung Bu-Heung;Jang Seong-Hyung
    • Geophysics and Geophysical Exploration
    • /
    • v.2 no.1
    • /
    • pp.26-32
    • /
    • 1999
  • Among the various seismic data processing sequences, the velocity analysis is the most time consuming and man-hour intensive processing steps. For the production seismic data processing, a good velocity analysis tool as well as the high performance computer is required. The tool must give fast and accurate velocity analysis. There are two different approches in the velocity analysis, batch and interactive. In the batch processing, a velocity plot is made at every analysis point. Generally, the plot consisted of a semblance contour, super gather, and a stack pannel. The interpreter chooses the velocity function by analyzing the velocity plot. The technique is highly dependent on the interpreters skill and requires human efforts. As the high speed graphic workstations are becoming more popular, various interactive velocity analysis programs are developed. Although, the programs enabled faster picking of the velocity nodes using mouse, the main improvement of these programs is simply the replacement of the paper plot by the graphic screen. The velocity spectrum is highly sensitive to the presence of the noise, especially the coherent noise often found in the shallow region of the marine seismic data. For the accurate velocity analysis, these noise must be removed before the spectrum is computed. Also, the velocity analysis must be carried out by carefully choosing the location of the analysis point and accuarate computation of the spectrum. The analyzed velocity function must be verified by the mute and stack, and the sequence must be repeated most time. Therefore an iterative, interactive, and unified velocity analysis tool is highly required. An interactive velocity analysis program, xva(X-Window based Velocity Analysis) was invented. The program handles all processes required in the velocity analysis such as composing the super gather, computing the velocity spectrum, NMO correction, mute, and stack. Most of the parameter changes give the final stack via a few mouse clicks thereby enabling the iterative and interactive processing. A simple trace indexing scheme is introduced and a program to nike the index of the Geobit seismic disk file was invented. The index is used to reference the original input, i.e., CDP sort, directly A transformation techinique of the mute function between the T-X domain and NMOC domain is introduced and adopted to the program. The result of the transform is simliar to the remove-NMO technique in suppressing the shallow noise such as direct wave and refracted wave. However, it has two improvements, i.e., no interpolation error and very high speed computing time. By the introduction of the technique, the mute times can be easily designed from the NMOC domain and applied to the super gather in the T-X domain, thereby producing more accurate velocity spectrum interactively. The xva program consists of 28 files, 12,029 lines, 34,990 words and 304,073 characters. The program references Geobit utility libraries and can be installed under Geobit preinstalled environment. The program runs on X-Window/Motif environment. The program menu is designed according to the Motif style guide. A brief usage of the program has been discussed. The program allows fast and accurate seismic velocity analysis, which is necessary computing the AVO (Amplitude Versus Offset) based DHI (Direct Hydrocarn Indicator), and making the high quality seismic sections.

  • PDF