• Title/Summary/Keyword: Parallel Image Processing

Search Result 343, Processing Time 0.028 seconds

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Development of rotational pulse-echo ultrasonic propagation imaging system capable of inspecting cylindrical specimens

  • Ahmed, Hasan;Lee, Young-Jun;Lee, Jung-Ryul
    • Smart Structures and Systems
    • /
    • v.26 no.5
    • /
    • pp.657-666
    • /
    • 2020
  • A rotational pulse-echo ultrasonic propagation imager that can inspect cylindrical specimens for material nondestructive evaluations is proposed herein. In this system, a laser-generated ultrasonic bulk wave is used for inspection, which enables a clear visualization of subsurface defects with a precise reproduction of the damage shape and size. The ultrasonic waves are generated by a Q-switched laser that impinges on the outer surface of the specimen walls. The generated waves travel through the walls and their echo is detected by a Laser Doppler Vibrometer (LDV) at the same point. To obtain the optimal Signal-to-Noise Ratio (SNR) of the measured signal, the LDV requires the sensed surface to be at a right angle to the laser beam and at a predefined constant standoff distance from the laser head. For flat specimens, these constraints can be easily satisfied by performing a raster scan using a dual-axis linear stage. However, this arrangement cannot be used for cylindrical specimens owing to their curved nature. To inspect the cylindrical specimens, a circular scan technology is newly proposed for pulse-echo laser ultrasound. A rotational stage is coupled with a single-axis linear stage to inspect the desired area of the specimen. This system arrangement ensures that the standoff distance and beam incidence angle are maintained while the cylindrical specimen is being inspected. This enables the inspection of a curved specimen while maintaining the optimal SNR. The measurement result is displayed in parallel with the on-going inspection. The inspection data used in scanning are mapped from rotational coordinates to linear coordinates for visualization and post-processing of results. A graphical user interface software is implemented in C++ using a QT framework and controls all the individual blocks of the system and implements the necessary image processing, scan calculations, data acquisition, signal processing and result visualization.

A Fast SAD Algorithm for Area-based Stereo Matching Methods (영역기반 스테레오 영상 정합을 위한 고속 SAD 알고리즘)

  • Lee, Woo-Young;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.2
    • /
    • pp.8-12
    • /
    • 2012
  • Area-based stereo matchng algorithms are widely used for image analysis for stereo vision. SAD (Sum of Absolute Difference) algorithm is one of well known area-based stereo matchng algorithms with the characteristics of data intensive computing application. Therefore, it requires very high computation capabilities and its processing speed becomes very slow with software realization. This paper proposes a fast SAD algorithm utilizing SSE (Streaming SIMD Extensions) instructions based on SIMD (Single Instruction Multiple Data) parallism. CPU supporing SSE instructions has 16 XMM registers with 128 bits. For the performance evaluation of the proposed scheme, we compare the processing speed between SAD with/without SSE instructions. The proposed scheme achieves four times performance improvement over the general SAD, which shows the possibility of the software realization of real time SAD algorithm.

Fine-scalable SPIHT Hardware Design for Frame Memory Compression in Video Codec

  • Kim, Sunwoong;Jang, Ji Hun;Lee, Hyuk-Jae;Rhee, Chae Eun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.446-457
    • /
    • 2017
  • In order to reduce the size of frame memory or bus bandwidth, frame memory compression (FMC) recompresses reconstructed or reference frames of video codecs. This paper proposes a novel FMC design based on discrete wavelet transform (DWT) - set partitioning in hierarchical trees (SPIHT), which supports fine-scalable throughput and is area-efficient. In the proposed design, multi-cores with small block sizes are used in parallel instead of a single core with a large block size. In addition, an appropriate pipelining schedule is proposed. Compared to the previous design, the proposed design achieves the processing speed which is closer to the target system speed, and therefore it is more efficient in hardware utilization. In addition, a scheme in which two passes of SPIHT are merged into one pass called merged refinement pass (MRP) is proposed. As the number of shifters decreases and the bit-width of remained shifters is reduced, the size of SPIHT hardware significantly decreases. The proposed FMC encoder and decoder designs achieve the throughputs of 4,448 and 4,000 Mpixels/s, respectively, and their gate counts are 76.5K and 107.8K. When the proposed design is applied to high efficiency video codec (HEVC), it achieves 1.96% lower average BDBR and 0.05 dB higher average BDPSNR than the previous FMC design.

Parallel $XY{\theta}$ Table Design and Implementation for Precision Positioning (고정밀 위치 제어용 병렬 $XY{\theta}$ 테이블 설계 및 구현)

  • Han, Joo-Hun;Oh, Choon-Suk;Ryu, Young-Kee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.7
    • /
    • pp.62-70
    • /
    • 1999
  • To achieve precision positioning, working area is required within $5mm{\times}5mm$ and positioning error is allowed within minimum ${\pm}4{\mu}m$. As a general three-layered table takes working range from several centimeters and a few tens of centimeters, it has disadvantages compared with precision positioning table, such as larger working range and rough accuracy. In this paper we design and implement a parallel $XY{\theta}$ table with three linear actuators, where one is on the horizontal direction and the others on the vertical direction on behalf of a degree of $XY{\theta}$ freedom. Finally, the experimental results of precision positioning is showed by using new image processing algorithms with two CCD cameras.

  • PDF

Evaluation of MR-SENSE Reconstruction by Filtering Effect and Spatial Resolution of the Sensitivity Map for the Simulation-Based Linear Coil Array (선형적 위상배열 코일구조의 시뮬레이션을 통한 민감도지도의 공간 해상도 및 필터링 변화에 따른 MR-SENSE 영상재구성 평가)

  • Lee, D.H.;Hong, C.P.;Han, B.S.;Kim, H.J.;Suh, J.J.;Kim, S.H.;Lee, C.H.;Lee, M.W.
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.3
    • /
    • pp.245-250
    • /
    • 2011
  • Parallel imaging technique can provide several advantages for a multitude of MRI applications. Especially, in SENSE technique, sensitivity maps were always required in order to determine the reconstruction matrix, therefore, a number of difference approaches using sensitivity information from coils have been demonstrated to improve of image quality. Moreover, many filtering methods were proposed such as adaptive matched filter and nonlinear diffusion technique to optimize the suppression of background noise and to improve of image quality. In this study, we performed SENSE reconstruction using computer simulations to confirm the most suitable method for the feasibility of filtering effect and according to changing order of polynomial fit that were applied on variation of spatial resolution of sensitivity map. The image was obtained at 0.32T(Magfinder II, Genpia, Korea) MRI system using spin-echo pulse sequence(TR/TE = 500/20 ms, FOV = 300 mm, matrix = $128{\times}128$, thickness = 8 mm). For the simulation, obtained image was multiplied with four linear-array coil sensitivities which were formed of 2D-gaussian distribution and the image was complex white gaussian noise was added. Image processing was separated to apply two methods which were polynomial fitting and filtering according to spatial resolution of sensitivity map and each coil image was subsampled corresponding to reduction factor(r-factor) of 2 and 4. The results were compared to mean value of geomety factor(g-factor) and artifact power(AP) according to r-factor 2 and 4. Our results were represented while changing of spatial resolution of sensitivity map and r-factor, polynomial fit methods were represented the better results compared with general filtering methods. Although our result had limitation of computer simulation study instead of applying to experiment and coil geometric array such as linear, our method may be useful for determination of optimal sensitivity map in a linear coil array.

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF

Double-Gauss Optical System Design with Fixed Magnification and Image Surface Independent of Object Distance (물체거리가 변하여도 배율과 상면이 고정되는 이중 가우스 광학계의 설계)

  • Ryu, Jae Myung;Ryu, Chang Ho;Kim, Kang Min;Kim, Byoung Young;Ju, Yun Jae;Jo, Jae Heung
    • Korean Journal of Optics and Photonics
    • /
    • v.29 no.1
    • /
    • pp.19-27
    • /
    • 2018
  • A change in object distance would generally change the magnification of an optical system. In this paper, we have proposed and designed a double-Gauss optical system with a fixed magnification and image surface regardless of any change in object distance, according to moving the lens groups a little bit to the front and rear of the stop, independently parallel to the direction of the optical axis. By maintaining a constant size of image formation in spite of various object-distance changes in a projection system such as a head-up display (HUD) or head-mounted display (HMD), we can prevent the field of view from changing while focusing in an HUD or HMD. Also, to check precisely the state of the wiring that connects semiconductor chips and IC circuit boards, we can keep the magnification of the optical system constant, even when the object distance changes due to vertical movement along the optical axis of a testing device. Additionally, if we use this double-Gauss optical system as a vision system in the testing process of lots of electronic boards in a manufacturing system, since we can systematically eliminate additional image processing for visual enhancement of image quality, we can dramatically reduce the testing time for a fast test process. Also, the Gaussian bracket method was used to find the moving distance of each group, to achieve the desired specifications and fix magnification and image surface simultaneously. After the initial design, the optimization of the optical system was performed using the Synopsys optical design software.

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.