• Title/Summary/Keyword: Stream processor

Search Result 76, Processing Time 0.023 seconds

An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor (Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.18 no.3
    • /
    • pp.392-397
    • /
    • 2014
  • In this paper, we propose a memory operation system architecture for memory latency penalty reduction in SIMT architecture based stream processor. The proposed architecture applied non-blocking cache architecture to reduce cache miss penalty generated by blocking cache architecture. We verified that the proposed memory operation architecture improve the performance of the stream processor by comparing processing performances of various algorithms. We measured the performance improvement rate that was improved in accordance with the ratio of memory instruction in each algorithm. As a result, we confirmed that the performance of stream processor improves up to minimum 8.2% and maximum 46.5%.

A Design of a High Performance Stream Processor without Superscalar Architecture (슈퍼스칼라 구조를 갖지 않는 고성능 Stream Processor 설계)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.77-80
    • /
    • 2017
  • In this paper, we proposed a way to improve performance of GP-GPU by deletion of superscalar issue from its original form. At first, we simplified the structure of stream processor in order to eliminate superscalar issue. Under this condition, preservation of hardware size and increasing of thread number were followed by functional improvement of GP-GPU. As the number of thread was getting larger, we proposed the new model of warp scheduler which adjusts the group of thread. This superscalar issue-deleted warp scheduler transferred the instructions to warp which was activated by Round Robin Scheduling. Performance comparison was conducted by Gaussian filtering and the results indicated that our newly designed GP-GPU showing 7.89 times better in its performance than original one.

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

The Design of a Multiplexer for Multiview Image Processing

  • Kim, Do-Kyun;Lee, Yong-Joo;Koo, Gun-Seo;Lee, Yong-Surk
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.682-685
    • /
    • 2002
  • In this paper, we defined necessary operations and functional blocks of a multiplexer for 3-D video systems and present our multiplexer design. We adopted the ITU-T's recommendation(H.222.0) to define the operations and functions of the multiplexer and explained the data structures and details of the design for multiview image processing. The data structure of TS(Transport Stream) and PES (Packetized Elementary Stream) in ITU-T Recommendation H.222.0 does not fit our multiview image processing system, because this recommendation is fur wide scope of transmission of non-telephone signals. Therefore, we modified these TS and PES stream structures. The TS is modified to DSS(3D System Stream) and PES is modified to SPDU(DSS Program Data Unit). We constructed the multiplexer through these modified DSS and SPDU. The number of multiview image channels is nine, and the image class employed is MPEG-2 SD(Standard Definition) level which requires a bandwidth of 2∼6 Mbps. The required clock speed should be faster than 54(= 6 ${\times}$ 9)㎒ which is the outer interface clock speed. The inside part of the multiplexer requires a clock speed of only 1/8 of 54㎒, since the inside part of the multiplexer operates by the unit of byte. we used ALTERA Quartus II and the FPGA verification for the simulation.

  • PDF

DVB-T PSI(Program Specific Information) Parser using Design of Ali M3330 MPEG-2 decoder processor (ALi M3330 MPEG-2 디코더 프로세서를 이용한 DVB-T PSI(Program Specific Information) 해석기 설계)

  • Jun, Do-Young;Kim, Min-Sung;Kim, Su-Hyun;You, Hong-Yean;Hong, Sung-Hoon
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.278-280
    • /
    • 2007
  • In this paper, wd design the Program Specific Information (PSI) parser and its On-Screen Display (OSD) on the middleware of ALi M3330 MPEG-2 decoder processor to analyze DVB-T Transport Stream(TS) information. To test the functional operation of the designed parser, we implement the DVB-T test board including the RF-tuner using ALi M3330 MPEP-2 decoder processor and confirm the correct operation using the input TS stream generated by DVB-T stream generator. The developed PSI parser could be used for the test environment, various channel extension, and the development of DVB-T reception module.

  • PDF

Development of DVB-T reception module based on Ali M3330 MPEG-2 decoder processor (ALi M3330 MPEG-2 디코더 프로세서 기반의 DVB-T 수신 모듈 개발)

  • Kim, Min-Sung;Jun, Do-Young;Yang, So-Jung;You, Hong-Hyun;Hong, Sung-Hoon
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.169-171
    • /
    • 2007
  • This paper presents the development of DVB-T reception module including the RF-tuner. For the development of the reception module, we design the board using the ALi M3330 MPEG-2 decoder processor and implement its device driver. Simple On-Screen Display (OSD) applications are also designed on the middleware of ALi M3330 MPEG-2 decoder processor. To evaluate the performance of reception module, we test the decoding operations using the input TS stream generated by DVB-T stream generator and confirm the correctness of its functional operations.

  • PDF

Design and implementation of a media processor for mobile multimedia broadcasting (이동멀티미디어 방송을 위한 미디어 처리기 설계 및 구현)

  • 안상우;이용주;최진수;김진웅
    • Journal of Broadcast Engineering
    • /
    • v.8 no.3
    • /
    • pp.259-267
    • /
    • 2003
  • In this paper, we propose a media processor to provide interactive services in mobile multimedia broadcasting environments. The proposed system Is designed to support various functionalities, such as generation of MPEG-4 IOD (Initial Object Descriptor)/OD(Object Descriptor)/BIFS (Binary Format for Scene) data, encapsulation of MPEG-4 AVC (Advanced Video Coding)/BSAC (Bit Sliced Arithmetic Coding) stream and generated IOD/OD/BIFS data into SL (Sync Layer) packet, packetization of SL packet into TS (Transport Stream) packet and multiplexing. The proposed media processor can provide MPEG-4 based interactive services for users.

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms (병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.92-95
    • /
    • 2017
  • In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.

Development of a Cell-based Long-term Hydrologic Model Using Geographic Information System(II) - Pre and Post Processor Development - (지리정보시스템을 이용한 장기유출모형의 개발(II) -전.후처리 시스템 개발-)

  • 최진용;정하우;김대식
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.39 no.2
    • /
    • pp.103-112
    • /
    • 1997
  • A CELTHYM(CEll-based Long-term HYdrologic Model), a pre-processor and a post-processor that can he integrated with geographic information system(GIS) were developed to predict the stream flow of a small agricultural watershed. Three kinds of routines, that are watershed boundary extraction routine(WBER), curve number calculation routine(CNR) and maximum available soil moisture calculation routine(MASR) composed pre-processor that was nicely interfaced with CELTRYM and GIS. Two kinds of routines, grapher and map composer composed post-processor that was well adapted CELTHYM output to chart making and GIS map making. The developed pre-post processor was useful for the GIS integration and spatial comprehension of the CELTHYM output.

  • PDF

Electronic Processor Design for Thermal Imager with Serial/Parallel Scan type (직병렬 주사방식 일정장비의 신호처리기 설계 연구)

  • 송인섭;유위경;윤은석;홍영철;홍석민
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.1
    • /
    • pp.49-56
    • /
    • 1994
  • This paper describes the design principles and methods of electronic processor for thermal imager with the SPRITE detector, operating in the 8-12 micron band. The thermal imager consists of a optical scanner containing the detector and an electrical signal processor. The optical scanner utilizing rotating polygon and oscillating mirror, is 2-dimensional serial/parallel scan type using 5 elements of the detector. And the electronic processor has pre-processing of 5 chnanel's thermal signal from the detector, and performs digital scan conversion to reform the parallel data stream into serial analog data compatible with conventional RS-170 video. Through the designed electronic processor, we have acquired a satisfactory thermal image. And the MRTD (Minimum Resolvable Temperature Difference) is 0.5$^{\circ}$K at 7.5 cycles/mm.

  • PDF