• Title/Summary/Keyword: Computer architecture

Search Result 3,101, Processing Time 0.14 seconds

Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler (작은 크기의 Warp 스케쥴러 기반 SIMT구조 고성능 모바일 GPGPU 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.479-484
    • /
    • 2021
  • This paper proposed and designed a structure to achieve high performance with a small number of cores in GPGPU with SIMT structure. GPGPU for application to mobile devices requires a structure to increase performance compared to power consumption. In order to reduce power consumption, the number of cores decreased, but to improve performance, the size of the warp scheduler for managing threads was set to 4, which was greatly reduced than 32 of general GPGPU. Reducing warp size can reduce the number of idle cycles in pipelines and efficiently apply memory latency to reduce miss penalty when accessing cache memory. The designed GPGPU measured computational performance using a test program that includes floating point operations and measured power consumption through a 28nm CMOS process to obtain 104.5GFlops/Watt as a performance per power. The results of this paper showed about four times better performance per power compared to Tegra K1 of Nvidia

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

An Evaluation Method of Understanding SW Architectures in an Arduino-based SW Lecture for Non-major Undergraduates (비전공자 대상 아두이노 활용 SW 강좌에서 SW 구조 이해도 평가 방법)

  • Hur, Kyeong
    • Journal of Practical Engineering Education
    • /
    • v.11 no.1
    • /
    • pp.17-23
    • /
    • 2019
  • In applying SW education for non-major undergraduates, we applied the physical computing lesson using Arduino. There is a case in which the basic problem-solving process teaching method based on the computational thinking was proposed in the physical computing class using Arduino. However, in educating computational thinking process, it is necessary to evaluate and educate understanding of SW structures. After understanding SW structures, it is correct SW education flow to make creative outputs by applying computational thinking process. However, there is a lack of examples of how to evaluate understanding of SW structures in the class using Arduino. In this paper, we proposed a one - semester curriculum for lectures on SW education using Arduino for non-majors. In addition, we proposed and analyzed the evaluation method of the understanding of SW structures and the evaluation problems developed in this course.

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

High-quality data collection for machine learning using block chain (블록체인을 활용한 양질의 기계학습용 데이터 수집 방안 연구)

  • Kim, Youngrang;Woo, Junghoon;Lee, Jaehwan;Shin, Ji Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.1
    • /
    • pp.13-19
    • /
    • 2019
  • The accuracy of machine learning is greatly affected by amount of learning data and quality of data. Collecting existing Web-based learning data has danger that data unrelated to actual learning can be collected, and it is impossible to secure data transparency. In this paper, we propose a method for collecting data directly in parallel by blocks in a block - chain structure, and comparing the data collected by each block with data in other blocks to select only good data. In the proposed system, each block shares data with each other through a chain of blocks, utilizes the All-reduce structure of Parallel-SGD to select only good quality data through comparison with other block data to construct a learning data set. Also, in order to verify the performance of the proposed architecture, we verify that the original image is only good data among the modulated images using the existing benchmark data set.

Implementation of Neural Network Accelerator for Rendering Noise Reduction on OpenCL (OpenCL을 이용한 랜더링 노이즈 제거를 위한 뉴럴 네트워크 가속기 구현)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.373-377
    • /
    • 2018
  • In this paper, we propose an implementation of a neural network accelerator for reducing the rendering noise using OpenCL. Among the rendering algorithms, we selects a ray tracing to assure a high quality graphics. Ray tracing rendering uses ray to render, less use of the ray will result in noise. Ray used more will produce a higher quality image but will take operation time longer. To reduce operation time whiles using fewer rays, Learning Base Filtering algorithm using neural network was applied. it's not always produce optimize result. In this paper, a new approach to Matrix Multiplication that is based on General Matrix Multiplication for improved performance. The development environment, we used specialized in high speed parallel processing of OpenCL. The proposed architecture was verified using Kintex UltraScale XKU6909T-2FDFG1157C FPGA board. The time it takes to calculate the parameters is about 1.12 times fast than that of Verilog-HDL structure.

Implementation and Performance Evaluation of Migration Agent for Seamless Virtual Environment System in Grid Computing Network (그리드 컴퓨팅 네트워크에서 Seamless 가상 환경 시스템 구축을 위한 마이그레이션 에이전트 구현 및 성능 평가)

  • Won, Dong Hyun;An, Dong-Un
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.11
    • /
    • pp.269-274
    • /
    • 2018
  • MMORPG is a role-playing game that tens of thousands of people access it online at the same time. Users connect to the server through the game client and play with their own characters. If the user moves to a management area of another server beyond the area managed by the server, the user information must be transmitted to the server to be moved. In an actual game, the user is required to synchronize the established and the transferred information. In this paper, we propose a migration agent server in the virtual systems. We implement a seamless virtual server using the grid method to experiment with seamless server architecture for virtual systems. We propose a method to minimize the delay and equalize the load when the user moves to another server region in the virtual environment. Migration Agent acts as a cache server to reduce response time, the response time was reduced by 50% in the case of 70,000 people.

Development of ITB Risk Mgt. Model Based on AI in Bidding Phase for Oversea EPC Projects (플랜트 EPC 해외 사업을 위한 입찰단계 시 AI 기반의 ITB Risk 관리 모델 개발)

  • Lee, Don-Hee;Yoon, Gun-Ho;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.151-160
    • /
    • 2019
  • EPC companies to continue operating overseas, it is increasingly becoming apparent that risk is no longer something to be avoided but a subject to be managed. During the bidding stage, the requirements, specifications and project line items within the bid package must be studied in details to analyze the various risk factors in order to avoid cost overruns. However, reviewing vast quantities of bidding documents is time consuming and labor intensive and is not an easy task and this is where automated information technology can help. For this study, I have constructed an ITB analysis model based on Watson AI that can analyze and apply vast amount of documents more effectively in a short time. Configuration of the Watson Explorer AI architecture for AI-based ITB risk management model research, the selection of learning procedures and analysis subjects, and the performance evaluation criteria were defined, and a test bed was constructed to conduct a pilot research. Consequently, I verified the effectiveness of the analytical time reduction and the quality of its results and VOC operations by professionals.

Design and Implementation of Feature Detector for Object Tracking (객체 추적을 위한 특징점 검출기의 설계 및 구현)

  • Lee, Du-hyeon;Kim, Hyeon;Cho, Jae-chan;Jung, Yun-ho
    • Journal of IKEEE
    • /
    • v.23 no.1
    • /
    • pp.207-213
    • /
    • 2019
  • In this paper, we propose a low-complexity feature detection algorithm for object tracking and present hardware architecture design and implementation results for real-time processing. The existing Shi-Tomasi algorithm shows good performance in object tracking applications, but has a high computational complexity. Therefore, we propose an efficient feature detection algorithm, which can reduce the operational complexity with the similar performance to Shi-Tomasi algorithm, and present its real-time implementation results. The proposed feature detector was implemented with 1,307 logic slices, 5 DSP 48s and 86.91Kbits memory with FPGA. In addition, it can support the real-time processing of 54fps at an operating frequency of 114MHz for $1920{\times}1080FHD$ images.

A Comparative Analysis of PKI Internet Banking and Blockchain Payment Transactions (PKI 인터넷 뱅킹과 블록체인 지불 거래의 비교 분석)

  • Park, Seungchul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.5
    • /
    • pp.604-612
    • /
    • 2019
  • PKI Internet banking is used to have users register their public keys with the banking server together with the identity information, and verify the signature for both user and transaction authentications by using the registered public keys. Although the Blockchain-based financial systems such as Bitcoin adopt similar digital signature-based authentication scheme, there is no server that participants can register public keys with because they perform P2P payment transactions. The purpose of this paper is to identify the advantages and disadvantages of the Blockchain-based payment transactions by analyzing the differences between the most common PKI Internet banking and Blockchain payment systems. Based on the analysis, this paper suggests the issues that need to be enhanced from the aspects of architecture and security in order for Blockchain payment transaction systems to be applied universally.