• Title/Summary/Keyword: 멀티코어프로세서

Search Result 167, Processing Time 0.03 seconds

A Study on the Scalability of Multi-core-PC Cluster for Seismic Design of Reinforced-Concrete Structures based on Genetic Algorithm (유전알고리즘 기반 콘크리트 구조물의 최적화 설계를 위한 멀티코어 퍼스널 컴퓨터 클러스터의 확장 가능성 연구)

  • Park, Keunhyoung;Choi, Se Woon;Kim, Yousok;Park, Hyo Seon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.4
    • /
    • pp.275-281
    • /
    • 2013
  • In this paper, determination of the scalability of the cluster composed common personal computer was performed when optimization of reinforced concrete structure using genetic algorithm. The goal of this research is watching the potential of multi-core-PC cluster for optimization of seismic design of reinforced-concrete structures. By increasing the number of core-processer of cluster, decreasing of computation time per each generation of genetic algorithm was observed. After classifying the components in singular personal computer, the estimation of the expected bottle-neck phenomenon and comparison with wall-clock time and Amdahl's law equation was performed. So we could obseved the scalability of the cluster appear complex tendency. For separating the bottle-neck phenomenon of physical and algorithm, the different size of population was selected for genetic algorithm cases. When using 64 core-processor, the efficiency of cluster is low as 31.2% compared with Amdahl's law efficiency.

Design and Verification of the Class-based Architecture Description Language (클래스-기반 아키텍처 기술 언어의 설계 및 검증)

  • Ko, Kwang-Man
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.7
    • /
    • pp.1076-1087
    • /
    • 2010
  • Together with a new advent of embedded processor developed to support specific application area and it evolution, a new research of software development to support the embedded processor and its commercial challenge has been revitalized. Retargetability is typically achieved by providing target machine information, ADL, as input. The ADLs are used to specify processor and memory architectures and generate software toolkit including compiler, simulator, assembler, profiler, and debugger. The EXPRESSION ADL follows a mixed level approach-it can capture both the structure and behavior supporting a natural specification of the programmable architectures consisting of processor cores, coprocessors, and memories. And it was originally designed to capture processor/memory architectures and generate software toolkit to enable compiler-in-the-loop exploration of SoC architecture. In this paper, we designed the class-based ADL based on the EXPRESSION ADL to promote the write-ability, extensibility and verified the validation of grammar. For this works, we defined 6 core classes and generated the EXPRESSION's compiler and simulator through the MIPS R4000 description.

A Bus Data Compression Method for High Resolution Mobile Multimedia SoC (고해상 모바일 멀티미디어 SoC를 위한 온칩 버스 데이터 압축 방법)

  • Lee, Jin;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.345-348
    • /
    • 2013
  • This paper provides a method for compression and transmission of on-chip bus data. As the data traffic on on-chip buses is rapidly increasing with enlarged video resolutions, many video processor chips suffer from a lack of bus bandwidth and their IP cores have to wait for a longer time to get a bus grant. In multimedia data such as images and video, the adjacent data signals very often have little or no difference between them. Taking advantage of this point, this paper develops a simple bus data compression method to improve the chip performance and presents its hardware implementation. The method is applied to a Video Codec - 1 (VC-1) decoder chip and reduces the processing time of one macro-block by 13.6% and 10.3% for SD and HD videos, respectively.

  • PDF

The Implementation of Real-time Performance Monitor for Multi-thread Application (멀티스레드 어플리케이션을 위한 실시간 성능모니터의 구현)

  • Kim, Jin-Hyuk;Shin, Kwang-Sik;Yoon, Wan-Oh;Lee, Chang-Ho;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.82-90
    • /
    • 2011
  • Multi-core system is becoming more general with development of microprocessors. Due to this change in performance improvement paradigm, switching conventional single thread applications with multi thread applications. Performance monitoring tools are used to optimize application performance because of complexity in development of multi thread applications. Conventional performance monitoring tools are focused on performance itself rather than user friendliness or real-time support. Real-time performance monitor identify the problem while multi-threaded applications should be performed as well as check real-time operating status of the application. So it can be used as an effective tool compared to non-real-time performance monitor that only with simple performance indicators to find the cause of the problem. In this paper, we propose RMPM(Real-time Multi-core Performance Monitor) which is real-time performance monitoring tool for multi-core system. Observation period is optimized by comparing relation between overhead due to performance evaluation period and accuracy. Our performance monitor shows not only amount of CPU usage of whole system, memory usage, network usage but also aspect of overhead distribution per thread of an application.

Analysis on the Cooling Efficiency of High-Performance Multicore Processors according to Cooling Methods (기계식 쿨링 기법에 따른 고성능 멀티코어 프로세서의 냉각 효율성 분석)

  • Kang, Seung-Gu;Choi, Hong-Jun;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.7
    • /
    • pp.1-11
    • /
    • 2011
  • Many researchers have studied on the methods to improve the processor performance. However, high integrated semiconductor technology for improving the processor performance causes many problems such as battery life, high power density, hotspot, etc. Especially, as hotspot has critical impact on the reliability of chip, thermal problems should be considered together with performance and power consumption when designing high-performance processors. To alleviate the thermal problems of processors, there have been various researches. In the past, mechanical cooling methods have been used to control the temperature of processors. However, up-to-date microprocessors causes severe thermal problems, resulting in increased cooling cost. Therefore, recent studies have focused on architecture-level thermal-aware design techniques than mechanical cooling methods. Even though architecture-level thermal-aware design techniques are efficient for reducing the temperature of processors, they cause performance degradation inevitably. Therefore, if the mechanical cooling methods can manage the thermal problems of processors efficiently, the performance can be improved by reducing the performance degradation due to architecture-level thermal-aware design techniques such as dynamic thermal management. In this paper, we analyze the cooling efficiency of high-performance multicore processors according to mechanical cooling methods. According to our experiments using air cooler and liquid cooler, the liquid cooler consumes more power than the air cooler whereas it reduces the temperature more efficiently. Especially, the cost for reducing $1^{\circ}C$ is varied by the environments. Therefore, if the mechanical cooling methods can be used appropriately, the temperature of high-performance processors can be managed more efficiently.

H.264/AVC Decoder Parallelization Methods for Real-time Full-HD Image Processing (Full-HD 영상의 실시간 처리를 위한 H.264/AVC 디코더 병렬화 기법)

  • Yoo, Hosun;Kim, Ilseung;Kim, Taeho;Jeon, Jeehyun;Jeong, Jechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.453-456
    • /
    • 2012
  • 최근 멀티코어 프로세서의 사용이 증가함에 따라 영상처리나 대용량 처리가 필요한 기술과 같은 다양한 분야에 OpenMP, SIMD 등과 같은 다양한 병렬화 기법들이 적용되고 있다. 특히, 영상처리 분야에서 Full-HD, UHD, 3D TV 등과 같이 높은 복잡도를 갖는 컨텐츠들의 수요가 높아짐에 따라 기존의 싱글코어 기반의 코덱에 병렬화를 적용하는 여러가지 기법들이 제안되어왔다. 본 논문은 기존의 OpenMP와 SIMD와 같은 병렬처리 기법을 H.264/AVC 코덱의 참조 소프트웨어 JM 18.2의 디코더에 적용함으로써 Full-HD영상을 실시간으로 디코딩하는 기법을 제안한다. 실험결과는 평균 38.338 fps의 프레임 율을 보이며 병렬처리시 평균 2배 이상 프레임 율이 증가함으로써 Full-HD 영상의 실시간 처리가 가능하다는 것을 보여준다.

  • PDF

미들박스 서비스를 위한 전용 소프트웨어 플랫폼과 네트워크 기능 가상화

  • Park, Gyeong-Su
    • Information and Communications Magazine
    • /
    • v.31 no.6
    • /
    • pp.32-38
    • /
    • 2014
  • 소프트웨어기반의 네트워크 미들박스 시스템은 특정 하드웨어의 종속성을 탈피하고, 다양한 여러 기능을 유연하게 제공할 수 있는 장점이 있어 최근 큰 각광을 받고 있다. 더욱이 최근 멀티코어 및 매니코어 프로세서의 발전 및 큰 대역폭을 지원하는 네트워크 카드의 등장은 저렴한 범용 컴퓨팅 하드웨어 기반에서도 높은 성능의 미들박스 서비스를 소프트웨어만으로 쉽게 제공할 수 있는 가능성을 보여주고 있다. 하지만 기존의 소프트웨어기반 네트워크 미들박스 시스템 개발에서 쓰이는 네트워킹 소프트웨어 스택은 여러 미들박스 서비스를 쉽게 만들고 유지하기에 불편한 점이 많이 있다. 첫째로, 리눅스(Linux)와 같은 범용 운영체제는 버클리 소켓(Berkeley socket)과 같이 엔드 노드를 위한 네트워킹 스택을 지원하는 반면 네트워크 미들박스 서비스 제작을 위한 전용 스택은 지원하지 않고 있다. 이로 인해 미들박스에서 많이 쓰는 플로 관리 같은 기능을 IP 패킷처리부터 새로 구현해야 하는 부담이 생긴다. 두번째로, 전용 스택의 부재는 같은 기능을 갖는 여러 미들박스 서비스가 공존할 때에도 그 구현을 공유하지 못하는 문제를 만들어 낸다. 또, 여러 미들박스 서비스가 하나의 물리적 하드웨어 위에서 수행될 경우에도 인터페이스가 일정하지 않아 같은 연산을 중복 수행해 자원 낭비를 초래한다. 본 논문에서는 차세대 소프트웨어기반 미들박스 서비스 설계 및 제작을 용이하게 하기 위한 전용 소프트웨어 스택의 필요성을 알아보고, 이런 전용 스택이 만들어 낼 수 있는 여러 가능성을 짚어본다.

Design and Implementation of an InfiniBand System Interconnect for High-Performance Cluster Systems (고성능 클러스터 시스템을 위한 인피니밴드 시스템 연결망의 설계 및 구현)

  • Mo, Sang-Man;Park, Kyung;Kim, Sung-Nam;Kim, Myung-Jun;Im, Ki-Wook
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.389-396
    • /
    • 2003
  • InfiniBand technology is being accepted as the future system interconnect to serve as the high-end enterprise fabric for cluster computing. This paper presents the design and implementation of the InfiniBand system interconnect, focusing on an InfiniBand host channel adapter (HCA) based on dual ARM9 processor cores The HCA is an SoC tailed KinCA which connects a host node onto the InfiniBand network both in hardware and in software. Since the ARM9 processor core does not provide necessary features for multiprocessor configuration, novel inter-processor communication and interrupt mechanisms between the two processors were designed and embedded within the KinCA chip. Kinch was fabricated as a 564-pin enhanced BGA (Bail Grid Array) device using 0.18${\mu}{\textrm}{m}$ CMOS technology Mounted on host nodes, it provides 10 Gbps outbound and inbound channels for transmit and receive, respectively, resulting in a high-performance cluster system.

Improving Haskell GC-Tuning Time Using Divide-and-Conquer (분할 정복법을 이용한 Haskell GC 조정 시간 개선)

  • An, Hyungjun;Kim, Hwamok;Liu, Xiao;Kim, Yeoneo;Byun, Sugwoo;Woo, Gyun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.9
    • /
    • pp.377-384
    • /
    • 2017
  • The performance improvement of a single core processor has reached its limit since the circuit density cannot be increased any longer due to overheating. Therefore, the multicore and manycore architectures have emerged as viable approaches and parallel programming becomes more important. Haskell, a purely functional language, is getting popular in this situation since it naturally supports parallel programming owing to its beneficial features including the implicit parallelism in evaluating expressions and the monadic tools supporting parallel constructs. However, the performance of Haskell parallel programs is strongly influenced by the performance of the run-time system including the garbage collector. Though a memory profiling tool namely GC-tune has been suggested, we need a more systematic way to use this tool. Since GC-tune finds the optimal memory size by executing the target program with all the different possible GC options, the GC-tuning time takes too long. This paper suggests a basic divide-and-conquer method to reduce the number of GC-tune executions by reducing the search area by one-quarter for every searching step. Applying this method to two parallel programs, a maximally independent set and a K-means programs, the memory tuning time is reduced by 7.78 times with accuracy 98% on average.

Performance Characterization of Tachyon Supercomputer using Hybrid Multi-zone NAS Parallel Benchmarks (하이브리드 병렬 프로그램을 이용한 타키온 슈퍼컴퓨터의 성능)

  • Park, Nam-Kyu;Jeong, Yoon-Su;Yi, Hong-Suk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.138-144
    • /
    • 2010
  • Tachyon primary system which introduces recently is a high performance supercomputer that composed with AMD Barcelona nodes. In this paper, we will verify the performance and parallel scalability of TachyonIn by using multi-zone NAS Parallel Benchmark(NPB) which is one of a program with hybrid parallel method. To test performance of hybrid parallel execution, B and C classes of BT-MZ in NPB version 3.3 were used. And the parallel scalability test has finished with Tachyon's 1024 processes. It is the first time in Korea to get a result of hybrid parallel computing calculation using more than 1024 processes. Hybrid parallel method in high performance computing system with multi-core technology like Tachyon describes that it can be very efficient and useful parallel performance benchmarks.