• Title/Summary/Keyword: 멀티코어

Search Result 413, Processing Time 0.028 seconds

Investigation on TLB Miss Impact through TLB Lockdown in Multi-core Systems (멀티코어 시스템에서 TLB Lockdown에 의한 TLB Miss 영향 분석)

  • Song, Daeyoung;Park, Sihyeong;Kim, Hyungshin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.1
    • /
    • pp.59-65
    • /
    • 2022
  • Virtual memory is used as the method to ensure the safety of the system through memory protection in the real-time system. TLB miss caused by using virtual memory makes the real-time system WCET more pessimistically. TLB lockdown can be applied as a method to improve this problem. However, processors with limited TLB lockdown entries, a selection criterion is needed to efficiently utilize the TLB lockdown entry. In this paper, the most frequently accessed virtual pages in the process are applied to the TLB lockdown by analyzing memory profiling. The results showed that micro data TLB miss stall cycle and main data TLB miss stall cycle of the processor decreased by at least 4.7% and up to 29.7%.

Design and Implementation of Userspace Read-Copy Update scheme using Page Faults (페이지 폴트를 이용한 Userspace Read-Copy Update 기법 설계 및 구현)

  • Kim, Inhyuk;Shin, Eunhwan;Eom, Youngik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.1721-1724
    • /
    • 2010
  • 멀티코어의 등장과 병렬 프로그래밍의 확산으로 lock-free 동기화 기법에 대한 관심과 필요성이 더욱 커지고 있지만, 대부분의 lock-free 동기화 기법들이 구현 복잡도와 동작시 오버헤드로 인해 실제 활용되는 사례는 미비하다. 하지만, RCU(Read-Copy Update) 기법의 등장으로 다양한 운영체제에서 이를 구현하여 활용하고, 최근에는 게임 서버와 같은 응용 프로그램에서도 이를 활용하려는 시도가 늘어나고 있지만, 기존에 제안된 URCU(Userspace RCU) 기법들은 메모리 순서오류 문제 해결을 위한 메모리 장벽 호출 및 reader와 updater 간의 IPC 등으로 충분한 성능을 보여주지 못하고 있다. 이에 본 논문에서는 페이지 폴트를 이용한 URCU 기법을 제안하고, 이를 구현하여 기존의 URCU 기법들과 실험을 통하여 평가하였다.

A Study of Advanced Hybrid Execution Using Reverse Traversal (역방향 탐색을 사용하는 하이브리드 분석 기법에 관한 연구)

  • Jang, Seong-Soo;Choi, Young-Hyun;Lim, Hun-Jung;Chung, Tai-Myoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.883-885
    • /
    • 2011
  • 소프트웨어 분석 기법이 발전하며 다양한 종류의 악성 코드를 점검할 수 있게 되면서 이를 회피하기 위한 기술들이 등장하였다. 실행 시 스스로 코드를 변경하는 등의 진화된 악성 코드들로부터 시스템을 보호하기 위해 프로그램에 존재하는 실행되지 않는 경로에 대해서도 검사를 할 수 있는 기법을 제시한다. 제안하는 기법은 프로그램을 읽어 CFG를 생성하고, 각 종료 지점에서부터 이를 역방향으로 순회하여 모든 실행 경로를 얻는다. 여기서 발생하는 오버헤드는 멀티코어 프로세서를 활용하는 다중 작업으로 완화시킬 수 있다.

DVFS based Memory-Contention Aware Scheduling Method for Multi-threaded Workloads (멀티쓰레드 워크로드를 위한 DVFS 기반 메모리 경합 인지 스케줄링 기법)

  • Nam, Yoonsung;Kang, Minkyu;Yeom, HeonYoung;Eom, Hyeonsang
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.1
    • /
    • pp.10-16
    • /
    • 2018
  • The task of consolidating server workloads is critical for the efficiency of a datacenter in terms of reducing costs. However, as a greater number of workloads are consolidated in a single server, the performance of workloads might be degraded due to their contention to the limited shared resources. To reduce the performance degradation, scheduling for mitigating the contention of shared resources is necessary. In this paper, we present the Dynamic Voltage Frequency Scaling (DVFS) based memory-contention aware scheduling method for multi-threaded workloads. The proposed method uses two approaches: running memory-intensive threads on the limited cores to avoid concurrent memory accesses, and reducing the frequencies of the cores that run memory-intensive threads. With the proposed algorithm, we increased performance by 43% and reduced power consumption by 38% compared to the Completely Fair Scheduler(CFS), the default scheduler of Linux.

A Code-level Parallelization Methodology to Enhance Interactivity of Smartphone Entertainment Applications (스마트폰 엔터테인먼트 애플리케이션의 상호작용성 개선을 위한 코드 수준 병렬화 방법론)

  • Kim, Byung-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.381-390
    • /
    • 2015
  • One of the fundamental requirements of entertainment applications is interactivity with users. The mobile device such as the smartphone, however, does not guarantee it due to the limit of the application processor's computing power, memory size and available electric power of the battery. This paper proposes a methodology to boost responsiveness of interactive applications by taking advantage of the parallel architecture of mobile devices which, for instance, have dual-core, quad-core or octa-core. To harness the multi-core architecture, it exploits the POSIX thread, a platform-independent thread library to be able to be used in various mobile platforms such as Android, iOS, etc. As a useful application example of the methodology, a heavy matrix calculation function was transformed to a parallelized version which showed around 2.5 ~ 3 times faster than the original version in a real-world usage environment.

A Function-characteristic Aware Thread-mapping Strategy for an SEDA-based Message Processor in Multi-core Environments (멀티코어 환경에서 SEDA 기반 메시지 처리기의 수행함수 특성을 고려한 쓰레드 매핑 기법)

  • Kang, Heeeun;Park, Sungyong;Lee, Younjeong;Jee, Seungbae
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.13-20
    • /
    • 2017
  • A message processor is server software that receives various message formats from clients, creates the corresponding threads to process them, and lastly delivers the results to the destination. Considering that each function of an SEDA-based message processor has its own characteristics such as CPU-bound or IO-bound, this paper proposes a thread-mapping strategy called "FC-TM" (function-characteristic aware thread mapping) that schedules the threads to the cores based on the function characteristics in multi-core environments. This paper assumes that message-processor functions are static in the sense that they are pre-defined when the message processor is built; therefore, we profile each function in advance and map each thread to a core using the information in order to maximize the throughput. The benchmarking results show that the throughput increased by up to a maximum of 72 % compared with the previous studies when the ratio of the IO-bound functions to the CPU-bound functions exceeds a certain percentage.

Implementation of 40 Gb/s Network Processor of Wire-Speed Flow Management (40 Gb/s 실시간 플로우 관리 네트워크 프로세서 구현)

  • Doo, Kyeong-Hwan;Lee, Bhum-Cheol;Kim, Whan-Woo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.814-821
    • /
    • 2012
  • We propose a network processor called an OmniFlow processor capable of wire-speed flow management by a hardware-based flow admission control(FAC) in this paper. Because the OmniFlow processor can set up and release a wire-speed connection for flows, the update period of flows can be set to a short time, and only active flows can be effectively managed by terminating a flow that does not have a packet transmitted within this period. Therefore, the FAC can be used to provide a reliable transmission of UDP as well as TCP applications. This processor is fabricated in 65nm CMOS technology, and total gate count is 25 million. It has 40 Gb/s throughput performance in using the 32 RISC cores when maximum operating frequency is 555MHz.

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

Acceleration of Intrusion Detection for Multi-core Video Surveillance Systems (멀티 코어 프로세서 기반의 영상 감시 시스템을 위한 침입 탐지 처리의 가속화)

  • Lee, Gil-Beom;Jung, Sang-Jin;Kim, Tae-Hwan;Lee, Myeong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.141-149
    • /
    • 2013
  • This paper presents a high-speed intrusion detection process for multi-core video surveillance systems. The high-speed intrusion detection was designed to a parallel process. Based on the analysis of the conventional process, a parallel intrusion detection process was proposed so as to be accelerated by utilizing multiple processing cores in contemporary computing systems. The proposed process performs the intrusion detection in a per-frame parallel manner, considering the data dependency between frames. The proposed process was validated by implementing a multi-threaded intrusion detection program. For the system having eight processing cores, the detection speed of the proposed program is higher than that of the conventional one by up to 353.76% in terms of the frame rate.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.