• Title/Summary/Keyword: 코어

Search Result 3,054, Processing Time 0.031 seconds

Impact of Process Scheduling on Network Performance over Multi-Core Systems (멀티 코어 시스템에서 통신 프로세스의 스케줄링에 따른 성능 분석)

  • Jang, Hye-Churn;Jin, Hyun-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.827-829
    • /
    • 2009
  • 현재 멀티 코어 프로세서는 많은 서버에 적용되어 사용되고 있으며, 향후에는 하나의 프로세서 패키지에 포함될 코어의 개수는 계속해서 증가할 것이다. 그러나 현재 운영체제들은 멀티 코어 시스템을 멀티 프로세서 환경과 거의 동일하게 다루고 있으며 아직 멀티 코어 특성을 고려한 성능 최적화 시도는 미흡한 상태이다. 본 논문은 SMP와 NUMA 구조의 멀티 코어 프로세서 환경에서 통신 프로세스와 네트워크 인터럽트의 프로세서 친화도를 변화시키며 네트워크 처리율과 코어의 유휴 자원 양을 정량적으로 분석한다. 측정 결과 프로세서 친화도에 따라 통신 처리율은 크게 변하지 않지만 프로세서 자원의 요구량에는 크게 영향을 주는 것을 보인다. 또한 이러한 프로세서 자원의 영향은 멀티 코어 프로세서의 캐쉬 공유 구조 및 메모리 분산 구조와 밀접한 관계를 갖고 있음을 밝힌다.

8×8 HEVC Inverse Core Transform Architecture Using Multiplier Reuse (곱셈기를 재사용하는 8×8 HEVC 코어 역변환기 설계)

  • Lee, Jong-Bae;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.570-578
    • /
    • 2013
  • This paper proposed an $8{\times}8$ HEVC inverse core transform architecture reusing multipliers. In HEVC core transform, processing of lower size block is identical with even part of upper size block. So an $8{\times}8$ core transform architecture can process both $8{\times}8$ and $4{\times}4$ core transforms. However, when $8{\times}8$ core transform architecture is exploited, frame processing time doubles in $4{\times}4$ core transform, since $8{\times}8$ and $4{\times}4$ core transforms concurrently process 8 and 4 pixels, respectively. In this paper, a novel inverse core transform architecture is proposed based on multiplier reuse. It runs as an $8{\times}8$ inverse core transformer or two $4{\times}4$ inverse core transformer. Its frame processing time is same in $8{\times}8$ and $4{\times}4$ core transforms, and reduces gate counts by 12%.

A Method of Selecting Candidate Core for Shared-Based Tree Multicast Routing Protocol (공유기반 트리 멀티캐스트 라우팅 프로토콜을 위한 후보 코어 선택 방법)

  • Hwang Soon-Hwan;Youn Sung-Dae
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1436-1442
    • /
    • 2004
  • A shared-based tree established by the Core Based Tree multicast routing protocol (CBT), the Protocol Independent Multicast Sparse-Mode(PIM-SM), or the Core-Manager based Multicast Routing(CMMR) is rooted at a center node called core or Rendezvous Point(RP). The routes from the core (or RP) to the members of the multicast group are shortest paths. The costs of the trees constructed based on the core and the packet delays are dependent on the location of the core. The location of the core may affect the cost and performance of the shared-based tree. In this paper, we propose three methods for selecting the set of candidate cores. The three proposed methods, namely, k-minimum average cost, k-maximum degree, k-maximum weight are compared with a method which select the candidate cores randomly. Three performance measures, namely, tree cost, mean packet delay, and maximum packet delay are considered. Our simulation results show that the three proposed methods produce lower tree cost, significantly lower mean packet delay and maximum packet delay than the method which selects the candidate cores randomly.

  • PDF

An Efficient Core-Based Multicast Tree using Weighted Clustering in Ad-hoc Networks (애드혹 네트워크에서 가중치 클러스터링을 이용한 효율적인 코어-기반 멀티캐스트 트리)

  • Park, Yang-Jae;Han, Seung-Jin;Lee, Jung-Hyun
    • The KIPS Transactions:PartC
    • /
    • v.10C no.3
    • /
    • pp.377-386
    • /
    • 2003
  • This study suggested a technique to maintain an efficient core-based multicast tree using weighted clustering factors in mobile Ad-hoc networks. The biggest problem with the core-based multicast tree routing is to decide the position of core node. The distance of data transmission varies depending on the position of core node. The overhead's effect on the entire network is great according to the recomposition of the multicast tree due to the movement of core node, clustering is used. A core node from cluster head nodes on the multicast tree within core area whose weighted factor is the least is chosen as the head core node. Way that compose multicast tree by weighted clustering factors thus and propose keeping could know that transmission distance and control overhead according to position andmobility of core node improve than existent multicast way, and when select core node, mobility is less, and is near in center of network multicast tree could verification by simulation stabilizing that transmission distance is short.

A Study of Trace-driven Simulation for Multi-core Processor Architectures (멀티코어 프로세서의 명령어 자취형 모의실험에 대한 연구)

  • Lee, Jong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-13
    • /
    • 2012
  • In order to overcome the complexity and power problems of superscalar processors, the multi-core architecture has been prevalent recently. Although the execution-driven simulation is wide spread, the trace-driven simulation has speed advantages over the execution-driven simulation. We present a methodology to simulate multi-core architecture using trace-driven simulator. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the cores ranging from 2 to 16 extensively. As a result, the 16-core processor resulted in 4.1 IPC and 13.3 times speed up over single-core processor on the average.

An Improving Method of Android Boot Speed in Multi-core based Embedded System (멀티코어 기반의 임베디드 시스템에서 안드로이드 부팅 속도 향상 방법)

  • Choi, Jin-Yong;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.564-569
    • /
    • 2013
  • The current embedded devices are growing rapidly in the multi-core, and these demand fast boot time. But method of previous boot uses core only one. The method includes parallel techniques and modification of CPU Frequency policy. Parallel methods, after analyzing the Android boot process with analysis tool, applied to location where a lot of CPU operation. CPU Frequency policy is modified for high performance of core. The proposed method was applied to S5PV310 dual core and Exynos4412 quad core embedded system. As a result of the experiment, we found that the proposed method makes boot time fast about 20.71% and 31.34% in dual core and quad core environment as compared with the previous method.

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

The Effect of Mesh Interconnection Network on the Performance of Manycore System. (다중코어 시스템의 메쉬구조 상호연결망이 성능에 미치는 영향)

  • Kim, Han-Yee;Kim, Young-Hwan;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.116-119
    • /
    • 2011
  • 다중코어(Many-Core) 시스템은 많은 코어들이 상호연결망을 통해서 연결되어있는 시스템으로, 단일코어나 멀티코어 시스템에 비해 보다 많은 병렬 컴퓨팅 자원을 지원한다. Amdahl 의 법칙에 의하면 병렬화되어 처리하는 부분은 이론적으로 프로세서의 개수에 비례하게 가속화 될 수 있지만, 상호연결망에서의 전송 지연을 비롯한 많은 요인에 의해서 성능의 가속화가 저해된다. 특히 캐시 일관성 규약(Cache Coherence Protocol)을 지원하는 대부분의 다중코어 시스템에서는 병렬화를 함에 있어서 캐시 미스로 인해 발생하는 데이터의 전송 지연이 성능에 많은 영향을 미칠 수 있다. 따라서 효과적인 병렬 프로그램을 위해서는 캐시 구조에 대한 이해를 바탕으로 상호연결망에 대한 연구가 필요하다. 본 논문에서는 메쉬(Mesh) 구조의 64 코어 다중코어 시스템인 TilePro64 를 이용하여 상호연결망의 데이터 전송 지연에 따른 프로그램 성능의 민감도를 측정하였다. 결과적으로 코어간 거리(Hop)가 늘어날수록 작업의 수행시간이 평균적으로 4.27%씩 선형적으로 증가하는 관계가 있는 것으로 나타났다.

Analysis on the Temperature of 3D Multi-core Processors according to Vertical Placement of Core and L2 Cache (코어와 L2 캐쉬의 수직적 배치 관계에 따른 3차원 멀티코어 프로세서의 온도 분석)

  • Son, Dong-Oh;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.6
    • /
    • pp.1-10
    • /
    • 2011
  • In designing multi-core processors, interconnection delay is one of the major constraints in performance improvement. To solve this problem, the 3-dimensional integration technology has been adopted in designing multi-core processors. The 3D multi-core architecture can reduce the physical wire length by stacking cores vertically, leading to reduced interconnection delay and reduced power consumption. However, the power density of 3D multi-core architecture is increased significantly compared to the traditional 2D multi-core architecture, resulting in the increased temperature of the processor. In this paper, the floorplan methods which change the forms of vertical placement of the core and the level-2 cache are analyzed to solve the thermal problems in 3D multi-core processors. According to the experimental results, it is an effective way to reduce the temperature in the processor that the core and the level-2 cache are stacked adjacently. Compared to the floorplan where cores are stacked adjacently to each other, the floorplan where the core is stacked adjacently to the level-2 cache can reduce the temperature by 22% in the case of 4-layers, and by 13% in the case of 2-layers.

Power-efficient Scheduling of Periodic Real-time Tasks on Lightly Loaded Multicore Processors (저부하 멀티코어 프로세서에서 주기적 실시간 작업들의 저전력 스케쥴링)

  • Lee, Wan-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.11-19
    • /
    • 2012
  • In this paper, we propose a power-efficient scheduling scheme for lightly loaded multicore processors which contain more processing cores than running tasks. The proposed scheme activates a portion of available cores and inactivates the other unused cores in order to save power consumption. The tasks are assigned to the activated cores based on a heuristic mechanism for fast task assignment. Each activated core executes its assigned tasks with the optimal clock frequency which minimizes the power consumption of the tasks while meeting their deadlines. Evaluation shows that the proposed scheme saves up to 78% power consumption of the previous method which activates as many processing cores as possible for the execution of the given tasks.