Search | Korea Science

Design and Performance Analysis of Segment Directory Method for Multiprocessor Systems (다중 프로세서 시스템을 위한 세그먼트 디렉토리 방식의 설계 및 성능 분석)

Choe, Jong-Hyeok;Lee, Chang-Gyu;Park, Gyu-Ho
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.11
- /
- pp.919-931
- /
- 2000
본 논문에서 우리는 전체 벡터 디렉토리와 포인터 디렉토리의 중간 형태를 가지는 새로운 디렉토리인 세그먼트 디렉토리를 제안한다. 이는 대부분의 포인터 기반 디렉토리 방법들에서 디렉토리 저장 효율을 높이기 위하여 사용될 수 있다. 포인터가 단지 하나의 프로세서만을 가리키는 데 비하여, 세그먼트 디렉토리 요소는 포인터와 거의 같은 수의 비트들을 가지고 여러 개의 프로세서들을 동시에 가리킬 수 있다. 본 논문에서는, 세그먼트 디렉토리를 기존의 네 가지 한정 디렉토리 방법들에 적용하고, 이렇게 얻은 성능 개선을 측정, 분석하였다. 세그먼트 디렉토리는 한정 디렉토리 방법들의 성능을 저하시키는 요인인 디렉토리 넘침을 71% 까지 제거시킴으로써, 이 네 가지 방법들 상에서 수행된 모든 벤치마크 프로그램들에 대해 대역폭 요구량과 디렉토리 제어기 점유도, 메모리 접근 지연을 감소시켜서 프로그램의 수행을 가속시켰다. 게다가, 세그먼트 디렉토리는 추가적인 하드웨어 부담이나 프로토콜 복잡도 없이 간단하게 구현될 수 있다.
PDF

Semi-dynamic Task Allocation for Parallel Spatial Joins (병렬 공간 조인을 위한 준동적 태스크 할당)

김진덕;서영덕;홍봉희
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.13-15
- /
- 2001
최근 병렬 시스템을 이용하여 공간 조인의 성능 방안에 연구가 진행되고 있다. 그렇지만 프로세서의 수가 증가할수록 병렬 처리에 의한 프로세서의 효율성은 급격히 떨어진다. 이것은 병렬 공간 조인을 수행할 경우 순차 공간 조인 보다 디스크 병목 현상과 메시지 전송 오버헤드가 심하게 발행하기 때문이다. 이 논문에서는 공유 디스크 구조에서 다중 프로세서의 디스크 동시 접근으로 인한 병목 현상을 환화하고, 메시지 전송을 최소화하기 위한 태스크 할당 기법을 제안하였다. 제안한 태스크 할당 기법을 두 가지 공간 조인 방법에 각각 적용하여 디스크 접근 횟수와 메시지 전송 횟수의 감소 효과를 실험으로 평가하였다. MIMD 구조 및 공유디스크 방식의 병렬 시스템에서의 다양한 실험에서 이 논문에서 제안한 준동적 태스크 할당 기법이 정적 할당과 동적 할당 기법에 비해 우수함을 보였다.
PDF

A Processor Allocation Scheme Using Task Relocation (태스크 재배치를 이용한 프로세서 할당방법)

Lee, Won-Joo;Jeon, Chang-Ho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.05a
- /
- pp.125-128
- /
- 2003
본 논문에서는 메쉬 구조 다중컴퓨터 시스템을 위한 새로운 서브메쉬 할당방법을 제안한다. 이 할당방법의 특징을 외적단편화로 인한 할당지연을 최소화하여 태스크 대기시간을 단축하는 것이다. 2차원 메쉬 구조에서는 할당 서브메쉬에 의해 상하, 좌우로 양분되는 프로세서 단편들을 연결하여 더 큰 가용 서브메쉬를 형성할 수 없는 구조적인 한계 때문에 외적단편화로 인한 서브메쉬의 할당지연이 발생한다. 이러한 할당지연은 태스크의 대기시간을 증가시키기 때문에 시스템의 성능을 저하시킨다. 따라서 본 논문에서는 외적단편화로 인해 서브메쉬의 할당지연이 발생하면 할당서브메쉬에서 수행중인 태스크들을 다른 가용 서브메쉬에 재배치하고 프로세서 단편들을 통합하여 할당함으로써 태스크의 대기시간을 줄인다. 시뮬레이션을 통하여 제안한 할당방법이 태스크의 대기시간을 줄이는 면에서 기존의 할당방법들 보다 우수함을 보인다.
PDF

Design and Implementation of 10Gigabit Ethernet System with IPC and Frame MUX/DEMUX Architecture (10기가비트 이더넷 인터페이스를 위한 프레임 다중화기/역다중화기와 IPC를 갖는 10기가비트 이더넷 시스템의 설계 및 구현)

조규인;김유진;정해원;조경록
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.41 no.5
- /
- pp.27-36
- /
- 2004
In this paper, we propose the ethernet Inter-Processor Communication (IPC) network architecture and 10gigabit ethernet frame multiplex/demultiplexer architecture for the edge switch system based on Linux that has 10 Gigabit Ethernet (10Gigabit Ethernet) port with 72Gbps capacities. we discuss the ethernet IPC with ethernet switch and we propose design and implementation of ethernet Inter-Processor Communication (IPC) network architecture and multiple gigabit ethernet frame rnultiplexing/demultiplexing scheme to handle 10gigabit ethernet frame instead of using 10gigabit network processor. And then ethernet Inter-Processor Communication (IPC) network architecture and 10gigabit ethernet frame MUX/DMUX architecture is designed verified and implemented.
PDF KSCI

Design and Performance Analysis of a Parallel Optimal Branch-and-Bound Algorithm for MIN-based Multiprocessors (MIN-based 다중 처리 시스템을 위한 효율적인 병렬 Branch-and-Bound 알고리즘 설계 및 성능 분석)

Yang, Myung-Kook
- Journal of IKEEE
- /
- v.1 no.1 s.1
- /
- pp.31-46
- /
- 1997
In this paper, a parallel Optimal Best-First search Branch-and-Bound(B&B) algorithm(pobs) is designed and evaluated for MIN-based multiprocessor systems. The proposed algorithm decomposes a problem into G subproblems, where each subproblem is processed on a group of P processors. Each processor group uses tile sub-Global Best-First search technique to find a local solution. The local solutions are broadcasted through the network to compute the global solution. This broadcast provides not only the comparison of G local solutions but also the load balancing among the processor groups. A performance analysis is then conducted to estimate the speed-up of the proposed parallel B&B algorithm. The analytical model is developed based on the probabilistic properties of the B&B algorithm. It considers both the computation time and communication overheads to evaluate the realistic performance of the algorithm under the parallel processing environment. In order to validate the proposed evaluation model, the simulation of the parallel B&B algorithm on a MIN-based system is carried out at the same time. The results from both analysis and simulation match closely. It is also shown that the proposed Optimal Best-First search B&B algorithm performs better than other reported schemes with its various advantageous features such as: less subproblem evaluations, prefer load balancing, and limited scope of remote communication.
PDF

An Efficient Parallel Information Retrieval System using Document Clustering (문서 클러스터링에 의한 효율적인 병렬 정보검색 시스템)

Gang, Yu-Gyeong;Ryu, Gwang-Ryeol;Jeong, Sang-Hwa
- Journal of KIISE:Software and Applications
- /
- v.28 no.2
- /
- pp.157-167
- /
- 2001
본 논문은 고품질의 정보를 신속하게 제공할 수 있으면서 가격대 성능비가 우수한 병렬 정보 검색 시스템을 제시하고 있다. 본 검색 시스템은 문서 라이브러리를 여러 개의 클러스터로 세분화하고 검색 시 클러스터 단위로 프로세서에 할당함으로써 작업 단위를 적절한 규모로 하였을 뿐만 아니라, 문서의 점수 계산 시 프로세서 간 통신이 전혀 필요치 않게 하였다. 검색은 1차로 클러스터 레벨에서 관련 클러스터들을 찾는 것으로 시작하여 2차로 관련 클러스터 내에서 실제 문서를 찾는 방식으로 이루어진다. 이러한 계층적인 검색 구조로 인하여 1차 검색 후 여과가 가능하므로 전체적인 검색의 부하를 줄일 수 있다. 또한 문서의 클러스터가 가능한 한 유사한 문서군이 되도록 함으로써 불필요한 클러스터가 검색될 가능성을 최소화하여 성능을 높였다. 본 검색 시스템은 분산메모리 MIMD 구조의 다중 트랜스퓨터 시스템에서 구현되었으며, 실험 결과 무작위적으로 클러스터링한 경우에 비해 유사 문서군으로 클러스터링한 접근 방법이 우수함을 확인하였다.
PDF

Advanced Victim Cache with Processor Reuse Information (프로세서의 재사용 정보를 이용하는 개선된 고성능 희생 캐쉬)

Kwak Jong Wook;Lee Hyunbae;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.12
- /
- pp.704-715
- /
- 2004
Recently, a single or multi processor system uses the hierarchical memory structure to reduce the time gap between processor clock rate and memory access time. A cache memory system includes especially two or three levels of caches to reduce this time gap. Moreover, one of the most important things In the hierarchical memory system is the hit rate in level 1 cache, because level 1 cache interfaces directly with the processor. Therefore, the high hit rate in level 1 cache is critical for system performance. A victim cache, another high level cache, is also important to assist level 1 cache by reducing the conflict miss in high level cache. In this paper, we propose the advanced high level cache management scheme based on the processor reuse information. This technique is a kind of cache replacement policy which uses the frequency of processor's memory accesses and makes the higher frequency address of the cache location reside longer in cache than the lower one. With this scheme, we simulate our policy using Augmint, the event-driven simulator, and analyze the simulation results. The simulation results show that the modified processor reuse information scheme(LIVMR) outperforms the level 1 with the simple victim cache(LIV), 6.7% in maximum and 0.5% in average, and performance benefits become larger as the number of processors increases.
PDF KSCI

Biased Multistage Inter connection Network in Multiprocessor System (다중프로세서 시스템에서 편향된 다단계 상호연결망)

Choi, Chang-Hoon
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.12 no.4
- /
- pp.1889-1896
- /
- 2011
There has been a lot of researches to develop techniques that provide redundant paths, there by making Multistage Interconnection Networks(MINs) fault tolerant. So far, the redundant paths in MINs have been realized by adding additional hardware such as extra stages or duplicated data links. This paper presents a new MIN topology called Hierarchical MIN. The proposed MIN is constructed with 2.5N-4 switching elements, which are much fewer than that of the classical MINs. Even though there are fewer hardware than the classical MINs, the HMIN possesses the property of full access and also provides alternative paths for the fault tolerant. Furthermore, since there is the short cut in HMIN for the localized communication, it takes advantage of exploiting the locality of reference in multiprocessor systems. Its performance under varying degrees of localized communication is analysed and simulated.
https://doi.org/10.5762/KAIS.2011.12.4.1889 인용 PDF KSCI

Design and Implementation of an InfiniBand System Interconnect for High-Performance Cluster Systems (고성능 클러스터 시스템을 위한 인피니밴드 시스템 연결망의 설계 및 구현)

Mo, Sang-Man;Park, Kyung;Kim, Sung-Nam;Kim, Myung-Jun;Im, Ki-Wook
- The KIPS Transactions:PartA
- /
- v.10A no.4
- /
- pp.389-396
- /
- 2003
InfiniBand technology is being accepted as the future system interconnect to serve as the high-end enterprise fabric for cluster computing. This paper presents the design and implementation of the InfiniBand system interconnect, focusing on an InfiniBand host channel adapter (HCA) based on dual ARM9 processor cores The HCA is an SoC tailed KinCA which connects a host node onto the InfiniBand network both in hardware and in software. Since the ARM9 processor core does not provide necessary features for multiprocessor configuration, novel inter-processor communication and interrupt mechanisms between the two processors were designed and embedded within the KinCA chip. Kinch was fabricated as a 564-pin enhanced BGA (Bail Grid Array) device using 0.18${\mu}{\textrm}{m}$ CMOS technology Mounted on host nodes, it provides 10 Gbps outbound and inbound channels for transmit and receive, respectively, resulting in a high-performance cluster system.
https://doi.org/10.3745/KIPSTA.2003.10A.4.389 인용 PDF KSCI

Parallel Programming for Exploiting Hybrid Parallel Model of CLUMP system and its Performance Evaluation (다중 메모리 모델의 CLUMP 시스템을 이용하기 위한 병렬 프로그래밍 기법과 성능 평가)

이용욱;라마크리쉬나
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.621-623
- /
- 2000
클러스터를 구성하는 단위 노드로 SMP가 새로운 대안으로 시장에 등장하였다. 이러한 멀티프로세서 클러스터(CLUMP)는 하나의 시스템에 다중 메모리 구조를 가지는데, CLUMP가 가지는 다중 메모리 구조를 효과적으로 사용하기 위해서 본 논문에서는 중첩된 병렬화 프로그램 모델을 제안하였다. 중첩된 병렬화 모델은 중첩된 루프 레벨의 병렬화, 중첩된 태스크 레벨의 병렬화, 그리고 다중 중첩된 병렬화로 나뉜다. 본 논문에서는 중첩된 루프 레벨의 병렬화를 실험대상으로 하여 그 성능을 평가하고 단일 메모리 구조의 병렬화 프로그램과 성능을 비교하였다. 실험 결과 시험한 중첩된 병렬화 모델이 단일 메모리 구조의 병렬화 프로그램에 비하여 좋은 성능을 나타내었지만, 실험대상이 된 루프 레벨 병렬화의 잠재적인 특징으로 인해 실행에 참여하는 노드 수가 많아질수록 성능 향상 폭이 감소하는 결과를 보였다. 프로그램의 성능 향상 폭과 확장성은 문제 크기가 클수록 좋은 특성을 보였다.
PDF

Search Result 281, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)