Search | Korea Science

A dual-link CC-NUMA System Tolerant to the Multiprogramming Environment (다중 프로그램 환경에 적합한 이중 연결 CC-NUMA 시스템)

Suh, Hyo-Joong
- The KIPS Transactions:PartA
- /
- v.11A no.3
- /
- pp.199-206
- /
- 2004
Under the multiprogrammed situation, the performance of multiprocessor system is affected by the process allocation policy of the operating systems. The lowest communication cost can be achieved when the related processes positioned to the adjacent processors. While the effective allocation is quite difficult to the real situation, and the processing of the allocation policy consumes some computation time. The dual-ring CC-NUMA systems exhibit a quite performance difference according to the process a1location policy due to a lot of unbalanced memory transactions on the interconnection networks. In this paper, I propose a load balanced dual-link CC-NUMA system that does not requires the processes allocation policy. By the program-driven simulation results. the proposed system shows no remarkable difference according to the allocation policy while the dual-ring systems shows 10％ performance improvement by the process allocation. In addition, the proposed system outperforms the dual~ring systems about 1.5 times.
https://doi.org/10.3745/KIPSTA.2004.11A.3.199 인용 PDF KSCI

A Study on Buffer and Shared Memory Optimization for Multi-Processor System (다중 프로세서 시스템에서의 버퍼 및 공유 메모리 최적화 연구)

Kim, Jong-Su;Mun, Jong-Uk;Im, Gang-Bin;Jeong, Gi-Hyeon;Choe, Gyeong-Hui
- The KIPS Transactions:PartA
- /
- v.9A no.2
- /
- pp.147-162
- /
- 2002
Multi-processor system with fast I/O devices improves processing performance and reduces the bottleneck by I/O concentration. In the system, the Performance influenced by shared memory used for exchanging data between processors varies with configuration and utilization. This paper suggests a prediction model for buffer and shared memory optimization under interrupt recognition method using mailbox. Ethernet (IEEE 802.3) packets are used as the input of system and the amount of utilized memory is measured for different network bandwidth and burstiness. Some empirical studies show that the amount of buffer and shared memory varies with packet concentration rate as well as I/O bandwidth. And the studies also show the correlation between two memories.
https://doi.org/10.3745/KIPSTA.2002.9A.2.147 인용 PDF KSCI

A Parallel Loop Scheduling Algorithm on Multiprocessor System Environments (다중프로세서 시스템 환경에서 병렬 루프 스케쥴링 알고리즘)

이영규;박두순
- Journal of Korea Multimedia Society
- /
- v.3 no.3
- /
- pp.309-319
- /
- 2000
The purpose of a parallel scheduling under a multiprocessor environment is to carry out the scheduling with the minimum synchronization overhead, and to perform load balance for a parallel application program. The processors calculate the chunk of iteration and are allocated to carry out the parallel iteration. At this time, it frequently accesses mutually exclusive global memory so that there are a lot of scheduling overhead and bottleneck imposed. And also, when the distribution of the parallel iteration in the allocated chunk to the processor is different, the different execution time of each chunk causes the load imbalance and badly affects the capability of the all scheduling. In the paper. we investigate the problems on the conventional algorithms in order to achieve the minimum scheduling overhead and load balance. we then present a new parallel loop scheduling algorithm, considering the locality of the data and processor affinity.
PDF

A Heuristic Load Balancing Algorithm by using Iterative Load Transfer (반복적인 부하 이동에 의한 휴리스틱 부하 평형 알고리즘)

Song Eui-Seok;Oh Ha-Ryung;Seong Yeong-Rak
- The KIPS Transactions:PartA
- /
- v.11A no.7 s.91
- /
- pp.499-510
- /
- 2004
This paper proposes a heuristic load balancing algorithm for multiprocessor systems. The algorithm minimizes the number of idle links to distribute load traffic and reduces its communication cost. Each processor iteratively tries to transfer unit load to/from every neighbor processors. However, real load transfer is collectively done after complete load traffic calculation to minimize useless traffic. The proposed algorithm can be employed in various interconnection topologies with slight modifications. In this paper, it is applied to both hypercube and mesh environments. For performance evaluation, simulation studies are performed. The performance of proposed algorithm is compared to those of two well-known algorithms. The results show that the proposed algorithm always balances the loads perfectly. Furthermore, it reduces the communication costs by $70{\%}{\~}90{\%}$ in the hypercube ; and it reduces the cost by $\75{\%}$ in the mesh, compared to existing algorithms.
https://doi.org/10.3745/KIPSTA.2004.11A.7.499 인용 PDF KSCI

Efficient Processor Allocation based on Join Selectivity in Multiple Hash Joins using Synchronization of Page Execution Time (페이지 실행시간 동기화를 이용한 다중 해쉬 결합에서 결합률에 따른 효율적인 프로세서 할당 기법)

Lee, Gyu-Ok;Hong, Man-Pyo
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.3
- /
- pp.144-154
- /
- 2001
다중 결합 질의에 포함된 다수의 결합 연산지를 효율적으로 처리하기 위해 서는 효율적인 병렬 알고리즘이 필요하다. 최근 다중 해쉬 결합 질의의 처리를 위해 할당 트리를 이용한 방법이 가장 우수한 것으로 알려져 있다. 그러나 이 방법은 실제 결합 시에 할당 트리의 각 노드에서 필연적인 지연이 발생되는 데 이는 튜플-시험 단계에서 외부 릴레이션을 디스크로부터 페이지 단위로 읽는 비용과 이미 읽는 페이지에 대한 해쉬 결합 비용간의 차이에 의해 발생하게 된다. 이들 사이의 실행시간을 가급적 일치시키기 위한 '페이지 실행시간 동기화'기법이 제안되었고 이를 통해 할당 트리 한 노드 실행에 있어서의 지연 시간을 줄일 수 있었다. 하지만 지연 시간을 최소화하기 위해 할당되어질 프로세서의 수 즉, 페이지 실행시간 동기화 계수(k)는 실제 결합 시의 결합률에 따라 상당한 차이를 보이게 되고 결국, 이 차이를 고려하지 않은 다중 해쉬 결합은 성능 면에서 크게 저하될 수밖에 없다. 본 논문에서는 결합 이전에 어느 정도의 결합률을 예측할 수 있다는 전제하에 다중 해쉬 결합 실행 시에 발생할 수 있는 지연 시간을 최소화 할 수 있도록 결합률에 따라 최적의 프로세서들을 노드에 할당함으로서 다중 해쉬 결합의 실행 성능을 개선하였다. 그리고 분석적 비용 모형을 세워 기존 방식과의 다양한 성능 분석을 통해 비용 모형의 타당성을 입증하였다.
PDF

A Low-Complexity Processor for Joint QR decomposition and Lattice Reduction for MIMO Systems (다중 입력 다중 출력 통신 시스템을 위한 저 복잡도의 Joint QR decomposition-Lattice Reduction 프로세서)

Park, Min-Woo;Lee, Sang-Woo;Kim, Tae-Hwan
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.8
- /
- pp.40-48
- /
- 2015
This paper presents a processor that performs QR decomposition (QRD) as well as Lattice Reduction (LR) for multiple-input multiple-output (MIMO) systems. By sharing the operations commonly required in QRD and LR, the hardware complexity of the proposed processor is reduced significantly. In addition, the proposed processor is designed based on a multi-cycle architecture so as to reduce the hardware complexity. The proposed processor is implemented with 139k logic gates in a $0.18-{\mu}m$ CMOS process, and its latency is $5{\mu}s$ for $8{\times}8$ MIMO preprocessing both QRD and LR where the operating frequency is 117MHz.
https://doi.org/10.5573/ieie.2015.52.8.040 인용 PDF KSCI

A Study on the Simplex and Distributed Multiplex type System for the Radar Data Processing (레이다 정보처리를 위한 단일형 및 분산다중형 시스템에 관한 연구)

김춘길
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.18 no.11
- /
- pp.1785-1796
- /
- 1993
Thanks to the data processing facilities of modern digital computers, the performances of radar has been promoted greatly as one of the main components of command and control systems along with the computer communications. In this study, radar data integrating and processing systems were designed for the data processing of various information from many kinds of radar in a single data processing system. The performance of the data integrating system was analyzed by applying queueing theory. A radar data integrating network was designed for synchronous relational operations among the information processing systems and the transmission characteristics were also analysed by specific models for each system. The designed data integrating systems can be divided into a simplex type and a distributed multiplex type.
PDF

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System (링 구조 NUMA 시스템에서 적응형 다중 그레인 원격 캐쉬 설계)

곽종욱;장성태;전주식
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.9
- /
- pp.461-476
- /
- 2003
Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.
PDF KSCI

Implementation and Performance Evaluation of Task Creation/Assignment Algorithms in Parallel Spatial Join using R-tree (R-tree를 이용한 병렬공간 조인의 태스크 생성/할당 알고리즘의 구현 및 성능평가)

서영덕;김진덕;홍봉희
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10b
- /
- pp.111-113
- /
- 1998
공간조인은 지리정보 시스템에서 공간분석을 위한 주요 연산중의 하나이다. 이러한 공간조인은 대상이 되는 공간 객체의 수가 증가함에 따라 연산시간이 지수적으로 증가하는 특징을 가지고 있다. 그래서 대규모 공간 데이터에 다한 공간 연산시간을 줄이기 위한 처리기법이 연구되고 있다. 그렇지만, 공유 디스크 구조에서 다중 프로세서의 디스크 동시 접근으로 인한 병목현상을 완화하고, 프로세서간의 공유 디스크 구조에서 다중 프로세서의 디스크 동시 접근으로 인한 병목현상을 완화하고, 프로세서간의 메시지 전달을 최소화하기 위한 태스크 생성방법, 태스크 할당방법에 관한 구체적인 연구가 없었다. 그래서 우선 병렬 공간 조인의 성능저하 요인을 분석하고, 이에 대한 성능 향상방안을 제시한다. 구체적으로 디스크 접근 시간을 줄이기 위한 객체 캐쉬 방법과 시공간 지역성을 이용한 태스크 생성 및 할당방법을 제시한다. 그리고 제안한 방법들에 대해 실험평가를 통해 최대 7.2배의 성능증가를 획득할 수 있음을 보여준다.

Cache Performance Analysis of Multiprocessor Systems for OLTP Applications based on a Memory-Resident DBMS (메모리 상주 DBMS 기반의 OLTP 응용을 위한 다중프로세서 시스템 캐쉬 성능 분석)

Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han;Park, Jin-Won;Lee, Kang-Woo;Kim, Yang-Woo
- Journal of KIISE:Computing Practices and Letters
- /
- v.6 no.4
- /
- pp.383-392
- /
- 2000
Currently, multiprocessors are evaluated almost exclusively with scientific applications. Commercial applications are rarely explored because it is difficult to obtain the source codes of commercial DBMS. Even when the source code is available, such as for POSTGRES, understanding the source code enough to perform detailed meaningful performance evaluations is a daunting task for computer architects.To evaluate multiprocessors with commercial applications, we have developed our own DBMS, called EZDB. EZDB is a parallelized DBMS, loosely inspired from POSTGRES, and running on top of a software architecture simulator. It is capable of executing parallel programs written in SQL. Contrary to POSTGRES, EZDB is not intended as a prototype for a production-quality DBMS. Its purpose is to easily run and evaluate the performance of commercial applications on multiprocessor architectures. To illustrate the usefulness of EZDB, we showed the cache performance data collected for the TPC-B benchmark on a shared-memory multiprocessor. The simulation results showed that the data structures exhibited unique sharing characteristics and that their locality properties and working sets were very different from those in scientific applications.
PDF

Search Result 281, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)