Search | Korea Science

Performance and Scalability of OpenMP Programs on Chip-MultiThreading Server (칩 멀티쓰레딩 서버에서 OpenMP 프로그램의 성능과 확장성)

Lee Myung-Ho;Kim Yong-Kyu
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.137-146
- /
- 2006
Shared Memory Multiprocessor (SMP) systems adopting Chip-level MultiThreading (CMT) technology are becoming mainstream servers in commercial applications and High Performance Computining (HPC) applications as well. OpenMP has become the standard paradigm to parallelize applications for SMP mostly because of its ease of use. As the demand for more computing power in HPC applications is growing rapidly, obtaining high performance and scalability for these applications parallelized using OpenMP API's will become more important. In this paper, we study the performance and scalability of HPC applications parallelized using OpenMP, SPEC OMPL (standard OpenMP benchmark suite), on the Sun Fire E25K server which adopts CMT technology. We also study the effect of CMT on SPEC OMPL.
https://doi.org/10.3745/KIPSTA.2006.13A.2.137 인용 PDF KSCI

An efficient interconnection network topology in dual-link CC-NUMA systems (이중 연결 구조 CC-NUMA 시스템의 효율적인 상호 연결망 구성 기법)

Suh, Hyo-Joong
- The KIPS Transactions:PartA
- /
- v.11A no.1
- /
- pp.49-56
- /
- 2004
The performance of the multiprocessor systems is limited by the several factors. The system performance is affected by the processor speed, memory delay, and interconnection network bandwidth/latency. By the evolution of semiconductor technology, off the shelf microprocessor speed breaks beyond GHz, and the processors can be scalable up to multiprocessor system by connecting through the interconnection networks. In this situation, the system performances are bound by the latencies and the bandwidth of the interconnection networks. SCI, Myrinet, and Gigabit Ethernet are widely adopted as a high-speed interconnection network links for the high performance cluster systems. Performance improvement of the interconnection network can be achieved by the bandwidth extension and the latency minimization. Speed up of the operation clock speed is a simple way to accomplish the bandwidth and latency betterment, while its physical distance makes the difficulties to attain the high frequency clock. Hence the system performance and scalability suffered from the interconnection network limitation. Duplicating the link of the interconnection network is one of the solutions to resolve the bottleneck of the scalable systems. Dual-ring SCI link structure is an example of the interconnection network improvement. In this paper, I propose a network topology and a transaction path algorism, which optimize the latency and the efficiency under the duplicated links. By the simulation results, the proposed structure shows 1.05 to 1.11 times better latency, and exhibits 1.42 to 2.1 times faster execution compared to the dual ring systems.
https://doi.org/10.3745/KIPSTA.2004.11A.1.049 인용 PDF KSCI

Design and Implementation of Anti-virus Software for Server Systems supporting Dynamic Load Balancing (동적 부하균형을 지원하는 서버용 안티바이러스 소프트웨어 설계 및 구현)

Choi, Ju-Young;Sung, Ji-Yeon;Bang, He-Mi;Choi, Eun-Jung;Kim, Myuhng-Joo
- Convergence Security Journal
- /
- v.6 no.1
- /
- pp.13-23
- /
- 2006
It is more desirable to execute AV software on server systems rather than on clients for the minimization of damages due to malicious codes. AV software on server system, however, may aggravate the load of server system. In this paper, we propose a new AV software executed on server system without additional loads with a monitor and multi-agent model. The new AV software supports dynamic load balancing that reflects the features of AV engine, and thus it can be executed efficiently on server systems. The results of performance evaluation on the AV software attest to the strong points of the new AV software.
PDF

Locking in Practice : Performance of a Database System on a Multicore Machine (락의 실제 : 멀티코어 상의 데이터베이스 성능 분석)

Han, Hyuck
- The Journal of the Korea Contents Association
- /
- v.14 no.8
- /
- pp.22-29
- /
- 2014
A lock is a general and popular way of serializing accesses to shared data in multiprocessor environments. After the mutual exclusion was first introduced in 1960s, many spinlock algorithms have been proposed and deployed to real systems such as operating systems and (transactional) database systems. In this study, we measure impacts of a lock mechanism on a database system under various CPU configurations using a high-end multicore system. For the evaluation, we use the most up-to-date version of MySQL (version 5.6) with InnoDB engine, which has been substantially re-architected to improve scalability on multicore machines. We changed the original spinlock function of InnoDB to evaluate various spinlock mechanisms on multicore machines.
https://doi.org/10.5392/JKCA.2014.14.08.022 인용 PDF KSCI

Efficient task allocation algorithms for reducing processors on real-time multiprocessor system (실시간 다중프로세서 환경에서 프로세서 수의 감소를 위한 효율적인 타스크 배치방식)

신명호;이정태;박승규
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.11
- /
- pp.2801-2809
- /
- 1996
Scheduling problems in real-time systems are known to be NP-hard. the heuristic approaches aregenerally aplied to solve a certain class of systems. One of such cases is to allocate periodic tasks to multiprocessors while the moethod assures the requirement of the deadine constraints of real-time systems. The study on the allocation of periodic taks includes RMNF, RMFF, FFDUF and Next-Fit-M algorithms, which make a set of task grups first and then allocate to processors. This papre proposes the various algorithms which are based on the Next-Fit-M. To analyze the four proposed methods, simulation was carried on, in which the sample tasks are randomly generated with the various time intervals. The proposed algorithms reduce the number of processors compared with the conventional methods.
PDF

An Efficient Processor Synchronization Scheme on Shared Memory Multiprocessor (공유메모리 다중처리기에서 효율적인 프로세서 동기화 기법)

윤석한;원철호;김덕진
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.5
- /
- pp.683-692
- /
- 1995
Many kinds of large scale multiprocessing and parallel-processing systems have recently been developed. The contention on the shared data caused by multiple processors may degrade system performance. So, processor synchronization has become one of the important issues in these systems. To solve the synchornization issues, a lot of software and hardware schemes based on spin lock have been proposed. Although software schemes are easy to implement, hardware schemes are preferred in many systems to gain optimized performance. This paper proposes an efficient processor synchronization scheme, called QCX,and describes its design considerations, hardware, algorithm, protocol. Also, in this paper, the performance of QCX has been evaluated with QOLB[5] and LBP[7] using a simulation. The simulation, with varying the number of processor and the contention on shared variables, measured the average execution times of a workload. The simulation results show that the performances of QCX is best when practicability is considered. QCX is more efficient than QOLB and LBP in two aspects. First, the hardware of QCX is more simple and cost-effective because the cache structure need not be changed. Secondly, QCX is more general because it uses a generic atomic instruction.
PDF

An Efficient List Scheduling Algorithm for Multiprocesor Systems (다중 처리기 시스템을 위한 효율적인 리스트 스케줄링 알고리듬)

Park, Gyeong-Rin;Chu, Hyeon-Seung;Lee, Jeong-Hun
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.7
- /
- pp.2060-2071
- /
- 2000
Scheduling parallel tasks, represented as a Directed Acyclic Graph (DAG) or task graph, on a multiprocessor system has been an important research area in the past decades. List scheduling algorithms assign priorities to a node or an edge in an input DAG, and then generate a schedule according to the assigned priorities. This appear proposes a list scheduling algorithms with effective method of priority assignments. The paper also analyzes the worst case performance and optimality condition for the proposed algorithm. The performance comparison study shows that the proposed algorithms outperforms existing scheduling algorithms especially for input DAGs with high communication overheads. The performance improvement over existing algorithms becomes larger as the input DAG becomes more dense and the level of parallelism in the DAG is increased.
PDF

Process Algebra for Multiple Shared Resources (다중 공유 자원을 위한 프로세스 대수)

Yoo, Hee-Jun;Lee, Ki-Huen;Choi, Jin-Young
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.3
- /
- pp.337-344
- /
- 2000
In this paper, we define a Process Algebra ACSMR(Algebra of Communicating Shared Multiple Resources) for system specification and verification using multiple resources. ACSMR extends a concept of multiple resources in ACSR that is a branch of formal methods based on process algebra. We'll show that two specification and verification examples. One is the specification of system behavior in multiprocessor using EDF(Earliest-Deadline-First) which is a scheduling algorithm of a real-time system. The other is the specification of describing timing analysis and resources restriction in a super scalar processor using multiple ports registers.
PDF

Theoretical Performance Bounds and Parallelization of a Two-Dimensional Packing Algorithm (이차원 팩킹 알고리즘의 이론적 성능 분석과 병렬화)

Hwang, In-Jae;Hong, Dong-Kweon
- The KIPS Transactions:PartA
- /
- v.10A no.1
- /
- pp.43-48
- /
- 2003
Two-dimensional packing algorithm can be used for allocating submeshes in mesh multiprocessor systems. Previously, we developed an efficient packing algorithm called TP heuristic, and showed how the results of the packing could be used for allocating submeshes. In this paper, we present theoretical performance bounds for TP heuristic. We also present a parallel version of the algorithm that consumes reduced time when it is executed by multiple processors in mesh multiprocessors.
https://doi.org/10.3745/KIPSTA.2003.10A.1.043 인용 PDF KSCI

Design and implementation of a dynamic controller for Hong-Ik Direct Drive Arm (홍익 직접 구동팔의 동적 제어기 개발)

이재완;이종수;최경삼
- 제어로봇시스템학회:학술대회논문집
- /
- 1993.10a
- /
- pp.1052-1057
- /
- 1993
A scara type Direct Drive Arm(DDA) with two degrees-of-freedom is designed and implemented. The direct drive motor is used to furnish large torque to reduce the modeling error by the gear and chains. To control the DDA, a multiprocessor control structure with multirate dynamic control algorithm is designed. In the control algorithm, the dynamics of system is used to calculate the nominal control torque and the feedback controls are calculated with a parallel processing algorithm for each joint. The laboratory experiments on Hong-Ik DDA by dynamic control algorithm are presented and compared to that of PID control algorithm. This result shows that the proposed controller guarantees small trajectory error and stability. With this research, Hong-Ik DDA is expected to be utilized as A basic tool for robotics and control engineering.
PDF

Search Result 162, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)