• Title/Summary/Keyword: 멀티프로세싱

Search Result 93, Processing Time 0.025 seconds

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

VDI deployment and performance analysys for multi-core-based applications (멀티코어 기반 어플리케이션 운용을 위한 데스크탑 가상화 구성 및 성능 분석)

  • Park, Junyong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1432-1440
    • /
    • 2022
  • Recently, as Virtual Desktop Infrastructure(VDI) is widely used not only in office work environments but also in workloads that use high-spec multi-core-based applications, the requirements for real-time and stability of VDI are increasing. Accordingly, the display protocol used for remote access in VDI and performance optimization of virtual machines have also become more important. In this paper, we propose two ways to configure desktop virtualization for multi-core-based application operation. First, we propose a codec configuration of a display protocol with optimal performance in a high load situation due to multi-processing. Second, we propose a virtual CPU scheduling optimization method to reduce scheduling delay in case of CPU contention between virtual machines. As a result of the test, it was confirmed that the H.264 codec of Blast Extreme showed the best and stable frame, and the scheduling performance of the virtual CPU was improved through scheduling optimization.

A Service Architecture to support IP Multicast Service over UNI 4.0 based ATM Networks (UNI 4.0 기반 ATM 망에서의 IP 멀티캐스트 지원 방안을 위한 서비스 구조)

  • Lee, Mee-Jeong;Jung, Sun;Kim, Ye-kyung
    • Journal of KIISE:Information Networking
    • /
    • v.27 no.3
    • /
    • pp.348-359
    • /
    • 2000
  • Most of the important real time multimedia applications require multipoint transmissions. To support these applications in ATM based Intermet environments, it is important to provide efficient IP multicast transportations over ATM networks. IETF proposed MARS(Multicast Address Resolution Server) as the service architecture to transport connectionless IP multicast flows over connection oriented ATM VCs. MARS assumes UNI3.0/3.1 signalling. Since UNI3.0/3.1 does not provide any means for receivers to request a join for a multicast ATM VC, MARS provides overlay service to relay join request from IP multicast group members to the sources of the multicast group. Later on, ATM Forum standardized UNI4.0 signalling which is provisioned with a new signalling mechanism called LIJ(Leaf Initiated Join). LIJ enables receivers to directly signal the source of an ATM VC to join. In this paper, we propose a new service architecture providing IP multicast flow transportation over ATM networks deploying UNI4.0 signalling. The proposed architecture is named UNI4MARS. It comprises service components same as those of the MARS. The main functionality provided by the UNI4MARS is to provide source information to the receivers so that the receivers may exploit LIJ to join multicast ATM VCs dynamically. The implementation overhead of UNI4MARS and that of MARS are compared by a course of simulations. The simulation results show that the UNI4MARS supports the dynamic IP multicast group changes more efficiently with respect to processing, memory and bandwidth overhead.

  • PDF

GPU에서의 SEED암호 알고리즘 수행을 통한 공인인증서 패스워드 공격 위협과 대응

  • Kim, Jong-Hoi;Ahn, Ji-Min;Kim, Min-Jae;Joo, Yons-Sik
    • Review of KIISC
    • /
    • v.20 no.6
    • /
    • pp.43-50
    • /
    • 2010
  • 병렬처리를 이용한 GPU(그래픽 프로세싱 유닛)의 연산 능력이 날이 갈수록 고속화됨에 따라 GPU에 대한 관심이 높아지고 있다. GPU는 다중 쓰레드 처리가 가능하도록 CPU보다 수십 배 많은 멀티코어로 구성되어 있으며 이 각각의 코어는 맹렬 프로그래밍이 가능하도록 처리 결과를 공유할 수 있다. 최근 해외에서 이러한 GPU의 연산 능력을 이용한 해쉬인증 공격의 효과가 다수 입증되었으며 패스워드 기반의 인증 방식이 보편화 되어있는 국내에서도 GPU를 이용한 인증 공격이 시도되고 있다. 본 논문에서는 국내 금융권에서 사용되고 있는 공인인증서의 개인키 복호화 과정을 GPU내에서 고속 수행이 가능하도록 개선하고, 이를 바탕으로 패스워드 무차별 대입 공격을 시도하여 공인 인증서에 사용되는 패스워드가 보안의 안전지대만이 아님을 보인다. 또한 날로 발전하는 하드웨어의 연산속도에 맞추어 공인인증서 등에 보편적으로 사용되는 패스워드 정책의 개선 방안을 제시한다.

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

A Design and Implementation of Packet Processing Engine for Handling Large Volumes of Traffic (대용량 트래픽 처리를 위한 패킷 처리 엔진 설계 및 구현)

  • Yoon, Joo-Yeong;Kim, Myoung-Soo;Chang, Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.325-326
    • /
    • 2020
  • 최근 5G의 영향으로 인터넷에 연결되는 사람과 기기가 더욱 증가하고 있고 새로운 사물인터넷(Internet of Things) 애플리케이션이 가능해짐에 따라 트래픽 양이 급증하고 있다. 그러나 국내의 많은 기업은 이러한 트래픽을 분석하기 위해 고비용의 외산 제품을 이용하고 있다. 그러나 이러한 제품은 네트워크상에서 처리되는 트래픽에 대한 통계 데이터를 저장하고 보여주는 것을 주된 목적으로 사용하고 있을 뿐 패킷을 자세하게 분석하기 어렵다는 단점이 있다. 따라서 본 논문에서는 대용량 트래픽 처리를 위한 효율적인 패킷 처리 엔진을 제안한다. 이 패킷 처리 엔진은 다수의 Core Process를 활용하여 시스템 자원을 최대한 활용할 수 있도록 하고, 멀티 프로세싱을 통하여 각 노드의 작업부하를 균등하게 유지함으로써 작업의 대기시간을 줄이고, 각 작업의 수행 시간을 최소화한다. 본 논문에서 제안하는 대용량 트래픽 처리를 위한 패킷 처리 엔진은 기존의 트래픽 처리를 수행하는 패킷 처리 엔진보다 고성능 컴퓨팅 시스템의 성능 향상 면에서 우수함을 보인다.

  • PDF

Web crawler Improvement and Dynamic process Design and Implementation for Effective Data Collection (효과적인 데이터 수집을 위한 웹 크롤러 개선 및 동적 프로세스 설계 및 구현)

  • Wang, Tae-su;Song, JaeBaek;Son, Dayeon;Kim, Minyoung;Choi, Donggyu;Jang, Jongwook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1729-1740
    • /
    • 2022
  • Recently, a lot of data has been generated according to the diversity and utilization of information, and the importance of big data analysis to collect, store, process and predict data has increased, and the ability to collect only necessary information is required. More than half of the web space consists of text, and a lot of data is generated through the organic interaction of users. There is a crawling technique as a representative method for collecting text data, but many crawlers are being developed that do not consider web servers or administrators because they focus on methods that can obtain data. In this paper, we design and implement an improved dynamic web crawler that can efficiently fetch data by examining problems that may occur during the crawling process and precautions to be considered. The crawler, which improved the problems of the existing crawler, was designed as a multi-process, and the work time was reduced by 4 times on average.

Dynamic Power Management Framework for Mobile Multi-core System (모바일 멀티코어 시스템을 위한 동적 전력관리 프레임워크)

  • Ahn, Young-Ho;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.52-60
    • /
    • 2010
  • In this paper, we propose a dynamic power management framework for multi-core systems. We reduced the power consumption of multi-core processors such as Intel Centrino Duo and ARM11 MPCore, which have been used at the consumer electronics and personal computer market. Each processor uses a different technique to save its power usage, but there is no embedded multi-core processor which has a precise power control mechanism such as dynamic voltage scaling technique. The proposed dynamic power management framework is suitable for smart phones which have an operating system to provide multi-processing capability. Basically, our framework follows an intuitive idea that reducing the power consumption of idle cores is the most effective way to save the overall power consumption of a multi-core processor. We could minimize the energy consumption used by idle cores with application-targeted policies that reflect the characteristics of active workloads. We defined some properties of an application to analyze the performance requirement in real time and automated the management process to verify the result quickly. We tested the proposed framework with popular processors such as Intel Centrino Duo and ARM11 MPCore, and were able to find that our framework dynamically reduced the power consumption of multi-core processors and satisfied the performance requirement of each program.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

A Performance Improvement of Linux TCP/IP Stack based on Flow-Level Parallelism in a Multi-Core System (멀티코어 시스템에서 흐름 수준 병렬처리에 기반한 리눅스 TCP/IP 스택의 성능 개선)

  • Kwon, Hui-Ung;Jung, Hyung-Jin;Kwak, Hu-Keun;Kim, Young-Jong;Chung, Kyu-Sik
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.113-124
    • /
    • 2009
  • With increasing multicore system, much effort has been put on the performance improvement of its application. Because multicore system has multiple processing devices in one system, its processing power increases compared to the single core system. However in many cases the advantages of multicore can not be exploited fully because the existing software and hardware were designed to be suitable for single core. When the existing software runs on multicore, its performance improvement is limited by the bottleneck of sharing resources and the inefficient use of cache memory on multicore. Therefore, according as the number of core increases, it doesn't show performance improvement and shows performance drop in the worst case. In this paper we propose a method of performance improvement of multicore system by applying Flow-Level Parallelism to the existing TCP/IP network application and operating system. The proposed method sets up the execution environment so that each core unit operates independently as much as possible in network application, TCP/IP stack on operating system, device driver, and network interface. Moreover it distributes network traffics to each core unit through L2 switch. The proposed method allows to minimize the sharing of application data, data structure, socket, device driver, and network interface between each core. Also it allows to minimize the competition among cores to take resources and increase the hit ratio of cache. We implemented the proposed methods with 8 core system and performed experiment. Experimental results show that network access speed and bandwidth increase linearly according to the number of core.