• Title/Summary/Keyword: 멀티코어

Search Result 415, Processing Time 0.025 seconds

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

An Admission Control for End-to-end Performance Guarantee in Next Generation Networks (Next Generation Networks에서의 단대단 성능 보장형 인입제어)

  • Joung, Jin-Oo;Choi, Jeong-Min
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8B
    • /
    • pp.1141-1149
    • /
    • 2010
  • Next Generation Networks (NGN) is defined as IP-based networks with multi-services and with multi-access networks. A variety of services and access technologies are co-existed within NGN. Therefore there are numerous transport technologies such as Differentiated Services (DiffServ), Multi-protocol Label Switching (MPLS), and the combined transport technologies. In such an environment, flows are aggregated and de-aggregated multiple times in their end-to-end paths. In this research, a method for calculating end-to-end delay bound for such a flow, provided that the information exchanged among networks regarding flow aggregates, especially the maximum burst size of a flow aggregate entering a network. We suggest an admission control mechanism that can decide whether the requested performance for a flow can be met. We further verify the suggested calculation and admission algorithm with a few realistic scenarios.

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.

Implementation of Real-time Data Stream Processing for Predictive Maintenance of Offshore Plants (해양플랜트의 예지보전을 위한 실시간 데이터 스트림 처리 구현)

  • Kim, Sung-Soo;Won, Jongho
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.840-845
    • /
    • 2015
  • In recent years, Big Data has been a topic of great interest for the production and operation work of offshore plants as well as for enterprise resource planning. The ability to predict future equipment performance based on historical results can be useful to shuttling assets to more productive areas. Specifically, a centrifugal compressor is one of the major piece of equipment in offshore plants. This machinery is very dangerous because it can explode due to failure, so it is necessary to monitor its performance in real time. In this paper, we present stream data processing architecture that can be used to compute the performance of the centrifugal compressor. Our system consists of two major components: a virtual tag stream generator and a real-time data stream manager. In order to provide scalability for our system, we exploit a parallel programming approach to use multi-core CPUs to process the massive amount of stream data. In addition, we provide experimental evidence that demonstrates improvements in the stream data processing for the centrifugal compressor.

An Effective Parallel Implementation of Sound Synthesis of Guitar using GPU (GPU를 이용한 기타의 음 합성을 위한 효과적인 병렬 구현)

  • Kang, Sung-Mo;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.8
    • /
    • pp.1-8
    • /
    • 2013
  • This paper proposes an effective parallel implementation of a physical modeling synthesis of guitar on the GPU environment. We used appropriate filter coefficients and adjusted the length of delay line for each open string to generate 44,100 six-polyphonic guitar sounds (E2, A2, D3, G4, B3, E4) by using physical modeling synthesis. In addition, we analyzed the physical modeling synthesis algorithm and observed that we can exploit parallelism inherent in the length of delay line. Thus, we assigned CUDA cores as many as the length of delay line and effectively implemented the physical modeling synthesis using GPU to achieve the highest performance. Experimental results indicated that synthetic guitar sounds using GPU were very similar to the original sounds when we compared their spectra. In addition, GPU achieved 68x and 3x better performance than high-performance TI DSP and CPU, respectively. Furthermore, this paper implemented and evaluated the performance of multi-GPU systems for the physical modeling algorithm.

MPEG-D USAC: Unified Speech and Audio Coding Technology (MPEG-D USAC: 통합 음성 오디오 부호화 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.589-598
    • /
    • 2009
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music content MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved WD3 at the 88th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC ACELP and TCX) for low frequency regions, SBR for high frequency regions and the MPEG Surround tool for stereo information. USAC can provide consistent sound quality for both speech and music content and can be applied to various applications such as multi-media download to mobile device Digital radio Mobile TV and audio books.

Design and Evaluation of a High-performance Journaling Scheme for Non-volatile Memory (비휘발성 메모리를 고려한 고성능 저널링 기법 설계 및 평가)

  • Han, Hyuck
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.8
    • /
    • pp.368-374
    • /
    • 2020
  • Journaling file systems (JFS) manage changes of file systems not yet committed in a data structure known as a journal to restore the file system in the event of an unexpected failure. Extra write operations required for journaling negatively affect the performance of JFS. The high-performance and byte-addressable non-volatile memory (NVM) was expected to easily mitigate these performance problems by providing NVM space as journal storage. However, even with such non-volatile memory technologies, performance problems still arise due to scalability problems inherent in processing transactions of JFS. To solve this problem, we proposes a technique for processing file system transactions for scalable performance. To this end, lock-free data structures are used and multiple I/O requests are allowed to simultaneously be processed on high-performance storage devices with multiple I/O channels. We evaluate the file system with the proposed technique by comparing the original ext4 file system and the recent proposed NVM-based journaling file system on a multi-core server, and experimental results show that our file system has better performance (up-to 2.9/2.3 times) than the original ext4 file system and the recent NVM-based journaling file system, respectively.

Hybrid Method using Frame Selection and Weighting Model Rank to improve Performance of Real-time Text-Independent Speaker Recognition System based on GMM (GMM 기반 실시간 문맥독립화자식별시스템의 성능향상을 위한 프레임선택 및 가중치를 이용한 Hybrid 방법)

  • 김민정;석수영;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.512-522
    • /
    • 2002
  • In this paper, we propose a hybrid method which is mixed with frame selection and weighting model rank method, based on GMM(gaussian mixture model), for real-time text-independent speaker recognition system. In the system, maximum likelihood estimation was used for GMM parameter optimization, and maximum likelihood was used for recognition basically Proposed hybrid method has two steps. First, likelihood score was calculated with speaker models and test data at frame level, and the difference is calculated between the biggest likelihood value and second. And then, the frame is selected if the difference is bigger than threshold. The second, instead of calculated likelihood, weighting value is used for calculating total score at each selected frame. Cepstrum coefficient and regressive coefficient were used as feature parameters, and the database for test and training consists of several data which are collected at different time, and data for experience are selected randomly In experiments, we applied each method to baseline system, and tested. In speaker recognition experiments, proposed hybrid method has an average of 4% higher recognition accuracy than frame selection method and 1% higher than W method, implying the effectiveness of it.

  • PDF

Implementation of an Effective Educational Community Service System by using Metadata and Category (MetaData와 Category를 이용한 효과적인 교육용 커뮤니티 서비스 시스템(ECSS) 구현)

  • Yoon, Sun-Jung;Kim, Mi-Jin;Kim, Chee-Yong
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.10
    • /
    • pp.1332-1343
    • /
    • 2006
  • This paper proposes an educational community service system that manages information of good quality intensively without overlapping, and that provides an effective searching function by using a personal community with many user layers. This system raises the efficiency of searching and management by using Metadata and Category. It is a self-leading educational community system that brings merits into relief and improves weak points. We constructed an autogenous Blog chain service as a tool which verifies this system, which is called 'EduLOG(Educational Blog) service' Especially we extracted Metadata suitable for this service, which is on the basis of worldfamous Dublin Core Metadata. And we made a new category on the basis of categories which were proposed by some educational community sites and public educational authorities, and we applied it to this system. To ascertain whether this service system provide adequate function or not, we made a questionnaire on the basis of the appraisal table in websites, and evaluated it at the request of experts. In view of the results so far achieved, it returned good scores (above 3.5/5.0) in accuracy of evaluation, low-end reappearance ratio, easiness of registering and approaching information, and intensive management of a categorical information, confirming the efficiency of the ECSS system. Therefore we believe firmly that the ECSS system will play a efficient information storing and searching roles in the near future.

  • PDF

Design and Implementation of Initial OpenSHMEM Based on PCI Express (PCI Express 기반 OpenSHMEM 초기 설계 및 구현)

  • Joo, Young-Woong;Choi, Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.105-112
    • /
    • 2017
  • PCI Express is a bus technology that connects the processor and the peripheral I/O devices that widely used as an industry standard because it has the characteristics of high-speed, low power. In addition, PCI Express is system interconnect technology such as Ethernet and Infiniband used in high-performance computing and computer cluster. PGAS(partitioned global address space) programming model is often used to implement the one-sided RDMA(remote direct memory access) from multi-host systems, such as computer clusters. In this paper, we design and implement a OpenSHMEM API based on PCI Express maintaining the existing features of OpenSHMEM to implement RDMA based on PCI Express. We perform experiment with implemented OpenSHMEM API through a matrix multiplication example from system which PCs connected with NTB(non-transparent bridge) technology of PCI Express. The PCI Express interconnection network is currently very expensive and is not yet widely available to the general public. Nevertheless, we actually implemented and evaluated a PCI Express based interconnection network on the RDK evaluation board. In addition, we have implemented the OpenSHMEM software stack, which is of great interest recently.