• Title/Summary/Keyword: multi-core processing

Search Result 218, Processing Time 0.026 seconds

Multi-Dimensional Record Scan with SIMD Vector Instructions (SIMD 벡터 명령어를 이용한 다차원 레코드 스캔)

  • Cho, Sung-Ryong;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.732-736
    • /
    • 2010
  • Processing a large amount of data becomes more important than ever. Particularly, the information queries which require multi-dimensional record scan can be efficiently implemented with SIMD instruction sets. In this article, we present a SIMD record scan technique which employs row-based scanning. Our technique is different from existing SIMD techniques for predicate processes and aggregate operations. Those techniques apply SIMD instructions to the attributes in the same column of the database, exploiting the column-based record organization of the in-memory database systems. Whereas, our SIMD technique is useful for multi-dimensional record scanning. As the sizes of registers and the memory become larger, our row-based SIMD scan can have bigger impact on the performance. Moreover, since our technique is orthogonal to the parallelization techniques for multi-core processors, it can be applied to both uni-processors and multi-core processors without too many changes in the software architectures.

Dynamic Power Management Framework for Mobile Multi-core System (모바일 멀티코어 시스템을 위한 동적 전력관리 프레임워크)

  • Ahn, Young-Ho;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.52-60
    • /
    • 2010
  • In this paper, we propose a dynamic power management framework for multi-core systems. We reduced the power consumption of multi-core processors such as Intel Centrino Duo and ARM11 MPCore, which have been used at the consumer electronics and personal computer market. Each processor uses a different technique to save its power usage, but there is no embedded multi-core processor which has a precise power control mechanism such as dynamic voltage scaling technique. The proposed dynamic power management framework is suitable for smart phones which have an operating system to provide multi-processing capability. Basically, our framework follows an intuitive idea that reducing the power consumption of idle cores is the most effective way to save the overall power consumption of a multi-core processor. We could minimize the energy consumption used by idle cores with application-targeted policies that reflect the characteristics of active workloads. We defined some properties of an application to analyze the performance requirement in real time and automated the management process to verify the result quickly. We tested the proposed framework with popular processors such as Intel Centrino Duo and ARM11 MPCore, and were able to find that our framework dynamically reduced the power consumption of multi-core processors and satisfied the performance requirement of each program.

Inspection of guided missiles applied with parallel processing algorithm (병렬처리 알고리즘 적용 유도탄 점검)

  • Jung, Eui-Jae;Koh, Sang-Hoon;Lee, You-Sang;Kim, Young-Sung
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.4
    • /
    • pp.293-298
    • /
    • 2021
  • In general, the guided weapon seeker and the guided control device process the target, search, recognition, and capture information to indicate the state of the guided missile, and play a role in controlling the operation and control of the guided weapon. The signals required for guided weapons are gaze change rate, visual signal, and end-stage fuselage orientation signal. In order to process the complex and difficult-to-process missile signals of recent missiles in real time, it is necessary to increase the data processing speed of the missiles. This study showed the processing speed after applying the stop and go and inverse enumeration algorithm among the parallel algorithm methods of PINQ and comparing the processing speed of the signal data required for the guided missile in real time using the guided missile inspection program. Based on the derived data processing results, we propose an effective method for processing missile data when applying a parallel processing algorithm by comparing the processing speed of the multi-core processing method and the single-core processing method, and the CPU core utilization rate.

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition

  • Liu, Jingxin;Cheng, Jieren;Peng, Xin;Zhao, Zeli;Tang, Xiangyan;Sheng, Victor S.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1833-1848
    • /
    • 2022
  • Named entity recognition (NER) is an important basic task in the field of Natural Language Processing (NLP). Recently deep learning approaches by extracting word segmentation or character features have been proved to be effective for Chinese Named Entity Recognition (CNER). However, since this method of extracting features only focuses on extracting some of the features, it lacks textual information mining from multiple perspectives and dimensions, resulting in the model not being able to fully capture semantic features. To tackle this problem, we propose a novel Multi-view Semantic Feature Fusion Model (MSFM). The proposed model mainly consists of two core components, that is, Multi-view Semantic Feature Fusion Embedding Module (MFEM) and Multi-head Self-Attention Mechanism Module (MSAM). Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters. The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities. Moreover, the MSAM is used to capture the dependencies between characters in a multi-dimensional subspace to better understand the semantic features of the context. Extensive experimental results on four benchmark datasets show that our method improves the overall performance of the CNER model.

A Packet Processing of Handling Large-capacity Traffic over 20Gbps Method Using Multi Core and Huge Page Memory Approache

  • Kwon, Young-Sun;Park, Byeong-Chan;Chang, Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.6
    • /
    • pp.73-80
    • /
    • 2021
  • In this paper, we propose a packet processing method capable of handling large-capacity traffic over 20Gbps using multi-core and huge page memory approaches. As ICT technology advances, the global average monthly traffic is expected to reach 396 exabytes by 2022. With the increase in network traffic, cyber threats are also increasing, increasing the importance of traffic analysis. Traffic analyzed as an existing high-cost foreign product simply stores statistical data and visually shows it. Network administrators introduce and analyze many traffic analysis systems to analyze traffic in various sections, but they cannot check the aggregated traffic of the entire network. In addition, since most of the existing equipment is of the 10Gbps class, it cannot handle the increasing traffic every year at a fast speed. In this paper, as a method of processing large-capacity traffic over 20Gbps, the process of processing raw packets without copying from single-core and basic SMA memory approaches to high-performance packet reception, packet detection, and statistics using multi-core and NUMA memory approaches suggest When using the proposed method, it was confirmed that more than 50% of the traffic was processed compared to the existing equipment.

A Study on Parallel Performance Optimization Method for Acceleration of High Resolution SAR Image Processing (고해상도 SAR 영상처리 고속화를 위한 병렬 성능 최적화 기법 연구)

  • Lee, Kyu Beom;Kim, Gyu Bin;An, Sol Bo Reum;Cho, Jin Yeon;Lim, Byoung-Gyun;Kim, Dong-Hyun;Kim, Jeong Ho
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.6
    • /
    • pp.503-512
    • /
    • 2018
  • SAR(Synthetic Aperture Radar) is a technology to acquire images by processing signals obtained from radar, and there is an increasing demand for utilization of high-resolution SAR images. In this paper, for high-speed processing of high-resolution SAR image data, a study for SAR image processing algorithms to achieve optimal performance in multi-core based computer architecture is performed. The performance deterioration due to a large amount of input/output data for high resolution images is reduced by maximizing the memory utilization, and the parallelization ratio of the code is increased by using dynamic scheduling and nested parallelism of OpenMP. As a result, not only the total computation time is reduced, but also the upper bound of parallel performance is increased and the actual parallel performance on a multi-core system with 10 cores is improved by more than 8 times. The result of this study is expected to be used effectively in the development of high-resolution SAR image processing software for multi-core systems with large memory.

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

Design and Analysis of MPEG-2 MP@HL Decoder in Multi-Processor Environments

  • Yoo, Seung-Hwan;Lee, Hyun-Seung;Lee, Sang-Jo;Park, Rae-Hong;Kim, Do-Hyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.211-216
    • /
    • 2009
  • As demands for high-definition television (HDTV) increase, the implementation of real-time decoding of high-definition (HD) video becomes an important issue. The data size for HD video is so large that real-time processing of the data is difficult to implement, especially with software. In order to implement a fast moving picture expert group-2 decoder for HDTV, we compose five scenarios that use parallel processing techniques such as data decomposition, task decomposition, and pipelining. Assuming the multi digital signal processor environments, we analyze each scenario in three aspects: decoding speed, L1 memory size, and bandwidth. By comparing the scenarios, we decide the most suitable cases for different situations. We simulate the scenarios in the dual-core and dual-central processing unit environment by using OpenMP and analyze the simulation results.

  • PDF

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

  • Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1778-1784
    • /
    • 2016
  • we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.