• Title/Summary/Keyword: K-코어 알고리즘

Search Result 124, Processing Time 0.027 seconds

Acceleration of Intrusion Detection for Multi-core Video Surveillance Systems (멀티 코어 프로세서 기반의 영상 감시 시스템을 위한 침입 탐지 처리의 가속화)

  • Lee, Gil-Beom;Jung, Sang-Jin;Kim, Tae-Hwan;Lee, Myeong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.141-149
    • /
    • 2013
  • This paper presents a high-speed intrusion detection process for multi-core video surveillance systems. The high-speed intrusion detection was designed to a parallel process. Based on the analysis of the conventional process, a parallel intrusion detection process was proposed so as to be accelerated by utilizing multiple processing cores in contemporary computing systems. The proposed process performs the intrusion detection in a per-frame parallel manner, considering the data dependency between frames. The proposed process was validated by implementing a multi-threaded intrusion detection program. For the system having eight processing cores, the detection speed of the proposed program is higher than that of the conventional one by up to 353.76% in terms of the frame rate.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Implementation and Performance Evaluation of Vector based Rasterization Algorithm using a Many-Core Processor (매니코어 프로세서를 이용한 벡터 기반 래스터화 알고리즘 구현 및 성능평가)

  • Shon, Dong-Koo;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.2
    • /
    • pp.87-93
    • /
    • 2013
  • In this paper, we implemented and evaluated the performance of a vector-based rasterization algorithm of 3D graphics using a SIMD-based many-core processor that consists of 4,096 processing elements. In addition, we compared the performance and efficiency of the rasterization algorithm using the many-core processor and commercial GPU (Graphics Processing Unit) system which consists of 7 GPUs and each of which have 512 cores. Experimental results showed that the SIMD-based many-core processor outperforms the commercial GPU system in terms of execution time (3.13x speedup), energy efficiency (17.5x better), and area efficiency (13.3x better). These results demonstrate that the SIMD-based many-core processor has potential as an embedded mobile processor.

A Developed Collision Resolution Algorithm in MAC Protocol for IEEE 802.11b Wireless LANs (IEEE 802.11b 무선 LAN의 MAC 프로토콜을 위한 개선된 충돌 해결 알고리즘)

  • Pan Ce;Park Hyun;Kim Byun-Gon;Chung Kyung-Taek;Chon Byoung-Sil
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.6 s.324
    • /
    • pp.33-39
    • /
    • 2004
  • Design of efficient Medium Access Control (MAC) protocols with both high throughput performances is a major focus in distributed contention based MAC protocol research. In this paper, we propose an efficient contention based MAC protocol for wireless Local Area Networks, namely, the Developed Collision Resolution (DCR) algorithm. This algorithm is developed based on the following innovative ideas: to speed up the collision resolution, we actively redistribute the backoff timers for all active nodes; to reduce the average number of idle slots, we use smaller contention window sizes for nodes with successful packet transmissions and reduce the backoff timers exponentially fast when a fixed number of consecutive idle slots are detected. We show that the proposed DCR algorithm provides high throughput performance and low latency in wireless LANs.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms (병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.92-95
    • /
    • 2017
  • In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.

Parallel clustering technology for real-time LWIR band image processing (실시간 LWIR 밴드 영상 처리를 위한 병렬 클러스터링 기술)

  • Cho, Yongjin;Lee, Kyou-seung;Hong, Seongha;Oh, Jong-woo;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.158-158
    • /
    • 2017
  • 비닐포장 하부에 위치한 콩의 생장 초기에 발생한 초엽을 인식하기 위한 연구를 수행중이다. 선행 연구에서 비닐포장에 접촉한 콩 초엽으로 인해 비닐포장 상부 표면의 열 반응 분포에 변화가 있음을 발견하였다. 현장에서 주행 중에 콩 초엽의 위치를 실시간으로 인식하고 연동된 선형 또는 회전형 엑츄에이터를 제어하여 정확한 위치에 천공을 수행하기 위해서는 계측 시스템과 제어 시스템간의 시간적 차이를 최소할 수 있는 실시간 신호 처리 기술이 필수적이다. 선행 연구에서 사용한 다중 IR 센서의 분해능은 $16{\times}4pixel$이며 주파수는 3 Hz로, 폭이 30cm 내외인 비닐포장 상부의 정밀 분석에 한계가 있음을 발견하였다. 이를 해결하기 위하여 분해능과 계측 주기를 개선할 수 있는 초소형 ($1cm{\times}1cm{\times}1cm$) 열화상 센서를 이용하였다. LWIR(Longwave infrared)영역에 해당하는 $8{\mu}m{\sim}14{\mu}m$의 영역에서 $0.05^{\circ}C$의 분해능을 보이는 $ Lepton^{TM}$ (500-0690-00, FLIR, Goleta, CA)모델을 사용하였다. 프레임당 $80{\times}60$ 픽셀의 정보가 2 Byte의 단위로 계측이 되며 9 Hz의 주파수로 대상면의 열 분포를 측정할 수 있다. 이론적으로 초당 정보 전송량은 86,400 Byte ($80{\times}60{\times}2{\times}9$)이며, 1 m를 진행하는 주행형 천공기에 적용할 경우 1 프레임당 10cm 정도의 면적을 측정하므로, 최대 위치 판정 분해능은 약 10 cm / 60 pixel = 0.17 cm/pixel로 상대적으로 정밀한 위치 판별이 가능하다. $80{\times}60{\times}2Byet$의 정보를 0.1초 이내에 분석해야 하는 기술적 과제를 해결하기 위하여 천공 작업기에 적합한 상용 SBC(Single board computer)의 클럭 속도(1 Ghz)로 처리 가능한 공간 분포 분석 알고리즘을 개발하였다. 전체 이미지 도메인을 한 번에 분석하는데 소요되는 시간을 최소화하기 위하여 공간정보 행렬을 균등히 배분하고 별도의 프로세서에서 Feature를 분석한 후 개별 프로세서의 결과를 경합식으로 판정하는 기술을 연구하였다. 오픈 소스인 MPICH(www.mpich.org) 라이브러리를 이용하여 개발한 신호 분석 프로그램을 클러스터링으로 연동된 개별 코어에 설치/수행 하였다. 2D 행렬인 열분포 정보를 공간적으로 균등 분배하여 개별 코어에서 행렬의 Spatial domain analysis를 수행하였다. $20{\times}20$의 클러스터링 단위를 이용할 경우 총 12개의 코어가 필요하였으며, 초당 10회의 연산이 가능함을 확인하였다. 병렬 클러스터링 기술을 이용하여 1m/s 내외의 주행 속도에 대응이 가능한 비닐포장 상부 열 분포 분석 시스템을 구현하였다.

  • PDF

Scheduling and Load Balancing Methods of Multithread Parallel Linear Solver of Finite Element Structural Analysis (유한요소 구조해석 다중쓰레드 병렬 선형해법의 스케쥴링 및 부하 조절 기법 연구)

  • Kim, Min Ki;Kim, Seung Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.42 no.5
    • /
    • pp.361-367
    • /
    • 2014
  • In this paper, task scheduling and load balancing methods of multifrontal solution methods of finite element structural analysis in a modern multicore machine are introduced. Many structural analysis problems have generally irregular grid and many kinds of properties and materials. These irregularities and heterogeneities lead to bottleneck of parallelization and cause idle time to analysis. Therefore, task scheduling and load balancing are desired to reduce inefficiency. Several kinds of multithreaded parallelization methods are presented and comparison between static and dynamic task scheduling are shown. To reduce the idle time caused by irregular partitioned subdomains, computational load balancing methods, Balancing all tasks and minmax task pairing balancing, are invented. Theoretical and actual elapsed time are shown and the reason of their performance gap are discussed.

Voltage-Frequency-Island Aware Energy Optimization Methodology for Network-on-Chip Design (전압-주파수-구역을 고려한 에너지 최적화 네트워크-온-칩 설계 방법론)

  • Kim, Woo-Joong;Kwon, Soon-Tae;Shin, Dong-Kun;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.22-30
    • /
    • 2009
  • Due to high levels of integration and complexity, the Network-on-Chip (NoC) approach has emerged as a new design paradigm to overcome on-chip communication issues and data bandwidth limits in conventional SoC(System-on-Chip) design. In particular, exponentially growing of energy consumption caused by high frequency, synchronization and distributing a single global clock signal throughout the chip have become major design bottlenecks. To deal with these issues, a globally asynchronous, locally synchronous (GALS) design combined with low power techniques is considered. Such a design style fits nicely with the concept of voltage-frequency-islands (VFI) which has been recently introduced for achieving fine-grain system-level power management. In this paper, we propose an efficient design methodology that minimizes energy consumption by VFI partitioning on an NoC architecture as well as assigning supply and threshold voltage levels to each VFI. The proposed algorithm which find VFI and appropriate core (or processing element) supply voltage consists of traffic-aware core graph partitioning, communication contention delay-aware tile mapping, power variation-aware core dynamic voltage scaling (DVS), power efficient VFI merging and voltage update on the VFIs Simulation results show that average 10.3% improvement in energy consumption compared to other existing works.

Efficient Test Data Compression Method (효율적인 테스트 데이터 압축 방법)

  • Jung, Jun-Mo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.05a
    • /
    • pp.690-692
    • /
    • 2012
  • This pape presents the efficient test data compression method considering test power dissipation in scan test of IP core. There are many researches about test data compression using scan slice selective encoding except power dissipation. We present the new algorithm that assigns the don't care value to be a minimal hamming distance between adjacent slices. Experimental results show that the power dissipation is reduced.

  • PDF