• Title/Summary/Keyword: 멀티코어

Search Result 413, Processing Time 0.025 seconds

Design of QoS Manager related in Radio Resource Allocation within All-IP Network (All-IP 망에서 무선 자원 할당과 연계된 QoS 관리자의 설계)

  • Go, Hui-Chang;Wang, Chang-Jong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8S
    • /
    • pp.2722-2728
    • /
    • 2000
  • 현재의 인터넷 망을 이용하여 음성, 화상 정보를 실시간으로 이용하고자 하는 다양한 응용이 시도되고 있다. 차세대 통신으로 주목 받고 있는 IMT-2000에서도 기존의 회선 교환망 대신 인터넷 망을 이용함으로써 경제성, 관리의 편의성, 새로운 서비스의 창출이 가능한 등의 이점이 있다. 인터넷 망이 최선의 노력(best effort)만을 제공하기 때문에 발생되는 신뢰성과 지연의 문제는 이미 많은 연구가 있어왔고 현재 어느 정도의 서비스 품질을 획득하여 VoIP(Voice Over Internet Protocol)와 같은 서비스가 실제로 이용되고 있다. 그러나 무선 통신의 경우는 이에 더하여 무선 구간에서의 자원 할당의 문제가 남아 있다. 본 연구에서는 코어 망으로 인터넷 프로토콜을 사용하는 차세대 All-IP 망에서, 무선 이동단말 간의 멀티미디어 서비스가 가능하도록 효율적인 주파수 할당을 지원하는 QoS 관리자를 설계하였다. 제안한 QoS(Quality Of Service)관리자는 요구 대역폭이 다른 멀티미디어 호 요청에 대해 융통성 있는 주파수 할당이 가능하도록 대국의 QoS 관리자와의 협상을 통해 제한된 범위 내에서 서비스 품질을 조절하여 보다 많은 호 연결 요청이 성공할 수 있도록 한다.

  • PDF

Synchronous Segmented Bus Architecture for Multitasking on Multimedia System (멀티미디어용 다중작업이 가능한 동기 세그먼트 구조)

  • Jun Chi-Hoon;Yeon Gyu-Sung;Hwang Tae-Jin;Wee Jae-Kyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.299-302
    • /
    • 2004
  • 본 논문은 OCP(Open Core Protocol)에 호환되는 파이프라인 구조를 가진 시스템 버스와 MPEG 시스템에 적합한 메모리 버스를 갖는 계층 구조를 가지는 새로운 동기 세그먼트 버스를 제안한다. 이 구조는 MPEG 시스템의 모바일 제품에 사용되는 영상 데이터 처리를 위한 메모리 인터페이스에 기반을 둔 버스 구조와 Multi-master와 Multi-slave를 사용하여 고성능의 다중 처리를 위한 양방향 다중 버스 구조(bi-direction multiple bus architecture)를 가진다. 효율적인 데이터 처리를 위하여 파이프라인 stage와 결합된 Master와 Slave의 주소번지가 latency를 결정하며, 시스템의 특성에 따라서 IP 코어를 배치하였다. 제안된 버스는 저 전력 구현을 위하여 세그먼트 버스 구조를 가지고, 멀티미디어 SoC 시스템의 성능 저하 없이 다중 작업이 가능한 구조를 갖는다. Wirability를 고려하여 양방향 구조를 채택하였고, Testablility를 위하여 단방향(uni-direction) 구조와 대체 가능하다. 또한, Local arbiter의 수정만으로 Master의 추가가 가능한 확장 구조를 가진다. Latency를 줄이기 위하여 직접 제어 방식과 단순한 구조의 Central arbiter로 구현되었다.

  • PDF

Tile-based Parallelizing for a Fast HEVC Encoder (HEVC 부호화기 고속화를 위한 타일 기반 병렬화)

  • Kim, Younhee;Jun, DongSan;Jung, Soon-Heung;Seok, Jinwuk;Choi, Jin Soo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.290-293
    • /
    • 2012
  • 본 논문에서는 기존 AVC 보다 50% 압축성능 향상을 목표로 표준화가 진행되고 있는 차세대표준인 HEVC 부호화기의 속도를 높이기 위한 방안으로, HEVC 의 기술 중 화면 분할 기술인 타일(Tile)을 기반으로 효율적으로 부호화기를 병렬화하는 구조를 제안한다. 부호화기에서 복잡도가 높은 율왜곡 기반 모드 결정 과정을 멀티코어 병렬프로그래밍으로 구현하고, 병렬처리에 의한 속도 개선 결과를 제시한다. 타일은 병렬처리를 지원하기 위해 HEVC 가 채택한 구조로, 화면을 여러 개로 분할하여 부/복호화 할 수 있어 병렬처리 단위로 적합하며, 표준화의 기고서를 통해 화면분할로 인한 압축성능 변화량은 여러 차례 보고되고 있다. 본 논문의 결과에 의하면 타일의 수만큼 쓰레드를 생성하여 각 타일 단위로 율왜곡 기반 부호화 모드 결정을 하도록 병렬화 하였을 때 기존 참조 소프트웨어 대비 12 개의 쓰레드 생성 시 6 배의 속도 개선을 보인다. 향후 병렬로 처리할 수 있는 모듈을 확장하면 쓰레드 수 증가에 따른 속도개선 효과가 증대되어 부호화기 실용화를 위한 실시간 부호화기 개발에 한 걸음 다가갈 수 있을 것이라 기대한다.

  • PDF

Implementation of SDR-based LTE-A PDSCH Decoder for Supporting Multi-Antenna Using Multi-Core DSP (멀티코어 DSP를 이용한 다중 안테나를 지원하는 SDR 기반 LTE-A PDSCH 디코더 구현)

  • Na, Yong;Ahn, Heungseop;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.85-92
    • /
    • 2019
  • This paper presents a SDR-based Long Term Evolution Advanced (LTE-A) Physical Downlink Shared Channel (PDSCH) decoder using a multicore Digital Signal Processor (DSP). For decoder implementation, multicore DSP TMS320C6670 is used, which provides various hardware accelerators such as turbo decoder, fast Fourier transformer and Bit Rate Coprocessors. The TMS320C6670 is a DSP specialized in implementing base station platforms and is not an optimized platform for implementing mobile terminal platform. Accordingly, in this paper, the hardware accelerator was changed to the terminal implementation to implement the LTE-A PDSCH decoder supporting the multi-antenna and the functions not provided by the hardware accelerator were implemented through core programming. Also pipeline using multicore was implemented to meet the transmission time interval. To confirm the feasibility of the proposed implementation, we verified the real-time decoding capability of the PDSCH decoder implemented using the LTE-A Reference Measurement Channel (RMC) waveform about transmission mode 2 and 3.

Implementation and Performance Evaluation of Preempt-RT Based Multi-core Motion Controller for Industrial Robot (산업용 로봇 제어를 위한 Preempt-RT 기반 멀티코어 모션 제어기의 구현 및 성능 평가)

  • Kim, Ikhwan;Ahn, Hyosung;Kim, Taehyoun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.1
    • /
    • pp.1-10
    • /
    • 2017
  • Recently, with the ever-increasing complexity of industrial robot systems, it has been greatly attention to adopt a multi-core based motion controller with high cost-performance ratio. In this paper, we propose a software architecture that aims to utilize the computing power of multi-core processors. The key concept of our architecture is to use shared memory for the interplay between threads running on separate processor cores. And then, we have integrated our proposed architecture with an industrial standard compliant IDE for automatic code generation of motion runtime. For the performance evaluation, we constructed a test-bed consisting of a motion controller with Preempt-RT Linux based dual-core industrial PC and a 3-axis industrial robot platform. The experimental results show that the actuation time difference between axes is 10 ns in average and bounded up to 689 ns under $1000{\mu}s$ control period, which can come up with real-time performance for industrial robot.

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.99-105
    • /
    • 2021
  • Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.

Coreset Construction for Character Recognition of PCB Components Based on Deep Learning (딥러닝 기반의 PCB 부품 문자인식을 위한 코어 셋 구성)

  • Gang, Su Myung;Lee, Joon Jae
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.3
    • /
    • pp.382-395
    • /
    • 2021
  • In this study, character recognition using deep learning is performed among the various defects in the PCB, the purpose of which is to check whether the printed characters are printed correctly on top of components, or the incorrect parts are attached. Generally, character recognition may be perceived as not a difficult problem when considering MNIST, but the printed letters on the PCB component data are difficult to collect, and have very high redundancy. So if a deep learning model is trained with original data without any preprocessing, it can lead to over fitting problems. Therefore, this study aims to reduce the redundancy to the smallest dataset that can represent large amounts of data collected in limited production sites, and to create datasets through data enhancement to train a flexible deep learning model can be used in various production sites. Moreover, ResNet model verifies to determine which combination of datasets is the most effective. This study discusses how to reduce and augment data that is constantly occurring in real PCB production lines, and discusses how to select coresets to learn and apply deep learning models in real sites.

Collaborative Streamlined On-Chip Software Architecture on Heterogenous Multi-Cores for Low-Power Reactive Control in Automotive Embedded Processors (차량용 임베디드 프로세서에서 저전력 반응적 제어를 위한 이기종 멀티코어 협력적 스트리밍 온-칩 소프트웨어 구조)

  • Jisu, Kwon;Daejin, Park
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.375-382
    • /
    • 2022
  • This paper proposes a multi-core cooperative computing structure considering the heterogeneous features of automotive embedded on-chip software. The automotive embedded software has the heterogeneous execution flow properties for various hardware drives. Software developed with a homogeneous execution flow without considering these properties will incur inefficient overhead due to core latency and load. The proposed method was evaluated on an target board on which a automotive MCU (micro-controller unit) with built-in multi-cores was mounted. We demonstrate an overhead reduction when software including common embedded system tasks, such as ADC sampling, DSP operations, and communication interfaces, are implemented in a heterogeneous execution flow. When we used the proposed method, embedded software was able to take advantage of idle states that occur between heterogeneous tasks to make efficient use of the resources on the board. As a result of the experiments, the power consumption of the board decreased by 42.11% compared to the baseline. Furthermore, the time required to process the same amount of sampling data was reduced by 27.09%. Experimental results validate the efficiency of the proposed multi-core cooperative heterogeneous embedded software execution technique.

Complexity-based Sample Adaptive Offset Parallelism (복잡도 기반 적응적 샘플 오프셋 병렬화)

  • Ryu, Eun-Kyung;Jo, Hyun-Ho;Seo, Jung-Han;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.3
    • /
    • pp.503-518
    • /
    • 2012
  • In this paper, we propose a complexity-based parallelization method of the sample adaptive offset (SAO) algorithm which is one of HEVC in-loop filters. The SAO algorithm can be regarded as region-based process and the regions are obtained and represented with a quad-tree scheme. A offset to minimize a reconstruction error is sent for each partitioned region. The SAO of the HEVC can be parallelized in data-level. However, because the sizes and complexities of the SAO regions are not regular, workload imbalance occurs with multi-core platform. In this paper, we propose a LCU-based SAO algorithm and a complexity prediction algorithm for each LCU. With the proposed complexity-based LCU processing, we found that the proposed algorithm is faster than the sequential implementation by a factor of 2.38 times. In addition, the proposed algorithm is faster than regular parallel implementation SAO by 21%.

Improving Haskell GC-Tuning Time Using Divide-and-Conquer (분할 정복법을 이용한 Haskell GC 조정 시간 개선)

  • An, Hyungjun;Kim, Hwamok;Liu, Xiao;Kim, Yeoneo;Byun, Sugwoo;Woo, Gyun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.9
    • /
    • pp.377-384
    • /
    • 2017
  • The performance improvement of a single core processor has reached its limit since the circuit density cannot be increased any longer due to overheating. Therefore, the multicore and manycore architectures have emerged as viable approaches and parallel programming becomes more important. Haskell, a purely functional language, is getting popular in this situation since it naturally supports parallel programming owing to its beneficial features including the implicit parallelism in evaluating expressions and the monadic tools supporting parallel constructs. However, the performance of Haskell parallel programs is strongly influenced by the performance of the run-time system including the garbage collector. Though a memory profiling tool namely GC-tune has been suggested, we need a more systematic way to use this tool. Since GC-tune finds the optimal memory size by executing the target program with all the different possible GC options, the GC-tuning time takes too long. This paper suggests a basic divide-and-conquer method to reduce the number of GC-tune executions by reducing the search area by one-quarter for every searching step. Applying this method to two parallel programs, a maximally independent set and a K-means programs, the memory tuning time is reduced by 7.78 times with accuracy 98% on average.