• Title/Summary/Keyword: 멀티코어

Search Result 413, Processing Time 0.024 seconds

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Design and Implementation of an InfiniBand System Interconnect for High-Performance Cluster Systems (고성능 클러스터 시스템을 위한 인피니밴드 시스템 연결망의 설계 및 구현)

  • Mo, Sang-Man;Park, Kyung;Kim, Sung-Nam;Kim, Myung-Jun;Im, Ki-Wook
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.389-396
    • /
    • 2003
  • InfiniBand technology is being accepted as the future system interconnect to serve as the high-end enterprise fabric for cluster computing. This paper presents the design and implementation of the InfiniBand system interconnect, focusing on an InfiniBand host channel adapter (HCA) based on dual ARM9 processor cores The HCA is an SoC tailed KinCA which connects a host node onto the InfiniBand network both in hardware and in software. Since the ARM9 processor core does not provide necessary features for multiprocessor configuration, novel inter-processor communication and interrupt mechanisms between the two processors were designed and embedded within the KinCA chip. Kinch was fabricated as a 564-pin enhanced BGA (Bail Grid Array) device using 0.18${\mu}{\textrm}{m}$ CMOS technology Mounted on host nodes, it provides 10 Gbps outbound and inbound channels for transmit and receive, respectively, resulting in a high-performance cluster system.

Design of the 1.5kVA Class Wireless Power Transfer Device for Battery Charging of Integrated Power Control System in MSAP (군 이동기지국시스템(MSAP) 통합전원제어장치 배터리 충전용 1.5kVA급 무선전력전송기기의 설계)

  • Kim, Jin-Sung;Kim, Byung-Jun;Park, Hyeon-Jeong;Seo, Min-Sung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.3
    • /
    • pp.413-420
    • /
    • 2020
  • The Tactical Information and Communication Network system provides real-time multimedia services such as voice and data by utilizing the Mobile Subscriber Access Point. At this time, an external transmission path is constructed through the Low Capacity Trunk Radio and the High Capacity Trunk Radio system. The communication devices of each wireless transmission system are mounted on a tactical vehicle and a secondary battery is used to prevent a power interruption when the supply power to the tactical vehicle is transferred to the integrated power control device. In this paper, the basic design of the Wireless Power Transfer device for charging the battery of the integrated power control system of the mobile base station system using the Loading Distribution Method and checking the number of primary windings and the core material selection by the air gap through the Finite Elements Method.

The Study on the Design and Optimization of Storage for the Recording of High Speed Astronomical Data (초고속 관측 데이터 수신 및 저장을 위한 기록 시스템 설계 및 성능 최적화 연구)

  • Song, Min-Gyu;Kang, Yong-Woo;Kim, Hyo-Ryoung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.1
    • /
    • pp.75-84
    • /
    • 2017
  • It becomes more and more more important for the storage that supports high speed recording and stable access from network environment. As one field of basic science which produces massive astronomical data, VLBI(: Very Long Baseline Interferometer) is now demanding more data writing performance and which is directly related to astronomical observation with high resolution and sensitivity. But most of existing storage are cloud model based for the high throughput of general IT, finance, and administrative service, and therefore it not the best choice for recording of big stream data. Therefore, in this study, we design storage system optimized for high performance of I/O and concurrency. To solve this problem, we implement packet read and writing module through the use of libpcap and pf_ring API on the multi core CPU environment, and build a scalable storage based on software RAID(: Redundant Array of Inexpensive Disks) for the efficient process of incoming data from external network.

A Design and Implementation of Educational Mobile Robot System including Remote Control Function (원격 제어 기능을 포함한 교육용 모바일 로봇 시스템의 설계 및 구현)

  • Chung, Joong-Soo;Jung, Kwang-Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.4
    • /
    • pp.33-40
    • /
    • 2015
  • This paper presents the design and implementation of the educational remote controlled robot system including remote sensing in the embedded environment. The design of sensing information processing, software design and template design mechanism for the programming practice are introduced. LPC1769 using Cortex-M3 core as CPU, LPCXPRESSO as debugging environment, C language as firmware development language and FreeRTOS as OS are used in development environment. The control command is received via RF communication by the server and the robot system which is operated by driving the various sensors. The educational procedure is from robot demo operation program as hands-on practice and then compiling, loading of the basic robot operation program, already supplied. Thereafter the verification is checked by using the basic robot operation to allow demo operation such as hands-on-training procedure. The original protocol is designed via RF communication between server and robot system, and the satisfied performance result is presented by analyzing the robot sensing data processing.

Design and Implementation of System in Package for a HF/UHF Multi-band RFID Reader (HF/UHF 멀티밴드 RFID 리더의 SiP 설계 및 구현)

  • An, Kwang-Dek;Yi, Kyeong-Il;Kim, Ji-Gon;Cho, Jung-Hyun;Kim, Shi-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.10
    • /
    • pp.59-65
    • /
    • 2008
  • We have proposed a UHF/HF multi-band RFID reader, and have implemented it into a system in a package(SiP). The proposed SiP RFID reader has been designed to support both for EPCgloabal Class1 Generation2 protocol of UHF band, and 13.56MHz RFID protocols of ISO14443 A/B type, and ISO15693 standards. The operating mode is controlled by embedded RISC core, and the mode can be selected by users. The area of implemented SiP is $40mm{\times}40mm$ with 4 metal layers. The implemented reader SiP operates at single supply voltage of 3.3V. The maximum current consumption is 210mA. The operating distances are 5cm for 13.56MHz modes, and 20cm for UHF mode.

Providing Fairness in Diffserv Architecture using Buffer Management Method (차등서비스 구조에서 버퍼관리기법을 이용한 공평성 제공)

  • 김중규
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2003.05a
    • /
    • pp.8-13
    • /
    • 2003
  • Historically, IP-based internets have been able to provide a simple best-effort delivery service to all applications they carry. Best effort treats all packets equally, with no service level, packet loss, and delay. But the needs of users have changed. The want to use the new real-time, multimedia, and multicasting applications. Thus, there is a strong need to be able to support a variety of traffic with a variety of quality-of-service requirements. The DiffServ architecture, proposed by the Internet Engineering Task Force(IETF), has become the most viable solution for provising QoS over IP networks. The DiffServ architecture does not specify any handling method between AF out-profile packets and BE packets. This paper propose a mechanism for supporting inter class fairness in the DiffServ architecture. Ⅰ proposed a modified Weighted Round Robin method to protect the BE traffic from AF out-profile packets in the core routers. The proposed technique is evaluated through simulation. Simulation results indicate that the proposed method provides better protection not only for BE packets from AF out-profile packets, but also for the AF in-profile packets in congested networks.

  • PDF

Topology Design for Energy/Latency Optimized Application-specific Hybrid Optical Network-on-Chip (HONoC) (특정 용도 하이브리드 광학 네트워크-온-칩에서의 에너지/응답시간 최적화를 위한 토폴로지 설계 기법)

  • Cui, Di;Lee, Jae Hoon;Kim, Hyun Joong;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.83-93
    • /
    • 2014
  • It is a widespread concern that electrical interconnection based network-on-chip (NoC) will ultimately face the limitation in communication bandwidth, transmission latency and power consumption in the near future. With the development of silicon photonics technology, a hybrid optical network-on-chip (HONoC) which embraces both electrical- and optical interconnect, is emerging as a promising solution to overcome these problems. Today's leading edge systems-on-chips (SoCs) comprise heterogeneous many-cores for higher energy efficiency, therefore, extended study beyond regular topology based NoC is required. This paper proposes an energy and latency optimization topology design technique for HONoC taking into account the traffic characteristics of target applications. The proposed technique is implemented with genetic algorithm and simulation results show the reduction by 13.84% in power loss and 28.14% in average latency, respectively.

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

An Admission Control for End-to-end Performance Guarantee in Next Generation Networks (Next Generation Networks에서의 단대단 성능 보장형 인입제어)

  • Joung, Jin-Oo;Choi, Jeong-Min
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8B
    • /
    • pp.1141-1149
    • /
    • 2010
  • Next Generation Networks (NGN) is defined as IP-based networks with multi-services and with multi-access networks. A variety of services and access technologies are co-existed within NGN. Therefore there are numerous transport technologies such as Differentiated Services (DiffServ), Multi-protocol Label Switching (MPLS), and the combined transport technologies. In such an environment, flows are aggregated and de-aggregated multiple times in their end-to-end paths. In this research, a method for calculating end-to-end delay bound for such a flow, provided that the information exchanged among networks regarding flow aggregates, especially the maximum burst size of a flow aggregate entering a network. We suggest an admission control mechanism that can decide whether the requested performance for a flow can be met. We further verify the suggested calculation and admission algorithm with a few realistic scenarios.