• Title/Summary/Keyword: multiple CPU's

Search Result 46, Processing Time 0.023 seconds

Scalable Prediction Models for Airbnb Listing in Spark Big Data Cluster using GPU-accelerated RAPIDS

  • Muralidharan, Samyuktha;Yadav, Savita;Huh, Jungwoo;Lee, Sanghoon;Woo, Jongwook
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.2
    • /
    • pp.96-102
    • /
    • 2022
  • We aim to build predictive models for Airbnb's prices using a GPU-accelerated RAPIDS in a big data cluster. The Airbnb Listings datasets are used for the predictive analysis. Several machine-learning algorithms have been adopted to build models that predict the price of Airbnb listings. We compare the results of traditional and big data approaches to machine learning for price prediction and discuss the performance of the models. We built big data models using Databricks Spark Cluster, a distributed parallel computing system. Furthermore, we implemented models using multiple GPUs using RAPIDS in the spark cluster. The model was developed using the XGBoost algorithm, whereas other models were developed using traditional central processing unit (CPU)-based algorithms. This study compared all models in terms of accuracy metrics and computing time. We observed that the XGBoost model with RAPIDS using GPUs had the highest accuracy and computing time.

Ubiquitous Workspace Synchronization in a Cloud-based Framework (클라우드 기반 프레임워크에서 유비쿼터스 워크스페이스 동기화)

  • Elijorde, Frank I.;Yang, Hyunho;Lee, Jaewan
    • Journal of Internet Computing and Services
    • /
    • v.14 no.1
    • /
    • pp.53-62
    • /
    • 2013
  • It is common among users to have multiple computing devices as well as to access their files or do work at different locations. To achieve file consistency as well as mobility in this scenario, an efficient approach for workspace synchronization should be used. However, file synchronization alone cannot guarantee the mobility of work environment which allows activities to be resumed at any place and time. This paper proposes a ubiquitous synchronization approach which provides cloud-based access to a user's workspace. Efficient synchronization is achieved by combining session monitoring with file system management. Experimental results show that the proposed mechanism outperforms Cloud Master-replica Synchronization in terms of number of I/O operations, CPU utilization, as well as the average and maximum latencies in responding to client requests.

A Study on Distributed System Construction and Numerical Calculation Using Raspberry Pi

  • Ko, Young-ho;Heo, Gyu-Seong;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • v.8 no.4
    • /
    • pp.194-199
    • /
    • 2019
  • As the performance of the system increases, more parallelized data is being processed than single processing of data. Today's cpu structure has been developed to leverage multicore, and hence data processing methods are being developed to enable parallel processing. In recent years desktop cpu has increased multicore, data is growing exponentially, and there is also a growing need for data processing as artificial intelligence develops. This neural network of artificial intelligence consists of a matrix, making it advantageous for parallel processing. This paper aims to speed up the processing of the system by using raspberrypi to implement the cluster building and parallel processing system against the backdrop of the foregoing discussion. Raspberrypi is a credit card-sized single computer made by the raspberrypi Foundation in England, developed for education in schools and developing countries. It is cheap and easy to get the information you need because many people use it. Distributed processing systems should be supported by programs that connected multiple computers in parallel and operate on a built-in system. RaspberryPi is connected to switchhub, each connected raspberrypi communicates using the internal network, and internally implements parallel processing using the Message Passing Interface (MPI). Parallel processing programs can be programmed in python and can also use C or Fortran. The system was tested for parallel processing as a result of multiplying the two-dimensional arrangement of 10000 size by 0.1. Tests have shown a reduction in computational time and that parallelism can be reduced to the maximum number of cores in the system. The systems in this paper are manufactured on a Linux-based single computer and are thought to require testing on systems in different environments.

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

Fast Image Pre-processing Algorithms Using SSE Instructions (SSE 명령어를 이용한 영상의 고속 전처리 알고리즘)

  • Park, Eun-Soo;Cui, Xuenan;Kim, Jun-Chul;Im, Yu-Cheong;Kim, Hak-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.65-77
    • /
    • 2009
  • This paper proposes fast image processing algorithms using SSE (Streaming SIMD Extensions) instructions. The CPU's supporting SSE instructions have 128bit XMM registers; data included in these registers are processed at the same time with the SIMD (Single Instruction Multiple Data) mode. This paper develops new SIMD image processing algorithms for Mean filter, Sobel horizontal edge detector, and Morphological erosion operation which are most widely used in automated optical inspection systems and compares their processing times. In order to objectively evaluate the processing time, the developed algorithms are compared with OpenCV 1.0 operated in SISD (Single Instruction Single Data) mode, Intel's IPP 5.2 and MIL 8.0 which are fast image processing libraries supporting SIMD mode. The experimental result shows that the proposed algorithms on average are 8 times faster than the SISD mode image processing library and 1.4 times faster than the SIMD fast image processing libraries. The proposed algorithms demonstrate their applicability to practical image processing systems at high speed without commercial image processing libraries or additional hardwares.

Matching Points Filtering Applied Panorama Image Processing Using SURF and RANSAC Algorithm (SURF와 RANSAC 알고리즘을 이용한 대응점 필터링 적용 파노라마 이미지 처리)

  • Kim, Jeongho;Kim, Daewon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.4
    • /
    • pp.144-159
    • /
    • 2014
  • Techniques for making a single panoramic image using multiple pictures are widely studied in many areas such as computer vision, computer graphics, etc. The panorama image can be applied to various fields like virtual reality, robot vision areas which require wide-angled shots as an useful way to overcome the limitations such as picture-angle, resolutions, and internal informations of an image taken from a single camera. It is so much meaningful in a point that a panoramic image usually provides better immersion feeling than a plain image. Although there are many ways to build a panoramic image, most of them are using the way of extracting feature points and matching points of each images for making a single panoramic image. In addition, those methods use the RANSAC(RANdom SAmple Consensus) algorithm with matching points and the Homography matrix to transform the image. The SURF(Speeded Up Robust Features) algorithm which is used in this paper to extract featuring points uses an image's black and white informations and local spatial informations. The SURF is widely being used since it is very much robust at detecting image's size, view-point changes, and additionally, faster than the SIFT(Scale Invariant Features Transform) algorithm. The SURF has a shortcoming of making an error which results in decreasing the RANSAC algorithm's performance speed when extracting image's feature points. As a result, this may increase the CPU usage occupation rate. The error of detecting matching points may role as a critical reason for disqualifying panoramic image's accuracy and lucidity. In this paper, in order to minimize errors of extracting matching points, we used $3{\times}3$ region's RGB pixel values around the matching points' coordinates to perform intermediate filtering process for removing wrong matching points. We have also presented analysis and evaluation results relating to enhanced working speed for producing a panorama image, CPU usage rate, extracted matching points' decreasing rate and accuracy.

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units (GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭)

  • Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.6
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

Improvement of learning performance and control of a robot manipulator using neural network with adaptive learning rate (적응 학습률을 이용한 신경회로망의 학습성능개선 및 로봇 제어)

  • Lee, Bo-Hee;Lee, Taek-Seung;Kim, Jin-Geol
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.4
    • /
    • pp.363-372
    • /
    • 1997
  • In this paper, the design and the implementation of the adaptive learning rate neural network controller for an articulate robot, which is being developed (or) has been developed in our Automatic Control Laboratory, are mainly discussed. The controller reduces software computational load via distributed processing method using multiple CPU's, and simplifies hardware structures by the time-division control with TMS32OC31 DSP chip. Proposed neural network controller with adaptive learning rate structure using expert's heuristics can improve learning speed. The proposed controller verifies its superiority by comparing response characteristics of conventional controller with those of the proposed controller that are obtained from the experiments for the 5 axis vertical articulated robot. We, also, present the generalization property of proposed controller for unlearned trajectory and the change of load through experimental data.

  • PDF

Acceleration of Viewport Extraction for Multi-Object Tracking Results in 360-degree Video (360도 영상에서 다중 객체 추적 결과에 대한 뷰포트 추출 가속화)

  • Heesu Park;Seok Ho Baek;Seokwon Lee;Myeong-jin Lee
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.3
    • /
    • pp.306-313
    • /
    • 2023
  • Realistic and graphics-based virtual reality content is based on 360-degree videos, and viewport extraction through the viewer's intention or automatic recommendation function is essential. This paper designs a viewport extraction system based on multiple object tracking in 360-degree videos and proposes a parallel computing structure necessary for multiple viewport extraction. The viewport extraction process in 360-degree videos is parallelized by composing pixel-wise threads, through 3D spherical surface coordinate transformation from ERP coordinates and 2D coordinate transformation of 3D spherical surface coordinates within the viewport. The proposed structure evaluated the computation time for up to 30 viewport extraction processes in aerial 360-degree video sequences and confirmed up to 5240 times acceleration compared to the CPU-based computation time proportional to the number of viewports. When using high-speed I/O or memory buffers that can reduce ERP frame I/O time, viewport extraction time can be further accelerated by 7.82 times. The proposed parallelized viewport extraction structure can be applied to simultaneous multi-access services for 360-degree videos or virtual reality contents and video summarization services for individual users.

An Implementation of Graphic Offloading Computing using GPU Virtualization based on API Remoting on a Server-based Software Service (서버 기반 SW 서비스에서 API 리모팅 기반의 GPU 가상화를 이용한 그래픽 분할 실행의 구현)

  • Choi, Won-Hyuk;Kim, Won-Young
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.53-62
    • /
    • 2011
  • In this paper, we introduce a method of graphic offloading computing using a GPU virtualization technology in order to provide high demanding software like 3D software as an on-line software service. When the offloading software is executed on server's software virtualization environment, its graphic works are processed on a client's GPU using GPU virtualization, while on the other its data works are processed on server's CPU. To do that, we propose a method of rendering graphics information on client side GPU using API Remoting method. Also, we show the better performance than server based rendering method when we serve offloading software which include dynamical 3D graphics that display images are frequently changed through on-line. Moreover, we describe a method to virtualize offloading software by a process level and manage client's configuration information in order to decrease server's load when we provide software service to multiple clients.