• Title/Summary/Keyword: General purpose computing

Search Result 160, Processing Time 0.029 seconds

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

A Parallel Bulk Loading Method for $B^+$-Tree Using CUDA (CUDA를 활용한 병렬 $B^+$-트리 벌크로드 기법)

  • Sung, Joo-Ho;Lee, Yoon-Woo;Han, A;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.707-711
    • /
    • 2010
  • Most relational database systems provide $B^+$-trees as their main index structures, and use bulk-loading techniques for creating new $B^+$-trees on existing data from scratch. Although bulk loadings are more effective than inserting keys one by one, they are still time-consuming because they have to sort all the keys from large data. To improve the performance of bulk loadings, this paper proposes an efficient parallel bulk loading method for $B^+$-trees based on CUDA, which is a parallel computing architecture developed by NVIDIA to utilize computing powers of graphic processor units for general purpose computing. Experimental results show that the proposed method enhance the performance more than 70 percents compared to existing bulk loading methods.

Pub/Sub-based Sensor virtualization framework for Cloud environment

  • Ullah, Mohammad Hasmat;Park, Sung-Soon;Nob, Jaechun;Kim, Gyeong Hun
    • International journal of advanced smart convergence
    • /
    • v.4 no.2
    • /
    • pp.109-119
    • /
    • 2015
  • The interaction between wireless sensors such as Internet of Things (IoT) and Cloud is a new paradigm of communication virtualization to overcome resource and efficiency restriction. Cloud computing provides unlimited platform, resources, services and also covers almost every area of computing. On the other hand, Wireless Sensor Networks (WSN) has gained attention for their potential supports and attractive solutions such as IoT, environment monitoring, healthcare, military, critical infrastructure monitoring, home and industrial automation, transportation, business, etc. Besides, our virtual groups and social networks are in main role of information sharing. However, this sensor network lacks resource, storage capacity and computational power along with extensibility, fault-tolerance, reliability and openness. These data are not available to community groups or cloud environment for general purpose research or utilization yet. If we reduce the gap between real and virtual world by adding this WSN driven data to cloud environment and virtual communities, then it can gain a remarkable attention from all over, along with giving us the benefit in various sectors. We have proposed a Pub/Sub-based sensor virtualization framework Cloud environment. This integration provides resource, service, and storage with sensor driven data to the community. We have virtualized physical sensors as virtual sensors on cloud computing, while this middleware and virtual sensors are provisioned automatically to end users whenever they required. Our architecture provides service to end users without being concerned about its implementation details. Furthermore, we have proposed an efficient content-based event matching algorithm to analyze subscriptions and to publish proper contents in a cost-effective manner. We have evaluated our algorithm which shows better performance while comparing to that of previously proposed algorithms.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

GPU Accelerating Methods for Pease FFT Processing (Pease FFT 처리를 위한 GPU 가속 기법)

  • Oh, Se-Chang;Joo, Young-Bok;Kwon, Oh-Young;Huh, Kyung-Moo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.20 no.1
    • /
    • pp.37-41
    • /
    • 2014
  • FFT (Fast Fourier Transform) has been widely used in various fields such as image processing, voice processing, physics, astronomy, applied mathematics and so forth. Much research has been conducted due to the importance of the FFT and recently new FFT algorithms using a GPU (Graphics Processing Unit) have been developed for the purpose of much faster processing. In this paper, the new optimal FFT algorithm using the Pease FFT algorithm has been proposed reflecting the hardware configuration of a GPGPU (General Purpose computing of GPU). According to the experiments, the proposed algorithm outperformed by between 3% to 43% compared to the CUFFT algorithm.

Distributed Process of Approximate Shape Optimization Based on the Internet (인터넷 기반 근사 형상최적설계의 분산처리)

  • Lim, O-Kaung;Choi, Eun-Ho;Kim, Woo-Hyun
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.21 no.4
    • /
    • pp.317-324
    • /
    • 2008
  • Optimum design for general or complex structures are required to the need of many numbers of structural analyses. However, current computational environment with single processor is not capable of generating a high-level efficiency in structural analysis and design process for complex structures. In this paper, a virtual parallel computing system communicated by an internet of personal computers and workstation is constructed. In addition, a routine executing Pro/E, ANSYS and optimization algorithm automatically are adopted in the distributed process technique of sequential approximate optimization for the purpose of enhancing the flexibility of application to general structures. By employing the distributed processing technique during structural analysis using commercial application, total calculation time could be reduced, which will enhance the applicability of the proposed technique to the general complex structures.

Comparative Analysis of Index Terms and Social Tags: Medical Subject Headings vs. BibSonomy and Delicious

  • Lee, Danielle H.
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.49 no.2
    • /
    • pp.291-311
    • /
    • 2015
  • This paper demonstrates the comparative analysis of the similarity and difference between Medical Subject Headings (MeSH) and social tags. Both types of metadata have the same purpose - that is, succinctly abstracting content of a given document - but are created from heterogeneous viewpoints. The former MeSH terms show the aspects of publication related professionals, whereas the latter social tags are from the perspectives of general readers. When both types of metadata are assigned to the same publications, do they consist of different nomenclatures reflecting the heterogeneous viewpoints or are they similar, since both metadata types describe the same publications? Social tags are also compared with family terms of MeSH terms in the given MeSH hierarchy, so as to understand the specificity of social tags, related to MeSH terms. Lastly, given the fact that readers assign social tags in casual ways without any restricted vocabulary, we tested how many social tags contain consumer health terms, which are familiar to laypeople. Through these comparisons, we ultimately aim to examine how much the highly controlled publication index reflects general readers' cognitive understandings and stress the necessity of general readers' involvement in the publication indexing process.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.