• Title/Summary/Keyword: SPARK

검색결과 1,508건 처리시간 0.03초

Design of Spark SQL Based Framework for Advanced Analytics (Spark SQL 기반 고도 분석 지원 프레임워크 설계)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제5권10호
    • /
    • pp.477-482
    • /
    • 2016
  • As being the advanced analytics indispensable on big data for agile decision-making and tactical planning in enterprises, distributed processing platforms, such as Hadoop and Spark which distribute and handle the large volume of data on multiple nodes, receive great attention in the field. In Spark platform stack, Spark SQL unveiled recently to make Spark able to support distributed processing framework based on SQL. However, Spark SQL cannot effectively handle advanced analytics that involves machine learning and graph processing in terms of iterative tasks and task allocations. Motivated by these issues, this paper proposes the design of SQL-based big data optimal processing engine and processing framework to support advanced analytics in Spark environments. Big data optimal processing engines copes with complex SQL queries that involves multiple parameters and join, aggregation and sorting operations in distributed/parallel manner and the proposing framework optimizes machine learning process in terms of relational operations.

An Experimental Study on the Effects of Spark Plug on the Strength of Electromagnetic Waves Radiating at the Spark Ignition System (불꽃 점화시스템에서 복사되는 전자파의 세기에 스파크 플러그가 미치는 영향에 대한 실험적 연구)

  • Choe, Gwang-Je;Jho, Shi-Gie;Jang, Sung-Kuk
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • 제15권6호
    • /
    • pp.94-101
    • /
    • 2007
  • This paper, we analyzed that the measured data of the radiated power spectrum of electromagnetic waves and the standing wave ratio(SWR) of the spark plug cable and spark plug. The measured data are the power strength of the electromagnetic waves radiated from the spark ignition system, the measured frequency ranges are 110 to 610MHz. The results show that the strength of radiated power spectrum and bandwidth have relation to the SWR of the the spark plug cable and spark plug, and the SWR of them is different because of the characteristics of resistor at the spark plug is different with the manufacturers. From the analyzed results, it can be concluded that the less SWR is little, the less maximum level of power spectrum is weak and bandwidth above the reference level is small.

Comparison of Scala and R for Machine Learning in Spark (스파크에서 스칼라와 R을 이용한 머신러닝의 비교)

  • Woo-Seok Ryu
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • 제18권1호
    • /
    • pp.85-90
    • /
    • 2023
  • Data analysis methodology in the healthcare field is shifting from traditional statistics-oriented research methods to predictive research using machine learning. In this study, we survey various machine learning tools, and compare several programming models, which utilize R and Spark, for applying R, a statistical tool widely used in the health care field, to machine learning. In addition, we compare the performance of linear regression model using scala, which is the basic languages of Spark and R. As a result of the experiment, the learning execution time when using SparkR increased by 10 to 20% compared to Scala. Considering the presented performance degradation, SparkR's distributed processing was confirmed as useful in R as the traditional statistical analysis tool that could be used as it is.

Spatial Computation on Spark Using GPGPU (GPGPU를 활용한 스파크 기반 공간 연산)

  • Son, Chanseung;Kim, Daehee;Park, Neungsoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • 제5권8호
    • /
    • pp.181-188
    • /
    • 2016
  • Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.

Distributed Processing of Big Data Analysis based on R using SparkR (SparkR을 이용한 R 기반 빅데이터 분석의 분산 처리)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • 제17권1호
    • /
    • pp.161-166
    • /
    • 2022
  • In this paper, we analyze the problems that occur when performing the big data analysis using R as a data analysis tool, and present the usefulness of the data analysis with SparkR which connects R and Spark to support distributed processing of big data effectively. First, we study the memory allocation problem of R which occurs when loading large amounts of data and performing operations, and the characteristics and programming environment of SparkR. And then, we perform the comparison analysis of the execution performance when linear regression analysis is performed in each environment. As a result of the analysis, it was shown that R can be used for data analysis through SparkR without additional language learning, and the code written in R can be effectively processed distributedly according to the increase in the number of nodes in the cluster.

A study on knock model in spark ignition engine (스파크 점화 기관의 노크 모델에 관한 연구)

  • 장종관;이종태;이성열
    • Journal of the korean Society of Automotive Engineers
    • /
    • 제14권5호
    • /
    • pp.30-40
    • /
    • 1992
  • Spark knock obstructs any improvement in the efficiency and performance of an engine. As the knock mechanism of spark ignition engine, the detonation and the autoignition theory have been offered. In this paper, the knock model was established, which was able to predict the onset of knock and knock timing of spark ignition engine by the basis of autoignition theory. This model was a function of engine speed and equivalent air-fuel ratio. When this established knock model was tested from 1000rpm to 3000rpm of engine speed data, maximum error was crank angle 2 degrees between measured and predicted knock time. And the main results were as follows by the experimental analysis of spark knock in spark ignition engine. 1) Knock frequency was increased as engine speed increased. 2) Knock amplitude was increased as mass of end gas increased. 3) Knock frequency was occured above minimum 18% mass fraction of end gas.

  • PDF

An Estimate of the Spark Plug Gap by Measuring Breakdown Voltage (방전전압 측정에 의한 점화플러그의 간극 추정)

  • Jeon, Chang-Sung;Kim, Jung-Il
    • Proceedings of the KIEE Conference
    • /
    • 대한전기학회 2005년도 추계학술대회 논문집 전기물성,응용부문
    • /
    • pp.210-213
    • /
    • 2005
  • This article describes an estimate method of the spark plug gap by measuring breakdown voltage. Breakdown voltage is the function of spark plug gap, pressure, temperature and humidify. However. It is dominated mainly by the spark plug gap. This technique is applied to in-line process test of the spark plug gap in automobile engine production. Breakdown voltage of normal spark plugs slightly scatters in ordinary conditions and if there is dust or burr in the gap, breakdown voltage gets lower. This technique saves repairing time for bad spark plug and attributes to improve the quality of automobile engine.

  • PDF

Large Scale Cooperative Coevolution Differential Evolution (대규모 협동진화 차등진화)

  • Shin, Seong-Yoon;Tan, Xujie;Shin, Kwang-Seong;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2022년도 춘계학술대회
    • /
    • pp.665-666
    • /
    • 2022
  • Differential evolution is an efficient algorithm for continuous optimization problems. However, applying differential evolution to solve large-scale optimization problems quickly degrades performance and exponentially increases runtime. To overcome this problem, a new cooperative coevolution differential evolution based on Spark (referred to as SparkDECC) is proposed. The divide-and-conquer strategy is used in SparkDECC.

  • PDF

Cooperative Coevolution Differential Evolution (협력적 공진화 차등진화)

  • Shin, Seong-Yoon;Lee, Hyun-Chang;Shin, Kwang-Seong;Kim, Hyung-Jin;Lee, Jae-Wan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2021년도 추계학술대회
    • /
    • pp.559-560
    • /
    • 2021
  • Differential evolution is an efficient algorithm for solving continuous optimization problems. However, applying differential evolution to solve large-scale optimization problems dramatically degrades performance and exponentially increases runtime. Therefore, a novel cooperative coevolution differential evolution based on Spark (known as SparkDECC) is proposed. The divide-and-conquer strategy is used in SparkDECC.

  • PDF

A Study on Buffer Optimization System for Improving Performance in Spark Cluster (Spark 클러스터 환경에서 분산 처리 성능 향상을 위한 Buffer 최적화 시스템 연구)

  • Seok-Min Hong;So-Yeoung Lee;Yong-Tae Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.396-398
    • /
    • 2023
  • Statista 통계 조사에 따르면 데이터의 규모는 매년 증가할 것으로 예상하고 빅데이터 처리 프레임워크의 관심이 높아지고 있다. 빅데이터 처리 프레임워크 Spark는 Shuffle 과정에서 노드 간 데이터 전송이 일어난다. 이때 분산 처리한 데이터를 네트워크로 전송하기 위해 객체를 바이트 스트림으로 변환하여 메모리 buffer에 담는 직렬화 작업이 필요하다. 그러나 바이트 스트림을 buffer에 담는 과정에서 바이트 스트림의 크기가 메모리 buffer보다 클 경우, 메모리 할당 과정이 추가로 발생하여 전체적이 Spark의 성능 저하로 이어질 수 있다. 이에 본 논문에서는 Spark 환경에서 분산 처리 성능 향상을 위한 직렬화 buffer 최적화 시스템을 제안한다. 제안하는 방법은 Spark Driver가 Executor에게 작업을 할당하기 전 직렬화된 데이터 크기 측정과 직렬화 옵션 설정을 통해 Executor에게 적절한 buffer를 할당할 수 있다. 향후 제안하는 방법의 검증을 위해 실제 Spark 클러스터 환경에서 성능 평가가 필요하다.