• Title/Summary/Keyword: 스파크R

Search Result 14, Processing Time 0.03 seconds

Comparison of Scala and R for Machine Learning in Spark (스파크에서 스칼라와 R을 이용한 머신러닝의 비교)

  • Woo-Seok Ryu
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.1
    • /
    • pp.85-90
    • /
    • 2023
  • Data analysis methodology in the healthcare field is shifting from traditional statistics-oriented research methods to predictive research using machine learning. In this study, we survey various machine learning tools, and compare several programming models, which utilize R and Spark, for applying R, a statistical tool widely used in the health care field, to machine learning. In addition, we compare the performance of linear regression model using scala, which is the basic languages of Spark and R. As a result of the experiment, the learning execution time when using SparkR increased by 10 to 20% compared to Scala. Considering the presented performance degradation, SparkR's distributed processing was confirmed as useful in R as the traditional statistical analysis tool that could be used as it is.

A Performance Comparison of Machine Learning Library based on Apache Spark for Real-time Data Processing (실시간 데이터 처리를 위한 아파치 스파크 기반 기계 학습 라이브러리 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Song, Byung-Hoo;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.15-16
    • /
    • 2017
  • IoT 시대가 도래함에 따라 실시간으로 대규모 데이터가 발생하고 있으며 이를 효율적으로 처리하고 활용하기 위한 분산 처리 및 기계 학습에 대한 관심이 높아지고 있다. 아파치 스파크는 RDD 기반의 인 메모리 처리 방식을 지원하는 분산 처리 플랫폼으로 다양한 기계 학습 라이브러리와의 연동을 지원하여 최근 차세대 빅 데이터 분석 엔진으로 주목받고 있다. 본 논문에서는 아파치 스파크 기반 기계 학습 라이브러리 성능 비교를 통해 아파치 스파크와 연동 가능한 기계 학습라이브러리인 MLlib와 아파치 머하웃, SparkR의 데이터 처리 성능을 비교한다. 이를 위해, 대표적인 기계 학습 알고리즘인 나이브 베이즈 알고리즘을 사용했으며 학습 시간 및 예측 시간을 비교하여 아파치 스파크 기반에서 실시간 데이터 처리에 적합한 기계 학습 라이브러리를 확인한다.

  • PDF

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

S-PARAFAC: Distributed Tensor Decomposition using Apache Spark (S-PARAFAC: 아파치 스파크를 이용한 분산 텐서 분해)

  • Yang, Hye-Kyung;Yong, Hwan-Seung
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.280-287
    • /
    • 2018
  • Recently, the use of a recommendation system and tensor data analysis, which has high-dimensional data, is increasing, as they allow us to analyze the tensor and extract potential elements and patterns. However, due to the large size and complexity of the tensor, it needs to be decomposed in order to analyze the tensor data. While several tools are used for tensor decomposition such as rTensor, pyTensor, and MATLAB, since such tools run on a single machine, they are unable to handle large data. Also, while distributed tensor decomposition tools based on Hadoop can handle a scalable tensor, its computing speed is too slow. In this paper, we propose S-PARAFAC, which is a tensor decomposition tool based on Apache Spark, in distributed in-memory environments. We converted the PARAFAC algorithm into an Apache Spark version that enables rapid processing of tensor data. We also compared the performance of the Hadoop based tensor tool and S-PARAFAC. The result showed that S-PARAFAC is approximately 4~25 times faster than the Hadoop based tensor tool.

Distributed Processing of Big Data Analysis based on R using SparkR (SparkR을 이용한 R 기반 빅데이터 분석의 분산 처리)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.1
    • /
    • pp.161-166
    • /
    • 2022
  • In this paper, we analyze the problems that occur when performing the big data analysis using R as a data analysis tool, and present the usefulness of the data analysis with SparkR which connects R and Spark to support distributed processing of big data effectively. First, we study the memory allocation problem of R which occurs when loading large amounts of data and performing operations, and the characteristics and programming environment of SparkR. And then, we perform the comparison analysis of the execution performance when linear regression analysis is performed in each environment. As a result of the analysis, it was shown that R can be used for data analysis through SparkR without additional language learning, and the code written in R can be effectively processed distributedly according to the increase in the number of nodes in the cluster.

A Study on the Characteristics of Spark Ignition Engine Cleanliness by Low Level Bio-Alcohol Blending (저농도 바이오알코올 혼합에 따른 스파크 점화 엔진 청정 특성 연구)

  • CHA, GYUSOB;NO, SOOYOUNG
    • Transactions of the Korean hydrogen and new energy society
    • /
    • v.30 no.5
    • /
    • pp.428-435
    • /
    • 2019
  • A comparative evaluation of engine cleanliness was performed on the transport gasoline blended with bio- alcohols, and this study was considered to achieve the aim of greenhouse gas reduction in Korea. In particular, the fuel blended with bio-ethanol and bio-butanol showed the best engine cleaning performance both on combustion chamber deposits and intake valve deposits. The deposit control gasoline additive was effective to remove intake valve deposits. In contrast, the amount of combustion chamber deposits were tend to increase even though fuels blended with bio-alcohols were used. In overall, fuels blended with bio-alcohols, compared to fossil fuels, still showed outstanding performance in terms of engine cleanliness.

Effects of Swirl on Flame Development and Late Combustion Characteristic in a High Speed Single-Shot Visualized SI Engine (고속 단발 가시화 스파크 점화 엔진에서의 연소 특성에 대한 선회효과 연구)

  • Kim, S.S.;Kim, S.S.
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.3 no.1
    • /
    • pp.54-64
    • /
    • 1995
  • The effects of swirl on early flame development and late combustion characteristic were investigated using a high speed single-shot visualized 51 engine. LDV measurements were performed to get better understanding of the flow field in this combustion chamber. Spark plugs were located at half radius (R/2) and central location of bore. High speed schlieren photographs at 20,000 frames/sec were taken to visualize the detailed formation and development of the flame kernel with cylinder pressure measurements. This study showed that high swirl gave favorable effects on combustion-related performances in terms of the maximum cylinder pressure and flame growth rate regardless of spark position. However, at R/2 ignition the low swirl shown desirable effects at low engine speed gave worse performances as engine speed increased than without swirl. There were distinct signs of slow-down in flame growth during the period when the flame front expanded from 2.5mm in radius until it reached 5.0mm apparently due to the presence of ground electrode. There seemed to be heat transfer effect on the flame expansion speed which was evidenced in high swirl case by the slowdown of the late flame front presumably caused by relatively large heat loss from burned gas to wall compared with low- or no-swirl cases.

  • PDF

Thermal Property of Mo-5~20 wt%. Cu Alloys Synthesized by Planetary Ball Milling and Spark Plasma Sintering Method (유성볼밀링 및 스파크 플라즈마 소결법으로 제조한 Mo-5~20 wt%. Cu 합금의 열적 특성)

  • Lee, Han-Chan;Moon, Kyoung-Il;Shin, Paik-Kyun
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.29 no.8
    • /
    • pp.516-521
    • /
    • 2016
  • Mo-Cu alloys have been widely used for heat sink materials, vacuum technology, automobile, and many other applications due to their excellent physical and electric properties. Especially, Mo-Cu composites with 5 ~ 20 wt.% copper are widely used for the heavy duty service contacts due to their excellent properties like low coefficient of thermal expansion, wear resistance, high temperature strength, and prominent electrical and thermal conductivity. In most of the applications, highly-dense Mo-Cu materials with homogeneous microstructure are required for better performance. In this study, Mo-Cu alloys were prepared by PBM (planetary ball milling) and SPS (spark plasma sintering). The effect of Cu with contents of 5~20 wt.% on the microstructure and thermal properties of Mo-Cu alloys was investigated.

Influence of Low Level Bio-Alcohol Fuels on Fuel Economy and Emissions in Spark Ignition Engine Vehicles (저농도 바이오알코올 혼합 연료가 스파크 점화 엔진 차량의 연비 및 배출가스에 미치는 영향)

  • CHA, GYUSOB;NO, SOOYOUNG
    • Transactions of the Korean hydrogen and new energy society
    • /
    • v.31 no.2
    • /
    • pp.250-258
    • /
    • 2020
  • This study was conducted to analyze the impact of low level bio-alcohols that can be applied without modification of vehicles to improve air quality in Korea. The emissions and fuel economy of low level bio-alcohols mixed gasoline fuels of spark ignition vehicles, which are direct injection and port fuel injection, were studied in this paper. As a result of the evaluation, the particle number (PN) was reduced in all evaluation fuels compared to the sub octane gasoline without oxygen, but the correlation with the PN due to the increase in the oxygen content was not clear. In the CVS-75 mode, emitted CO tended to decrease compared to sub octane gasoline, but no significant correlation was found between NMHC, NOx and fuel economy. In addition, it was found that the aldehyde increased in the oxygenated fuel, and there was no difference in terms of the amount of aldehyde generated among a series of bio-alcohol mixed fuels.

A Study on the Variation of Explosion Characteristics by the Block in Closed Vessel (밀폐 공간내 Block에 의한 폭발특성 변화에 관한 연구)

  • Oh Kyuhyung;Kim Jongbok;Lee Seungeun;Kim Hong;Lee Youngchul;Park Sungsu
    • Journal of the Korean Institute of Gas
    • /
    • v.3 no.3 s.8
    • /
    • pp.23-28
    • /
    • 1999
  • Variation of explosion characteristics by the blocks in closed vessel was investigated to analyse the effect of the block volume(volume blockage) and the surface area of the blocks(ratio of block surface area to vessel volume). Volume and surface area of blocks in explosion vessel were changed by the combination of blocks. The volume of explosion vessels was 270 liter, and the LPG-air or NG-air mixtures were ignited by the electric spark. Explosion pressure was measured with the strain type pressure transducer. From the experimental results, explosion pressure was decreased by the increase of the volume blockage and the block surface area. And the decrease of explosion pressure was more affected by the volume blockage than the surface area.

  • PDF