• Title/Summary/Keyword: SPARK

Search Result 1,508, Processing Time 0.025 seconds

A Survey on the Performance Comparison of Map Reduce Technologies and the Architectural Improvement of Spark

  • Raghavendra, GS;Manasa, Bezwada;Vasavi, M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.121-126
    • /
    • 2022
  • Hadoop and Apache Spark are Apache Software Foundation open source projects, and both of them are premier large data analytic tools. Hadoop has led the big data industry for five years. The processing velocity of the Spark can be significantly different, up to 100 times quicker. However, the amount of data handled varies: Hadoop Map Reduce can process data sets that are far bigger than Spark. This article compares the performance of both spark and map and discusses the advantages and disadvantages of both above-noted technologies.

Ecological Based Hybrid Differential Evolution (생태 기반 하이브리드 차등 진화)

  • Shin, Seong-Yoon;Cho, Gwang-Hyun;Cho, Seung-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.416-417
    • /
    • 2022
  • In this paper, we propose a hybrid DE based on an ecological model algorithm called SparkHDE-EM. This model implements the parallelization of various DE variants by introducing an island model based on Spark, and utilizes the Monod model to maintain a balance between resources.

  • PDF

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.31 no.2
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Techniques to Guarantee Real-Time Fault Recovery in Spark Streaming Based Cloud System (Spark Streaming 기반 클라우드 시스템에서 실시간 고장 복구를 지원하기 위한 기법들)

  • Kim, Jungho;Park, Daedong;Kim, Sangwook;Moon, Yongshik;Hong, Seongsoo
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.460-468
    • /
    • 2017
  • In a real-time cloud environment, the data analysis framework plays a pivotal role. Spark Streaming meets most real-time requirements among existing frameworks. However, the framework does not meet the second scale real-time fault recovery requirement. Spark Streaming fault recovery time increases in proportion to the transformation history length called lineage. This is because it recovers the last state data based on the cumulative lineage recorded during normal operation. Therefore, fault recovery time is not bounded within a limited time. In addition, it is impossible to achieve a second-scale fault recovery time because it costs tens of seconds to read initial state data from fault-tolerant storage. In this paper, we propose two techniques to solve the problems mentioned above. We apply the proposed techniques to Spark Streaming 1.6.2. Experimental results show that the fault recovery time is bounded and the average fault recovery time is reduced by up to 41.57%.

Framework Implementation of Image-Based Indoor Localization System Using Parallel Distributed Computing (병렬 분산 처리를 이용한 영상 기반 실내 위치인식 시스템의 프레임워크 구현)

  • Kwon, Beom;Jeon, Donghyun;Kim, Jongyoo;Kim, Junghwan;Kim, Doyoung;Song, Hyewon;Lee, Sanghoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1490-1501
    • /
    • 2016
  • In this paper, we propose an image-based indoor localization system using parallel distributed computing. In order to reduce computation time for indoor localization, an scale invariant feature transform (SIFT) algorithm is performed in parallel by using Apache Spark. Toward this goal, we propose a novel image processing interface of Apache Spark. The experimental results show that the speed of the proposed system is about 3.6 times better than that of the conventional system.

Design and Development of Micro Combustor (II) - Design and Test of Micro Electric Spark discharge Device for Power MEMS - (미세 연소기 개발 (II) - 미세동력 장치용 미세 전극의 제작과 성능평가 -)

  • Gwon, Se-Jin;Lee, Dae-Hun;Park, Dae-Eun;Yun, Jun-Bo;Han, Cheol-Hui
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.26 no.4
    • /
    • pp.524-530
    • /
    • 2002
  • Micro electric spark discharge device was fabricated on a FOTURAN glass wafer using MEMS processing technique and its performance of electron discharge and subsequent formation of ignition kernel were tested. Micro electric spark device is an essential subsystem of a power MEMS that has been under development in this laboratories. In a combustion chamber of sub millimeter scale depth, spark electrodes are formed by electroplating Ni on a base plate of FOTURAN glass wafer. Optimization of spark voltage and spark gap is crucial for stable ignition and endurance of the electrodes. Namely, wider spark gaps insures stable ignition but requires higher ignition voltage to overcome the spark barrier. Also, electron discharge across larger voltage tends to erode the electrodes limiting the endurance of the overall system. In the present study, the discharge characteristics of the proptotype ignition device was measured in terms of electric quantities such as voltage and currant with spark gap and end shape as parameters. Discharge voltage shows a little decrease in width of less than 50㎛ and increases with electrode gap size. Reliability test shows no severe damage over 10$\^$6/ times of discharge test resulting in satisfactory performance for application to proposed power MEMS devices.

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

An Experimental Study on the Secondary Waveform Analysis according to Measure of Electronic Control Waveform (가솔린엔진의 전자제어 센서파형 측정을 통한 점화2차 파형 분석에 관한 실험적 연구)

  • Yoo, Jong-Sik;Kim, Chul-Soo;Cha, Kyoung-Ok
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.19 no.1
    • /
    • pp.95-100
    • /
    • 2011
  • The test was done on cars travelling at speeds of 20km/h, 60km/h and 100km/h, the performance testing mode for chassis dynamometer. In this test, the secondary waveform were measured, including those using faulty MAP sensors, oxygen sensors and spark plugs. The results from these measurements and their analysis of secondary waveform can be summarized as follows: 1) The secondary waveform measured from the faulty oxygen sensor showed a lot of noise around peak voltage and in the rising and falling sections during spark line which means that the air fuel mixture was non-homogeneous. 2) The secondary waveform from the faulty MAP sensor showed the worst shape compared to other sensors, including variation of spark line, state of air-fuel mixture and velocity of flame front. 3) The spark line time of secondary waveform using a faulty spark plug displayed the shortest and smallest energy spark line, which means that a misfire occurred.

Generation of Silver Nanoparticles by Spark Discharge Aerosol Generator Using Air as a Carrier Gas (공기 분위기에서 스파크 방전을 이용한 은 나노입자 생성)

  • Oh, Hyun-Cheol;Jung, Jae-Hee;Park, Hyung-Ho;Ji, Jun-Ho;Kim, Sang-Soo
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.30 no.2 s.245
    • /
    • pp.170-176
    • /
    • 2006
  • A spark discharge aerosol generator using air as a carrier gas has successfully been applied to silver nanoparticle production. The spark discharge between two silver electrodes, which was periodically obtained by discharging the capacitor, produced sufficient high temperatures to evaporate a small fraction of the silver electrodes. The silver vapor was subsequently supersaturated by rapid cooling and condensed to silver nanoparticles by nucleation and condensation. The morphology of the generated particles observed by transmission electron microscope was spherical. The element composition of the nanoparticles was silver, which was determined by energy dispersive X-ray spectroscopy. The crystal phase of the particles spark-generated under air atmosphere was composed of silver and silver oxides phase, which was determined by Xray diffraction analysis. While the nanoparticles generated under nitrogen atmosphere had only silver phase. This XRD data indicates that some fraction of the evaporated silver vapor could be oxidized in air atmosphere by the reaction with oxygen. A stable operation of the spark discharge generator has been achieved. The size and concentration of the particles can be easily controlled by altering the repetition frequency, capacitance, gap distance and flow rate of the spark discharge system.