• Title/Summary/Keyword: Spark Platform

Search Result 37, Processing Time 0.031 seconds

Big data platform for health monitoring systems of multiple bridges

  • Wang, Manya;Ding, Youliang;Wan, Chunfeng;Zhao, Hanwei
    • Structural Monitoring and Maintenance
    • /
    • v.7 no.4
    • /
    • pp.345-365
    • /
    • 2020
  • At present, many machine leaning and data mining methods are used for analyzing and predicting structural response characteristics. However, the platform that combines big data analysis methods with online and offline analysis modules has not been used in actual projects. This work is dedicated to developing a multifunctional Hadoop-Spark big data platform for bridges to monitor and evaluate the serviceability based on structural health monitoring system. It realizes rapid processing, analysis and storage of collected health monitoring data. The platform contains offline computing and online analysis modules, using Hadoop-Spark environment. Hadoop provides the overall framework and storage subsystem for big data platform, while Spark is used for online computing. Finally, the big data Hadoop-Spark platform computational performance is verified through several actual analysis tasks. Experiments show the Hadoop-Spark big data platform has good fault tolerance, scalability and online analysis performance. It can meet the daily analysis requirements of 5s/time for one bridge and 40s/time for 100 bridges.

Design of Spark SQL Based Framework for Advanced Analytics (Spark SQL 기반 고도 분석 지원 프레임워크 설계)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.477-482
    • /
    • 2016
  • As being the advanced analytics indispensable on big data for agile decision-making and tactical planning in enterprises, distributed processing platforms, such as Hadoop and Spark which distribute and handle the large volume of data on multiple nodes, receive great attention in the field. In Spark platform stack, Spark SQL unveiled recently to make Spark able to support distributed processing framework based on SQL. However, Spark SQL cannot effectively handle advanced analytics that involves machine learning and graph processing in terms of iterative tasks and task allocations. Motivated by these issues, this paper proposes the design of SQL-based big data optimal processing engine and processing framework to support advanced analytics in Spark environments. Big data optimal processing engines copes with complex SQL queries that involves multiple parameters and join, aggregation and sorting operations in distributed/parallel manner and the proposing framework optimizes machine learning process in terms of relational operations.

Detection of Abnormal Ship Operation using a Big Data Platform based on Hadoop and Spark (하둡 및 스파크 기반 빅데이터 플랫폼을 이용한 선박 운항 효율 이상 상태 분석)

  • Lee, Taehyeon;Yu, Eun-seop;Park, Kaemyoung;Yu, Seongsang;Park, Jinpyo;Mun, Duhwan
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.18 no.6
    • /
    • pp.82-90
    • /
    • 2019
  • To reduce emissions of marine pollutants, regulations are being tightened around the world. In the shipbuilding and shipping industries, various countermeasures are being put forward. As there are limits to applying countermeasures to ships already in operation, however, it is necessary for these vessels to use energy efficiently. The sensors installed on ships typically gather a very large amount of data, and thus a big data platform is needed to manage and analyze the data. In this paper, we build a big data analysis platform based on Hadoop and Spark, and we present a method to detect abnormal ship operation using the platform. We also utilize real ship operation data to discuss the data analysis experiment.

A New SoC Platform with an Application-Specific PLD (전용 PLD를 가진 새로운 SoC 플랫폼)

  • Lee, Jae-Jin;Song, Gi-Yong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.8 no.4
    • /
    • pp.285-292
    • /
    • 2007
  • SoC which deploys software modules as well as hardware IPs on a single chip is a major revolution taking place in the implementation of a system design, and high-level synthesis is an important process of SoC design methodology. Recently, SPARK parallelizing high-level synthesis software tool has been developed. It takes a behavioral ANSI-C code as an input, schedules it using code motion and various code transformations, and then finally generates synthesizable RTL VHDL code. Although SPARK employs various loop transformation algorithms, the synthesis results generated by SPARK are not acceptable for basic signal and image processing algorithms with nested loop. In this paper we propose a SoC platform with an application-specific PLD targeting local operations which are feature of many loop algorithms used in signal and image processing, and demonstrate design process which maps behavioral specification with nested loops written in a high-level language (ANSI-C) onto 2D systolic array. Finally the derived systolic array is implemented on the proposed application-specific PLD of SoC platform.

  • PDF

Big Data Astronomy : Let's "PySpark" the Universe (빅데이터 천문학 : PySpark를 이용한 천문자료 분석)

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.63.1-63.1
    • /
    • 2018
  • The modern large-scale surveys and state-of-the-art cosmological simulations produce various kinds of big data composed of millions and billions of galaxies. Inevitably, we need to adopt modern Big Data platforms to properly handle such large-scale data sets. In my talk, I will briefly introduce the de facto standard of modern Big Data platform, Apache Spark, and present some examples to demonstrate how Apache Spark can be utilized for solving data-driven astronomical problems.

  • PDF

An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform (메모리 기반 빅데이터 처리 프레임워크의 성능개선 연구)

  • Lee, Jae hwan;Choi, Jun;Koo, Dong hun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.13-19
    • /
    • 2016
  • Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.

Predictive Analysis of Financial Fraud Detection using Azure and Spark ML

  • Priyanka Purushu;Niklas Melcher;Bhagyashree Bhagwat;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.28 no.4
    • /
    • pp.308-319
    • /
    • 2018
  • This paper aims at providing valuable insights on Financial Fraud Detection on a mobile money transactional activity. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure and Spark ML, which are traditional systems and Big Data respectively. Experimenting with sample dataset in Azure, we found that the Decision Forest model is the most accurate to proceed in terms of the recall value. For the massive data set using Spark ML, it is found that the Random Forest classifier algorithm of the classification model proves to be the best algorithm. It is presented that the Spark cluster gets much faster to build and evaluate models as adding more servers to the cluster with the same accuracy, which proves that the large scale data set can be predictable using Big Data platform. Finally, we reached a recall score with 0.73, which implies a satisfying prediction quality in predicting fraudulent transactions.

Performance Comparison of Python and Scala APIs in Spark Distributed Cluster Computing System (Spark 기반에서 Python과 Scala API의 성능 비교 분석)

  • Ji, Keung-yeup;Kwon, Youngmi
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.2
    • /
    • pp.241-246
    • /
    • 2020
  • Hadoop is a framework to process large data sets in a distributed way across clusters of nodes. It has been a popular platform to process big data, but in recent years, other platforms became competitive ones depending on the characteristics of the application. Spark is one of distributed platforms to enable real-time data processing and improve overall processing performance over Hadoop by introducing in-memory processing instead of disk I/O. Whereas Hadoop is designed to work on Java and data analysis is processed using Java API, Spark provides a variety of APIs with Scala, Python, Java and R. In this paper, the goal is to find out whether the APIs of different programming languages af ect the performances in Spark. We chose two popular APIs: Python and Scala. Python is easy to learn and is used in AI domain in a wide range. Scala is a programming language with advantages of parallelism. Our experiment shows much faster processing with Scala API than Python API. For the performance issues on AI-based analysis, further study is needed.

Appingpot : Application curation platform based on Hadoop and Spark (Appingpot : 하둡 및 스파크를 활용한 어플리케이션 큐레이션 플랫폼)

  • Jeon, Sangwoo;Shim, Euiseok;Chi, Jeonghee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.372-373
    • /
    • 2016
  • 현재 해외뿐만 아니라 국내에서도 큐레이션 서비스가 활발히 운영중이다. 폭발적으로 증가한 어플리케이션 마켓 시장에서 사용자들은 자신에게 맞는 앱을 찾고 설치하기 어려워지고 있다. 이에 대응하여 본 논문에서는 어플리케이션 큐레이션 서비스인 Appingpot 시스템을 제안한다. Appingpot에서는 사용자들로부터 수집된 앱 로그데이터와 Facebook 친구 정보를 기반으로 Hadoop과 Spark를 통해 사용자들에게 적합한 앱을 추천하는 서비스를 제공한다.

An Implementation of Web-Enabled OLAP Server in Korean HealthCare BigData Platform (한국 보건의료 빅데이터 플랫폼에서 웹 기반 OLAP 서버 구현)

  • Ly, Pichponreay;Kim, jin-hyuk;Jung, seung-hyun;Lee, kyung-hee Lee;Cho, wan-sup
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2017.05a
    • /
    • pp.33-34
    • /
    • 2017
  • In 2015, Ministry of Health and Welfare of Korea announced a research and development plan of using Korean healthcare data to support decision making, reduce cost and enhance a better treatment. This project relies on the adoption of BigData technology such as Apache Hadoop, Apache Spark to store and process HealthCare Data from various institution. Here we present an approach a design and implementation of OLAP server in Korean HealthCare BigData platform. This approach is used to establish a basis for promoting personalized healthcare research for decision making, forecasting disease and developing customized diagnosis and treatment.

  • PDF