• Title/Summary/Keyword: Spark Platform

Search Result 37, Processing Time 0.02 seconds

Big Data Astronomy: Large-scale Graph Analyses of Five Different Multiverses

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.2
    • /
    • pp.36.3-37
    • /
    • 2018
  • By utilizing large-scale graph analytic tools in the modern Big Data platform, Apache Spark, we investigate the topological structures of five different multiverses produced by cosmological n-body simulations with various cosmological initial conditions: (1) one standard universe, (2) two different dark energy states, and (3) two different dark matter densities. For the Big Data calculations, we use a custom build of stand-alone Spark cluster at KIAS and Dataproc Compute Engine in Google Cloud Platform with the sample sizes ranging from 7 millions to 200 millions. Among many graph statistics, we find that three simple graph measurements, denoted by (1) $n_\k$, (2) $\tau_\Delta$, and (3) $n_{S\ge5}$, can efficiently discern different topology in discrete point distributions. We denote this set of three graph diagnostics by kT5+. These kT5+ statistics provide a quick look of various orders of n-points correlation functions in a computationally cheap way: (1) $n = 2$ by $n_k$, (2) $n = 3$ by $\tau_\Delta$, and (3) $n \ge 5$ by $n_{S\ge5}$.

  • PDF

Leveraging Big Data for Spark Deep Learning to Predict Rating

  • Mishra, Monika;Kang, Mingoo;Woo, Jongwook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.33-39
    • /
    • 2020
  • The paper is to build recommendation systems leveraging Deep Learning and Big Data platform, Spark to predict item ratings of the Amazon e-commerce site. Recommendation system in e-commerce has become extremely popular in recent years and it is very important for both customers and sellers in daily life. It means providing the users with products and services they are interested in. Therecommendation systems need users' previous shopping activities and digital footprints to make best recommendation purpose for next item shopping. We developed the recommendation models in Amazon AWS Cloud services to predict the users' ratings for the items with the massive data set of Amazon customer reviews. We also present Big Data architecture to afford the large scale data set for storing and computation. And, we adopted deep learning for machine learning community as it is known that it has higher accuracy for the massive data set. In the end, a comparative conclusion in terms of the accuracy as well as the performance is illustrated with the Deep Learning architecture with Spark ML and the traditional Big Data architecture, Spark ML alone.

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments (도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발)

  • Kim, Dong Gil;Park, Yong-Soon;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.4
    • /
    • pp.153-161
    • /
    • 2021
  • Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

Design and Implementation of a Big Data Analytics Framework based on Cargo DTG Data for Crackdown on Overloaded Trucks

  • Kim, Bum-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.67-74
    • /
    • 2019
  • In this paper, we design and implement an analytics platform based on bulk cargo DTG data for crackdown on overloaded trucks. DTG(digital tachograph) is a device that stores the driving record in real time; that is, it is a device that records the vehicle driving related data such as GPS, speed, RPM, braking, and moving distance of the vehicle in one second unit. The fast processing of DTG data is essential for finding vehicle driving patterns and analytics. In particular, a big data analytics platform is required for preprocessing and converting large amounts of DTG data. In this paper, we implement a big data analytics framework based on cargo DTG data using Spark, which is an open source-based big data framework for crackdown on overloaded trucks. As the result of implementation, our proposed platform converts real large cargo DTG data sets into GIS data, and these are visualized by a map. It also recommends crackdown points.

A performance comparison for Apache Spark platform on environment of limited memory (제한된 메모리 환경에서의 아파치 스파크 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Lee, Jung-June;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.67-68
    • /
    • 2016
  • 최근 빅 데이터를 이용한 시스템들이 여러 분야에서 활발히 이용되기 시작하면서 대표적인 빅데이터 저장 및 처리 플랫폼인 하둡(Hadoop)의 기술적 단점을 보완할 수 있는 다양한 분산 시스템 플랫폼이 등장하고 있다. 그 중 아파치 스파크(Apache Spark)는 하둡 플랫폼의 속도저하 단점을 보완하기 위해 인 메모리 처리를 지원하여 대용량 데이터를 효율적으로 처리하는 오픈 소스 분산 데이터 처리 플랫폼이다. 하지만, 아파치 스파크의 작업은 메모리에 의존적이므로 제한된 메모리 환경에서 전체 작업 성능은 급격히 낮아진다. 본 논문에서는 메모리 용량에 따른 아파치 스파크 성능 비교를 통해 아파치 스파크 동작을 위해 필요한 적정 메모리 용량을 확인한다.

  • PDF

Design of a Platform for Collecting and Analyzing Agricultural Big Data (농업 빅데이터 수집 및 분석을 위한 플랫폼 설계)

  • Nguyen, Van-Quyet;Nguyen, Sinh Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.149-158
    • /
    • 2017
  • Big data have been presenting us with exciting opportunities and challenges in economic development. For instance, in the agriculture sector, mixing up of various agricultural data (e.g., weather data, soil data, etc.), and subsequently analyzing these data deliver valuable and helpful information to farmers and agribusinesses. However, massive data in agriculture are generated in every minute through multiple kinds of devices and services such as sensors and agricultural web markets. It leads to the challenges of big data problem including data collection, data storage, and data analysis. Although some systems have been proposed to address this problem, they are still restricted either in the type of data, the type of storage, or the size of data they can handle. In this paper, we propose a novel design of a platform for collecting and analyzing agricultural big data. The proposed platform supports (1) multiple methods of collecting data from various data sources using Flume and MapReduce; (2) multiple choices of data storage including HDFS, HBase, and Hive; and (3) big data analysis modules with Spark and Hadoop.

Item Recommendation Technique Using Spark (Spark를 이용한 항목 추천 기법에 관한 연구)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.5
    • /
    • pp.715-721
    • /
    • 2018
  • With the spread of mobile devices, the users of social network services or e-commerce sites have increased dramatically, and the amount of data produced by the users has increased exponentially. E-commerce companies have faced a task regarding how to extract useful information from a vast amount of data produced by the users. To solve this problem, there are various studies applying big data processing technique. In this paper, we propose a collaborative filtering method that applies the tag weight in the Apache Spark platform. In order to elevate the accuracy of recommendation, the proposed method refines the tag data in the preprocessing process and categorizes the items and then applies the information of periods and tag weight to the estimate rating of the items. After generating RDD, we calculate item similarity and prediction values and recommend items to users. The experiment result indicated that the proposed method process large amounts of data quickly and improve the appropriateness of recommendation better.

Meta's Metaverse Platform Design in the Pre-launch and Ignition Life Stage

  • Song, Minzheong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.121-131
    • /
    • 2022
  • We look at the initial stage of Meta (previous Facebook)'s new metaverse platform and investigate its platform design in pre-launch and ignition life stage. From the Rocket Model (RM)'s theoretical logic, the results reveal that Meta firstly focuses on investing in key content developers by acquiring virtual reality (VR), video, music content firms and offering production support platform of the augmented reality (AR) content, 'Spark AR' last three years (2019~2021) for attracting high-potential developers and users. In terms of three matching criteria, Meta develops an Artificial Intelligence (AI) powered translation software, partners with Microsoft (MS) for cloud computing and AI, and develops an AI platform for realistic avatar, MyoSuite. In 'connect' function, Meta curates the game concept submitted by game developers, welcomes other game and SNS based metaverse apps, and expands Horizon Worlds (HW) on VR devices to PCs and mobile devices. In 'transact' function, Meta offers 'HW Creator Funding' program for metaverse, launches the first commercialized Meta Avatar Store on Meta's conventional SNS and Messaging apps by inviting all fashion creators to design and sell clothing in this store. Mata also launches an initial test of non-fungible token (NFT) display on Instagram and expands it to Facebook in the US. Lastly, regarding optimization, especially in the face of recent data privacy issues that have adversely affected corporate key performance indicators (KPIs), Meta assures not to collect any new data and to make its privacy policy easier to understand and update its terms of service more user friendly.

Safety Autonomous Platform Design with Ensemble AI Models (앙상블 인공지능 모델을 활용한 안전 관리 자율운영 플랫폼 설계)

  • Dongyeop Lee;Daesik Lim;Soojeong Woo;Youngho Moon;Minjeong Kim;Joonwon Lee
    • Journal of Advanced Navigation Technology
    • /
    • v.28 no.1
    • /
    • pp.159-162
    • /
    • 2024
  • This paper proposes a novel safety autonomous platform (SAP) architecture that can automatically and precisely manage on-site safety through ensemble artificial intelligence models generated from video information, worker's biometric information, and the safety rule to estimate the risk index. We practically designed the proposed SAP architecture by the Hadoop ecosystem with Kafka/NiFi, Spark/Hive, Hue, ELK (Elasticsearch, Logstash, Kibana), Ansible, etc., and confirmed that it worked well with safety mobility gateways for providing various safety applications.

A System Design for Real-Time Monitoring of Patient Waiting Time based on Open-Source Platform (오픈소스 플랫폼 기반의 실시간 환자 대기시간 모니터링 시스템 설계)

  • Ryu, Wooseok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.4
    • /
    • pp.575-580
    • /
    • 2018
  • This paper discusses system for real-time monitoring of patient waiting time in hospitals based on open-source platform. It is necessary to make use of open-source projects to develop a high-performance stream processing system, which analyzes and processes stream data in real time, with less cost. The Hadoop ecosystem is a well-known big data processing platform consisting of numerous open-source subprojects. This paper first defines several requirements for the monitoring system, and selects a few projects from the Hadoop ecosystem that are suited to meet the requirements. Then, the paper proposes system architecture and a detailed module design using Apache Spark, Apache Kafka, and so on. The proposed system can reduce development costs by using open-source projects and by acquiring data from legacy hospital information system. High-performance and fault-tolerance of the system can also be achieved through distributed processing.