• Title/Summary/Keyword: SparkR

Search Result 81, Processing Time 0.021 seconds

A Performance Comparison of Machine Learning Library based on Apache Spark for Real-time Data Processing (실시간 데이터 처리를 위한 아파치 스파크 기반 기계 학습 라이브러리 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Song, Byung-Hoo;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.15-16
    • /
    • 2017
  • IoT 시대가 도래함에 따라 실시간으로 대규모 데이터가 발생하고 있으며 이를 효율적으로 처리하고 활용하기 위한 분산 처리 및 기계 학습에 대한 관심이 높아지고 있다. 아파치 스파크는 RDD 기반의 인 메모리 처리 방식을 지원하는 분산 처리 플랫폼으로 다양한 기계 학습 라이브러리와의 연동을 지원하여 최근 차세대 빅 데이터 분석 엔진으로 주목받고 있다. 본 논문에서는 아파치 스파크 기반 기계 학습 라이브러리 성능 비교를 통해 아파치 스파크와 연동 가능한 기계 학습라이브러리인 MLlib와 아파치 머하웃, SparkR의 데이터 처리 성능을 비교한다. 이를 위해, 대표적인 기계 학습 알고리즘인 나이브 베이즈 알고리즘을 사용했으며 학습 시간 및 예측 시간을 비교하여 아파치 스파크 기반에서 실시간 데이터 처리에 적합한 기계 학습 라이브러리를 확인한다.

  • PDF

Performance Comparison of Python and Scala APIs in Spark Distributed Cluster Computing System (Spark 기반에서 Python과 Scala API의 성능 비교 분석)

  • Ji, Keung-yeup;Kwon, Youngmi
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.2
    • /
    • pp.241-246
    • /
    • 2020
  • Hadoop is a framework to process large data sets in a distributed way across clusters of nodes. It has been a popular platform to process big data, but in recent years, other platforms became competitive ones depending on the characteristics of the application. Spark is one of distributed platforms to enable real-time data processing and improve overall processing performance over Hadoop by introducing in-memory processing instead of disk I/O. Whereas Hadoop is designed to work on Java and data analysis is processed using Java API, Spark provides a variety of APIs with Scala, Python, Java and R. In this paper, the goal is to find out whether the APIs of different programming languages af ect the performances in Spark. We chose two popular APIs: Python and Scala. Python is easy to learn and is used in AI domain in a wide range. Scala is a programming language with advantages of parallelism. Our experiment shows much faster processing with Scala API than Python API. For the performance issues on AI-based analysis, further study is needed.

Distributed Indexing Methods for Moving Objects based on Spark Stream

  • Lee, Yunsou;Song, Seokil
    • International Journal of Contents
    • /
    • v.11 no.1
    • /
    • pp.69-72
    • /
    • 2015
  • Generally, existing parallel main-memory spatial index structures to avoid the trade-off between query freshness and CPU cost uses light-weight locking techniques. However, still, the lock based methods have some limits such as thrashing which is a well-known problem in lock based methods. In this paper, we propose a distributed index structure for moving objects exploiting the parallelism in multiple machines. The proposed index is a lock free multi-version concurrency technique based on the D-Stream model of Spark Stream. The proposed method exploits the multiversion nature of D-Stream of Spark Streaming.

Particulate Emissions from a Direct Injection Spark-ignition Engine Fuelled with Gasoline and LPG (가솔린 및 LPG 연료를 사용하는 직접분사식 불꽃점화엔진에서 배출되는 극미세입자 배출 특성에 관한 연구)

  • Lee, Seok-Hwan;Oh, Seung-Mook;Kang, Kern-Yong;Cho, Jun-Ho;Cha, Kyoung-Ok
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.19 no.3
    • /
    • pp.65-72
    • /
    • 2011
  • In this study, the numbers, sizes of particles from a single cylinder direct injection spark-ignition (DISI) engine fuelled with gasoline and LPG are examined over a wide range of engine operating conditions. Tests are conducted with various engine loads (2~10bar of IMEP) and fuel injection pressures (60, 90, and 120 bar) at the engine speed of 1,500 rpm. Particles are sampled directly from the exhaust pipe using rotating disk thermodiluter. The size distributions are measured using a scanning mobility particle sizer (SMPS) and the particle number concentrations are measured using a condensation particle counter (CPC). The results show that maximum brake torque (MBT) timing for LPG fuel is less sensitive to engine load and its combustion stability is also better than that for gasoline fuel. The total particle number concentration for LPG was lower by a factor of 100 compared to the results of gasoline emission due to the good vaporization characteristic of LPG. Test result presents that LPG for direct injection spark ignition engine help the particle emission level to reduce.

Compressive Deformation Behavior of Al-10Si-5Fe-1Zr Powder Alloys Consolidated by Spark Plasma Sintering Process (Spark Plasma Sintering법에 의해 예비 성형된 Al-10Si-5Fe-1Zr 분말합금의 고온 압축변형 거동)

  • Park, Sang-Choon;Kim, Mok-Soon;Kim, Kyung-Taek;Shin, Seung-Young;Lee, Jeong-Keun;Ryu, Kwan-Ho
    • Korean Journal of Metals and Materials
    • /
    • v.49 no.11
    • /
    • pp.853-859
    • /
    • 2011
  • Compressive deformation behavior of Al-10Si-5Fe-1Zr (wt%) alloy preform fabricated by SPS(spark plasma sintering) of gas atomized powder was investigated in the temperature range from 380 to $480^{\circ}C$ and at strain rates from $1.0{\times}10^{-3}$ to $1.0{\times}10^{0}s^{-1}$. Stress-strain curves showed a peak stress (${\sigma}_p$) during initial stage of deformation, followed by a steady state flow at all temperatures and strain rates tested. The (${\sigma}_p$) decreased with both increase in temperature and decrease in strain rate. Nearly full densification was found to occur in the compressively deformed specimens irrespective of test condition. TEM observation revealed a restricted grain growth during steady state flow.

Mechanical Property Evaluation of WC-Co-B4C Hard Materials by a Spark Plasma Sintering Process (방전플라즈마 소결 공정을 이용한 WC-Co-B4C 소재의 기계적 특성평가)

  • Lee, Jeong-Han;Park, Hyun-Kuk
    • Korean Journal of Materials Research
    • /
    • v.31 no.7
    • /
    • pp.397-402
    • /
    • 2021
  • In this study, binderless-WC, WC-6 wt%Co, WC-6wt% 1 and 2.5 B4C materials are fabricated by spark plasma sintering process (SPS process). Each fabricated WC material is almost completely dense, with a relative density up to 99.5 % after the simultaneous application of pressure of 60 MPa. The WC added Co and Co-B4C materials resulted in crystalline growth. The WC with HCP crystal structure has respective interfacial energy (basal facet direction: 1.07 ~ 1.34 J·m-2, prismatic direction: 1.43 ~ 3.02 J·m-2) that depends on the grain growth direction. It is confirmed that the continuous grain growth, biased by the basal facet, which has relatively low energy, is promoted at the WC/Co interface. As abnormal grain growth takes place, the grain size increases more than twice from 0.37 to 0.8 um. It is found through analysis that the hardness property also greatly decreases from about 2661.4 to 1721.4 kg/mm2, along with the grain growth.

Consolidation of Bulk Metallic Glass Composites

  • Lee, Jin-Kyu;Kim, Hwi-Jun;Kim, Taek-Soo;Shin, Seung-Yong;Bae, Jung-Chan
    • Proceedings of the Korean Powder Metallurgy Institute Conference
    • /
    • 2006.09b
    • /
    • pp.848-849
    • /
    • 2006
  • Bulk metallic glass (BMG) composites combining a $Cu_{54}Ni_6Zr_{22}Ti_{18}$ matrix with brass powders or $Zr_{62}A_{l8}Ni_{13}Cu_{17}$ metallic glass powders were fabricated by spark plasma sintering. The brass powders and Zr-based metallic glass powders added for the enhancement of plasticity are well distributed homogeneously in the Cu-based metallic glass matrix after consolidation. The BMG composites show macroscopic plasticity after yielding, and the plastic strain increased to around 2% without a decrease in strength for the composite material containing 20 vol% Zr-based amorphous powders. The proper combination of strength and plasticity in the BMG composites was obtained by introducing a second phase in the metallic glass matrix.

  • PDF

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

A Study for its Characteristics with Electric Variation in an Electrical Discharge Machining (방전가공에서 전기적 변화가 갖는 방전 특성에 관한 연구)

  • 신근하
    • Journal of the Korean Society of Manufacturing Technology Engineers
    • /
    • v.6 no.4
    • /
    • pp.72-79
    • /
    • 1997
  • A study is a experiment which is figure out to optimum discharge cutting condition of the surface roughness, electronic discharging speed and electrode wear ration with Ton , Toff and V(voltage) as an input condition according to the current(Ip) in an electric spark machine : 1) Electrode is utilized Cu and Graphite. 2) Work piece is used the material of carbon steel. The condition of experiment is : 1) Current is varied 0.7(A) to 50(A) and the time of electric discharging to work piece in each time is 30(min) to 60(min). 2) After the upper side of work piece was measured in radius(5$\mu$m) of stylus analyzed the surface roughness to ade the table and graph of Rmax by yielding data. 3) Electro wear ratio is : \circled1Cooper was measured ex-machining and post-machining by the electronic balance. \circled2The ex-machining of graphite measured by it, the post-machining was found the data from volume $\times$specific gravity and analyzed to made its table and graph on ground the data. 4) In order to keep the accuracy of voltage affected to the work piece was equipped with the A.V. R and the memory scope was sticked to the electric spark machine. 5) In order to preserve the precision of current, to get rid of the noise occured by internal resistance of electric spark machine and to force injecting for the discharge fluid , it made the fixed table for a work piece to minimize the work error by means of one's failure during the electric discharging.

  • PDF

S-PARAFAC: Distributed Tensor Decomposition using Apache Spark (S-PARAFAC: 아파치 스파크를 이용한 분산 텐서 분해)

  • Yang, Hye-Kyung;Yong, Hwan-Seung
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.280-287
    • /
    • 2018
  • Recently, the use of a recommendation system and tensor data analysis, which has high-dimensional data, is increasing, as they allow us to analyze the tensor and extract potential elements and patterns. However, due to the large size and complexity of the tensor, it needs to be decomposed in order to analyze the tensor data. While several tools are used for tensor decomposition such as rTensor, pyTensor, and MATLAB, since such tools run on a single machine, they are unable to handle large data. Also, while distributed tensor decomposition tools based on Hadoop can handle a scalable tensor, its computing speed is too slow. In this paper, we propose S-PARAFAC, which is a tensor decomposition tool based on Apache Spark, in distributed in-memory environments. We converted the PARAFAC algorithm into an Apache Spark version that enables rapid processing of tensor data. We also compared the performance of the Hadoop based tensor tool and S-PARAFAC. The result showed that S-PARAFAC is approximately 4~25 times faster than the Hadoop based tensor tool.