• Title/Summary/Keyword: Large Scale Data

Search Result 2,796, Processing Time 0.044 seconds

Leveraging Big Data for Spark Deep Learning to Predict Rating

  • Mishra, Monika;Kang, Mingoo;Woo, Jongwook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.33-39
    • /
    • 2020
  • The paper is to build recommendation systems leveraging Deep Learning and Big Data platform, Spark to predict item ratings of the Amazon e-commerce site. Recommendation system in e-commerce has become extremely popular in recent years and it is very important for both customers and sellers in daily life. It means providing the users with products and services they are interested in. Therecommendation systems need users' previous shopping activities and digital footprints to make best recommendation purpose for next item shopping. We developed the recommendation models in Amazon AWS Cloud services to predict the users' ratings for the items with the massive data set of Amazon customer reviews. We also present Big Data architecture to afford the large scale data set for storing and computation. And, we adopted deep learning for machine learning community as it is known that it has higher accuracy for the massive data set. In the end, a comparative conclusion in terms of the accuracy as well as the performance is illustrated with the Deep Learning architecture with Spark ML and the traditional Big Data architecture, Spark ML alone.

A Probabilistic Tensor Factorization approach for Missing Data Inference in Mobile Crowd-Sensing

  • Akter, Shathee;Yoon, Seokhoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.63-72
    • /
    • 2021
  • Mobile crowd-sensing (MCS) is a promising sensing paradigm that leverages mobile users with smart devices to perform large-scale sensing tasks in order to provide services to specific applications in various domains. However, MCS sensing tasks may not always be successfully completed or timely completed for various reasons, such as accidentally leaving the tasks incomplete by the users, asynchronous transmission, or connection errors. This results in missing sensing data at specific locations and times, which can degrade the performance of the applications and lead to serious casualties. Therefore, in this paper, we propose a missing data inference approach, called missing data approximation with probabilistic tensor factorization (MDI-PTF), to approximate the missing values as closely as possible to the actual values while taking asynchronous data transmission time and different sensing locations of the mobile users into account. The proposed method first normalizes the data to limit the range of the possible values. Next, a probabilistic model of tensor factorization is formulated, and finally, the data are approximated using the gradient descent method. The performance of the proposed algorithm is verified by conducting simulations under various situations using different datasets.

Image-based rainfall prediction from a novel deep learning method

  • Byun, Jongyun;Kim, Jinwon;Jun, Changhyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.183-183
    • /
    • 2021
  • Deep learning methods and their application have become an essential part of prediction and modeling in water-related research areas, including hydrological processes, climate change, etc. It is known that application of deep learning leads to high availability of data sources in hydrology, which shows its usefulness in analysis of precipitation, runoff, groundwater level, evapotranspiration, and so on. However, there is still a limitation on microclimate analysis and prediction with deep learning methods because of deficiency of gauge-based data and shortcomings of existing technologies. In this study, a real-time rainfall prediction model was developed from a sky image data set with convolutional neural networks (CNNs). These daily image data were collected at Chung-Ang University and Korea University. For high accuracy of the proposed model, it considers data classification, image processing, ratio adjustment of no-rain data. Rainfall prediction data were compared with minutely rainfall data at rain gauge stations close to image sensors. It indicates that the proposed model could offer an interpolation of current rainfall observation system and have large potential to fill an observation gap. Information from small-scaled areas leads to advance in accurate weather forecasting and hydrological modeling at a micro scale.

  • PDF

Analysis of the Present Status and Future Prospects for Smart Agriculture Technologies in South Korea Using National R&D Project Data

  • Lee, Sujin;Park, Jun-Hwan;Kim, EunSun;Jang, Wooseok
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.spc
    • /
    • pp.112-122
    • /
    • 2022
  • Food security and its sovereignty have become among the most important key issues due to changes in the international situation. Regarding these issues, many countries now give attention to smart agriculture, which would increase production efficiency through a data-based system. The Korean government also has attempted to promote smart agriculture by 1) implementing the agri-food ICT (information and communications technology) policy, and 2) increasing the R&D budget by more than double in recent years. However, its endeavors only centered on large-scale farms which a number of domestic farmers rarely utilized in their farming. To promote smart agriculture more effectively, we diagnosed the government R&D trends of smart agriculture based on NTIS (National Science and Technology Information Service) data. We identified the research trends for each R&D period by analyzing three pieces of information: the regional information, research actor, and topic. Based on these findings, we could suggest systematic R&D directions and implications.

Efficient and Secure Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy

  • Yu, Haiyang;Cai, Yongquan;Kong, Shanshan;Ning, Zhenhu;Xue, Fei;Zhong, Han
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5039-5061
    • /
    • 2017
  • Cloud storage becomes a new trend that more and more users move their data to cloud storage servers (CSSs). To ensure the security of cloud storage, many cloud auditing schemes are proposed to check the integrity of users' cloud data. However, most of them are based on public key infrastructure, which leads to complex certificates management and verification. Besides, most existing auditing schemes are inefficient when user uploads a large amount of data or a third party auditor (TPA) performs auditing for multiple users' data on different CSSs. To overcome these problems, in this paper, we propose an efficient and secure auditing scheme based on identity-based cryptography. To relieve user's computation burden, we introduce a proxy, which is delegated to generate and upload homomorphic verifiable tags for user. We extend our auditing scheme to support auditing for dynamic data operations. We further extend it to support batch auditing in multiple users and multiple CSSs setting, which is practical and efficient in large scale cloud storage system. Extensive security analysis shows that our scheme is provably secure in random oracle model. Performance analysis demonstrates that our scheme is highly efficient, especially reducing the computation cost of proxy and TPA.

Accuracy evaluation of near-surface air temperature from ERA-Interim reanalysis and satellite-based data according to elevation

  • Ryu, Jae-Hyun;Han, Kyung-Soo;Park, Eun-Bin
    • Korean Journal of Remote Sensing
    • /
    • v.29 no.6
    • /
    • pp.595-600
    • /
    • 2013
  • In order to spatially interpolate the near-surface temperature (Ta) values, satellite and reanalysis methods were used from previous studies. Accuracy of reanalysis Ta was generally better than that of satellite-based Ta, but spatial resolution of reanalysis Ta was large to use at local scale studies. Our purpose is to evaluate accuracy of reanalysis Ta and satellite-based Ta according to elevation from April 2011 to March 2012 in Northeast Asia that includes various topographic features. In this study, we used reanalysis data that is ERA-Interim produced by European Centre for Medium-Range Weather Forecasts (ECMWF), and estimated satellite-based Ta using Digital Elevation Meter (DEM), Normalized Difference Vegetation Index (NDVI), difference between brightness temperature of $11{\mu}m$ and $12{\mu}m$, and Land Surface Temperature (LST) data. The DEM data was used as auxiliary data, and observed Ta at 470 meteorological stations was used in order to evaluate accuracy. We confirmed that the accuracy of satellite-based Ta was less accurate than that of ERA-Interim Ta for total data. Results of analyzing according to elevation that was divided nine cases, ERA-Interim Ta showed higher accurate than satellite-based Ta at the low elevation (less than 500 m). However, satellite-based Ta was more accurate than ERA-Interim Ta at the higher elevation from 500 to 3500 m. Also, the width of the upper and lower quartile appeared largely from 2500 to 3500 m. It is clear from these results that ERA-Interim Ta do not consider elevation because of large spatial resolution. Therefore, satellite-based Ta was more effective than ERA-Interim Ta in the regions that is range from 500 m to 3500 m, and satellite-based Ta was recommended at a region of above 2500 m.

An Analysis of the Port Transportation System (항만운송시스템의 분석에 관한 연구)

  • 이철영;문성혁
    • Journal of the Korean Institute of Navigation
    • /
    • v.7 no.1
    • /
    • pp.1-32
    • /
    • 1983
  • The delay due to congestion has recently attracted widespread attention with the analysis of over-all operation at the port. But, the complexity of the situation is evident in view of the large number of factors which impinge on the considerable end. Queueing theory is applicable to a large scale transportation system which is associated with arrivals of vessels in a large port. The attempt of this paper is to make an extensive analysis of the port transport system and its economic implications from the viewpoint that port is one of the physical distribution facilities and a kind of queueing system which includes ships and cargoes as port customer. By analyzing the real data on the Port of Pusan, it is known that this port can be represented as a set of multi-channel with identical setof Poisson arrival and Erlang service time, and also it is confirmed that the following formula is suitable to calculate the mean delay in this port, namely, $W_4={\frac{\rho}{\lambda(1-\rho)} {\frac{e_N(\rho{\cdot}N)}{D_{N-1}(\rho{\cdot}N)}$ where, ${\lambda}$: mean arrival rate $\mu$: mean servicing rate; N: number of servicing channel; ${\rho}$: utillization rate (${\lambda}/N{\mu}$) $e_N$: the Poisson function Coming to grips with the essentials of the cost of delay due to congestion, a simple ship journey cost model is adopted and the operating profit sensitivity to variation in port time is examined, and for purpose of a future development for port princing service the marginal cost is approximately calculated on the basis of queueing theory.

  • PDF

Design of the new parallel processing architecture for commercial applications (상용 응용을 위한 병렬처리 구조 설계)

  • 한우종;윤석한;임기욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.5
    • /
    • pp.41-51
    • /
    • 1996
  • In this paper, anew parallel processing system based on a cluster architecture which provides scalability of a parallel processing system while maintains shared memory multiprocessor characteristics is proposed. In recent days low cost, high performnce microprocessors have led to construction of large scale parallel processing systems. Such parallel processing systems provides large scalability but are mainly used for scientific applications which have large data parallelism. A shared memory multiprocessor system like TICOM is currently used as aserver for the commercial application, however, the shared memory multiprocessor system is known to have very limited scalability. The proposed architecture can support scalability and performance of the parallel processing system while it provides adaptability for the commerical application, hence it can overcome the limitation of the shared memory multiprocessor. The architecture and characteristics of the proposed system shall be described. A proprietary hierarchical crsossbar network is designed for this system, of which the protocol, routing and switching technique and the signal transfer technique are optimized for the proposed architecture. The design trade-offs for the network are described in this paper and with simulation usihng the SES/workbench, it is explored that the network fits to the proposed architecture.

  • PDF

Fluctuation of Transport Properties of Random Heterogeneous Media (비정형 혼합재 이동성질의 변동)

  • Kim, In-Chan
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.20 no.9
    • /
    • pp.3015-3029
    • /
    • 1996
  • The notion of effective transport property of a heterogeneous medium implies that the medium is large enough that the ergodic theorem holds and local fluctuation of the property can be neglected. In case that the medium is not large enough compared to its characteristic microstructure length scale, the effective property fluctuates and differs from the value of the medium being large enough. As a representative transport phenomenon, diffusion was considered and the fluctuation of varying effective diffusion property, diffusion coarseness $C_k$, was defined as a quantifying parameter. Scaled effective diffusion property, $^*$>/k$_1$ and $C_k$ were computed for the two phase random media consisting of matrix of diffusion coefficient k$_1$ and spheres of diffusion coefficient k$_2$. Numerical simulations were performed by use of the so-called first passage time technique and data were collected for existing microstructure models of hard spheres(HS), overlapping spheres(OS) and penetrable concentric shells(PCS).

Large Eddy Simulation and Parametric Study of Turbulent Flow Characteristics in the Internal Combustion Chamber using SGS Model (연소실 내 난류유동장 특성에 대한 아격자 모델을 사용한 LES 모사 및 관련인자 영향 평가)

  • Nam, Seung Man;Lee, Kye Bock
    • Journal of Energy Engineering
    • /
    • v.21 no.3
    • /
    • pp.228-236
    • /
    • 2012
  • Large eddy simulation (LES) is increasingly used as a tool for studying the dynamics of turbulence in combustion chamber flows due to the promise of wider generality and more accurate results compared to Reynolds averaged Navier-Stokes(RANS) models. This study presents the appropriate subgrid-scale(SGS) model in LES for predicting the turbulent flow field in the internal combustion engine. The study of the effects of model and numerical parameters such as discretization scheme, initial condition, time step and SGS model was performed. The results of LES using the SGS model were found to be in the good agreement with experimental data.