• Title/Summary/Keyword: Big Data Environment

Search Result 962, Processing Time 0.03 seconds

A Study on the Data Collection Methods based Hadoop Distributed Environment (하둡 분산 환경 기반의 데이터 수집 기법 연구)

  • Jin, Go-Whan
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Many studies have been carried out for the development of big data utilization and analysis technology recently. There is a tendency that government agencies and companies to introduce a Hadoop of a processing platform for analyzing big data is increasing gradually. Increased interest with respect to the processing and analysis of these big data collection technology of data has become a major issue in parallel to it. However, study of the collection technology as compared to the study of data analysis techniques, it is insignificant situation. Therefore, in this paper, to build on the Hadoop cluster is a big data analysis platform, through the Apache sqoop, stylized from relational databases, to collect the data. In addition, to provide a sensor through the Apache flume, a system to collect on the basis of the data file of the Web application, the non-structured data such as log files to stream. The collection of data through these convergence would be able to utilize as a basic material of big data analysis.

Distributed Processing of Big Data Analysis based on R using SparkR (SparkR을 이용한 R 기반 빅데이터 분석의 분산 처리)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.1
    • /
    • pp.161-166
    • /
    • 2022
  • In this paper, we analyze the problems that occur when performing the big data analysis using R as a data analysis tool, and present the usefulness of the data analysis with SparkR which connects R and Spark to support distributed processing of big data effectively. First, we study the memory allocation problem of R which occurs when loading large amounts of data and performing operations, and the characteristics and programming environment of SparkR. And then, we perform the comparison analysis of the execution performance when linear regression analysis is performed in each environment. As a result of the analysis, it was shown that R can be used for data analysis through SparkR without additional language learning, and the code written in R can be effectively processed distributedly according to the increase in the number of nodes in the cluster.

Eco-System: REC Price Prediction Simulation in Cloud Computing Environment (Eco-System: 클라우드 컴퓨팅환경에서 REC 가격예측 시뮬레이션)

  • Cho, Kyucheol
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.1-8
    • /
    • 2014
  • Cloud computing helps big data processing to make various information using IT resources. The government has to start the RPS(Renewable Portfolio Standard) and induce the production of electricity using renewable energy equipment. And the government manages system to gather big data that is distributed geographically. The companies can purchase the REC(Renewable Energy Certificate) to other electricity generation companies to fill shortage among their duty from the system. Because of the RPS use voluntary competitive market in REC trade and the prices have the large variation, RPS is necessary to predict the equitable REC price using RPS big data. This paper proposed REC price prediction method base on fuzzy logic using the price trend and trading condition infra in REC market, that is modeled in cloud computing environment. Cloud computing helps to analyze correlation and variables that act on REC price within RPS big data and the analysis can be predict REC price by simulation. Fuzzy logic presents balanced REC average trading prices using the trading quantity and price. The model presents REC average trading price using the trading quantity and price and the method helps induce well-converged price in the long run in cloud computing environment.

On Implementing a Learning Environment for Big Data Processing using Raspberry Pi (라즈베리파이를 이용한 빅 데이터 처리 학습 환경 구축)

  • Hwang, Boram;Kim, Seonggyu
    • Journal of Digital Convergence
    • /
    • v.14 no.4
    • /
    • pp.251-258
    • /
    • 2016
  • Big data processing is a broad term for processing data sets so large or complex that traditional data processing applications are inadequate. Widespread use of smart devices results in a huge impact on the way we process data. Many organizations are contemplating how to incorporate or integrate those devices into their enterprise data systems. We have proposed a way to process big data by way of integrating Raspberry Pi into a Hadoop cluster as a computational grid. We have then shown the efficiency through several experiments and the ease of scaling of the proposed system.

A Empirical Study on Effects of Dynamic Capabilities and Entrepreneurial Orientation of SMEs on Big Data Utilization Intention (중소기업의 동적역량과 기업가지향성이 빅 데이터 활용의도에 미치는 영향에 관한 실증연구)

  • Han, Byung Jae;Yang, Dong Woo
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.237-253
    • /
    • 2018
  • In a rapidly changing environment, dynamic resources have become important factors for companies, the use of Big Data come into focus new core value of business but researches on the major resources and capabilities of companies are insufficient. In this study, the effect of dynamic capability and entrepreneurial orientation in the SMEs on the intention of Big Data utilization are explored. For the purpose of empirical analysis, the survey condusted of 364 domestic SMEs to analyze the effect of dynamic capability on the intention of Big Data utilization through entrepreneurial orientation, performed a parallel multi-parameter analysis of using SPSS Win Ver.22.0 and PROCESS macro v3.0. The results of hypothesis testing showing that dynamic resources and entrepreneurial orientation had positive influence intention of big data utilization. For the determinants of Big Data utilization related to AI it provide suggestions thereby improving the understanding of dynamic capabilities and entrepreneurial orientation and helping to improve the management of SMEs.

A Study on Securing Global Big Data Competitiveness based on its Environment Analysis (빅데이터 환경 분석과 글로벌 경쟁력 확보 방안에 대한 연구)

  • Moon, Seung Hyeog
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.2
    • /
    • pp.361-366
    • /
    • 2019
  • The amount of data created in the present intelligence information society is beyond imagination. Big data has a great diversity from every information via SNS and internet to the one created by government and enterprises. This various data is close at hand having infinite value as same as crude oil. Big data analysis and utilization by data mining over every areas in the modern industrial society is getting more important for finding useful correlation and strengthening forecasting power against the future uncertainty. Efficient management and utilization of big data produced by complex modern society will be researched in this paper. Also it addresses strategies and methods for securing overall industrial competitiveness, synergy creation among industries, cost reduction and effective application based on big data in the $4^{th}$ industrial revolution era.

Risk based policy at big data era: Case study of privacy invasion (빅 데이터 시대 위험기반의 정책 - 개인정보침해 사례를 중심으로 -)

  • Moon, Hyejung;Cho, Hyun Suk
    • Informatization Policy
    • /
    • v.19 no.4
    • /
    • pp.63-82
    • /
    • 2012
  • The world's best level of ICT(Information, Communication and Technology) infrastructure has experienced the world's worst level of ICT accident in Korea. The number of major accidents of privacy invasion has been three times larger than the total number of Internet user of Korea. The cause of the severe accident was due to big data environment. As a result, big data environment has become an important policy agenda. This paper has conducted analyzing the accident case of data spill to study policy issues for ICT security from a social science perspective focusing on risk. The results from case analysis are as follows. First, ICT risk can be categorized 'severe, strong, intensive and individual'from the level of both probability and impact. Second, strategy of risk management can be designated 'avoid, transfer, mitigate, accept' by understanding their own culture type of relative group such as 'hierarchy, egalitarianism, fatalism and individualism'. Third, personal data has contained characteristics of big data such like 'volume, velocity, variety' for each risk situation. Therefore, government needs to establish a standing organization responsible for ICT risk policy and management in a new big data era. And the policy for ICT risk management needs to balance in considering 'technology, norms, laws, and market'in big data era.

  • PDF

Advanced Resource Management with Access Control for Multitenant Hadoop

  • Won, Heesun;Nguyen, Minh Chau;Gil, Myeong-Seon;Moon, Yang-Sae
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.592-601
    • /
    • 2015
  • Multitenancy has gained growing importance with the development and evolution of cloud computing technology. In a multitenant environment, multiple tenants with different demands can share a variety of computing resources (e.g., CPU, memory, storage, network, and data) within a single system, while each tenant remains logically isolated. This useful multitenancy concept offers highly efficient, and cost-effective systems without wasting computing resources to enterprises requiring similar environments for data processing and management. In this paper, we propose a novel approach supporting multitenancy features for Apache Hadoop, a large scale distributed system commonly used for processing big data. We first analyze the Hadoop framework focusing on "yet another resource negotiator (YARN)", which is responsible for managing resources, application runtime, and access control in the latest version of Hadoop. We then define the problems for supporting multitenancy and formally derive the requirements to solve these problems. Based on these requirements, we design the details of multitenant Hadoop. We also present experimental results to validate the data access control and to evaluate the performance enhancement of multitenant Hadoop.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

Comparative study on NoSQL for Processing a Big Data (빅데이터 처리에 관한 NoSQL 비교연구)

  • Jang, Rae-Young;Bae, Jung-Min;Jung, Sung-Jae;Soh, Woo-Young;Sung, Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.351-354
    • /
    • 2014
  • The emergence of big data has brought many changes to the database management environment. the each amount of big data will increase, but each data size is smaller and simpler. This feature was required to a new data processing techniques. Accordingly, A variety database technology was provided to Specializing in big data processing. It is defined as NoSQL. NoSQL is how to use each different, according to the data characteristics. It is difficult to define one. In this paper, Classified according to the characteristics of each type of NoSQL Appropriate NoSQL is proposed.

  • PDF