• Title/Summary/Keyword: Big data platform

Search Result 506, Processing Time 0.023 seconds

Cloud Computing Platforms for Big Data Adoption and Analytics

  • Hussain, Mohammad Jabed;Alsadie, Deafallah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.290-296
    • /
    • 2022
  • Big Data is a data analysis technology empowered by late advances in innovations and engineering. In any case, big data involves a colossal responsibility of equipment and handling assets, making reception expenses of big data innovation restrictive to little and medium estimated organizations. Cloud computing offers the guarantee of big data execution to little and medium measured organizations. Big Data preparing is performed through a programming worldview known as MapReduce. Normally, execution of the MapReduce worldview requires organized joined stockpiling and equal preparing. The computing needs of MapReduce writing computer programs are frequently past what little and medium measured business can submit. Cloud computing is on-request network admittance to computing assets, given by an external element. Normal arrangement models for cloud computing incorporate platform as a service (PaaS), software as a service (SaaS), framework as a service (IaaS), and equipment as a service (HaaS).

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.

Methodology for Evaluating Big Data Platforms Performance in the Domestic Electronic Power Industry (국내 전력산업에서의 빅데이터 플랫폼 성과 평가 방법론)

  • Cho, Chisun;Lee, Nangyu;Hahm, Yukun
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.97-108
    • /
    • 2020
  • As the domestic electric power industry becomes a smart grid, big data platforms for demand management, facility management, and customer service have been deployed. However, due to the nature of the big data project, big data platforms take time to realize their value in the business processes. Therefore, it is not easy to evaluate the performance of the initial big data platforms using the known or theoretical evaluation methods. In this paper, we propose a methodology of big data platform performance evaluation based on specific information quality such as information completeness/sufficiency, information reliability, information relevancy, information comparability, information unbiasedness, timeliness of information, related to the volume, diversity, and velocity of big data.

A Study on Open API of Securities and Investment Companies in Korea for Activating Big Data

  • Ryu, Gui Yeol
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.102-108
    • /
    • 2019
  • Big data was associated with three key concepts, volume, variety, and velocity. Securities and investment services produce and store a large data of text/numbers. They have also the most data per company on the average in the US. Gartner found that the demand for big data in finance was 25%, which was the highest. Therefore securities and investment companies produce the largest data such as text/numbers, and have the highest demand. And insurance companies and credit card companies are using big data more actively than banking companies in Korea. Researches on the use of big data in securities and investment companies have been found to be insignificant. We surveyed 22 major securities and investment companies in Korea for activating big data. We can see they actively use AI for investment recommend. As for big data of securities and investment companies, we studied open API. Of the major 22 securities and investment companies, only six securities and investment companies are offering open APIs. The user OS is 100% Windows, and the language used is mainly VB, C#, MFC, and Excel provided by Windows. There is a difficulty in real-time analysis and decision making since developers cannot receive data directly using Hadoop, the big data platform. Development manuals are mainly provided on the Web, and only three companies provide as files. The development documentation for the file format is more convenient than web type. In order to activate big data in the securities and investment fields, we found that they should support Linux, and Java, Python, easy-to-view development manuals, videos such as YouTube.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.

Cross platform classification of microarrays by rank comparison

  • Lee, Sunho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.475-486
    • /
    • 2015
  • Mining the microarray data accumulated in the public data repositories can save experimental cost and time and provide valuable biomedical information. Big data analysis pooling multiple data sets increases statistical power, improves the reliability of the results, and reduces the specific bias of the individual study. However, integrating several data sets from different studies is needed to deal with many problems. In this study, I limited the focus to the cross platform classification that the platform of a testing sample is different from the platform of a training set, and suggested a simple classification method based on rank. This method is compared with the diagonal linear discriminant analysis, k nearest neighbor method and support vector machine using the cross platform real example data sets of two cancers.

Efficient Association Rule Mining based SON Algorithm for a Bigdata Platform (빅데이터 플랫폼을 위한 SON알고리즘 기반의 효과적인 연관 룰 마이닝)

  • Nguyen, Giang-Truong;Nguyen, Van-Quyet;Nguyen, Sinh-Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.8
    • /
    • pp.1593-1601
    • /
    • 2017
  • In a big data platform, association rule mining applications could bring some benefits. For instance, in a agricultural big data platform, the association rule mining application could recommend specific products for farmers to grow, which could increase income. The key process of the association rule mining is the frequent itemsets mining, which finds sets of products accompanying together frequently. Former researches about this issue, e.g. Apriori, are not satisfying enough because huge possible sets can cause memory to be overloaded. In order to deal with it, SON algorithm has been proposed, which divides the considered set into many smaller ones and handles them sequently. But in a single machine, SON algorithm cause heavy time consuming. In this paper, we present a method to find association rules in our Hadoop based big data platform, by parallelling SON algorithm. The entire process of association rule mining including pre-processing, SON algorithm based frequent itemset mining, and association rule finding is implemented on Hadoop based big data platform. Through the experiment with real dataset, it is conformed that the proposed method outperforms a brute force method.

A Study on the Data Collection Methods based Hadoop Distributed Environment (하둡 분산 환경 기반의 데이터 수집 기법 연구)

  • Jin, Go-Whan
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Many studies have been carried out for the development of big data utilization and analysis technology recently. There is a tendency that government agencies and companies to introduce a Hadoop of a processing platform for analyzing big data is increasing gradually. Increased interest with respect to the processing and analysis of these big data collection technology of data has become a major issue in parallel to it. However, study of the collection technology as compared to the study of data analysis techniques, it is insignificant situation. Therefore, in this paper, to build on the Hadoop cluster is a big data analysis platform, through the Apache sqoop, stylized from relational databases, to collect the data. In addition, to provide a sensor through the Apache flume, a system to collect on the basis of the data file of the Web application, the non-structured data such as log files to stream. The collection of data through these convergence would be able to utilize as a basic material of big data analysis.

Policy Achievements and Tasks for Using Big-Data in Regional Tourism -The Case of Jeju Special Self-Governing Province- (지역관광 빅데이터 정책성과와 과제 -제주특별자치도를 사례로-)

  • Koh, Sun-Young;JEONG, GEUNOH
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.579-586
    • /
    • 2021
  • This study examines the application of big data and tasks of tourism based on the case of Jeju Special Self-Governing Province, which used big data for regional tourism policy. Through the use of big data, it is possible to understand rapidly changing tourism trends and trends in the tourism industry in a timely and detailed manner. and also could be used to elaborate existing tourism statistics. In addition, beyond the level of big data analysis to understand tourism phenomena, its scope has expanded to provide a platform for providing real-time customized services. This was made possible by the cooperative governance of industry, government, and academia for data building, analysis, infrastructure, and utilization. As a task, the limitation of budget dependence and institutional problems such as the infrastructure for building personal-level data for personalized services, which are the ultimate goal of smart tourism, and the Personal Information Protection Act remain. In addition, expertise and technical limitations for data analysis and data linkage remain.

Review of Fintech and Bigdata Technology (핀테크와 빅데이터 기술에 대한 리뷰)

  • Choi, Gi Woo
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.77-84
    • /
    • 2016
  • We investigate the types and characteristics of Fintech has become a major issue. Through this, we believe that the essence of Fintech are platform business and market occupancy. To success Fintech business, the price of Fintech services needs to be lower than that of traditional financial services. The solution is to take advantage of big data and big data analysis. Finally, we think only a win-win cooperation with Fintech startups and financial companies in the direction we need to go.

  • PDF