• Title/Summary/Keyword: Big-data Software

Search Result 447, Processing Time 0.02 seconds

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

A review of big data analytics and healthcare (빅데이터 분석과 헬스케어에 대한 동향)

  • Moon, Seok-Jae;Lee, Namju
    • Journal of the Korean Applied Science and Technology
    • /
    • v.37 no.1
    • /
    • pp.76-82
    • /
    • 2020
  • Big data analysis in healthcare research seems to be a necessary strategy for the convergence of sports science and technology in the era of the Fourth Industrial Revolution. The purpose of this study is to provide the basic review to secure the diversity of big data and healthcare convergence by discussing the concept, analysis method, and application examples of big data and by exploring the application. Text mining, data mining, opinion mining, process mining, cluster analysis, and social network analysis is currently used. Identifying high-risk factor for a certain condition, determining specific health determinants for diseases, monitoring bio signals, predicting diseases, providing training and treatments, and analyzing healthcare measurements would be possible via big data analysis. As a further work, the big data characteristics provide very appropriate basis to use promising software platforms for development of applications that can handle big data in healthcare and even more in sports science.

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments (하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발)

  • Kim, Dae-Ho;Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.11
    • /
    • pp.1785-1792
    • /
    • 2017
  • Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.

A Leading Study of Data Lake Platform based on Big Data to support Business Intelligence (Business Intelligence를 지원하기 위한 Big Data 기반 Data Lake 플랫폼의 선행 연구)

  • Lee, Sang-Beom
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.01a
    • /
    • pp.31-34
    • /
    • 2018
  • We live in the digital era, and the characteristics of our customers in the digital era are constantly changing. That's why understanding business requirements and converting them to technical requirements is essential, and you have to understand the data model behind the business layout. Moreover, BI(Business Intelligence) is at the crux of revolutionizing enterprise to minimize losses and maximize profits. In this paper, we have described a leading study about the situation of desk-top BI(software product & programming language) in aspect of front-end side and the Data Lake platform based on Big Data by data modeling in aspect of back-end side to support the business intelligence.

  • PDF

In Small and Medium Business the Government 3.0-based Big Data Utilization Policy (중소기업에서 정부 3.0기반의 빅 데이터 활용정책)

  • Cho, Young-Bok;Woo, Seng-hee;Lee, Sang-Ho
    • Journal of Convergence Society for SMB
    • /
    • v.3 no.1
    • /
    • pp.15-22
    • /
    • 2013
  • Recently, in Korea lacks the innovation for small and medium enterprises the proportion of enterprises' capabilities are poor. In addition, sales of small business and medium scale venture are vulnerable because it is difficult to expect developments in the situation. thus the government 3.0 based small business and medium scale venture will present ways to take advantage of big data. Government 3.0 based big data infrastructure, small businesses and small and medium-sized ventures to build their autonomy is required so that you can take advantage of the platform advantage.

  • PDF

A Prediction of Number of Patients and Risk of Disease in Each Region Based on Pharmaceutical Prescription Data (의약품 처방 데이터 기반의 지역별 예상 환자수 및 위험도 예측)

  • Chang, Jeong Hyeon;Kim, Young Jae;Choi, Jong Hyeok;Kim, Chang Su;Aziz, Nasridinov
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.271-280
    • /
    • 2018
  • Recently, big data has been growing rapidly due to the development of IT technology. Especially in the medical field, big data is utilized to provide services such as patient-customized medical care, disease management and disease prediction. In Korea, 'National Health Alarm Service' is provided by National Health Insurance Corporation. However, the prediction model has a problem of short-term prediction within 3 days and unreliability of social data used in prediction model. In order to solve these problems, this paper proposes a disease prediction model using medicine prescription data generated from actual patients. This model predicts the total number of patients and the risk of disease in each region and uses the ARIMA model for long-term predictions.

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

ISO/IEC 9126 Quality Model-based Assessment Criteria for Measuring the Quality of Big Data Analysis Platform (빅데이터 분석 플랫폼 평가를 위한 ISO/IEC 9126 품질 모델 기반 평가준거 개발)

  • Lee, Jong Yun
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.459-467
    • /
    • 2015
  • The analysis platform of remote-sensing big data is a system that downloads data from satellites, transforms it to a data type of L3, and then analyzes it and produces its analysis results. The objective of this paper is to develop ISO/IEC 9126-1 software quality model-based assessment criteria, in order to evaluate the quality of remote-sensing big data analysis platform. Its detailed research contents are as follows. First, the ISO/IEC 9216 standards and previous software evaluation models will be reviewed. Second, this paper will define evaluation areas, evaluation elements, and evaluation items for measuring the quality of big data analysis platform. Third, the validity of the assessment criteria will be verified by statistical experiments through content validity, reliability validity, and construct validity, by using SPSS 20.0 and Amos 20.0 software. The construct validity will also be conducted by performing the confirmatory factor analysis and path analysis. Lastly, it is significant that our research result demonstrates the first evaluation criteria in measuring the quality of big data analysis platform. It is also expected that our assessment criteria could be used as the basis information for evaluation criteria in the platforms that will be developed in the future.

Development of Data Profiling Software Supporting a Microservice Architecture (마이크로 서비스 아키텍처를 지원하는 데이터 프로파일링 소프트웨어의 개발)

  • Chang, Jae-Young;Kim, Jihoon;Jee, Seowoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.127-134
    • /
    • 2021
  • Recently, acquisition of high quality data has become an important issue as the expansion of the big data industry. In order to acquiring high quality data, accurate evaluation of data quality should be preceded first. The quality of data can be evaluated through meta-information such as statistics on data, and the task to extract such meta-information is called data profiling. Until now, data profiling software has typically been provided as a component or an additional service of traditional data quality or visualization tools. Hence, it was not suitable for utilizing directly in various environments. To address this problem, this paper presents the development result of data profiling software based on a microservice architecture that can be serviced in various environments. The presented data profiler provides an easy-to-use interface that requests of meta-information can be serviced through the restful API. Also, a proposed data profiler is independent of a specific environment, thus can be integrated efficiently with the various big data platforms or data analysis tools.

Analysis of Encryption Algorithm Performance by Workload in BigData Platform (빅데이터 플랫폼 환경에서의 워크로드별 암호화 알고리즘 성능 분석)

  • Lee, Sunju;Hur, Junbeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.6
    • /
    • pp.1305-1317
    • /
    • 2019
  • Although encryption for data protection is essential in the big data platform environment of public institutions and corporations, much performance verification studies on encryption algorithms considering actual big data workloads have not been conducted. In this paper, we analyzed the performance change of AES, ARIA, and 3DES for each of six workloads of big data by adding data and nodes in MongoDB environment. This enables us to identify the optimal block-based cryptographic algorithm for each workload in the big data platform environment, and test the performance of MongoDB by testing various workloads in data and node configurations using the NoSQL Database Benchmark (YCSB). We propose an optimized architecture that takes into account.