• Title/Summary/Keyword: Engineering Big Data

Search Result 1,862, Processing Time 0.039 seconds

Analysis of Repository Systems for Designing a Archive System of Large Science Data (대용량 과학데이터 아카이브 시스템 설계를 위한 리포지터리 시스템 분석)

  • Lim, Jongtae;Seo, Indeok;Song, Heesub;Yoo, Seunghun;Jeong, Jaeyun;Cho, Jungkwon;Paul, Aniruddha;Ko, Geonsik;Kim, Byounghoon;Park, Yunjeong;Song, Jinwoo;Lee, Seohee;Jeon, Hyeonwook;Choi, Minwoong;Noh, Yeonwoo;Choi, Dojin;Kim, Yeonwoo;Bok, Kyoungsoo;Lee, Jeonghoon;Lee, Sanghwon;Yoo, Jaesoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2016.05a
    • /
    • pp.21-22
    • /
    • 2016
  • 본 논문에서는 대용량 과학데이터 아카이브 시스템 설계를 위해 기존 리포지터리 시스템을 분석한다. 대용량 과학데이터를 효율적으로 수집하고 저장하기 위한 아카이브 시스템 아키텍쳐 설계를 위하여 현재 서비스되고 있는 다양한 과학데이터 리포지터리 시스템을 분석한다. 분석한 내용을 바탕으로 대용량 과학데이터 아카이브 시스템 아키텍쳐를 설계하기 위한 기술적인 요구사항을 도출한다.

  • PDF

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R (빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1199-1205
    • /
    • 2021
  • In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Forecasting Housing Demand with Big Data

  • Kim, Han Been;Kim, Seong Do;Song, Su Jin;Shin, Do Hyoung
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.44-48
    • /
    • 2015
  • Housing price is a key indicator of housing demand. Actual Transaction Price Index of Apartment (ATPIA) released by Korea Appraisal Board is useful to understand the current level of housing price, but it does not forecast future prices. Big data such as the frequency of internet search queries is more accessible and faster than ever. Forecasting future housing demand through big data will be very helpful in housing market. The objective of this study is to develop a forecasting model of ATPIA as a part of forecasting housing demand. For forecasting, a concept of time shift was applied in the model. As a result, the forecasting model with the time shift of 5 months shows the highest coefficient of determination, thus selected as the optimal model. The mean error rate is 2.95% which is a quite promising result.

  • PDF

Design of Ecosystems to Analyze Big Data Market (빅데이터 시장 분석을 위한 에코시스템 설계)

  • Lee, Sangwon;Park, Sungbum;Shin, Seong-yoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.433-434
    • /
    • 2014
  • Big Data services is composed of Big Data user, Big Data service provider, and Big Data application provider. And it is possible to extend the service to interplay-reciprocal actions among three subjects such as providing, being provided, connecting, being connected, and so on. In this paper, we propose an ecosystems of Big Data and a framework of its service.

  • PDF

A Study on the Strategy of the Use of Big Data for Cost Estimating in Construction Management Firms based on the SWOT Analysis (SWOT분석을 통한 CM사 견적업무 빅데이터 활용전략에 관한 연구)

  • Kim, Hyeon Jin;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.2
    • /
    • pp.54-64
    • /
    • 2022
  • Since the interest in big data is growing exponentially, various types of research and development in the field of big data have been conducted in the construction industry. Among various application areas, cost estimating can be a topic where the use of big data provides positive benefits. In order for firms to make efficient use of big data for estimating tasks, they need to establish a strategy based on the multifaceted analysis of internal and external environments. The objective of the study is to develop and propose a strategy of the use of big data for construction management(CM) firms' cost estimating tasks based on the SWOT analysis. Through the combined efforts of literature review, questionnaire survey, interviews and the SWOT analysis, the study suggests that CM firms need to maintain the current level of the receptive culture for the use of big data and expand incrementally information resources. It also proposes that they need to reinforce the weak areas including big data experts and practice infrastructure for improving the big data-based cost estimating.

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

A research paper for e-government's role for public Big Data application (공공의 빅데이터 활용을 위한 전자정부 역할 연구)

  • Bae, Yong-guen;Cho, Young-Ju;Choung, Young-chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2176-2183
    • /
    • 2017
  • The value of Big-Data which is a main factor of the fourth Industrial Revolution enhances industrial productivity in private sector and provides administrative services for nations and corporates in public sector. ICT-developed countries are coming up with Big-Data application in public sector rapidly. Especially, when it comes to social crisis management, they are equipped with pre-forcasting system. Korean Government also emphasizes Big-Data application in public sector for the social crisis management. But the reality where the overall infrastructure vulnerability reveals requires preparation and operation of measurement for social problems. Accordingly, we need to analyze Big-Data application problem and benchmark the precedented cases, thereby, direct policy diversity. Hence, this paper proposes the roles and rules of E-government analyzing problems from Big-Data application. The following policy proposes open Information and legal&institutional improvement, Big-Data service considerations threatening privacy issues in Big-Data ecosystem, necessity of operational and analytical technology for Big-Data and related technology in technical implication of Big-Data.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.