• Title/Summary/Keyword: Bigdata Collection

Search Result 47, Processing Time 0.027 seconds

Comprehensive Knowledge Archive Network harvester improvement for efficient open-data collection and management

  • Kim, Dasol;Gil, Myeong-Seon;Nguyen, Minh Chau;Won, Heesun;Moon, Yang-Sae
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.835-855
    • /
    • 2021
  • With the recent increase in data disclosure, the Comprehensive Knowledge Archive Network (CKAN), which is an open-source data distribution platform, is drawing much attention. CKAN is used together with additional extensions, such as Datastore and Datapusher for data management and Harvest and DCAT for data collection. This study derives the problems of CKAN itself and Harvest Extension. First, CKAN causes two problems of data inconsistency and storage space waste for data deletion. Second, Harvest Extension causes three additional problems, namely source deletion that deletes only sources without deleting data themselves, job stop that cannot delete job during data collection, and service interruption that cannot provide service, even if data exist. Based on these observations, we propose herein an improved CKAN that provides a new deletion function solving data inconsistency and storage space waste problems. In addition, we present an improved Harvest Extension solving three problems of the legacy Harvest Extension. We verify the correctness and the usefulness of the improved CKAN and Harvest Extension functions through actual implementation and extensive experiments.

Designing Bigdata Platform for Multi-Source Maritime Information

  • Junsang Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.111-119
    • /
    • 2024
  • In this paper, we propose a big data platform that can collect information from various sources collected at ocean. Currently operating ocean-related big data platforms are focused on storing and sharing created data, and each data provider is responsible for data collection and preprocessing. There are high costs and inefficiencies in collecting and integrating data in a marine environment using communication networks that are poor compared to those on land, making it difficult to implement related infrastructure. In particular, in fields that require real-time data collection and analysis, such as weather information, radar and sensor data, a number of issues must be considered compared to land-based systems, such as data security, characteristics of organizations and ships, and data collection costs, in addition to communication network issues. First, this paper defines these problems and presents solutions. In order to design a big data platform that reflects this, we first propose a data source, hierarchical MEC, and data flow structure, and then present an overall platform structure that integrates them all.

Visual Cell : Image Analysis and Visual Retrieval System for Biology Cell Image Bigdata (Visual Cell : 바이오세포 이미지 빅데이터를 위한 이미지 분석 및 시각적 검색 시스템)

  • Park, Beomjun;Jo, Sunhwa;Lee, Suan;Shin, Jiwoon;Yoo, Hyuk Sang;Kim, Jinho
    • The Journal of Bigdata
    • /
    • v.4 no.1
    • /
    • pp.53-61
    • /
    • 2019
  • The extracellular matrix, which provides the structural and biochemical support of surrounding cells, is a cell physiological modulator that controls cell division and differentiation. In the bio sector, the company produces Scapold, a three-dimensional support for tissue engineering, and cultivates stem cells in the produced Scapold to be transplanted into animals to assess tissue regeneration. This depends on components such as collagen in the tissue. Therefore, it is very important to identify the inclusion rate and distribution of components in the tissue, and the data are obtained by analyzing the color of the dyed tissue image. The process from image collection to analysis is costly, and the data collected and analyzed are managed in different formats by different research institutions. Therefore, data integration management and analysis results search are not being performed. In this paper, we establish a database that can manage relevant bigdata in an integrated manner, and propose a bio-image integrated management and retrieval system that can be searched based on color, an important analytical measure in this field of study.

  • PDF

A Study on Perception of Educational Big Data Utilization and Current State of Data Utilization of Officials of the Provicial Office of Education (교육청 공무원의 데이터 활용실태 및 교육 빅데이터 활용에 관한 인식 연구 - A도교육청을 중심으로)

  • Shin, Jong-Ho
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.39-47
    • /
    • 2020
  • This study was conducted with the aim of investigating the actual state of data utilization and the perception of big data utilization by officials of the provincial Office of Education and to derive implications for the establishment of strategies for big data utilization. An online survey of 440 people was conducted. As a result, the types and sources of data used for work varied, and data collection and refining were the most difficult parts. The infrastructure for data utilization was insufficient and the most necessary factor. The purpose of big data utilization was related to the establishment of educational policy agenda.

Designing an Agricultural Data Sharing Platform for Digital Agriculture Data Utilization and Service Delivery (디지털 농업 데이터 활용 및 서비스 제공을 위한 농산업 데이터 공유 플랫폼 설계)

  • Seung-Jae Kim;Meong-Hun Lee;Jin-Gwang Koh
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.1-10
    • /
    • 2023
  • This paper presents the design process of an agricultural data sharing platform intended to address major challenges faced by the domestic agricultural industry. The platform was designed with a user interface that prioritizes user requirements for ease of use and offers various analysis techniques to provide growth prediction for field environment, growth, management, and control data. Additionally, the platform supports File to DB and DB to DB linkage methods to ensure seamless linkage between the platform and farmhouses. The UI design process utilized HTML/CSS-based languages, JavaScript, and React to provide a comprehensive user experience from platform login to data upload, analysis, and detailed inquiry visualization. The study is expected to contribute to the development of Korean smart farm models and provide reliable data sets to agricultural industry sites and researchers.

Development of IIoT Edge Middleware System for Smart Services (스마트서비스를 위한 경량형 IIoT Edge 미들웨어 시스템 개발)

  • Lee, Han;Hwang, Joon Suk;Kang, Dae Hyun;Jeong, Seok Chan
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.115-125
    • /
    • 2021
  • Due to various ICT Technology innovations and Digital Transformation, the Internet of Things(IoT) environment is increasingly requiring intelligence, decentralization, and automated service, especially an advanced and stable smart service environment in the Industrial Internet of Things(IIoT) where communication network(5G), data analysis and artificial intelligence(AI), and digital twin technology are combined. In this study, we propose IIoT Edge middleware systems for flexible interface with heterogeneous devices such as facilities and sensors at various industrial sites and for quick and stable data collection and processing.

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Designing Cost Effective Open Source System for Bigdata Analysis (빅데이터 분석을 위한 비용효과적 오픈 소스 시스템 설계)

  • Lee, Jong-Hwa;Lee, Hyun-Kyu
    • Knowledge Management Research
    • /
    • v.19 no.1
    • /
    • pp.119-132
    • /
    • 2018
  • Many advanced products and services are emerging in the market thanks to data-based technologies such as Internet (IoT), Big Data, and AI. The construction of a system for data processing under the IoT network environment is not simple in configuration, and has a lot of restrictions due to a high cost for constructing a high performance server environment. Therefore, in this paper, we will design a development environment for large data analysis computing platform using open source with low cost and practicality. Therefore, this study intends to implement a big data processing system using Raspberry Pi, an ultra-small PC environment, and open source API. This big data processing system includes building a portable server system, building a web server for web mining, developing Python IDE classes for crawling, and developing R Libraries for NLP and visualization. Through this research, we will develop a web environment that can control real-time data collection and analysis of web media in a mobile environment and present it as a curriculum for non-IT specialists.

Implementation and Performance Aanalysis of Efficient Big Data Processing System Through Dynamic Configuration of Edge Server Computing and Storage Modules (BigCrawler: 엣지 서버 컴퓨팅·스토리지 모듈의 동적 구성을 통한 효율적인 빅데이터 처리 시스템 구현 및 성능 분석)

  • Kim, Yongyeon;Jeon, Jaeho;Kang, Sungjoo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.6
    • /
    • pp.259-266
    • /
    • 2021
  • Edge Computing enables real-time big data processing by performing computing close to the physical location of the user or data source. However, in an edge computing environment, various situations that affect big data processing performance may occur depending on temporary service requirements or changes of physical resources in the field. In this paper, we proposed a BigCrawler system that dynamically configures the computing module and storage module according to the big data collection status and computing resource usage status in the edge computing environment. And the feature of big data processing workload according to the arrangement of computing module and storage module were analyzed.

A Study on the Analysis of Aviation Safety Data Structure and Standard Classification (항공안전데이터 구조 분석 및 표준 분류체계에 관한 연구)

  • Kim, Jun Hwan;Lim, Jae Jin;Lee, Jang Ryong
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.28 no.4
    • /
    • pp.89-101
    • /
    • 2020
  • In order to enhance the safety of the international aviation industry, the International Civil Aviation Organization has recommended establishing an operational foundation for systematic and integrated collection, storage, analysis and sharing of aviation safety data. Accordingly, the Korea aviation industry also needs to comprehensively manage the safety data which generated and collected by various stakeholders related to aviation safety, and through this, it is necessary to previously identify and remove hazards that may cause accident. For more effective data management and utilization, a standard structure should be established to enable integrated management and sharing of safety data. Therefore, this study aims to propose the framework about how to manage and integrate the aviation safety data for big data-based aviation safety management and shared platform.