• Title/Summary/Keyword: Bigdata

Search Result 647, Processing Time 0.031 seconds

A Study on Data Processing Technology based on a open source R to improve utilization of the Geostationary Ocean Color Imager(GOCI) Products (천리안해양관측위성 산출물 활용성 향상을 위한 오픈소스 R 기반 데이터 처리기술 연구)

  • OH, Jung-Hee;CHOI, Hyun-Woo;LEE, Chol-Young;YANG, Hyun;HAN, Hee-Jeong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.4
    • /
    • pp.215-228
    • /
    • 2019
  • HDF5 data format is used to effectively store and distribute large volume of Geostationary Ocean Color Imager(GOCI) satellite data. The Korea Ocean Satellite Center has developed and provided a GOCI Data Processing System(GDPS) for general users who are not familiar with HDF5 format. Nevertheless, it is not easy to merge and process Hierarchical Data Format version5(HDF5) data that requires an understanding of satellite data characteristics, needs to learn how to use GDPS, and stores location and attribute information separately. Therefore, the open source R and rhdf5, data.table, and matrixStats packages were used to develop algorithm that could easily utilize satellite data in HDF5 format without the need for the process of using GDPS.

Performance Evaluation of Machine Learning Optimizers (기계학습 옵티마이저 성능 평가)

  • Joo, Gihun;Park, Chihyun;Im, Hyeonseung
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.766-776
    • /
    • 2020
  • Recently, as interest in machine learning (ML) has increased and research using ML has become active, it is becoming more important to find an optimal hyperparameter combination for various ML models. In this paper, among various hyperparameters, we focused on ML optimizers, and measured and compared the performance of major optimizers using various datasets. In particular, we compared the performance of nine optimizers ranging from SGD, which is the most basic, to Momentum, NAG, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax, and Nadam, using the MNIST, CIFAR-10, IRIS, TITANIC, and Boston Housing Price datasets. Experimental results showed that when Adam or Nadam was used, the loss of various ML models decreased most rapidly and their F1 score was also increased. Meanwhile, AdaMax showed a lot of instability during training and AdaDelta showed slower convergence speed and lower performance than other optimizers.

An Exploratory Study on Improvement Method of the Subway Congestion Based Big Data Convergence (지하철 혼잡도 개선방안에 관한 빅데이터융합 기반의 탐색적 연구)

  • Kim, KeunWon;Kim, DongWoo;Noh, Kyoo-Sung;Lee, Joo-Yeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.2
    • /
    • pp.35-42
    • /
    • 2015
  • As the value of Bigdata has been recognized importantly, public agencies including the government, private sector, etc. began to have an interest in Big Data. As there are sources of various data, and a variety of planning and analysis methods based on these sources has emerged, It is true that Bigdata will become a tool for creation of the new high qualitied information and decision making based on new insights. The purpose of this study is to find an alternative to the subway congestion problem that is not improved even though the various measures. In this study, we tried to explore approaches for ways to improve the congestion of the Seoul Subway using Seoul Metropolitan public data. Lastly, this study derived a policy alternative to establish new bus route that runs around the metro station that have a high level of congestion.

A Study on Perception of Educational Big Data Utilization and Current State of Data Utilization of Officials of the Provicial Office of Education (교육청 공무원의 데이터 활용실태 및 교육 빅데이터 활용에 관한 인식 연구 - A도교육청을 중심으로)

  • Shin, Jong-Ho
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.39-47
    • /
    • 2020
  • This study was conducted with the aim of investigating the actual state of data utilization and the perception of big data utilization by officials of the provincial Office of Education and to derive implications for the establishment of strategies for big data utilization. An online survey of 440 people was conducted. As a result, the types and sources of data used for work varied, and data collection and refining were the most difficult parts. The infrastructure for data utilization was insufficient and the most necessary factor. The purpose of big data utilization was related to the establishment of educational policy agenda.

Blockchain Technology for Healthcare Big Data Sharing (헬스케어 빅데이터 유통을 위한 블록체인기술 활성화 방안)

  • Yu, Hyeong Won;Lee, Eunsol;Kho, Wookyun;Han, Ho-seong;Han, Hyun Wook
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.73-82
    • /
    • 2018
  • At the core of future medicine is the realization of Precision Medicine centered on individuals. For this, we need to have an open ecosystem that can view, manage and distribute healthcare data anytime, anywhere. However, since healthcare data deals with sensitive personal information, a significant level of reliability and security are required at the same time. In order to solve this problem, the healthcare industry is paying attention to the blockchain technology. Unlike the existing information communication infrastructure, which stores and manages transaction information in a central server, the block chain technology is a distributed operating network in which a data is distributed and managed by all users participating in the network. In this study, we not only discuss the technical and legal aspects necessary for demonstration of healthcare data distribution using blockchain technology but also introduce KOREN SDI Network-based Healthcare Big Data Distribution Demonstration Study. In addition, we discuss policy strategies for activating blockchain technology in healthcare.

Prediction Model of Real Estate Transaction Price with the LSTM Model based on AI and Bigdata

  • Lee, Jeong-hyun;Kim, Hoo-bin;Shim, Gyo-eon
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.274-283
    • /
    • 2022
  • Korea is facing a number difficulties arising from rising housing prices. As 'housing' takes the lion's share in personal assets, many difficulties are expected to arise from fluctuating housing prices. The purpose of this study is creating housing price prediction model to prevent such risks and induce reasonable real estate purchases. This study made many attempts for understanding real estate instability and creating appropriate housing price prediction model. This study predicted and validated housing prices by using the LSTM technique - a type of Artificial Intelligence deep learning technology. LSTM is a network in which cell state and hidden state are recursively calculated in a structure which added cell state, which is conveyor belt role, to the existing RNN's hidden state. The real sale prices of apartments in autonomous districts ranging from January 2006 to December 2019 were collected through the Ministry of Land, Infrastructure, and Transport's real sale price open system and basic apartment and commercial district information were collected through the Public Data Portal and the Seoul Metropolitan City Data. The collected real sale price data were scaled based on monthly average sale price and a total of 168 data were organized by preprocessing respective data based on address. In order to predict prices, the LSTM implementation process was conducted by setting training period as 29 months (April 2015 to August 2017), validation period as 13 months (September 2017 to September 2018), and test period as 13 months (December 2018 to December 2019) according to time series data set. As a result of this study for predicting 'prices', there have been the following results. Firstly, this study obtained 76 percent of prediction similarity. We tried to design a prediction model of real estate transaction price with the LSTM Model based on AI and Bigdata. The final prediction model was created by collecting time series data, which identified the fact that 76 percent model can be made. This validated that predicting rate of return through the LSTM method can gain reliability.

Designing an Agricultural Data Sharing Platform for Digital Agriculture Data Utilization and Service Delivery (디지털 농업 데이터 활용 및 서비스 제공을 위한 농산업 데이터 공유 플랫폼 설계)

  • Seung-Jae Kim;Meong-Hun Lee;Jin-Gwang Koh
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.1-10
    • /
    • 2023
  • This paper presents the design process of an agricultural data sharing platform intended to address major challenges faced by the domestic agricultural industry. The platform was designed with a user interface that prioritizes user requirements for ease of use and offers various analysis techniques to provide growth prediction for field environment, growth, management, and control data. Additionally, the platform supports File to DB and DB to DB linkage methods to ensure seamless linkage between the platform and farmhouses. The UI design process utilized HTML/CSS-based languages, JavaScript, and React to provide a comprehensive user experience from platform login to data upload, analysis, and detailed inquiry visualization. The study is expected to contribute to the development of Korean smart farm models and provide reliable data sets to agricultural industry sites and researchers.

Design and Implementation of Bigdata Platform for Vessel Traffic Service (해상교통 관제 빅데이터 체계의 설계 및 구현)

  • Hye-Jin Kim;Jaeyong Oh
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.887-892
    • /
    • 2023
  • Vessel traffic service(VTS) centers are equipped with RADAR, AIS(Automatic Identification System), weather sensors, and VHF(Very High Frequency). VTS operators use this equipment to observe the movement of ships operating in the VTS area and provide information. The VTS data generated by these various devices is highly valuable for analyzing maritime traffic situation. However, owing to a lack of compatibility between system manufacturers or policy issues, they are often not systematically managed. Therefore, we developed the VTS Bigdata Platform that could efficiently collect, store, and manage control data collected by the VTS, and this paper describes its design and implementation. A microservice architecture was applied to secure operational stability that was one of the important issues in the development of the platform. In addition, the performance of the platform could be improved by dualizing the storage for real-time navigation information. The implemented system was tested using real maritime data to check its performance, identify additional improvements, and consider its feasibility in a real VTS environment.

Development of Big Data and AutoML Platforms for Smart Plants (스마트 플랜트를 위한 빅데이터 및 AutoML 플랫폼 개발)

  • Jin-Young Kang;Byeong-Seok Jeong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.83-95
    • /
    • 2023
  • Big data analytics and AI play a critical role in the development of smart plants. This study presents a big data platform for plant data and an 'AutoML platform' for AI-based plant O&M(Operation and Maintenance). The big data platform collects, processes and stores large volumes of data generated in plants using Hadoop, Spark, and Kafka. The AutoML platform is a machine learning automation system aimed at constructing predictive models for equipment prognostics and process optimization in plants. The developed platforms configures a data pipeline considering compatibility with existing plant OISs(Operation Information Systems) and employs a web-based GUI to enhance both accessibility and convenience for users. Also, it has functions to load user-customizable modules into data processing and learning algorithms, which increases process flexibility. This paper demonstrates the operation of the platforms for a specific process of an oil company in Korea and presents an example of an effective data utilization platform for smart plants.

An Analysis of Library User and Circulation Status based on Bigdata Logs A Case Study of National Library of Korea, Sejong (빅데이터 로그 기반 도서관 이용자 및 대출 현황 분석 - 국립세종도서관을 중심으로 -)

  • Kim, Tae-Young;Baek, Ji-Yeon;Oh, Hyo Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.49 no.2
    • /
    • pp.357-388
    • /
    • 2018
  • This study aims to analyze library user and circulation status based on the bigdata logs to identify characteristics by user group and propose methods for efficient management of library. The logs to be analyzed consist of user information, circulation information, service usage information registered at the National Library of Korea, Sejong. The user information logs contain 107,369 age data, 106,918 gender data, 106,838 residential data. The circulation information logs contain 536,083 circulation user data, 6,509,369 circulation count data, and the service usage information logs contain 82,813 data. For the analysis of characteristics by user group, the data were used for analyzing user status by age, gender, residence and circulation status by year, month, day. In addition, this study conducts FGI(Focus Group Interview) and linkage analysis with external data to identify factors for analysis results. Based on analysis results, improvement methods for helping library make effective decision-making were proposed. This study analyze empirically user and circulation status based on bigdata logs, and it has significance for being different form proceeding researches with less analysis data.